Category: Analysis

  • The Autonomous SDLC: What’s Solved, What’s Not, and Why the Gaps Are Closing Fast

    We’re further along than most people realize. The software development lifecycle is being automated piece by piece, and the trajectory is becoming harder to ignore—not through some magical breakthrough, but through the steady elimination of bottlenecks that seemed permanent six months ago.

    This is a practitioner’s status report discussing what works in production today, what remains genuinely unsolved, and why the remaining gaps matter less than conventional wisdom suggests.

    Code Generation: Already Production-Grade

    The middle portion of the SDLC—turning specifications into working code—has crossed a threshold. Cursor CEO Michael Truell describes three eras: tab autocomplete, synchronous agents responding to prompts, and now agents tackling larger tasks independently with less human direction. At Cursor, 35% of merged PRs now come from agents running autonomously in cloud VMs. The agent PRs are “an order of magnitude more ambitious than human PRs” while maintaining higher merge rates.

    What matters isn’t the percentage—it’s that these agent-generated PRs pass the same review standards as human code. Max Woolf’s detailed experiments are instructive. Starting as a vocal skeptic who wrote about rarely using LLMs, he ended up building Rust libraries that outperformed battle-tested numpy-backed implementations by 2-30x. Not prototypes—production code passing comprehensive test suites and benchmarks.

    His conclusion after months of testing:

    I have been trying to break this damn model by giving it complex tasks that would take me months to do by myself despite my coding pedigree but Opus and Codex keep doing them correctly.

    The quality ceiling keeps rising with each model generation. This isn’t “good enough for prototypes”—it’s production-grade code that ships.

    Spec-Driven Development

    The initiation problem has largely converged. Most tools now support planning mode—the agent reads a spec, creates an implementation plan, follows it through. Woolf’s experience matters here:

    AGENTS.md is probably the main differentiator between those getting good and bad results with agents.

    These persistent instruction files function as system prompts that shape agent behaviour across sessions.

    This is just spec-driven development—the same methodology good engineering teams already use. The pattern works: write a detailed spec (GitHub issue, markdown file), point the agent at it, let it execute. The difference is that agents can now be the executor, and the pattern works across tools (Cursor, Claude Code, Codex) because it aligns with how reliable software gets built regardless of who’s typing.

    The Feedback Loop: The Primary Gap

    Basic unit tests and regression tests work well—agents can write and run them as part of their workflow. Complex feature tests, integration tests, and UAT remain the primary gap. UI/UX testing is particularly challenging since agents can’t easily evaluate visual output.

    The current workaround: human-in-the-loop for complex test evaluation, with agents handling mechanical testing. That said, the coding agents can still fix bugs when given screenshots and descriptions.

    This is an active focus area. The gap is narrowing from both sides: agents getting better at generating comprehensive tests, and tooling improving for automated visual and integration testing. Satisfactory solutions within 2026 aren’t a stretch—they’re the natural next step given where the infrastructure is heading.

    Guardrails: Actively Being Solved

    Managing task boundaries and blast radius is critical for autonomous operation. Best practices are emerging around sandboxing—isolated agent execution environments, limited file system access, branch-based workflows.

    The Anthropic C compiler experiment demonstrated the pattern at scale: 16 agents working on a shared codebase over 2,000 sessions, coordinating through git locks and comprehensive test harnesses. The test infrastructure was rigorous enough to guide autonomous agents toward correctness without human review, producing a 100,000-line compiler that can build Linux.

    StrongDM took this further with their dark factory approach. They built digital twins of production dependencies—behavioral clones of Okta, Jira, Slack—using agents to replicate APIs and edge cases. This enabled validation at volumes far exceeding production limits without risk. Their rule: “Code must not be reviewed by humans.” The safety comes from comprehensive scenario testing against holdout test cases the agents never see.

    The agent infrastructure layer is building out fast. We’re seeing microVMs that boot fast enough to feel container-like, with snapshot/restore making “reset” almost free. Agent-specific sandboxed compute, identity, and API access are emerging as distinct product categories.

    The guardrails problem is increasingly an infrastructure problem, not a model problem. This converges toward a standard pattern: spec + guardrails + sandbox + automated validation = safe autonomous execution.

    The Self-Improvement Dynamic

    Something subtle is happening. Codex optimizes code, Opus optimizes it further, Opus validates against known-good implementations. Cumulative 6x speed improvements on already-optimized code. Then you have Opus 4.6 iteratively improving its own code through benchmark-driven passes.

    Folks have showed agents tuning LLMs on Hugging Face—the tooling layer being built by the tools themselves. This isn’t theoretical AGI. It’s narrow but powerful self-improvement within the coding domain. The practical implication: the rate of improvement accelerates as agents get better at improving agents. For the coding stack specifically, each generation of tools makes the next generation arrive faster.

    What This Means for Planning

    Here’s the timeline as I see it:

    2025: Code generation reliable. Spec-driven development emerging. Testing and guardrails manual.

    2026: Testing automation reaches satisfactory level. Guardrails standardize. The loop becomes semi-autonomous.

    2027+: Fully autonomous for standard applications. Human involvement shifts entirely to direction and edge cases.

    The companies planning as if these gaps will persist are making the same mistake as those who planned around slow internet in 2005. AI tools amplify existing expertise—all the practices that distinguished senior engineers (comprehensive testing, good documentation, strong version control habits, effective code review) matter even more now. But the bar for what “good enough” looks like is rising in parallel.

    Antirez captures the shift plainly:

    Writing code is no longer needed for the most part. It is now a lot more interesting to understand what to do, and how to do it.

    The mental work hasn’t disappeared. It’s concentrated in the parts machines can’t yet replace: architecture decisions, user needs, system design trade-offs.

    The gaps are real today. But they’re the wrong thing to optimize around. Optimize around what becomes possible when they close—because that’s happening faster than the pace of traditional software planning cycles.

  • From Productivity to Progress: What the New MIT-Stanford AI Study Really Tells Us About the Future of Work

    From Productivity to Progress: What the New MIT-Stanford AI Study Really Tells Us About the Future of Work

    A new study from MIT and Stanford just rewrote the AI-in-the-workplace narrative.

    Published in Fortune this week, the research shows that generative AI tools — specifically chatbots — are not only boosting productivity by up to 14%, but they’re also raising earnings without reducing work hours.

    “Rather than displacing workers, AI adoption led to higher earnings, especially for lower-performing employees.”

    Let that sink in.


    🧠 AI as a Floor-Raiser, Not a Ceiling-Breaker

    The most surprising finding?
    AI’s greatest impact was seen not among the top performers, but among lower-skilled or newer workers.

    In customer service teams, the AI tools essentially became real-time coaches — suggesting responses, guiding tone, and summarizing queries. The result: a productivity uplift and quality improvement that evened out performance levels across the team.

    This is a quiet revolution in workforce design.

    In many traditional orgs, productivity initiatives often widen the gap between high and average performers. But with AI augmentation, we’re seeing the inverse — a democratization of capability.


    💼 What This Means for Enterprise Leaders

    This research confirms a pattern I’ve observed firsthand in consulting:
    The impact of AI is not just technical, it’s organizational.

    To translate AI gains into business value, leaders need to:

    ✅ 1. Shift from Efficiency to Enablement

    Don’t chase cost-cutting alone. Use AI to empower more team members to operate at higher skill levels.

    ✅ 2. Invest in Workflow Design

    Tool adoption isn’t enough. Embed AI into daily rituals — response writing, research, meeting prep — where the marginal gains accumulate.

    ✅ 3. Reframe KPIs

    Move beyond “time saved” metrics. Start tracking value added — better resolutions, improved CSAT, faster ramp-up for new hires.


    🔄 A Playbook for Augmented Teams

    From piloting GPT agents to reimagining onboarding flows, I’ve worked with startups and enterprise teams navigating this shift. The ones who succeed typically follow this arc:

    1. Pilot AI in a high-volume, low-risk function
    2. Co-create use cases with users (not for them)
    3. Build layered systems: AI support + human escalation
    4. Train managers to interpret, not just supervise, AI-led work
    5. Feed learnings back into process improvement loops

    🔚 Not AI vs Jobs. AI Plus Better Jobs.

    The real story here isn’t about productivity stats. It’s about potential unlocked.

    AI is no longer a futuristic experiment. It’s a present-day differentiator — especially for teams willing to rethink how work gets done.

    As leaders, we now face a simple choice:

    Will we augment the talent we have, or continue to chase the talent we can’t find?

    Your answer will shape the next 3 years of your business.


    🔗 Read the original article here:

    Fortune: AI chatbots boost earnings and hours, not job loss


    Want to go deeper? I’m working on a new AI augmentation playbook — DM me or sign up for updates.

    #AI #FutureOfWork #EnterpriseStrategy #GTM #DigitalTransformation #Chatbots #Productivity #ConsultingInsights

  • From Data to Decision: AI Assistance in the Agile Workplace

    From Data to Decision: AI Assistance in the Agile Workplace

    I recently had the privilege of presenting online at the Business Analytics and Decision Sciences Conclave to a group of enthusiastic MBA students. The session, titled “From Data to Decision: AI Assistance in the Agile Workplace,” focused on how AI and analytics are revolutionizing the workplace and how students can prepare for these changes.

    Key Takeaways from the Session

    Data Literacy

    One of the core ideas we discussed was the importance of data literacy. In today’s data-rich world, it’s not enough to simply collect data; we must understand and interpret it effectively. I used the analogy of looking for lost keys under a streetlight to illustrate how we often focus on easily accessible data, even though the true insights might lie in harder-to-reach places. This highlights the need to measure what truly matters, rather than what is easy to quantify.

    Deep Analytics

    We also explored the concept of deep analytics. It’s crucial to go beyond surface-level data and understand the context and intricacies behind the numbers. For example, understanding the difference between correlation and causation can prevent misleading conclusions. I emphasized the importance of domain expertise in providing context to data and avoiding biases in AI-based decision making.

    Practical Examples

    To make these ideas more tangible, I shared practical examples from the pharmaceutical industry:

    • Follow-up Email Campaigns: We discussed why data literacy is important for new channel activations and how AI can help launch and optimize follow-up email campaigns by incentivizing the right behavior, monitoring customer satisfaction, and adjusting campaign content based on performance. The Rule of 80 – 80 – 40 was highlighted as a guideline to ensure campaign effectiveness.
    • Next Best Action (NBA) Solutions: I showcased how AI can determine the next best actions for the field force by analyzing customer preferences, transaction history, and available content. This approach helps in personalizing interactions and driving better outcomes.

    Agility

    The session also covered the importance of agility in today’s fast-paced business environment. AI plays a crucial role in speeding up decision-making processes by providing actionable insights, enabling rapid hypothesis testing, and offering predictive analytics. Embracing agility allows businesses to adapt quickly to market changes and stay competitive.

    Preparing for the Future To conclude the session, I offered practical tips for students on how to prepare for the future workplace. I also recommended three impactful books for those interested in diving deeper into these topics:

    • “How to Lie with Statistics” by Darrell Huff
    • “Weapons of Math Destruction” by Cathy O’Neil
    • “Data Science for Business” by Foster Provost and Tom Fawcett

    The session was an enriching experience, and I’m excited to continue the conversation on how we can better leverage AI and analytics to drive operational resilience and innovation.

    Feel free to check out the attached presentation slides for a more detailed look at the session.

  • Reality catches up with Uber – Mumbai taxi fares re-revisited

    Uber Mumbai has just announced a big hike on the Black and SUV services, pretty much bringing them on par with the Ola Prime SUV service. So here’s the latest fare chart (older versions here – v1, v2):

    Approx. taxi fares in Mumbai as on 13 July 2015
    Approx. taxi fares in Mumbai as on 13 July 2015

    Note on the calculation methodology:

    • Travel time calculated assuming 3 min per km (Uber, Ola, TFS)
    • Waiting time taken as 1/2 min per km (kaali peeli & Meru\TabCab)
  • Revisiting the taxi fares in Mumbai

    Ola announced a series of price cuts to their Mini and Sedan services to better compete with Uber and also added the Taxi for Sure hatchbacks to their app in the last few days. This calls for an update to the fare chart that I had made for the various taxi services in Mumbai ranging from the traditional kaali peeli and Meru\Tab cab to the new entrants like Ola and Uber. So here it is:

    Taxi fares in Mumbai
    Taxi fares in Mumbai

    The equation hasn’t changed drastically, but the Ola Mini service is now pretty much comparable to UberX, while UberGo remains unchallenged. Ola Sedan also becomes significantly cheaper than the Merus and Tab Cabs while the newly added Taxi for Sure service (for the Ola app) slots in between these two. TFS seems ripe for a round of price revisions as the  cars are effectively equivalent of the Minis, i.e., hatchbacks.

    The recommendations are quite simple:

    • For short distances (<10 km), kaali peelis are the most economical
    • Beyond 10 km, UberGo reigns supreme. In fact, unless you are doing very short distances (sub 5 km), they are the best option. They’re definitely not sustainable for Uber and that possibly explains their relatively limited availability. However, for taxi commuters like me they’re the perfect kaali peeli replacement.
    • Since you are unlikely to get an UberGo, your next best bet is to settle for an UberX or an Ola Mini. For that matter you could go with any of the other options barring the SUVs or Uber Black for distances around 10-15 km without too much fare difference.
    • For distances longer than 15 km, the newer lot comprising of UberX and Ola Mini & Sedan pull away from the Rs 20/km crowd of Meru, Mega, Tab Cab etc.

    Either way, this is a good time for the commuter though the rates are unlikely to be sustainable in the long run. So, enjoy for the time being and hope that the day of pleading with taxi drivers and autos never returns.

  • How Uber’s shaken up the pricing structure in India

    I’ve been using Uber quite frequently over the last couple of months and today’s Mumbai taxi strike to protest such services ironically forced me to opt for Uber at a 1.8x surge price. While I’ve had my share of ups & downs with Uber, the flexible pricing model has been one aspect that I’ve been impressed with compared to the competition like Ola.

    Uber managed to create quite a buzz offering single digit per km rates which was almost half the rate others were offering at that time, but the pricing model which included a per minute charge on the trip ensured that the overall fare was not unsustainably low. This has also allowed them to go after the local taxi & auto services in the different cities and they also end up being cheaper for medium to long distances.

    The Uber pricing in India is typically a low per km rate coupled with another per trip minute rate on top of a fixed base fare, with the overall fare subject to a minimum amount and of course the surge factor. Putting it simply:

    Fare = Surge factor x (Distance x Rate per km + Trip time in minutes x Rate per minute)

    Ola which had started off in India with a conventional pricing model of rate per km and a waiting time rate has pretty much overhauled their pricing to mimic the Uber model. They have in fact abandoned their initial method of applying a fixed peak time price during 2 slots on weekdays in favour of a surge factor. The other taxi services like Meru, Tab Cab, Easy Cab etc. have thus far stuck to the traditional model, though they’re trying to stay relevant through special offers.

    I also did a simplistic analysis of how the different services compare in terms of the trip fare in a city like Mumbai (Google Sheet here). I’ve assumed a trip time of 3 minutes per km and waiting time of 1 minute for every 4 km, so the results are going to be quite different in heavy traffic.

    Approx fare comparison
    Approx fare comparison (corrected)

    For short distances, the local kaali peelis are of course the cheapest, but for distances above 10 km, UberGO ends up being a better deal. The next cheapest is the Ola mini which starts getting pretty competitive with kaali peelis after the 20 km mark. This is of course disregarding the non-AC nature of the kaali peelis. [Update] Ola Mini and UberX are pretty competitive till the 10 km range, but separate pretty quickly after that as the near 30% higher charge per km for Ola starts making a mark.

    The older generation of Meru, Tab Cab etc manage to remain competitive with the newer lot, matching the next best Ola Sedan UberX and Ola up to the 10 km mark, but the higher cost per km quickly multiplies beyond that point. And then we have UberBLACK and UberSUV which have the same rates but different capacities. They can actually offer a better deal than Meru and the likes for long distances over 25 km. Of course if you have 5-6 people travelling, then these 6 seaters are the way to go. Lastly, we have Ola’s version of the SUV with its Prime service that’s the costliest of the lot. Again, if you are in a group of 5-6 people, this can actually be cheaper than the taking two 4-seater vehicles, unless of course you manage to get a couple of UberGOs.

    I haven’t considered the surge pricing in the above comparison, and that is a scenario where the older lot turns out to be cheaper. However, such scenarios are rare as Merus and the likes can be pretty hard to find for immediate travel. The interesting thing to see now will be the role that regulators play in toying around with these pricing models.

    Update (16 Jun 2015): Found a major miscalculation in the trip time. I have corrected the graph and updated the text accordingly.