Tag: vibe coding

  • The Dark Factory: Engineering Teams That Run With the Lights Off

    A few engineering organisations are already operating a model most companies haven’t begun to consider. While the typical software team debates whether to adopt AI coding assistants, companies like StrongDM are running fully automated development pipelines where agents handle implementation, testing, review, and deployment. Humans set direction and define constraints. The mechanical work happens without them.

    This isn’t speculative. It’s operational. And the gap between companies working this way and those that aren’t is widening fast.

    What “lights off” actually means

    The term comes from manufacturing — factories that run autonomously, with minimal human presence. In software, it describes engineering organisations where AI agents do the bulk of execution work while humans focus on architecture, constraints, and outcomes.

    StrongDM’s approach is instructive: their benchmark is that if you haven’t spent at least $1,000 on tokens per human engineer per day, your software factory has room for improvement. Agents work in parallel on isolated tasks. Code is written, tested, and reviewed without manual intervention. Tasks assigned Friday evening return results Monday morning.

    The ratio of agents to humans is high and growing. But this isn’t about replacing engineers — it’s about fundamentally changing what engineers do.

    The guardrails are the system

    Dark factories aren’t ungoverned. They’re heavily governed in a different way.

    Linters, formatters, comprehensive test suites, design pattern enforcement — these become pre-conditions rather than suggestions. Agents are configured to seek completion only when all guardrails pass. Code review shifts from line-by-line human inspection to AI review with human spot-checks on critical paths.

    The discipline moves from “write good code” to “design good systems for code to be written in.” That’s a different skill. It requires thinking about constraints, validation, and feedback loops rather than syntax and implementation details.

    Anthropic’s experiment building a C compiler with parallel Claude instances demonstrates this principle. Sixteen agents worked simultaneously on a shared codebase, coordinating through git locks and comprehensive test harnesses. The result: a 100,000-line compiler capable of building the Linux kernel, produced over nearly 2,000 sessions across two weeks for just under $20,000. The project worked because the test infrastructure was rigorous enough to guide autonomous agents toward correctness without human review of every change.

    Cursor’s experiments with scaling agents ran into a different problem. They tried flat coordination first — agents self-organising through a shared file, claiming tasks, updating status. It broke down. Agents held locks too long, became risk-averse, made small safe changes, and nobody took responsibility for hard problems. The fix was introducing hierarchy: planners that explore the codebase and create tasks, workers that grind on assigned work until it’s done. No single agent tries to do everything. The system ran for weeks, writing over a million lines of code. One project improved video rendering performance by 25x and shipped to production. Their takeaway: many of the gains came from removing complexity rather than adding it.

    Digital twins as the enabler

    The biggest blocker to agent autonomy has been the fear of breaking production. Digital twins remove that constraint.

    StrongDM built behavioural replicas of third-party services their software depends on — Okta, Jira, Slack, Google Docs, Google Drive, and Google Sheets. These twins replicate APIs, edge cases, and observable behaviours with sufficient fidelity that agents can test against realistic conditions at volume, without rate limits or production risk.

    Simon Willison’s write-up of StrongDM’s approach highlights how this changed what was economically feasible: “Creating a high fidelity clone of a significant SaaS application was always possible, but never economically feasible. Generations of engineers may have wanted a full in-memory replica of their CRM to test against, but self-censored the proposal to build it.”

    What makes this rigorous rather than just better staging is how they handle validation. Test scenarios are stored outside the codebase — separate from where the coding agents can see them — functioning like holdout sets in machine learning. Agents can’t overfit to the tests because they don’t have access to them. The QA team is also agents, running thousands of scenarios per hour without hitting rate limits or accumulating API costs.

    The structural advantage of starting fresh

    Startups and SMBs have a material advantage here. No legacy organisational structure to dismantle. No 500-person engineering floor with stakeholders defending headcount. No 18-month procurement cycles.

    Capital efficiency becomes native. A three-person team with agents can produce output that previously required twenty people. The cost of compute is a fraction of equivalent human labour and falling rapidly.

    This creates an asymmetric advantage. If your competitor ships in days what takes you months, no amount of talent closes that gap. And the competitive pressure isn’t just on speed — it’s on the ability to attract talent that wants to work this way. Senior engineers who’ve experienced agent-driven development don’t want to go back to manual workflows.

    The gap between adopters and laggards

    Companies operating this way are shipping at a fundamentally different pace. The difference isn’t incremental — it’s orders of magnitude in output per person.

    Block’s recent announcement of a near-50% reduction in headcount offers a data point. The company is reducing its organization from over 10,000 people to just under 6,000. Jack Dorsey stated “we’re not making this decision because we’re in trouble. our business is strong” but noted that “the intelligence tools we’re creating and using, paired with smaller and flatter teams, are enabling a new way of working which fundamentally changes what it means to build and run a company.”

    Cursor’s data shows the same pattern. 35% of pull requests merged internally at Cursor are now created by agents operating autonomously in cloud VMs. The developers adopting this approach write almost no code themselves. They spend their time breaking down problems, reviewing artifacts, and giving feedback. They spin up multiple agents simultaneously instead of guiding one to completion.

    The laggards aren’t just slower. They’re increasingly unable to compete for talent, capital, or market position against organisations that have made this transition.

    You don’t need a corporate budget to start

    The dark factory model scales down. A single developer with a Claude Code subscription and well-structured GitHub workflows can run a lightweight version of the same approach.

    Start with one workflow. Pick a repetitive part of your development or business process, establish the guardrails, and let agents handle it. The key investment isn’t in compute — it’s in guardrails and context. Linters, test suites, good documentation, and clear specifications matter more than token budget.

    For SMBs and founders, this is the most asymmetric advantage available. You can operate at a scale that was previously only accessible with significant headcount. The learning curve is steep but short. Within 30 days of serious experimentation, most people develop the intuition for what agents can and can’t handle.

    Projects like OpenClaw — an open-source autonomous agent that executes tasks across messaging platforms and services — demonstrate that the tooling for this approach is increasingly accessible. The software runs locally, integrates with multiple LLM providers, and requires no enterprise licensing. The barrier isn’t access to technology. It’s willingness to change how work gets done.

    What this means beyond software

    Software is where this pattern is playing out first, but the model applies wherever knowledge work is structured and repeatable.

    Audit processes. Compliance checks. Report generation. Data analysis. Document review. These are all candidates for the same approach: clear specifications, comprehensive validation, and autonomous execution within defined guardrails.

    Most traditional industries haven’t started thinking about this. They’re still debating whether to use ChatGPT for email drafts. The firms that figure out how to apply dark factory principles to their domain will have an enormous advantage over those still operating with manual workflows.

    The lights are already off in some factories. The question isn’t whether this approach will spread. It’s how quickly your organisation recognises that the game has changed.

  • GitHub’s SpecKit: The Structure Vibe Coding Was Missing

    GitHub’s SpecKit: The Structure Vibe Coding Was Missing

    When I first started experimenting with “vibe coding,” building apps with AI agents felt like a superpower. The ability to spin up prototypes in hours was exhilarating. But as I soon discovered, the initial thrill came with an illusion. It was like managing a team of developers with an attrition rate measured in minutes—every new prompt felt like onboarding a fresh hire with no idea what the last one had been working on.

    The productivity boost was real, but the progress was fragile. The core problem was context—a classic case of the law of leaky abstractions applied to AI. Models would forget why they made certain choices or break something they had just built. To cope, I invented makeshift practices: keeping detailed dev context files, enforcing strict version control with frequent commits, and even asking the model to generate “reset prompts” to re-establish continuity. Messy, ad hoc, but necessary.

    That’s why GitHub’s announcement of SpecKit immediately caught my attention. SpecKit is an open-source toolkit for what they call “spec-driven development.” Instead of treating prompts and chat logs as disposable artifacts, it elevates specifications to first-class citizens of the development lifecycle.

    In practice, this means:

    • Specs as Durable Artifacts: Specifications live in Git alongside your code—permanent, version-controlled, and not just throwaway notes.
    • Capturing Intent: They document the why—the constraints, purpose, and expected behavior—so both humans and AI stay aligned.
    • Ensuring Continuity: They serve as the source of truth, keeping projects coherent across sessions and contributors.

    For anyone who has tried scaling vibe coding beyond a demo, this feels like the missing bridge. It brings just enough structure to carry a proof-of-concept into maintainable software.

    And it fits into a larger story. Software engineering has always evolved in waves—structured programming, agile, test-driven development. Each wave added discipline to creativity, redefining roles to reflect new economic realities—a pattern we’re seeing again with agentic coding. Spec-driven development could be the next step:

    • Redefining the Developer’s Role: Less about writing boilerplate, more about designing robust specs that guide AI agents.
    • Harnessing Improvisation: Keeping the creative energy of vibe coding, but channeling it within a coherent framework.
    • Flexible Guardrails: Not rigid top-down rules, but guardrails that allow both creativity and scalability.

    Looking back, my dev context files and commit hygiene were crude precursors to this very idea. GitHub’s SpecKit makes clear that those instincts weren’t just survival hacks—they pointed to where the field is heading.

    The real question now isn’t whether AI can write code—we know it can. The question is: how do we design the frameworks that let humans and AI build together, reliably and at scale?

    Because as powerful as vibe coding feels, it’s only when we bring structure to the improvisation that the music really starts.


    👉 What do you think—will specs become the new lingua franca between humans and AI?

  • The Economic Reality and the Optimistic Future of Agentic Coding

    The Economic Reality and the Optimistic Future of Agentic Coding

    After a couple of months deep in the trenches of vibe coding with AI agents, I’ve learned this much: scaling from a fun, magical PoC to an enterprise-grade MVP is a completely different game.

    Why Scaling Remains Hard—And Costly

    Getting a prototype out the door? No problem.

    But taking it to something robust, secure, and maintainable? Here’s where today’s AI tools reveal their limits:

    • Maintenance becomes a slog. Once you start patching AI-generated code, hidden dependencies and context loss pile up. Keeping everything working as requirements change feels like chasing gremlins through a maze.
    • Context loss multiplies with scale. As your codebase grows, so do the risks of agents forgetting crucial design choices or breaking things when asked to “improve” features.

    And then there’s the other elephant in the room: costs.

    • The cost scaling isn’t marginal—not like the old days of cloud or Web 2.0. Powerful models chew through tokens and API credits at a rate that surprises even seasoned devs.
    • That $20/month Cursor plan with unlimited auto mode? For hobby projects, it’s a steal. For real business needs, I can see why some queries rack up millions of tokens and would quickly outgrow even the $200 ultra plan.
    • This is why we’re seeing big tech layoffs and restructuring: AI-driven productivity gains aren’t evenly distributed, and the cost curve for the biggest players keeps climbing.

    What the Data Tells Us

    That research paper—Measuring the Impact of Early-2025 AI on Experienced Open-Source Developer Productivity—had a surprising conclusion:

    Not only did experienced developers see no time savings on real-world coding tasks with AI, but costs increased as they spent more time reviewing, correcting, and adapting agent output.

    The lesson:

    AI shifts where the work happens—it doesn’t always reduce it. For now, scaling with agents is only as good as your processes for context, review, and cost control.

    Why I Remain Optimistic

    Despite the challenges, I’m genuinely excited for what’s coming next.

    • The platforms and models are evolving at warp speed. Many of the headaches I face today—context loss, doc gaps, cost blind spots—will get solved just as software engineering best practices eventually became codified in our tools and frameworks.
    • Agentic coding will find its place. It might not fully automate developer roles, but it will reshape teams: more focus on high-leverage decisions, design, and creative problem-solving, less on boilerplate and “busy work.”

    And if you care about the craft, the opportunity is real:

    • Devs who learn to manage, review, and direct agents will be in demand.
    • Organizations that figure out how to blend agentic workflows with human expertise and robust process will win big.

    Open Questions for the Future

    • Will AI agentic coding mean smaller, nimbler teams—or simply more ambitious projects for the same headcount?
    • How will the developer role evolve when so much code is “synthesized,” not hand-crafted?
    • What new best practices, cost controls, and team rituals will we invent as agentic coding matures?

    Final thought:

    The future won’t be a return to “pure code” or a total AI handoff. It’ll be a blend—one that rewards curiosity, resilience, and the willingness to keep learning.

    Where do you see your work—and your team—in this new landscape?

  • The Law of Leaky Abstractions & the Unexpected Slowdown

    The Law of Leaky Abstractions & the Unexpected Slowdown

    If the first rush of agentic/vibe coding feels like having a team of superhuman developers, the second phase is a reality check—one that every software builder and AI enthusiast needs to understand.

    Why “Vibe Coding” Alone Can’t Scale

    The further I got into building real-world prototypes with AI agents, the clearer it became: Joel Spolsky’s law of leaky abstractions is alive and well.

    You can’t just vibe code your way to a robust app—because underneath the magic, the cracks start to show fast. AI-generated coding is an abstraction, and like all abstractions, it leaks. When it leaks, you need to know what’s really happening underneath.

    My Experience: Hallucinations, Context Loss, and Broken Promises

    I lost count of the times an agent “forgot” what I was trying to do, changed underlying logic mid-stream, or hallucinated code that simply didn’t run. Sometimes it wrote beautiful test suites and then… broke the underlying logic with a “fix” I never asked for. It was like having a junior developer who could code at blazing speed—but with almost no institutional memory or sense for what mattered.

    The “context elephant” is real. As sessions get longer, agents lose track of goals and start generating output that’s more confusing than helpful. That’s why my own best practices quickly became non-negotiable:

    • Frequent commits and clear commit messages
    • Dev context files to anchor each session
    • Separate dev/QA/prod environments to avoid catastrophic rollbacks (especially with database changes)

    What the Research Shows: AI Can Actually Slow Down Experienced Devs

    Here’s the kicker—my frustration isn’t unique.

    A recent research paper, Measuring the Impact of Early-2025 AI on Experienced Open-Source Developer Productivity, found that experienced developers actually worked slower with AI on real-world tasks. That’s right—AI tools didn’t just fail to deliver the expected productivity boost, they created friction.

    Why?

    • Only about 44% of AI-generated code was accepted
    • Developers lost time reviewing, debugging, and correcting “bad” generations
    • Context loss and reliability issues forced more manual intervention, not less

    This matches my experience exactly. For all the hype, these tools introduce new bottlenecks—especially if you’re expecting them to “just work” out of the box.

    Lessons from the Frontlines (and from Agent Week)

    I’m not alone. In the article What I Learned Trying Seven Coding Agents, Timothy B. Lee finds similar headaches:

    • Agents get stuck
    • Complex tasks routinely stump even the best models
    • Human-in-the-loop review isn’t going anywhere

    But the tools are still useful—they’re not a dead end. You just need to treat them like a constantly rotating team of interns, not fully autonomous engineers.

    Best Practices: How to Keep AI Agents Under Control

    So how do you avoid the worst pitfalls?

    The answer is surprisingly old-school:

    • Human supervision for every critical change
    • Sandboxing and least privilege for agent actions
    • Version control and regular context refreshers

    Again, Lee’s article Keeping AI agents under control doesn’t seem very hard nails it:

    Classic engineering controls—proven in decades of team-based software—work just as well for AI. “Doomer” fears are overblown, but so is the hype about autonomy.

    Conclusion: The Hidden Cost of Abstraction

    Vibe coding with agents is like riding a rocket with no seatbelt—exhilarating, but you’ll need to learn to steer, brake, and fix things mid-flight.

    If you ignore the leaky abstractions, you’ll pay the price in lost time, broken prototypes, and hidden tech debt.

    But with the right mix of skepticism and software discipline, you can harness the magic and avoid the mess.

    In my next post, I’ll zoom out to the economics—where cost, scaling, and the future of developer work come into play.

    To be continued…

  • The Thrill and the Illusion of AI Agentic Coding

    The Thrill and the Illusion of AI Agentic Coding

    A few months ago, I stumbled into what felt like a superpower: building fully functional enterprise prototypes using nothing but vibe coding and AI agent tools like Cursor and Claude. The pace was intoxicating—I could spin up a PoC in days instead of weeks, crank out documentation and test suites, and automate all the boring stuff I used to dread.

    But here’s the secret I discovered: working with these AI agents isn’t like managing a team of brilliant, reliable developers. It’s more like leading a software team with a sky-high attrition rate and non-existent knowledge transfer practices. Imagine onboarding a fresh dev every couple of hours, only to have them forget what happened yesterday and misinterpret your requirements—over and over again. That’s vibe coding with agents.

    The Early Magic

    When it works, it really works. I’ve built multiple PoCs this way—each one a small experiment, delivered at a speed I never thought possible. The agents are fantastic for “greenfield” tasks: setting up skeleton apps, generating sample datasets, and creating exhaustive test suites with a few prompts. They can even whip up pages of API docs and help document internal workflows with impressive speed.

    It’s not just me. Thomas Ptacek’s piece “My AI Skeptic Friends Are All Nuts” hits the nail on the head: AI is raising the floor for software development. The boring, repetitive coding work—the scaffolding, the CRUD operations, the endless boilerplate—gets handled in minutes, letting me focus on the interesting edge cases or higher-level product thinking. As they put it, “AI is a game-changer for the drudge work,” and I’ve found this to be 100% true.

    The Fragility Behind the Hype

    But here’s where the illusion comes in. Even with this boost, the experience is a long way from plug-and-play engineering. These AI coding agents don’t retain context well; they can hallucinate requirements, generate code that fails silently, or simply ignore crucial business logic because the conversation moved too fast. The “high-attrition, low-knowledge-transfer team” analogy isn’t just a joke—it’s my daily reality. I’m often forced to stop and rebuild context from scratch, re-explain core concepts, and review every change with a skeptical eye.

    Version control quickly became my lifeline. Frequent commits, detailed commit messages, and an obsessive approach to saving state are my insurance policy against the chaos that sometimes erupts. The magic is real, but it’s brittle: a PoC can go from “looks good” to “completely broken” in a couple of prompts if you’re not careful.

    Superpowers—With Limits

    If you’re a founder, product manager, or even an experienced developer, these tools can absolutely supercharge your output. But don’t believe the hype about “no-code” or “auto-code” replacing foundational knowledge. If you don’t understand software basics—version control, debugging, the structure of a modern web app—you’ll quickly hit walls that feel like magic turning to madness.

    Still, I’m optimistic. The productivity gains are real, and the thrill of seeing a new prototype come to life in a weekend is hard to beat. But the more I use these tools, the more I appreciate the fundamentals that have always mattered in software—and why, in the next post, I’ll talk about the unavoidable reality check that comes when abstractions leak and AI doesn’t quite deliver on its promise.

    To be continued…