The Dark Factory: Engineering Teams That Run With the Lights Off

A few engineering organisations are already operating a model most companies haven’t begun to consider. While the typical software team debates whether to adopt AI coding assistants, companies like StrongDM are running fully automated development pipelines where agents handle implementation, testing, review, and deployment. Humans set direction and define constraints. The mechanical work happens without them.

This isn’t speculative. It’s operational. And the gap between companies working this way and those that aren’t is widening fast.

What “lights off” actually means

The term comes from manufacturing — factories that run autonomously, with minimal human presence. In software, it describes engineering organisations where AI agents do the bulk of execution work while humans focus on architecture, constraints, and outcomes.

StrongDM’s approach is instructive: their benchmark is that if you haven’t spent at least $1,000 on tokens per human engineer per day, your software factory has room for improvement. Agents work in parallel on isolated tasks. Code is written, tested, and reviewed without manual intervention. Tasks assigned Friday evening return results Monday morning.

The ratio of agents to humans is high and growing. But this isn’t about replacing engineers — it’s about fundamentally changing what engineers do.

The guardrails are the system

Dark factories aren’t ungoverned. They’re heavily governed in a different way.

Linters, formatters, comprehensive test suites, design pattern enforcement — these become pre-conditions rather than suggestions. Agents are configured to seek completion only when all guardrails pass. Code review shifts from line-by-line human inspection to AI review with human spot-checks on critical paths.

The discipline moves from “write good code” to “design good systems for code to be written in.” That’s a different skill. It requires thinking about constraints, validation, and feedback loops rather than syntax and implementation details.

Anthropic’s experiment building a C compiler with parallel Claude instances demonstrates this principle. Sixteen agents worked simultaneously on a shared codebase, coordinating through git locks and comprehensive test harnesses. The result: a 100,000-line compiler capable of building the Linux kernel, produced over nearly 2,000 sessions across two weeks for just under $20,000. The project worked because the test infrastructure was rigorous enough to guide autonomous agents toward correctness without human review of every change.

Cursor’s experiments with scaling agents ran into a different problem. They tried flat coordination first — agents self-organising through a shared file, claiming tasks, updating status. It broke down. Agents held locks too long, became risk-averse, made small safe changes, and nobody took responsibility for hard problems. The fix was introducing hierarchy: planners that explore the codebase and create tasks, workers that grind on assigned work until it’s done. No single agent tries to do everything. The system ran for weeks, writing over a million lines of code. One project improved video rendering performance by 25x and shipped to production. Their takeaway: many of the gains came from removing complexity rather than adding it.

Digital twins as the enabler

The biggest blocker to agent autonomy has been the fear of breaking production. Digital twins remove that constraint.

StrongDM built behavioural replicas of third-party services their software depends on — Okta, Jira, Slack, Google Docs, Google Drive, and Google Sheets. These twins replicate APIs, edge cases, and observable behaviours with sufficient fidelity that agents can test against realistic conditions at volume, without rate limits or production risk.

Simon Willison’s write-up of StrongDM’s approach highlights how this changed what was economically feasible: “Creating a high fidelity clone of a significant SaaS application was always possible, but never economically feasible. Generations of engineers may have wanted a full in-memory replica of their CRM to test against, but self-censored the proposal to build it.”

What makes this rigorous rather than just better staging is how they handle validation. Test scenarios are stored outside the codebase — separate from where the coding agents can see them — functioning like holdout sets in machine learning. Agents can’t overfit to the tests because they don’t have access to them. The QA team is also agents, running thousands of scenarios per hour without hitting rate limits or accumulating API costs.

The structural advantage of starting fresh

Startups and SMBs have a material advantage here. No legacy organisational structure to dismantle. No 500-person engineering floor with stakeholders defending headcount. No 18-month procurement cycles.

Capital efficiency becomes native. A three-person team with agents can produce output that previously required twenty people. The cost of compute is a fraction of equivalent human labour and falling rapidly.

This creates an asymmetric advantage. If your competitor ships in days what takes you months, no amount of talent closes that gap. And the competitive pressure isn’t just on speed — it’s on the ability to attract talent that wants to work this way. Senior engineers who’ve experienced agent-driven development don’t want to go back to manual workflows.

The gap between adopters and laggards

Companies operating this way are shipping at a fundamentally different pace. The difference isn’t incremental — it’s orders of magnitude in output per person.

Block’s recent announcement of a near-50% reduction in headcount offers a data point. The company is reducing its organization from over 10,000 people to just under 6,000. Jack Dorsey stated “we’re not making this decision because we’re in trouble. our business is strong” but noted that “the intelligence tools we’re creating and using, paired with smaller and flatter teams, are enabling a new way of working which fundamentally changes what it means to build and run a company.”

Cursor’s data shows the same pattern. 35% of pull requests merged internally at Cursor are now created by agents operating autonomously in cloud VMs. The developers adopting this approach write almost no code themselves. They spend their time breaking down problems, reviewing artifacts, and giving feedback. They spin up multiple agents simultaneously instead of guiding one to completion.

The laggards aren’t just slower. They’re increasingly unable to compete for talent, capital, or market position against organisations that have made this transition.

You don’t need a corporate budget to start

The dark factory model scales down. A single developer with a Claude Code subscription and well-structured GitHub workflows can run a lightweight version of the same approach.

Start with one workflow. Pick a repetitive part of your development or business process, establish the guardrails, and let agents handle it. The key investment isn’t in compute — it’s in guardrails and context. Linters, test suites, good documentation, and clear specifications matter more than token budget.

For SMBs and founders, this is the most asymmetric advantage available. You can operate at a scale that was previously only accessible with significant headcount. The learning curve is steep but short. Within 30 days of serious experimentation, most people develop the intuition for what agents can and can’t handle.

Projects like OpenClaw — an open-source autonomous agent that executes tasks across messaging platforms and services — demonstrate that the tooling for this approach is increasingly accessible. The software runs locally, integrates with multiple LLM providers, and requires no enterprise licensing. The barrier isn’t access to technology. It’s willingness to change how work gets done.

What this means beyond software

Software is where this pattern is playing out first, but the model applies wherever knowledge work is structured and repeatable.

Audit processes. Compliance checks. Report generation. Data analysis. Document review. These are all candidates for the same approach: clear specifications, comprehensive validation, and autonomous execution within defined guardrails.

Most traditional industries haven’t started thinking about this. They’re still debating whether to use ChatGPT for email drafts. The firms that figure out how to apply dark factory principles to their domain will have an enormous advantage over those still operating with manual workflows.

The lights are already off in some factories. The question isn’t whether this approach will spread. It’s how quickly your organisation recognises that the game has changed.


Discover more from AB's Reflections

Subscribe to get the latest posts sent to your email.

Discover more from AB's Reflections

Subscribe now to keep reading and get access to the full archive.

Continue reading