organization – AB's Reflections

The Task Changed, The Job Didn’t — But Your Org Hasn’t Noticed Yet

There’s a conversation happening quietly in engineering teams, product orgs, and design studios. It surfaces in Slack DMs and whispered break-room conversations. The question underneath is always the same: If AI can do what I do, what am I for?

That fear makes sense. Engineers who built their identity around writing clean code watch AI generate entire modules in seconds. Product managers who prided themselves on writing crisp specs see AI agents do the same work overnight. Designers watch their Figma files get autocompleted before they’ve finished thinking through the problem.

But here’s what’s being missed: the task is changing, the job isn’t.

Writing code was always a means to an end. The job was shipping features that solve problems. Writing specs was always a means to an end. The job was understanding user needs and deciding what to build. AI automates the means, not the end. The bottleneck was never typing speed — it was clarity of thinking, problem definition, and judgment about what to build.

Those bottlenecks are still ours.

The Identity Trap

Most people in technology define themselves by the task they perform, not the outcome they produce. “I’m a backend engineer” means I write backend code. “I’m a PM” means I write specs and manage tickets. When AI starts doing those tasks faster and arguably better, the identity feels threatened.

The first response is usually denial: “AI can’t really do what I do — it doesn’t understand context, it makes mistakes, it needs constant supervision.” The second is panic: “I’m about to be replaced by a model that costs pennies per thousand tokens.”

But the real shift isn’t about automation replacing roles. It’s about what happens when execution becomes nearly free and the entire competitive advantage moves to knowing what to build in the first place.

From Tasks to Judgment

When people ask what humans will do in this new world, the answer is usually “taste and judgment.” But that’s abstract. What does judgment actually mean?

It means knowing what to build, when to say no, and how to spot when AI is heading in the wrong direction. It’s defining the guardrails before you let agents run — test suites, design patterns, architectural constraints. It’s understanding that every line of code is future maintenance burden, which makes the discipline to not build more valuable than the ability to build fast.

In 2014, Melissa Perri warned about “The Build Trap” — companies stuck measuring success by what they shipped rather than what they learned. “Building is the easy part,” she wrote. “Figuring out what to build and how we are going to build it is the hard part.”

Most companies ignored that. Now AI makes building trivially easy, and those companies are about to drown in features that solve nothing. The agents don’t get tired. They don’t push back. They’ll happily build everything you point them at, whether or not it should exist.

The Multi-Hat Convergence

The expectation is shifting: one person who can think about the problem, design the solution, and use AI to build it. This doesn’t mean everyone becomes a shallow generalist. It means the boundaries between roles blur significantly.

PMs without a hard skill — design or code — and engineers without product sense are both increasingly vulnerable. The trifecta of product thinking, design sense, and technical execution is becoming the baseline, not the exception.

For experienced professionals considering independence, this convergence changes the economics dramatically. A single person with AI tools can now deliver what used to require a small team.

The Org Structure Problem

Most organizations are still structured around tasks, not outcomes. Teams are organized by function — frontend team, backend team, QA team, design team. Performance is measured by task completion: PRs merged, tickets closed, specs written.

AI makes task completion trivially fast, which breaks these measurement systems completely. The real metric should be business outcomes, but most orgs aren’t wired to measure or incentivize that way.

Companies are starting to notice. Last year, the Shopify CEO asked employees to prove why they “cannot get what they want done using AI” before asking for more headcount. Last week, Block laid off 40% of its workforce — more than 4,000 people. Co-founder Jack Dorsey was direct: “A significantly smaller team, using the tools we’re building, can do more and do it better.”

A startup with great direction and AI agents beats a startup with mediocre direction and the same agents. A company with 10 people who know exactly what to build beats one with 100 people building everything they can think of.

The companies still hiring for “more hands” are optimizing for the wrong bottleneck.

What This Means for You

If you’re an engineer, invest in product sense and domain expertise. Understand why you’re building, not just how. Study the business side of your domain — unit economics, customer behavior, market dynamics.

If you’re a PM, get your hands dirty with at least one hard skill. Design or code, even at a basic level. The ability to prototype your own ideas or understand technical tradeoffs without waiting for a meeting makes you more effective than you’d expect.

If you’re a leader, start restructuring teams around outcomes, not functions. Measure business impact, not tickets closed. Reward people for solving problems and learning, not for producing code.

Stop identifying with your task. Start identifying with the outcomes you produce.

The people making this shift now are building a compounding advantage. The gap widens every month. Domain expertise becomes your moat. The deeper you understand a specific business problem space, the better you can direct agents toward solving it.

The execution bottleneck is being solved. The judgment bottleneck requires human capacity, and it’s where the real value lives now.

March 10, 2026

The Dark Factory: Engineering Teams That Run With the Lights Off

A few engineering organisations are already operating a model most companies haven’t begun to consider. While the typical software team debates whether to adopt AI coding assistants, companies like StrongDM are running fully automated development pipelines where agents handle implementation, testing, review, and deployment. Humans set direction and define constraints. The mechanical work happens without them.

This isn’t speculative. It’s operational. And the gap between companies working this way and those that aren’t is widening fast.

What “lights off” actually means

The term comes from manufacturing — factories that run autonomously, with minimal human presence. In software, it describes engineering organisations where AI agents do the bulk of execution work while humans focus on architecture, constraints, and outcomes.

StrongDM’s approach is instructive: their benchmark is that if you haven’t spent at least $1,000 on tokens per human engineer per day, your software factory has room for improvement. Agents work in parallel on isolated tasks. Code is written, tested, and reviewed without manual intervention. Tasks assigned Friday evening return results Monday morning.

The ratio of agents to humans is high and growing. But this isn’t about replacing engineers — it’s about fundamentally changing what engineers do.

The guardrails are the system

Dark factories aren’t ungoverned. They’re heavily governed in a different way.

Linters, formatters, comprehensive test suites, design pattern enforcement — these become pre-conditions rather than suggestions. Agents are configured to seek completion only when all guardrails pass. Code review shifts from line-by-line human inspection to AI review with human spot-checks on critical paths.

The discipline moves from “write good code” to “design good systems for code to be written in.” That’s a different skill. It requires thinking about constraints, validation, and feedback loops rather than syntax and implementation details.

Anthropic’s experiment building a C compiler with parallel Claude instances demonstrates this principle. Sixteen agents worked simultaneously on a shared codebase, coordinating through git locks and comprehensive test harnesses. The result: a 100,000-line compiler capable of building the Linux kernel, produced over nearly 2,000 sessions across two weeks for just under $20,000. The project worked because the test infrastructure was rigorous enough to guide autonomous agents toward correctness without human review of every change.

Cursor’s experiments with scaling agents ran into a different problem. They tried flat coordination first — agents self-organising through a shared file, claiming tasks, updating status. It broke down. Agents held locks too long, became risk-averse, made small safe changes, and nobody took responsibility for hard problems. The fix was introducing hierarchy: planners that explore the codebase and create tasks, workers that grind on assigned work until it’s done. No single agent tries to do everything. The system ran for weeks, writing over a million lines of code. One project improved video rendering performance by 25x and shipped to production. Their takeaway: many of the gains came from removing complexity rather than adding it.

Digital twins as the enabler

The biggest blocker to agent autonomy has been the fear of breaking production. Digital twins remove that constraint.

StrongDM built behavioural replicas of third-party services their software depends on — Okta, Jira, Slack, Google Docs, Google Drive, and Google Sheets. These twins replicate APIs, edge cases, and observable behaviours with sufficient fidelity that agents can test against realistic conditions at volume, without rate limits or production risk.

Simon Willison’s write-up of StrongDM’s approach highlights how this changed what was economically feasible: “Creating a high fidelity clone of a significant SaaS application was always possible, but never economically feasible. Generations of engineers may have wanted a full in-memory replica of their CRM to test against, but self-censored the proposal to build it.”

What makes this rigorous rather than just better staging is how they handle validation. Test scenarios are stored outside the codebase — separate from where the coding agents can see them — functioning like holdout sets in machine learning. Agents can’t overfit to the tests because they don’t have access to them. The QA team is also agents, running thousands of scenarios per hour without hitting rate limits or accumulating API costs.

The structural advantage of starting fresh

Startups and SMBs have a material advantage here. No legacy organisational structure to dismantle. No 500-person engineering floor with stakeholders defending headcount. No 18-month procurement cycles.

Capital efficiency becomes native. A three-person team with agents can produce output that previously required twenty people. The cost of compute is a fraction of equivalent human labour and falling rapidly.

This creates an asymmetric advantage. If your competitor ships in days what takes you months, no amount of talent closes that gap. And the competitive pressure isn’t just on speed — it’s on the ability to attract talent that wants to work this way. Senior engineers who’ve experienced agent-driven development don’t want to go back to manual workflows.

The gap between adopters and laggards

Companies operating this way are shipping at a fundamentally different pace. The difference isn’t incremental — it’s orders of magnitude in output per person.

Block’s recent announcement of a near-50% reduction in headcount offers a data point. The company is reducing its organization from over 10,000 people to just under 6,000. Jack Dorsey stated “we’re not making this decision because we’re in trouble. our business is strong” but noted that “the intelligence tools we’re creating and using, paired with smaller and flatter teams, are enabling a new way of working which fundamentally changes what it means to build and run a company.”

Cursor’s data shows the same pattern. 35% of pull requests merged internally at Cursor are now created by agents operating autonomously in cloud VMs. The developers adopting this approach write almost no code themselves. They spend their time breaking down problems, reviewing artifacts, and giving feedback. They spin up multiple agents simultaneously instead of guiding one to completion.

The laggards aren’t just slower. They’re increasingly unable to compete for talent, capital, or market position against organisations that have made this transition.

You don’t need a corporate budget to start

The dark factory model scales down. A single developer with a Claude Code subscription and well-structured GitHub workflows can run a lightweight version of the same approach.

Start with one workflow. Pick a repetitive part of your development or business process, establish the guardrails, and let agents handle it. The key investment isn’t in compute — it’s in guardrails and context. Linters, test suites, good documentation, and clear specifications matter more than token budget.

For SMBs and founders, this is the most asymmetric advantage available. You can operate at a scale that was previously only accessible with significant headcount. The learning curve is steep but short. Within 30 days of serious experimentation, most people develop the intuition for what agents can and can’t handle.

Projects like OpenClaw — an open-source autonomous agent that executes tasks across messaging platforms and services — demonstrate that the tooling for this approach is increasingly accessible. The software runs locally, integrates with multiple LLM providers, and requires no enterprise licensing. The barrier isn’t access to technology. It’s willingness to change how work gets done.

What this means beyond software

Software is where this pattern is playing out first, but the model applies wherever knowledge work is structured and repeatable.

Audit processes. Compliance checks. Report generation. Data analysis. Document review. These are all candidates for the same approach: clear specifications, comprehensive validation, and autonomous execution within defined guardrails.

Most traditional industries haven’t started thinking about this. They’re still debating whether to use ChatGPT for email drafts. The firms that figure out how to apply dark factory principles to their domain will have an enormous advantage over those still operating with manual workflows.

The lights are already off in some factories. The question isn’t whether this approach will spread. It’s how quickly your organisation recognises that the game has changed.

March 5, 2026

[LinkBlog] The Uncanny Valley of a Functional Organization – Stratechery by Ben Thompson

How organizations can falter on the road to transformation and scaling up
[Link]

April 20, 2016

Tag: organization

The Task Changed, The Job Didn’t — But Your Org Hasn’t Noticed Yet

The Identity Trap

From Tasks to Judgment

The Multi-Hat Convergence

The Org Structure Problem

What This Means for You

Share:

The Dark Factory: Engineering Teams That Run With the Lights Off

What “lights off” actually means

The guardrails are the system

Digital twins as the enabler

The structural advantage of starting fresh

The gap between adopters and laggards

You don’t need a corporate budget to start

What this means beyond software

Share:

[LinkBlog] The Uncanny Valley of a Functional Organization – Stratechery by Ben Thompson

Share: