Tag: agenticAI

  • What it takes to actually run NanoClaw

    NanoClaw is the structurally right framework for pipeline-shaped agent workloads. It’s also genuinely more technical to set up than most personal-assistant frameworks people compare it to. If you’re evaluating it after reading the comparison piece and wondering what you’re signing up for, this is the honest answer.

    One thing worth setting up front: NanoClaw is not built for non-technical users, and neither is openclaw despite its more polished onboarding. The marketing on both sites pitches “personal AI assistant for everyone.” The reality is different. NanoClaw expects comfort with git, the command line, Docker, and at least basic Linux administration. The trade you make in exchange is access to Claude Code as the authoring layer for your fork — which is arguably the most capable AI coding tool available right now, and meaningfully more capable than the typical models you’d be running underneath openclaw. The framework is built around that capability difference rather than trying to abstract it away.

    The architecture is right. The setup curve is real. Below is what actually bites.

    You need a Claude Code subscription

    This isn’t a soft dependency. NanoClaw is built around Claude Code as the authoring layer — the slash commands that install channels and providers (/add-telegram, /add-opencode, /add-codex and so on) run inside Claude Code and copy source files into your fork from long-lived branches. You can technically edit the same files by hand, but you’d be reverse-engineering what those slash commands do every time you customise.

    Practically: a Claude Code Pro or Max subscription is the working assumption. Without it, you’re not really running NanoClaw the way it’s designed to run. With it, the authoring experience is the best part of the framework — the codebase is small enough that Claude Code can confidently make changes across it, and the fork-as-install model means every customization is a code change you can read and revert.

    This also constrains who NanoClaw is for. If you’re allergic to Claude Code (philosophically, financially, or because you prefer Codex or another harness as your primary), you’ll fight the framework. If you’re already deep in Claude Code, the integration is genuinely tight.

    Codex works as a fallback authoring layer for individual tasks, and the /add-codex skill makes Codex available as an agent provider (separate from authoring). But the slash-command-based setup expects Claude Code as the primary harness. Plan around that.

    OneCLI is part of the deal

    NanoClaw doesn’t manage your API keys directly. That job is delegated to OneCLI, the companion credential proxy that ships alongside it. Agents inside containers never see raw API keys; they make outbound HTTPS requests through OneCLI, which injects credentials at the proxy layer based on per-agent policies.

    This matters in practice for two reasons. First, agents inside NanoClaw containers have bash access — anything that put an API key directly in the container would be reachable by any code the agent runs. OneCLI keeps that surface clean. Second, you’ll spend real time during setup configuring OneCLI: registering your Anthropic credential, creating per-agent secret assignments, deciding whether each agent gets all secrets or a specific subset. The nanoclaw.sh install script handles the basics, but ongoing changes (adding a new provider, rotating keys, scoping a credential to one agent) involve OneCLI commands rather than editing config files.

    It’s worth understanding before you start. Treat OneCLI as a meaningful piece of the system, not a one-time setup chore that disappears after install.

    There’s no web UI out of the box

    NanoClaw ships the channel and agent runtime. It doesn’t ship an operator console. There’s no dashboard for browsing agent activity, no log viewer, no chat history UI, no admin panel, no menubar app. The framework’s stance is that you talk to your agent through a messaging channel — Telegram, Slack, Discord, WhatsApp, whatever you’ve installed — and that’s the interface.

    Openclaw, by comparison, has a guided openclaw onboard CLI for setup and a Companion App (Beta) on macOS that adds a menubar interface. So if you’re coming from openclaw expecting some kind of UI affordance out of the box, NanoClaw will feel deliberately bare.

    For an assistant, the chat-channel-only approach is fine. The channel is the interface.

    For a pipeline, it’s not enough. Pipelines need state-of-everything views: which prospects are in which stages, which agents are working on what, what’s pending operator review, what’s been dead-lettered. None of that is conversational. You need a UI.

    The options are real but each has a cost:

    Build a custom web UI as a NanoClaw skill. A small Express or similar server inside a skill that exposes a chat-plus-dashboard interface, talks to the agent through the same task contract NanoClaw uses elsewhere, and serves over tailscale serve so it’s only reachable on your tailnet. Takes a day to build. You control the UX completely. You can mount per-agent dashboards next to the chat thread. No third party between you and your operator interface. This is the version I keep coming back to.

    Use a messaging channel as the operator interface. Telegram is fastest to bring up — bot via BotFather, token in five minutes. Discord and Slack work too. The trade is that pipeline state is awkward to display in a chat thread, and you end up either composing structured messages (clunky) or building dashboards anyway (defeats the purpose).

    Lean on the underlying systems for state visibility. SQLite for the artifact and journal storage means you can run ad-hoc queries against it. docker logs for container-level activity. journalctl --user for systemd-level service logs. This works for debugging and post-hoc analysis. It doesn’t work as a real-time operator surface.

    In practice, you’ll mix all three. The custom web UI is the primary operator console, channels handle quick-access from your phone, and you use the underlying tooling when something goes weird and you need to dig.

    Setup gotchas on a small VPS

    NanoClaw runs comfortably on a 2GB DigitalOcean droplet (or equivalent). The hosting cost is a few dollars a month. The friction comes from minimal cloud images being stripped down enough that several setup steps fail in non-obvious ways.

    The base image doesn’t ship with a C compiler. Several modules in the dependency tree build native bindings during pnpm install and fail with generic “command failed” errors that don’t tell you the compiler is missing. Install build tools before the first install:

    sudo apt update
    sudo apt install -y build-essential acl

    The acl package is also missing from minimal images and you’ll need it for the Docker socket fix below.

    The Docker socket ACL doesn’t survive reboot. NanoClaw runs agent containers via Docker. By default, only root can talk to the Docker socket. Adding your operator user to the docker group works but is broadly equivalent to giving that user root, which is not what you want.

    The cleaner approach is an ACL grant on /var/run/docker.sock. The catch: /var/run is a tmpfs mount, recreated on every boot. Anything you setfacl once is wiped on reboot. The fix is a tmpfiles.d rule that recreates the ACL automatically. Create /etc/tmpfiles.d/docker.conf with:

    a+ /var/run/docker.sock - - - - u:youruser:rw

    Replace youruser with the actual operator username. Test with sudo systemd-tmpfiles --create and verify with getfacl /var/run/docker.sock. Reboots no longer break Docker access for the operator account.

    Two systemd services, not one. Run NanoClaw and your custom orchestrator as separate systemd user services. When you’re iterating on the orchestrator (which you will, often, especially in early development), restarting it shouldn’t take the channel adapters down. Channel reconnects are slow and annoying; orchestrator restarts should be near-instant.

    A reasonable layout:

    ~/.config/systemd/user/nanoclaw.service
    ~/.config/systemd/user/orchestrator.service

    If you want either service to start on boot before you log in, enable lingering for the user with sudo loginctl enable-linger youruser. Easy to forget; non-obvious failure mode (services don’t start, you don’t know why, you log in, they magically work).

    Add swap. A 2GB droplet doesn’t ship with swap configured. Under heavy LLM-context loads — long-context windows plus large augmentation tasks — you can OOM unexpectedly. A 2GB swap file is cheap insurance:

    sudo fallocate -l 2G /swapfile
    sudo chmod 600 /swapfile
    sudo mkswap /swapfile
    sudo swapon /swapfile
    echo '/swapfile none swap sw 0 0' | sudo tee -a /etc/fstab

    Set vm.swappiness=10 in /etc/sysctl.conf so the kernel prefers RAM and only swaps under genuine pressure. Reboot to verify.

    What stays on the laptop, what goes on the VPS

    The local-versus-VPS question resolves cleanly:

    • A laptop is fine for the install rehearsal, fork setup, and a couple of agents you only use while at the keyboard.
    • Anything that needs to be reachable, scheduled, or running while you’re not at the keyboard belongs on the VPS.

    The cost difference between 1GB and 2GB on DigitalOcean is a few dollars a month, and the difference in headroom is between fighting the host and forgetting about it. Take the 2GB. The marginal saving on a 1GB droplet is not worth the time you’ll spend wondering why builds are failing or why the agent container is OOM’ing.

    Honest scope of “easy”

    NanoClaw is technically simpler than openclaw — fewer lines of code, fewer abstractions, fewer hidden behaviours. It’s not operationally simpler. The framework expects you to:

    • Have a Claude Code subscription and use it as the authoring layer
    • Be comfortable with the Linux command line, systemd, Docker, git
    • Build your own operator UI if you want one
    • Write your own orchestrator if you’re doing pipeline-shaped work

    For someone who already operates in this stack, NanoClaw feels light and clean — and the Claude Code authoring layer is genuinely the best part. The codebase is small enough that asking Claude Code to make changes across it works reliably, which is a meaningfully better experience than the typical “edit config files, hope you got it right, debug when you didn’t” pattern.

    For someone hoping for a one-click personal assistant, the curve is meaningfully steeper than openclaw’s onboarding. Openclaw has a guided CLI (openclaw onboard) and a macOS Companion App that gives you a menubar interface; NanoClaw deliberately ships none of that. Both still expect a technical user underneath, but openclaw lowers the floor more.

    The trade is real and the trade is good if your use case justifies it. You end up with a system you understand end to end, that runs in resources you control, that doesn’t depend on a SaaS gateway, and that you can reason about when something breaks. Worth the lift if you’re building something pipeline-shaped. Not worth the lift if you just want a chatbot.

    A useful concrete reference point: Singapore’s Foreign Minister, Vivian Balakrishnan, published the architecture for his own NanoClaw-based “second brain” setup, with an accompanying X post walking through the composition. He’s technically literate — coding is a known hobby of his — but not a software engineer by trade. His setup composes NanoClaw with a few other open-source pieces (a memory layer, OneCLI for credentials, the LLM Wiki pattern for knowledge synthesis) and runs on a Raspberry Pi. It’s a useful existence proof of “technical-but-not-developer” being the floor for NanoClaw, and equally a useful caution: Vivian could compose those pieces because of fluency he already had. Anyone reading this without that fluency yet would need to pick it up first. The reward is real, and so is the prerequisite.

    The full GTM system this deployment serves is in Building a GTM dark factory with Nemotron 3 and NanoClaw. The framework comparison that motivates picking NanoClaw in the first place is in Why I picked NanoClaw over openclaw for a GTM pipeline.

  • Why I picked NanoClaw over OpenClaw for a GTM pipeline

    Before getting into the comparison itself, one piece of context worth setting straight: neither OpenClaw nor NanoClaw is built for a non-technical audience. Both expect comfort with the command line, git, and at least one model provider’s API setup. Both reward fluency with the underlying stack. The marketing copy on both sites pitches “personal AI assistant for everyone,” which is aspirational. The reality today is that you need to know what pnpm install does and roughly what a Docker container is to get either one running smoothly.

    That said, the two frameworks make different trade-offs within that technical-user space, and which trade-off is right for you depends on what you’re actually building.

    OpenClaw is the more monolithic, more featureful option. It ships a guided CLI onboarding (openclaw onboard), supports multiple LLM providers natively (Anthropic, OpenAI, local), has a Companion App for macOS that gives you a menubar interface, and includes browser control, persistent memory, and dozens of community-built skills out of the box. The trade is operational complexity — ~434,000 lines of code, 70+ dependencies, single Node process with shared memory — and a security model that relies on application-level checks rather than OS isolation. Recent CVEs and security writeups in this space have mostly been openclaw-shaped.

    NanoClaw is the lighter, more opinionated alternative. ~3,900 lines of code, fewer than 10 dependencies, agents in isolated Linux containers with explicit mounts, single host process orchestrating per-session containers. Credential handling is delegated to OneCLI (NanoClaw’s companion credential proxy), which injects API keys at request time so agents never hold raw secrets — meaningful when an agent has bash access inside its container. The trade is that NanoClaw is built natively around the Claude Agent SDK — Claude Code is the primary harness, the slash commands that install channels and providers run inside Claude Code, and other providers (Codex, OpenCode, Ollama) are drop-in alternatives rather than peers. There’s no menubar app, no built-in dashboard, no UI beyond the chat channels you’ve installed. The codebase is small enough that “ask Claude Code to walk you through it” is a realistic onboarding strategy.

    For a personal-assistant use case, the openclaw trade-off probably wins. More features out of the box, more flexibility on providers, easier to bring up if you’re not already deep in the Claude Code ecosystem. For a pipeline-shaped workload — GTM, document processing, anything where the workflow exists independent of conversation — NanoClaw is structurally a better fit, and Claude Code being the assumed harness is actually an advantage rather than a constraint, because Claude Code is arguably the most capable AI authoring tool available right now and the framework is built around it.

    I went through both before settling. Here’s the rest of the comparison through the pipeline-shape lens.

    The shape of the workload matters

    A personal assistant is reactive. You send it something. It figures out what you meant, picks a tool, runs the tool, replies. The workflow is whatever the conversation is.

    A pipeline is the opposite. There’s a state machine. There are stages. Each prospect, ticket, document, or whatever the unit is moves through stages on its own clock. Some get stuck. Some get rerouted. Some need to be remembered six months later when a specific signal lights up. The workflow exists independent of any conversation.

    These two workloads want different things from a framework. The assistant wants flexibility, channels, plug-and-play tools, an LLM that figures out what to do. The pipeline wants determinism between stages, deterministic routing, dry-run capability, an LLM that does bounded judgment work inside a stage.

    This is the lens that matters. Most framework comparisons are feature bake-offs. The actual question is which workload shape you’re building.

    Three things that didn’t survive OpenClaw for me

    Routing. Openclaw’s agent picks what to do based on the inbound and its own reasoning. That’s the right model for “summarise my inbox” and the wrong model for “transition prospect ABC from awaiting-reply to unresponsive after 14 days.” The second decision has to be deterministic, replayable, dry-runnable, and outside the LLM. Tool-call routing is fine when the cost of a wrong decision is small. In a GTM pipeline a wrong routing decision is a duplicate touch, a wrong segment, a compliance breach.

    You can wire OpenClaw to do deterministic routing — through skill conditions, scheduled triggers, scripted control flow — but you’re working against the framework’s grain. Every hour spent there is an hour reinventing what a state machine engine gives you for free.

    Per-skill model preference. Pipelines benefit from heterogeneity. Small fast models for bulk discovery and augmentation. Larger models for content polish. Different providers for redundancy. OpenClaw supports multiple LLM backends as a first-class feature — you can configure Anthropic, OpenAI, or local models — but the routing decisions are made within the agent’s own reasoning rather than at the framework level. For a pipeline you want the framework to route deterministically based on skill family, not let the agent pick its own provider per call.

    NanoClaw’s approach is the opposite: provider is configured per agent group, one provider per group, multiple groups in parallel. That maps directly to “discovery and augmentation in one group on Nemotron, polish in another group on Claude.” Per-task provider hints would be cleaner, but group-level routing is what works today, and for most pipelines it’s enough because the natural skill boundaries align with provider preferences anyway.

    Operating cost. OpenClaw runs a websocket gateway with constant background activity. mDNS service discovery, periodic health probes, channel reconnect loops. On a 1GB droplet it spent most of its capacity on its own metabolism. Bumping the VPS works, but the symptom is telling.

    NanoClaw is much quieter at idle. The host process owns message queues, agent containers spin up per task, channels are explicit and minimal. A 2GB droplet has plenty of headroom for a working pipeline plus orchestrator plus operator UI.

    What NanoClaw doesn’t do, and why that’s useful

    NanoClaw has no built-in orchestrator. No state machine engine. No artifact store. No journal writer. No skill dispatcher. No dry-run harness. No business logic of any kind.

    For an assistant, this is missing functionality. For a pipeline, it’s the right scope.

    The orchestrator is the part that’s specific to your workflow. State transitions, when to retry, when to dead-letter, what counts as completion, what triggers the next stage. Building it as plain code (in any language; mine is TypeScript) means it stays readable, testable, and replaceable. NanoClaw runs the channel adapters and the agent containers. The orchestrator runs the workflow. They talk through structured task contracts.

    The trade is real: you write more code to start. The benefit is real: you understand and own every line of the pipeline that matters.

    What both share

    The skills system. Both frameworks treat skills as SKILL.md markdown files that the agent reads and executes. The same skills can technically run on either framework with minor adjustments, though the agent configuration files differ — openclaw uses SOUL.md for agent personality and config, NanoClaw uses CLAUDE.md for the same purpose. So you’re not locked into a framework by your skills library — you’re picking the framework that runs them at the right architectural layer.

    Both also lean on Claude Code as a useful authoring layer, though the relationship is different. NanoClaw is explicit about it — the slash commands that install channels and providers run inside Claude Code and copy source files into your fork from long-lived branches. OpenClaw is more flexible: you can author with Claude Code, edit config files by hand, or use whatever AI coding tool you prefer including the built in agents. Either way, having Claude Code in the loop is the best authoring experience available right now for both — it’s just that NanoClaw treats it as the assumption while openclaw treats it as one option among several.

    The forking model

    NanoClaw’s other design choice worth flagging: it’s opinionated about you forking the repo and treating the fork as your install. There’s no config-as-data layer that abstracts away your customizations. If you want different behaviour, you change the code. The codebase is small enough that this is safe.

    This is a discipline. It means every customization is a code change you can read and revert. It also means setup feels heavier than openclaw’s onboarding. For a pipeline you’ll be running for months, that’s the right trade. For a weekend assistant project, it’s overkill.

    The decision criteria, condensed

    Pick OpenClaw if:

    • You want a personal assistant that responds to messages on channels
    • The workflow is whatever the conversation is
    • You want maximum provider flexibility (Anthropic, OpenAI, local models all first-class)
    • You want a menubar app and guided onboarding out of the box
    • You’re fine with the larger codebase and application-level security model

    Pick NanoClaw if:

    • You’re building something with a state machine — pipeline-shaped, not chat-shaped
    • The workflow exists independent of any conversation
    • You need deterministic routing, dry-runs, replay
    • You want different providers for different stages, configured per agent group
    • You’re deep enough in Claude Code to leverage it as the authoring layer
    • You want OS-level container isolation as your security model
    • You’re willing to write the orchestrator yourself (and would rather, because you want to own the workflow logic)

    Worth knowing

    NanoClaw is younger and more spartan around setup edges — both because it does less by design and because the project is moving fast. If you hit a setup gotcha, the answer is usually in the docs and a quick edit by Claude Code resolves it. Filing an issue and waiting is the slower path. The flip side: the codebase is small enough that you can read all of it, and Claude Code can confidently make changes across it.

    OpenClaw has the larger community, more channel adapters in stable shape, and a richer ecosystem of community skills (ClawHub, the openclaw skills marketplace, has hundreds). If you’re operating in personal-assistant territory, those network effects matter. For pipelines, they don’t.

    Worth flagging for context: OpenClaw’s creator, Peter Steinberger, joined OpenAI in February 2026, with the project continuing as open source. The project’s velocity has been impressive but the security model has also been the subject of multiple writeups — anyone evaluating it for production should read the security analyses alongside the marketing copy.

    The full GTM system this comparison feeds into is in Building a GTM dark factory with Nemotron 3 and NanoClaw. For setup specifics — what it takes to actually run NanoClaw end to end — see the companion piece.

  • Building a GTM dark factory with Nemotron 3 and NanoClaw

    Building a GTM dark factory with Nemotron 3 and NanoClaw

    Outbound has a failure mode anyone running a B2B pipeline has hit. Go wide and the response rates collapse, the domain gets filtered, the brand looks like every other vendor blasting templates. Go narrow and the volume can’t sustain a business. The middle path — per-prospect research, context-aware first touches, disciplined follow-ups — used to need an army of SDRs.

    What the system below builds toward is functionally an AI-native CRM with marketing automation, segmentation, and funnels. It’s the same business object SaaS stacks like HubSpot, Salesforce + Marketo, or Apollo + Outreach + Clay assemble from a dozen subscriptions and a small ops team. Traditionally that operation is human-fronted at every stage: defining segments, enriching records, writing sequences, reviewing replies, tuning the funnel. Tools speed each step but don’t change the shape. Humans are in every loop because the judgment work is theirs.

    The dark factory operating model changes that. GTM is unusually well-suited to it because it’s a closed-loop domain. Every action generates measurable feedback: opens, replies, meetings booked, deals closed, journal of what worked and what didn’t. That feedback is what lets skills earn autonomy on evidence rather than wishful thinking, graduating from copilot mode (operator approves each output) to dark factory mode (autonomous, with sampling and exception escalation). Volume goes up because agents work on more prospects in parallel than any human can. Consistency goes up because the contract on the wire enforces it. The operator’s role compresses from reviewing every output to reviewing what the journal flags.

    The building blocks are NanoClaw as the agent and channel runtime, Nemotron 3 Super as the bulk runtime model alongside Claude for polish, and Claude Code and Codex as the authoring layer. None of them is a CRM. Composed together, with a state machine and journal sitting above them, they become one.

    What the engine does

    The engine takes a hypothesis (ex. “healthcare companies publicly investing in compliance automation are good prospects”) and produces a queue of prospects with structured profiles, draft first-touches in a collab-partner voice, and context packs for the channels where execution stays manual (LinkedIn, anything high-touch). The operator reviews and approves drafts. Email goes out via Resend with proper deliverability hygiene. Replies route through an inbound webhook, get classified, and trigger state transitions. The journal records every decision with rationale, confidence, alternatives considered, and source evidence.

    Two things distinguish it from the standard funnel.

    The qualifying signal is behavioural rather than firmographic. “This company’s CEO talked publicly about scaling regulatory automation last quarter” beats “this company has 80 employees in three cities.” The second tells you a company exists. The first tells you something is happening there worth a conversation.

    Disqualification states are first class: not a fit, not now, unreachable, unresponsive, do not contact, conflict. None of these are fallbacks at the edge of the state machine. They’re destinations the orchestrator routes to deliberately. A prospect that hit “not now” with a specific signal six months ago is a different lead than one that’s been silent. The state machine has to remember the difference.

    Operator in the loop, then less of it

    The two-mode model deserves a closer look because it’s where the architecture earns its keep. Copilot and dark factory aren’t synonyms for “manual” and “automated.” They’re different relationships between the operator and the agent group. Copilot is the operator approving every output and using the journal to spot patterns. Dark factory is the operator sampling outputs, reading exception escalations, and trusting the rubric for the rest. Some skills move between them in weeks. Some never graduate. Drafting outbound to a high-value prospect is a copilot job forever. Augmenting an early-funnel profile from public sources isn’t.

    Claude Code and Codex sit on the operator side of this loop, not the agent side. They edit the orchestrator, write skills, debug runs, apply patches. The agents inside NanoClaw containers run the domain skills, not the authoring code. The operator stitches the two layers together until each carries more on its own.

    Why this architecture for a GTM pipeline

    The framework choice matters because pipelines aren’t assistants. I started on OpenClaw. It’s the more featureful framework on paper, with channels, providers, scheduled tasks, and a guided onboarding flow all in one package. The pitch is right for a personal assistant. You point it at your stuff, it runs.

    For a GTM pipeline it’s the wrong shape. OpenClaw’s agent picks what to do based on the inbound and its own reasoning. That’s the right model for “summarise my inbox” and the wrong model for “transition prospect ABC from awaiting-reply to unresponsive after 14 days.” The second decision has to be deterministic, replayable, dry-runnable, and outside the LLM. Tool-call routing is fine when the cost of a wrong decision is small. In a GTM pipeline a wrong routing decision is a duplicate touch, a wrong segment, a compliance breach.

    NanoClaw makes the opposite design choice. It does less. It runs the channel adapters, one container per agent group, and a host process that owns the message queues. Skills are markdown files mounted into containers. There’s no built-in orchestrator, no business logic, no opinion on your workflow. For an assistant that would be missing functionality. For a pipeline it’s the right scope for the bottom layer.

    The full stack: NanoClaw is the channel and agent runtime. A separate orchestrator (custom code) sits above it and owns the pipeline state machine. Claude Code or Codex sits next to all of it as the authoring layer. The operator sits on top, reviewing outputs, approving drafts, gradually handing off more as each skill earns it. (I’ve written more on the framework comparison itself for those evaluating the two.)

    The orchestrator is plain code. State machine engine, artifact store, journal writer, skill dispatcher, dry-run harness. It dispatches structured tasks to the agent’s inbound queue. The agent runs the skill in its container and writes a result back. The result has to carry, at minimum, what was found, why, how confident the agent is, the alternatives considered and rejected, and the evidence with sources. The orchestrator validates against that contract on read. Validation failure means deterministic retry or dead-letter, never a re-prompt loop. The agent is allowed to be uncertain. It’s not allowed to be silent about it.

    Operating mode lives at the agent group, not in the task. A copilot group’s outputs land in a review queue. A dark factory group’s outputs trigger state transitions automatically. Promoting a skill from copilot to dark factory is moving its mount point, not rewriting it.

    For the model layer: Nemotron 3 Super handles the bulk runtime work. Strong instruction following, long context, throughput that holds up under volume. Augmentation skills that read four or five sources and synthesise a structured profile benefit from the long context: public LinkedIn snippets, recent posts, the company’s own site, a news mention or two. Drafting routes to Claude. The bulk-then-polish chain saves tokens on volume work and keeps the polish pass focused on prose that goes to a human. The free tier covers early-stage development; production volumes need API access. Multi-provider routing is less about feature redundancy and more about not having a single provider’s outage take out the whole pipeline. The orchestrator routes per skill family: bulk runtime to Nemotron, polish to Claude, redundancy keys for either in reserve.

    For setup specifics — Claude Code as the authoring dependency, the no-UI consequence, deployment gotchas a small VPS surfaces — checkout the companion piece on what it takes to actually run NanoClaw.

    DPDP Act compliance lives at the journal layer: every artifact change is logged with provenance, deletion requests tombstone the artifact while retaining audit evidence. Easier upfront than retrofitted.

    What this is, when it’s working

    A GTM dark factory is a specific shape: an AI-native CRM where the determinism lives between tasks and the LLM agency lives inside them. The agent does the bounded judgment work; the orchestrator decides what comes next; the journal holds both accountable. Volume goes up. Variance stays bounded. The operator’s role compresses to where it adds the most value — picking what gets built next, reviewing what the rubric can’t decide, deciding when a skill has earned graduation.

    Outbound that holds shape between wide and narrow doesn’t need an SDR army. It needs orchestration you can trust, a contract on the wire, and the discipline to let skills earn autonomy rather than be granted it. The framework choice is secondary. The split between framework, orchestrator, and authoring layer is what makes it work.

  • Indian IT’s Arbitrage Problem: When Tokens Cost the Same Everywhere

    The Indian IT services industry was built on a straightforward premise: skilled developers in Bangalore cost significantly less than comparable talent in San Francisco. This differential created an empire — TCS, Infosys, Wipro, and hundreds of smaller firms billing clients based on headcount. The model was self-reinforcing. More engineers meant more revenue, which meant hiring even more engineers.

    AI breaks this equation in a way previous technology shifts didn’t. When an LLM API costs the same per million tokens whether you’re calling it from Mumbai or Manhattan, geography stops mattering. The cost of doing work is shifting from labour, which varies by location, to compute, which doesn’t. As AI agents get better at performing tasks that used to require human engineers, the ratio keeps tilting further away from the headcount model, resulting in a structural break.

    The arbitrage that built an industry

    India’s tech boom worked because clients could get the same capability at dramatically lower cost. A Fortune 500 company could hire multiple engineers in India for significantly less than the cost of one in the US, and the output quality was comparable. Even Global Capability Centres — the in-house versions of this model — followed the same logic, functioning as cost centres to reduce the parent company’s tech spend.

    China’s manufacturing dominance followed the same pattern: cheap labour built the industry, then automation eroded the advantage but the specialised human knowledge persisted. The difference may be speed — manufacturing automation took decades, while AI may be compressing that timeline.

    When uniform pricing changes everything

    Nandan Nilekani described recently how India moved from concept to deployed AI solution for dairy farmers in three weeks — from a January 8 meeting with the Prime Minister to a February 11 launch. That kind of velocity shows what’s possible when AI adoption isn’t constrained by procurement cycles. Large IT services companies, by contrast operate on longer evaluation timelines. By the time a tool clears compliance and gets deployed at scale, the market has moved on.

    This isn’t a process problem that better project management can fix. It’s structural, baked into how large organisations manage risk. Smaller, leaner operations can adopt and discard tools at whatever pace the technology demands. Established players can’t.

    Scale, which used to be the competitive moat, becomes an anchor. When you have large engineering teams on payroll, each person represents fixed costs — salaries, benefits, office space, management overhead. If 10 engineers with AI agents can now produce what 50 engineers produced before, every client will eventually ask why they’re still paying for 50. The “bench” model, where firms keep engineers on payroll between projects, becomes financially unsustainable when margins compress.

    The maintenance trap

    The strongest counterargument came immediately. In February 2026, a short-seller report from Citrini — written as a fictional memo from June 2028 — wiped roughly $10 billion off Indian IT stocks by arguing that cost arbitrage was dead because AI agents run at the cost of electricity. The defence was swift and detailed: Indian IT revenue is overwhelmingly maintenance and integration on legacy enterprise systems, not greenfield coding. Enterprise systems are sprawling, non-monolithic, and require deterministic outputs. AI is probabilistic. You can’t wholesale replace systems of record with something that gives you a different answer every time you ask the same question.

    HSBC estimated 14-16% gross AI-led revenue deflation across service segments — significant but not existential. The technology stacks of the world’s largest enterprises take years to adapt. Custom application maintenance alone accounts for roughly 35% of a typical Indian IT company’s revenue: incident management, service requests, change requests, problem resolution across architectures where SAP, Salesforce, Snowflake, and ServiceNow coexist in configurations unique to each client.

    The problem with this defence: maintenance work is structured, repeatable, well-documented—exactly the kind of work agents may eventually handle well. It’s arguably easier to automate than greenfield development because the patterns are known and the test conditions are defined. Even if 14-16% deflation is accurate, that’s 14-16% less revenue through a headcount-based billing model, which means clients now have a benchmark for what’s possible. The entire pricing structure comes under pressure.

    HFS Research projects a category called Services-as-Software growing to $1.5 trillion — AI-driven autonomous delivery replacing seat-based pricing with outcome-based models. IT service companies proactive about building their own AI agents, and willing to cannibalise legacy revenues, can gain share from software companies rather than just lose it. Companies that defend the old model will likely lose share.

    What survives

    Strategic judgement still matters. Domain expertise still matters. The ability to translate messy business problems into AI-solvable workflows — that doesn’t have a token cost equivalent. Even if code generation gets solved, the compliance, security, infrastructure, and domain knowledge layers don’t collapse. Enterprise software involves SOC-2 audits, data residency, currency handling, PII management. None of that happens automatically. Someone needs to be accountable when things break.

    DevOps, support, and production reliability are further behind code generation in the automation curve. Monitoring, incident response, infrastructure management — the consequences of AI errors in these areas are immediate and expensive. The software development lifecycle may be restructuring fast, but the operational layer still needs human judgment.

    Indian IT’s deep domain knowledge in specific verticals — healthcare, banking, insurance — could be repositioned rather than eliminated. Whether companies can make that pivot before clients start asking harder questions about headcount is the open question.

    The uncomfortable transition

    Headcount-based billing becomes harder to justify every quarter. The bench model becomes financially unsustainable at current margins. GCCs will face pressure to shrink headcount and demonstrate output-per-head improvements. Indian IT may need to pivot from services to products, or reinvent the services model around outcome-based pricing.

    When 59% of hiring managers admit they emphasize AI in layoff announcements because it “plays better with stakeholders” than admitting financial constraints, the narrative gap becomes clear. Companies are restructuring for traditional budget reasons but framing it as AI transformation. That creates a trust problem, but it also reveals something about client expectations: the perception that AI should reduce headcount costs is becoming real, whether or not the technology has fully delivered on that promise yet.

    The same forces dismantling labour arbitrage are creating opportunities for lean operators. A solo developer or small team with the right domain expertise and AI tools can now deliver enterprise-grade output. Clients don’t care if the work was done by 50 engineers in a GCC or 2 people with agents — they care about the outcome. Outcome-based pricing models become viable and attractive: charge for value delivered, not hours spent.

    Indian tech talent is world-class. The individuals who decouple from the headcount model and operate independently or in small setups may be better positioned than ever. The market is shifting from “who has the most people” to “who can deliver the most value per unit of cost” — and that’s a game lean operators can win.

    The question isn’t whether Indian IT survives. The industry isn’t disappearing. The question is whether the organisational models built around labour arbitrage can adapt to value arbitrage fast enough. The talent is there. The domain expertise is there. What’s uncertain is whether companies structured around selling engineer-hours can reinvent themselves to sell outcomes instead—and whether they can do it before clients find someone else who already has.

  • The Autonomous SDLC: What’s Solved, What’s Not, and Why the Gaps Are Closing Fast

    We’re further along than most people realize. The software development lifecycle is being automated piece by piece, and the trajectory is becoming harder to ignore—not through some magical breakthrough, but through the steady elimination of bottlenecks that seemed permanent six months ago.

    This is a practitioner’s status report discussing what works in production today, what remains genuinely unsolved, and why the remaining gaps matter less than conventional wisdom suggests.

    Code Generation: Already Production-Grade

    The middle portion of the SDLC—turning specifications into working code—has crossed a threshold. Cursor CEO Michael Truell describes three eras: tab autocomplete, synchronous agents responding to prompts, and now agents tackling larger tasks independently with less human direction. At Cursor, 35% of merged PRs now come from agents running autonomously in cloud VMs. The agent PRs are “an order of magnitude more ambitious than human PRs” while maintaining higher merge rates.

    What matters isn’t the percentage—it’s that these agent-generated PRs pass the same review standards as human code. Max Woolf’s detailed experiments are instructive. Starting as a vocal skeptic who wrote about rarely using LLMs, he ended up building Rust libraries that outperformed battle-tested numpy-backed implementations by 2-30x. Not prototypes—production code passing comprehensive test suites and benchmarks.

    His conclusion after months of testing:

    I have been trying to break this damn model by giving it complex tasks that would take me months to do by myself despite my coding pedigree but Opus and Codex keep doing them correctly.

    The quality ceiling keeps rising with each model generation. This isn’t “good enough for prototypes”—it’s production-grade code that ships.

    Spec-Driven Development

    The initiation problem has largely converged. Most tools now support planning mode—the agent reads a spec, creates an implementation plan, follows it through. Woolf’s experience matters here:

    AGENTS.md is probably the main differentiator between those getting good and bad results with agents.

    These persistent instruction files function as system prompts that shape agent behaviour across sessions.

    This is just spec-driven development—the same methodology good engineering teams already use. The pattern works: write a detailed spec (GitHub issue, markdown file), point the agent at it, let it execute. The difference is that agents can now be the executor, and the pattern works across tools (Cursor, Claude Code, Codex) because it aligns with how reliable software gets built regardless of who’s typing.

    The Feedback Loop: The Primary Gap

    Basic unit tests and regression tests work well—agents can write and run them as part of their workflow. Complex feature tests, integration tests, and UAT remain the primary gap. UI/UX testing is particularly challenging since agents can’t easily evaluate visual output.

    The current workaround: human-in-the-loop for complex test evaluation, with agents handling mechanical testing. That said, the coding agents can still fix bugs when given screenshots and descriptions.

    This is an active focus area. The gap is narrowing from both sides: agents getting better at generating comprehensive tests, and tooling improving for automated visual and integration testing. Satisfactory solutions within 2026 aren’t a stretch—they’re the natural next step given where the infrastructure is heading.

    Guardrails: Actively Being Solved

    Managing task boundaries and blast radius is critical for autonomous operation. Best practices are emerging around sandboxing—isolated agent execution environments, limited file system access, branch-based workflows.

    The Anthropic C compiler experiment demonstrated the pattern at scale: 16 agents working on a shared codebase over 2,000 sessions, coordinating through git locks and comprehensive test harnesses. The test infrastructure was rigorous enough to guide autonomous agents toward correctness without human review, producing a 100,000-line compiler that can build Linux.

    StrongDM took this further with their dark factory approach. They built digital twins of production dependencies—behavioral clones of Okta, Jira, Slack—using agents to replicate APIs and edge cases. This enabled validation at volumes far exceeding production limits without risk. Their rule: “Code must not be reviewed by humans.” The safety comes from comprehensive scenario testing against holdout test cases the agents never see.

    The agent infrastructure layer is building out fast. We’re seeing microVMs that boot fast enough to feel container-like, with snapshot/restore making “reset” almost free. Agent-specific sandboxed compute, identity, and API access are emerging as distinct product categories.

    The guardrails problem is increasingly an infrastructure problem, not a model problem. This converges toward a standard pattern: spec + guardrails + sandbox + automated validation = safe autonomous execution.

    The Self-Improvement Dynamic

    Something subtle is happening. Codex optimizes code, Opus optimizes it further, Opus validates against known-good implementations. Cumulative 6x speed improvements on already-optimized code. Then you have Opus 4.6 iteratively improving its own code through benchmark-driven passes.

    Folks have showed agents tuning LLMs on Hugging Face—the tooling layer being built by the tools themselves. This isn’t theoretical AGI. It’s narrow but powerful self-improvement within the coding domain. The practical implication: the rate of improvement accelerates as agents get better at improving agents. For the coding stack specifically, each generation of tools makes the next generation arrive faster.

    What This Means for Planning

    Here’s the timeline as I see it:

    2025: Code generation reliable. Spec-driven development emerging. Testing and guardrails manual.

    2026: Testing automation reaches satisfactory level. Guardrails standardize. The loop becomes semi-autonomous.

    2027+: Fully autonomous for standard applications. Human involvement shifts entirely to direction and edge cases.

    The companies planning as if these gaps will persist are making the same mistake as those who planned around slow internet in 2005. AI tools amplify existing expertise—all the practices that distinguished senior engineers (comprehensive testing, good documentation, strong version control habits, effective code review) matter even more now. But the bar for what “good enough” looks like is rising in parallel.

    Antirez captures the shift plainly:

    Writing code is no longer needed for the most part. It is now a lot more interesting to understand what to do, and how to do it.

    The mental work hasn’t disappeared. It’s concentrated in the parts machines can’t yet replace: architecture decisions, user needs, system design trade-offs.

    The gaps are real today. But they’re the wrong thing to optimize around. Optimize around what becomes possible when they close—because that’s happening faster than the pace of traditional software planning cycles.

  • From Clicks to Conversations: How AI Agents Are Revolutionizing Business

    From Clicks to Conversations: How AI Agents Are Revolutionizing Business

    For the last decade, businesses have invested heavily in “Digital Transformation,” building powerful digital tools and processes to modernize their operations. While this era brought significant progress, it also created a persistent challenge. The tools we built—from CRMs to ERPs—were largely dependent on structured data: the neat, organized numbers and categories found in a spreadsheet or database. Computers excel at processing this kind of information.

    The problem is that the most valuable business intelligence isn’t structured. The context behind a business plan locked in a 100-slide presentation, the nuance of a customer relationship captured in a rep’s notes, or the true objective of a strategy discussed in a meeting—this is all unstructured data. This divide has created a major hurdle for business efficiency, as great ideas often get lost when people try to translate them into the rigid, structured systems that computers understand.

    The Old Way: The Limits of Traditional Digital Tools

    The first wave of digital tools, from customer relationship management (CRM) software to accounting platforms, were designed for humans to operate. Their critical limitation was their reliance on structured data, which forced people to act as human translators. A brilliant, nuanced strategy conceived in conversations and documents had to be manually broken down and entered into rigid forms and fields.

    This created a significant “gap between business strategy and execution,” where high-level vision was lost during implementation. The result was heavy “change management overheads,” not just because teams needed training on new software, but because of the cognitive friction involved. People are used to working with the unstructured information in their heads; these tools forced them to constantly translate their natural way of thinking into structured processes the software could understand.

    Information TypeBusiness Example
    StructuredEntries in a CRM database, financial data in an accounting platform, inventory numbers in an ERP system.
    UnstructuredA 100-slide brand plan document, a sales rep’s recorded notes describing a doctor they just met, emails discussing a new brand strategy.

    This reliance on structured systems meant that the tools, while digital, couldn’t fully grasp the human context of the work they were supposed to support. A new approach was needed—one that could understand information more like a person does.

    A Smarter Way: Introducing AI Agents

    Welcome to the era of “AI Transformation.” At the heart of this new wave are AI Agents: specialized digital team members that can augment a human workforce. Think of them as a dedicated marketing agent, a sales agent, or a data analyst agent, each designed to perform specific business functions.

    The single most important capability of AI agents is their ability to work with both structured and unstructured information. You can communicate a plan to an agent by typing a message, speaking, or providing a document—just as you would with a human colleague. This fundamental shift from clicking buttons to holding conversations unlocks three profound benefits:

    • Bridging the Strategy-to-Execution Gap: AI agents can understand the nuance of an unstructured plan—the “why” behind the “what”—and help execute it without critical information getting lost in translation.
    • Handling All Information Seamlessly: They can process natural language from documents, presentations, or conversations and transform it into the actionable, structured data that existing digital tools need to function.
    • Reducing Change Management: Because agents understand human language, the need for extensive training on rigid software interfaces is significantly reduced. People can work more naturally, supervising the agents as they handle the tedious, structured tasks.

    To see how this works in practice, let’s walk through how a team of AI agents can help plan and execute a marketing campaign from start to finish.

    AI Agents in Action: Launching a Marketing Campaign

    This step-by-step walkthrough shows how AI agents can take a high-level marketing plan from a simple idea to a fully executed campaign, seamlessly connecting unstructured strategy with structured execution.

    1. The Starting Point: The Marketing Brief – The process begins when a brand manager provides a marketing brief. This brief is pure unstructured information—it could be a presentation, a document, or even the transcript of a planning conversation. It contains the high-level goals and vision for the campaign.
    2. Deconstructing the Brief: The Brand Manager Agent – A specialized “Brand Manager” agent analyzes the unstructured brief and extracts the core business context elements. It identifies key information such as:
      • Business objectives
      • Target audience definitions
      • Key messages
      • Brands in focus
      • Timelines and milestones
    3. The agent then organizes this information into structured, machine-readable “context blocks,” creating a clear, logical foundation that other systems and agents can use.
    4. Understanding the Customer: The Digital Sales Agent – Next, a “Digital Sales” agent contributes by performing customer profiling. It can take unstructured, natural language descriptions of customers (for instance, from a sales rep’s recorded notes) and map them to formal, structured customer segments and personas. This builds a richer, more accurate customer profile than a simple survey could provide.
    5. Creating the Content: The Content Writer Agent – Using the structured business context from the Brand Manager agent, a “Content Writer” agent assembles personalized content. It can reuse and repurpose existing content from a library of approved modules, accelerating content creation while ensuring brand compliance.
    6. Executing the Plan: The Next Best Action (NBA) Engine – Finally, the system brings everything together to recommend the “Next Best Action.” This engine synthesizes the campaign’s business context, the customer’s profile, the available content, and their recent engagement history to suggest the perfect next step for each customer. It recommends precisely what content to send and which channel to use, turning high-level strategy into a concrete, personalized action.

    This orchestrated workflow makes the entire process smoother, faster, and far more intelligent. It creates a virtuous cycle, where the system learns from every interaction to continuously improve the overall strategy and execution over time.

    The Future of Work is Collaborative

    The rise of AI agents marks a fundamental shift in how we work with technology. We are moving from a world where humans must adapt to operate digital tools to one where humans supervise intelligent AI agents that use those tools on our behalf.

    This new wave of AI transformation is not about replacing people, but about augmenting their human workforce without adding headcount. By handling the translation between unstructured human ideas and structured digital processes, AI agents help businesses reduce friction, cut down on turnaround times, and finally bridge the long-standing gap between their biggest strategies and their real-world execution.

  • The Economic Reality and the Optimistic Future of Agentic Coding

    The Economic Reality and the Optimistic Future of Agentic Coding

    After a couple of months deep in the trenches of vibe coding with AI agents, I’ve learned this much: scaling from a fun, magical PoC to an enterprise-grade MVP is a completely different game.

    Why Scaling Remains Hard—And Costly

    Getting a prototype out the door? No problem.

    But taking it to something robust, secure, and maintainable? Here’s where today’s AI tools reveal their limits:

    • Maintenance becomes a slog. Once you start patching AI-generated code, hidden dependencies and context loss pile up. Keeping everything working as requirements change feels like chasing gremlins through a maze.
    • Context loss multiplies with scale. As your codebase grows, so do the risks of agents forgetting crucial design choices or breaking things when asked to “improve” features.

    And then there’s the other elephant in the room: costs.

    • The cost scaling isn’t marginal—not like the old days of cloud or Web 2.0. Powerful models chew through tokens and API credits at a rate that surprises even seasoned devs.
    • That $20/month Cursor plan with unlimited auto mode? For hobby projects, it’s a steal. For real business needs, I can see why some queries rack up millions of tokens and would quickly outgrow even the $200 ultra plan.
    • This is why we’re seeing big tech layoffs and restructuring: AI-driven productivity gains aren’t evenly distributed, and the cost curve for the biggest players keeps climbing.

    What the Data Tells Us

    That research paper—Measuring the Impact of Early-2025 AI on Experienced Open-Source Developer Productivity—had a surprising conclusion:

    Not only did experienced developers see no time savings on real-world coding tasks with AI, but costs increased as they spent more time reviewing, correcting, and adapting agent output.

    The lesson:

    AI shifts where the work happens—it doesn’t always reduce it. For now, scaling with agents is only as good as your processes for context, review, and cost control.

    Why I Remain Optimistic

    Despite the challenges, I’m genuinely excited for what’s coming next.

    • The platforms and models are evolving at warp speed. Many of the headaches I face today—context loss, doc gaps, cost blind spots—will get solved just as software engineering best practices eventually became codified in our tools and frameworks.
    • Agentic coding will find its place. It might not fully automate developer roles, but it will reshape teams: more focus on high-leverage decisions, design, and creative problem-solving, less on boilerplate and “busy work.”

    And if you care about the craft, the opportunity is real:

    • Devs who learn to manage, review, and direct agents will be in demand.
    • Organizations that figure out how to blend agentic workflows with human expertise and robust process will win big.

    Open Questions for the Future

    • Will AI agentic coding mean smaller, nimbler teams—or simply more ambitious projects for the same headcount?
    • How will the developer role evolve when so much code is “synthesized,” not hand-crafted?
    • What new best practices, cost controls, and team rituals will we invent as agentic coding matures?

    Final thought:

    The future won’t be a return to “pure code” or a total AI handoff. It’ll be a blend—one that rewards curiosity, resilience, and the willingness to keep learning.

    Where do you see your work—and your team—in this new landscape?

  • The Law of Leaky Abstractions & the Unexpected Slowdown

    The Law of Leaky Abstractions & the Unexpected Slowdown

    If the first rush of agentic/vibe coding feels like having a team of superhuman developers, the second phase is a reality check—one that every software builder and AI enthusiast needs to understand.

    Why “Vibe Coding” Alone Can’t Scale

    The further I got into building real-world prototypes with AI agents, the clearer it became: Joel Spolsky’s law of leaky abstractions is alive and well.

    You can’t just vibe code your way to a robust app—because underneath the magic, the cracks start to show fast. AI-generated coding is an abstraction, and like all abstractions, it leaks. When it leaks, you need to know what’s really happening underneath.

    My Experience: Hallucinations, Context Loss, and Broken Promises

    I lost count of the times an agent “forgot” what I was trying to do, changed underlying logic mid-stream, or hallucinated code that simply didn’t run. Sometimes it wrote beautiful test suites and then… broke the underlying logic with a “fix” I never asked for. It was like having a junior developer who could code at blazing speed—but with almost no institutional memory or sense for what mattered.

    The “context elephant” is real. As sessions get longer, agents lose track of goals and start generating output that’s more confusing than helpful. That’s why my own best practices quickly became non-negotiable:

    • Frequent commits and clear commit messages
    • Dev context files to anchor each session
    • Separate dev/QA/prod environments to avoid catastrophic rollbacks (especially with database changes)

    What the Research Shows: AI Can Actually Slow Down Experienced Devs

    Here’s the kicker—my frustration isn’t unique.

    A recent research paper, Measuring the Impact of Early-2025 AI on Experienced Open-Source Developer Productivity, found that experienced developers actually worked slower with AI on real-world tasks. That’s right—AI tools didn’t just fail to deliver the expected productivity boost, they created friction.

    Why?

    • Only about 44% of AI-generated code was accepted
    • Developers lost time reviewing, debugging, and correcting “bad” generations
    • Context loss and reliability issues forced more manual intervention, not less

    This matches my experience exactly. For all the hype, these tools introduce new bottlenecks—especially if you’re expecting them to “just work” out of the box.

    Lessons from the Frontlines (and from Agent Week)

    I’m not alone. In the article What I Learned Trying Seven Coding Agents, Timothy B. Lee finds similar headaches:

    • Agents get stuck
    • Complex tasks routinely stump even the best models
    • Human-in-the-loop review isn’t going anywhere

    But the tools are still useful—they’re not a dead end. You just need to treat them like a constantly rotating team of interns, not fully autonomous engineers.

    Best Practices: How to Keep AI Agents Under Control

    So how do you avoid the worst pitfalls?

    The answer is surprisingly old-school:

    • Human supervision for every critical change
    • Sandboxing and least privilege for agent actions
    • Version control and regular context refreshers

    Again, Lee’s article Keeping AI agents under control doesn’t seem very hard nails it:

    Classic engineering controls—proven in decades of team-based software—work just as well for AI. “Doomer” fears are overblown, but so is the hype about autonomy.

    Conclusion: The Hidden Cost of Abstraction

    Vibe coding with agents is like riding a rocket with no seatbelt—exhilarating, but you’ll need to learn to steer, brake, and fix things mid-flight.

    If you ignore the leaky abstractions, you’ll pay the price in lost time, broken prototypes, and hidden tech debt.

    But with the right mix of skepticism and software discipline, you can harness the magic and avoid the mess.

    In my next post, I’ll zoom out to the economics—where cost, scaling, and the future of developer work come into play.

    To be continued…

  • The Thrill and the Illusion of AI Agentic Coding

    The Thrill and the Illusion of AI Agentic Coding

    A few months ago, I stumbled into what felt like a superpower: building fully functional enterprise prototypes using nothing but vibe coding and AI agent tools like Cursor and Claude. The pace was intoxicating—I could spin up a PoC in days instead of weeks, crank out documentation and test suites, and automate all the boring stuff I used to dread.

    But here’s the secret I discovered: working with these AI agents isn’t like managing a team of brilliant, reliable developers. It’s more like leading a software team with a sky-high attrition rate and non-existent knowledge transfer practices. Imagine onboarding a fresh dev every couple of hours, only to have them forget what happened yesterday and misinterpret your requirements—over and over again. That’s vibe coding with agents.

    The Early Magic

    When it works, it really works. I’ve built multiple PoCs this way—each one a small experiment, delivered at a speed I never thought possible. The agents are fantastic for “greenfield” tasks: setting up skeleton apps, generating sample datasets, and creating exhaustive test suites with a few prompts. They can even whip up pages of API docs and help document internal workflows with impressive speed.

    It’s not just me. Thomas Ptacek’s piece “My AI Skeptic Friends Are All Nuts” hits the nail on the head: AI is raising the floor for software development. The boring, repetitive coding work—the scaffolding, the CRUD operations, the endless boilerplate—gets handled in minutes, letting me focus on the interesting edge cases or higher-level product thinking. As they put it, “AI is a game-changer for the drudge work,” and I’ve found this to be 100% true.

    The Fragility Behind the Hype

    But here’s where the illusion comes in. Even with this boost, the experience is a long way from plug-and-play engineering. These AI coding agents don’t retain context well; they can hallucinate requirements, generate code that fails silently, or simply ignore crucial business logic because the conversation moved too fast. The “high-attrition, low-knowledge-transfer team” analogy isn’t just a joke—it’s my daily reality. I’m often forced to stop and rebuild context from scratch, re-explain core concepts, and review every change with a skeptical eye.

    Version control quickly became my lifeline. Frequent commits, detailed commit messages, and an obsessive approach to saving state are my insurance policy against the chaos that sometimes erupts. The magic is real, but it’s brittle: a PoC can go from “looks good” to “completely broken” in a couple of prompts if you’re not careful.

    Superpowers—With Limits

    If you’re a founder, product manager, or even an experienced developer, these tools can absolutely supercharge your output. But don’t believe the hype about “no-code” or “auto-code” replacing foundational knowledge. If you don’t understand software basics—version control, debugging, the structure of a modern web app—you’ll quickly hit walls that feel like magic turning to madness.

    Still, I’m optimistic. The productivity gains are real, and the thrill of seeing a new prototype come to life in a weekend is hard to beat. But the more I use these tools, the more I appreciate the fundamentals that have always mattered in software—and why, in the next post, I’ll talk about the unavoidable reality check that comes when abstractions leak and AI doesn’t quite deliver on its promise.

    To be continued…