← All articles

From git worktrees to autonomous agents: a senior engineer's upgrade

Worktrees + tmux + Claude Code is a working setup for parallel agents. It's also the ceiling of what you can do without real isolation. Here's what changes when you move to containerized autonomous runs.

11 min read

The setup is familiar. You have a worktree per feature, a tmux session per worktree, a Claude Code or Codex instance running in each. You alt-tab between them. You hit accept a lot. You write a script that creates the worktree, opens the tmux pane, and launches the agent in one command.

This works. It's also the most that this approach can do. The reasons it stops scaling are the same reasons it was always fragile — they just become unavoidable once you push it to four or five concurrent agents.

This article is about what's actually breaking, and what changes when you move the agent into an isolated runtime instead of an isolated checkout.

Why worktrees work — and where they end

A worktree gives you N independent checkouts of the same repo, each on its own branch, sharing one .git directory. It's a clean answer to the obvious problem with parallel agents: two agents shouldn't be writing to the same source tree.

What worktrees don't isolate, in the order it bites:

  • Services. Postgres, Redis, your message queue. One database, several agents writing to it. One of them runs a migration. Another runs tests against the half-migrated schema. The test passes for reasons unrelated to the code under test.
  • Ports. One dev server on 3000. Five agents each want to pnpm dev. The second one fails. The agent treats it as a bug and tries to "fix" your dev server config.
  • The filesystem outside the worktree. node_modules caches, ~/.cache/pip, ~/.npm, /tmp. Two agents installing dependencies at the same time can race on the global cache and produce subtly broken installs.
  • Host credentials. Every agent has your real GitHub token, your AWS profile, your .env files. One bad rm, one over-eager push, one credential leak in a log line shipped to a third-party service — your problem, not the agent's.
  • Your attention. Five tmux panes is what you can hold in your head. Eight is not. The bottleneck stops being the agent and starts being the screen real estate.

You can paper over each of these. Run a Postgres-per-worktree on a port-per-worktree. Set PORT=$(random) for the dev server. Use a fresh per-worktree node_modules. Set per-worktree env files. Now your "simple" worktree script is 200 lines of bash that nobody else on the team can read.

The actual upgrade: container per run, not checkout per run

The shift is from "isolated working copy on a shared host" to "isolated runtime, full stop."

Each parallel agent runs inside its own container with:

  • Its own clone of the repo on its own branch
  • Its own Postgres, its own Redis, its own whatever — running in sidecar containers on the run's private network
  • Its own ports, its own hostnames, its own filesystem caches
  • A scoped GitHub token, not your personal one
  • No view of your ~/.ssh, your .env files, your other projects, or your host's network

That last point is the security upgrade you also get for free. In the worktree model, every agent runs as your user, with your credentials, in a directory next to the rest of your work. In the container model, the worst-case blast radius of a bad agent action is the container itself.

What this looks like for the dev environment

The pattern that scales: a dev image that bundles your app's runtime plus its services. When a run starts, the orchestrator launches a container from that image. The container starts Postgres, runs migrations, starts the app, then starts the agent. When the run ends, everything is thrown away. The next run starts from a clean slate.

Trimo expresses this through a services.json sidecar definition: declare your databases, caches, queues, and search indexes as named sidecars; they come up automatically on every run, on a private internal network, with a DNS alias matching the service name. There's no shared state between runs. There's no cleanup script.

(The same model works for headless browsers and build tooling — anything that needs to be a long-lived sidecar rather than a step in your test script.)

What you actually have to change in your workflow

If you're already running parallel agents in worktrees, the conceptual model survives intact. The mechanics shift.

Dispatch: from "spawn pane" to "create run"

Old: ./new-agent.sh feature-x "implement X" creates a worktree, opens a tmux pane, launches the agent.

New: a CLI call (or an API call, or a click in a dashboard) creates a pipeline, which spins up a container, clones the repo, creates the branch, and launches the agent. You're not opening anything on your host. There's nothing to alt-tab to.

Visibility: from tmux panes to a dashboard

Old: you scan five terminal windows for the one that's stalled on a permission prompt.

New: a single view shows every pipeline's status — running, idle, needs attention, complete — with the latest tool call and the latest commit. You glance at it when you're between things. When something finishes, you open a terminal in the container and verify the result.

The point isn't that the dashboard is prettier. It's that the unit of attention is "all my agents," not "one agent." You stop optimizing your terminal layout and start optimizing your review queue.

Iteration: from re-prompting in a chat to follow-up runs

Old: agent finishes a task half-done; you type a correction in the same chat; agent picks up the context and tries again.

New: agent finishes a task half-done; you create a follow-up run on the same pipeline. The branch is preserved, the prior commits are there, the context files are there. The next run picks up with full awareness of what happened in the previous one. Nothing is in your scrollback — it's all in the pipeline's history.

This is better in two ways: the iteration record is durable (you can come back in two weeks and trace what happened), and the conversation is shorter (each run starts with structured context rather than a long chat transcript).

Permission prompts: gone

The reason most worktree setups still have permission prompts is that the agent is running on your host. Even with auto-accept lists, every novel operation blocks. Multiply by five panes — see Why coding agents freeze on permission prompts for the math.

In a container, every operation is structurally safe by default. The agent doesn't ask. It does. If it does something wrong, the container is disposable. If it tries to do something dangerous — Trimo blocks it. Agents can't mess up your repo. Safety stops being a per-prompt human review and starts being a property of the runtime.

What you give up

Worth being honest about the tradeoffs.

There's a first-time setup

A working worktree script is 50 lines. A container-based workflow needs a dev image and a services definition — but with Trimo that's smaller than it sounds. The base image already bundles the agent runtime with safe git handling built in; you add your project's dependencies on top of it. Services go into a single services.json schema (name, image, memory, healthcheck) and come up automatically on every run. For a straightforward app this is an afternoon of setup at most. The payoff starts on the first parallel run and compounds from there — every subsequent run, every new hire, every CI job gets the same environment.

Inspection happens inside the container

Worktree workflow: you can cd into the worktree and run psql against the database the agent just modified. Container workflow: the database is inside the container's network.

Trimo's QA mode handles this. When a run finishes, you open a terminal session inside the container — start a dev server, query the database, run tests, test endpoints in your browser. The full dev environment is still running. You're verifying the working product, not just reading code.

Your laptop is now the bottleneck for run count

Each container with its own Postgres uses real RAM. A 32 GB MacBook will comfortably run 6–8 concurrent dev environments; a 16 GB machine sits at 3–4. Cloud sandboxes don't have this limit, but they have a different one (your code leaves your machine, and you pay for compute on top of LLM tokens).

For most senior engineers running real review-heavy workflows, the limit is your read speed long before it's your hardware.

A migration path that doesn't require rebuilding everything

Don't try to move every workflow at once. The pattern that works:

  1. Pick one project. The one where you most often run parallel agents. The one where service collisions are biting you most.
  2. Define its dev environment as a container image. Start from a base image that includes your language runtime and the agent. Add the project's dependencies. Define the supporting services as sidecars.
  3. Run one container by hand. Verify the agent can clone, install, run tests, and push a branch. Match it to what your worktree setup does today.
  4. Move dispatch into a tool. CLI, dashboard, whatever — anything that beats "open a new tmux pane." This is the moment the workflow actually changes for you.
  5. Tear down the worktree script. Once parallel runs in containers are reliable, the bash script becomes a liability you don't need.

The whole migration is a half day if your dev environment is straightforward, a couple of days if it isn't. After that, the daily workflow is meaningfully better — and so is everyone else's, because the container is reproducible.

The shape of the work after the upgrade

The most visible change isn't the tooling. It's where your time goes.

Before: dispatching takes seconds, supervising takes most of the day. You're attached to the terminals.

After: dispatching takes a couple of minutes (writing a good prompt is the work). Reviewing takes most of the day. You're verifying working products and writing follow-ups, not babysitting terminals.

The actual senior-engineer skills — knowing what good looks like, knowing where the codebase is fragile, knowing which corners not to cut — get more leverage, not less. You're applying them across five branches per day instead of one.


Related articles