Sub-agents — Transformer Math

Module 48 · AI Engineering

🤖 Sub-agents

Each sub-agent gets a fresh 200K context window — the parent keeps working

Status:

When a task is too complex for one agent, it spawns sub-agents — fresh instances with clean context that work on subtasks independently. This is how Claude Code handles parallel file exploration, background research, and isolated code changes without polluting the main conversation.

Each sub-agent gets a fresh QueryEngine with empty message history
Sub-agents inherit all tools by default — harnesses typically restrict the Agent tool to prevent recursive fork bombs
Worktree isolation prevents file conflicts between parallel agents
Results return as a single text summary, not the full conversation

🎮

Sub-Agent Lifecycle

What you are seeing

The complete lifecycle of a sub-agent: the parent spawns it with a task description, the sub-agent works independently with its own tools and context, and returns a summary when done.

What to try

Compare foreground (blocking) vs background (non-blocking) execution. Notice how the parent's context stays clean regardless of how many tool calls the sub-agent makes.

// Parent spawns sub-agent

Parent context: 45K tokens (100+ messages)

→ spawn_sub_agent("Find all TODO comments")

// Sub-agent starts fresh

Sub-agent context: 0 tokens (empty history)

Tools: [Grep, Glob, Read] (no Agent, no Bash)

1. Grep "TODO" ./src → 23 matches

2. Read 5 files for context

3. Summarize findings

// Result back to parent

→ "Found 23 TODOs across 12 files. Critical: ..."

Parent context: 45K + ~200 tokens (just the summary)

💡

The Intuition

What you’re seeing: a parent agent fanning out to 4 sub-agents, each with its own isolated context window; only the summary returns up. What to try: follow why the recursion guard prevents sub-agents from spawning more sub-agents.

Why Fresh Context?

The parent has 80K tokens of history about login bugs. The sub-agent needs to search for CSS files — that history is noise. Fresh QueryEngine = full context window for the actual task. Inheriting the parent's context would waste tokens and risk the sub-agent losing focus or hitting the context limit before finishing.

Context Isolation

The parent agent might have 100K+ tokens of conversation history — file contents, tool results, reasoning. If a sub-agent inherited all of that, it would waste context on irrelevant information and risk hitting the context limit before completing its task. Instead, each sub-agent starts with messages=[] — a clean slate. It gets only the task description, typically 50-200 tokens.

Tool Subset

Sub-agents inherit all tools by default, but harnesses typically apply an allowlist or denylist. The most common restriction: remove the Agent tool so sub-agents cannot recursively spawn more sub-agents, creating a fork bomb that consumes all available resources.

💡 Tip · Different sub-agent types get different tool sets. An Explore agent gets only read tools (Grep, Glob, Read) for fast search. A Plan agent is typically read-only — it explores and designs but does not modify files. A general-purpose agent gets everything except Agent.

Foreground vs Background Execution

Foreground: the parent waits for the sub-agent to finish. Simple, the result is immediately available. Background: the parent continues working while the sub-agent runs asynchronously. Higher throughput but the parent must handle the result arriving later — and the sub-agent's changes may conflict with the parent's concurrent edits.

Worktree Isolation

Worktree isolation is an optional mechanism for filesystem safety. Setting isolation: 'worktree'creates a git worktree — a separate checkout of the same repo at a different path. The sub-agent edits its worktree without affecting the parent's working directory. When done, changes are merged back. Many sub-agents run without this — it is opt-in for cases where parallel edits would conflict.

✨ Insight · Sub-agents can potentially be resumed or continued via messaging — the parent can send follow-up instructions to a running sub-agent. In practice, many harnesses treat sub-agents as disposable: if one fails, the parent retries or takes a different approach. Success returns a summary, failure returns an error message, and the parent decides what to do next.

When NOT to Use Sub-Agents

Sub-agents add overhead: spawning a new QueryEngine, assembling a fresh system prompt, and making at least one extra API call. For tasks under 5 tool calls, the overhead is not worth it — just do the work in the parent. The right signal for sub-agents is independent subtasks that each need 10+ tool calls. Over-spawning creates a different problem: if the parent spawns 10 sub-agents that each read overlapping sets of files, you pay 10x the token cost for redundant reads with no parallelism benefit on shared files. A useful heuristic: spawn a sub-agent when (1) the subtask is clearly scoped, (2) it does not need to share mutable state with the parent in real time, and (3) its result can be expressed as a single text summary. If any of these fail, keep the work in the parent loop.

Result Aggregation Strategies

When multiple sub-agents finish, the parent must combine their outputs into a coherent view. Three common patterns:

Concatenation— simplest. Append all summaries and let the LLM reconcile. Works when subtasks are truly independent (e.g., "analyze module A" +"analyze module B").
Structured return — sub-agents return JSON instead of prose. The parent aggregates fields programmatically before presenting to the LLM. Avoids the LLM having to parse free-form summaries from 5 agents.
Hierarchical synthesis — after sub-agents finish, a dedicated synthesis sub-agent reads all summaries and produces a single merged report. The parent only ever sees one final summary. Higher cost but better coherence for 10+ sub-agents.

The key constraint: each sub-agent result appended to the parent adds ~200–500 tokens. Spawning 20 sub-agents adds 4–10K tokens to the parent's context. At scale, structured returns and hierarchical synthesis are essential to keep the parent's context from bloating.

Quick Check

Why do sub-agents start with empty message history?

📐

Key Code Patterns

Sub-Agent Spawning (TypeScript pseudocode)

typescript

async function spawnSubAgent(
  task: string,
  tools: Tool[],
  background = false,
  isolation?: "worktree"
): Promise<string> {
  // 1. Create fresh QueryEngine (clean context)
  const engine = new QueryEngine({
    tools: filterTools(tools), // no Agent tool
    messages: [],              // empty history
    abortController: new AbortController(),
  });

  // 2. Optional worktree isolation
  if (isolation === "worktree") {
    const worktreePath = createGitWorktree();
    engine.cwd = worktreePath;
  }

  // 3. Run the task
  if (background) {
    void engine.submit(task); // fire-and-forget
    return "Agent running in background";
  }

  const result = await engine.submit(task);
  return result.finalText; // single string back to parent
}

Tool Filtering for Sub-Agents

typescript

type AgentType = "explore" | "plan" | "general";

function filterTools(tools: Tool[], agentType: AgentType = "general"): Tool[] {
  // Different agent types get different tool sets
  const EXCLUDED_ALWAYS = new Set(["Agent"]); // prevent fork bombs

  const TYPE_ALLOWED: Record<AgentType, Set<string> | null> = {
    explore: new Set(["Grep", "Glob", "Read"]),
    plan:    new Set(["Grep", "Glob", "Read"]),  // read-only: explores and designs, doesn't modify files
    general: null, // all except EXCLUDED_ALWAYS
  };

  const allowed = TYPE_ALLOWED[agentType];
  return tools.filter(
    (t) =>
      !EXCLUDED_ALWAYS.has(t.name) &&
      (allowed === null || allowed.has(t.name))
  );
}

Result Aggregation

typescript

async function runWithSubAgents(
  parent: QueryEngine,
  tasks: Task[]
): Promise<string[]> {
  // Parent dispatches independent tasks to sub-agents
  const promises = tasks.map((task) =>
    spawnSubAgent(
      task.description,
      parent.tools,
      /* background= */ true,
      task.editsFiles ? "worktree" : "shared"
    )
  );

  // Wait for all sub-agents
  const results = await Promise.all(promises);

  // Each result is a short summary (not the full conversation)
  // Parent's context grows by ~200 tokens per sub-agent
  return results;
}

🔧

Break It — See What Happens

Shared context (no isolation)

No worktree isolation (shared filesystem)

📊

Real-World Numbers

Metric	Value
Parent context at spawn	50-150K tokens typical
Sub-agent initial context	50-200 tokens (task description only)
Result size back to parent	~200-500 tokens (summary)
Agent types	General, Explore (fast search), Plan (architecture)
Worktree creation	git worktree add (shared object store, separate working tree)
Excluded tools	Agent tool (prevents recursive fork bombs)

✨ Insight · The context savings are dramatic: a sub-agent that makes 10 tool calls generates ~5K tokens of internal conversation. Without isolation, the parent would inherit all 5K tokens. With isolation, the parent receives only a ~200-token summary — a 25x reduction in context growth.

🧠

Key Takeaways

What to remember for interviews

1Sub-agents start with empty message history — never inheriting the parent's 100K+ token conversation — maximizing the context window for the actual subtask.
2Sub-agents inherit all tools by default; harnesses typically denylist the Agent tool to prevent recursive fork bombs.
3Git worktree isolation gives each parallel sub-agent its own filesystem checkout so concurrent edits never conflict.
4Sub-agents return a single text summary (~200–500 tokens) to the parent, not their full conversation — limiting context growth to 25x less than without isolation.
5Spawn a sub-agent only when the subtask needs 10+ tool calls, is clearly scoped, and its result fits in a single summary; otherwise keep work in the parent loop.

📚

Interview Questions

Difficulty:

Company:

Showing 4 of 4

Design a sub-agent system with context isolation. How do you prevent context blowup?

★★★

AnthropicOpenAI

How would you implement parallel sub-agents that edit the same codebase safely?

★★★

GoogleMeta

What are the tradeoffs of foreground vs background sub-agent execution?

★★☆

AnthropicDatabricks

How would you implement a fan-out/fan-in pattern where 5 sub-agents research in parallel and a coordinator synthesizes results?

★★★

Anthropic

🤖 Sub-agents

Sub-Agent Lifecycle

The Intuition

Key Code Patterns

Break It — See What Happens

Real-World Numbers

Key Takeaways

Further Reading

Interview Questions