Skip to content

Transformer Math

Module 48 · AI Engineering

🤖 Sub-agents

Each sub-agent gets a fresh 200K context window — the parent keeps working

Status:

When a task is too complex for one agent, it spawns sub-agents — fresh instances with clean context that work on subtasks independently. This is how Claude Code handles parallel file exploration, background research, and isolated code changes without polluting the main conversation.

  • Each sub-agent gets a fresh QueryEngine with empty message history
  • Sub-agents inherit all tools by default — harnesses typically restrict the Agent tool to prevent recursive fork bombs
  • Worktree isolation prevents file conflicts between parallel agents
  • Results return as a single text summary, not the full conversation
🎮

Sub-Agent Lifecycle

What you are seeing

The complete lifecycle of a sub-agent: the parent spawns it with a task description, the sub-agent works independently with its own tools and context, and returns a summary when done.

What to try

Compare foreground (blocking) vs background (non-blocking) execution. Notice how the parent's context stays clean regardless of how many tool calls the sub-agent makes.

// Parent spawns sub-agent

Parent context: 45K tokens (100+ messages)

→ spawn_sub_agent("Find all TODO comments")

// Sub-agent starts fresh

Sub-agent context: 0 tokens (empty history)

Tools: [Grep, Glob, Read] (no Agent, no Bash)

1. Grep "TODO" ./src → 23 matches

2. Read 5 files for context

3. Summarize findings

// Result back to parent

→ "Found 23 TODOs across 12 files. Critical: ..."

Parent context: 45K + ~200 tokens (just the summary)

💡

The Intuition

What you’re seeing: a parent agent fanning out to 4 sub-agents, each with its own isolated context window; only the summary returns up. What to try: follow why the recursion guard prevents sub-agents from spawning more sub-agents.

Sub-Agent Fan-Out TopologyParent Agentfull contextSub-ASearch filesctx: isolatedSub-BRun testsctx: isolatedSub-CLint codectx: isolatedSub-DCheck typesctx: isolatedsummary onlytask descriptions dispatched in parallelParent agentSub-agent (isolated ctx)Task dispatchSummary return

Why Fresh Context?

The parent has 80K tokens of history about login bugs. The sub-agent needs to search for CSS files — that history is noise. Fresh QueryEngine = full context window for the actual task. Inheriting the parent's context would waste tokens and risk the sub-agent losing focus or hitting the context limit before finishing.

Context Isolation

The parent agent might have 100K+ tokens of conversation history — file contents, tool results, reasoning. If a sub-agent inherited all of that, it would waste context on irrelevant information and risk hitting the context limit before completing its task. Instead, each sub-agent starts with messages=[] — a clean slate. It gets only the task description, typically 50-200 tokens.

Tool Subset

Sub-agents inherit all tools by default, but harnesses typically apply an allowlist or denylist. The most common restriction: remove the Agent tool so sub-agents cannot recursively spawn more sub-agents, creating a fork bomb that consumes all available resources.

💡 Tip · Different sub-agent types get different tool sets. An Explore agent gets only read tools (Grep, Glob, Read) for fast search. A Plan agent is typically read-only — it explores and designs but does not modify files. A general-purpose agent gets everything except Agent.

Foreground vs Background Execution

Foreground: the parent waits for the sub-agent to finish. Simple, the result is immediately available. Background: the parent continues working while the sub-agent runs asynchronously. Higher throughput but the parent must handle the result arriving later — and the sub-agent's changes may conflict with the parent's concurrent edits.

Worktree Isolation

Worktree isolation is an optional mechanism for filesystem safety. Setting isolation: 'worktree'creates a git worktree — a separate checkout of the same repo at a different path. The sub-agent edits its worktree without affecting the parent's working directory. When done, changes are merged back. Many sub-agents run without this — it is opt-in for cases where parallel edits would conflict.

✨ Insight · Sub-agents can potentially be resumed or continued via messaging — the parent can send follow-up instructions to a running sub-agent. In practice, many harnesses treat sub-agents as disposable: if one fails, the parent retries or takes a different approach. Success returns a summary, failure returns an error message, and the parent decides what to do next.

When NOT to Use Sub-Agents

Sub-agents add overhead: spawning a new QueryEngine, assembling a fresh system prompt, and making at least one extra API call. For tasks under 5 tool calls, the overhead is not worth it — just do the work in the parent. The right signal for sub-agents is independent subtasks that each need 10+ tool calls. Over-spawning creates a different problem: if the parent spawns 10 sub-agents that each read overlapping sets of files, you pay 10x the token cost for redundant reads with no parallelism benefit on shared files. A useful heuristic: spawn a sub-agent when (1) the subtask is clearly scoped, (2) it does not need to share mutable state with the parent in real time, and (3) its result can be expressed as a single text summary. If any of these fail, keep the work in the parent loop.

Result Aggregation Strategies

When multiple sub-agents finish, the parent must combine their outputs into a coherent view. Three common patterns:

  • Concatenation— simplest. Append all summaries and let the LLM reconcile. Works when subtasks are truly independent (e.g., "analyze module A" +"analyze module B").
  • Structured return — sub-agents return JSON instead of prose. The parent aggregates fields programmatically before presenting to the LLM. Avoids the LLM having to parse free-form summaries from 5 agents.
  • Hierarchical synthesis — after sub-agents finish, a dedicated synthesis sub-agent reads all summaries and produces a single merged report. The parent only ever sees one final summary. Higher cost but better coherence for 10+ sub-agents.

The key constraint: each sub-agent result appended to the parent adds ~200–500 tokens. Spawning 20 sub-agents adds 4–10K tokens to the parent's context. At scale, structured returns and hierarchical synthesis are essential to keep the parent's context from bloating.

Quick Check

Why do sub-agents start with empty message history?

📐

Key Code Patterns

Sub-Agent Spawning (TypeScript pseudocode)

typescript
async function spawnSubAgent(
  task: string,
  tools: Tool[],
  background = false,
  isolation?: "worktree"
): Promise<string> {
  // 1. Create fresh QueryEngine (clean context)
  const engine = new QueryEngine({
    tools: filterTools(tools), // no Agent tool
    messages: [],              // empty history
    abortController: new AbortController(),
  });

  // 2. Optional worktree isolation
  if (isolation === "worktree") {
    const worktreePath = createGitWorktree();
    engine.cwd = worktreePath;
  }

  // 3. Run the task
  if (background) {
    void engine.submit(task); // fire-and-forget
    return "Agent running in background";
  }

  const result = await engine.submit(task);
  return result.finalText; // single string back to parent
}

Tool Filtering for Sub-Agents

typescript
type AgentType = "explore" | "plan" | "general";

function filterTools(tools: Tool[], agentType: AgentType = "general"): Tool[] {
  // Different agent types get different tool sets
  const EXCLUDED_ALWAYS = new Set(["Agent"]); // prevent fork bombs

  const TYPE_ALLOWED: Record<AgentType, Set<string> | null> = {
    explore: new Set(["Grep", "Glob", "Read"]),
    plan:    new Set(["Grep", "Glob", "Read"]),  // read-only: explores and designs, doesn't modify files
    general: null, // all except EXCLUDED_ALWAYS
  };

  const allowed = TYPE_ALLOWED[agentType];
  return tools.filter(
    (t) =>
      !EXCLUDED_ALWAYS.has(t.name) &&
      (allowed === null || allowed.has(t.name))
  );
}

Result Aggregation

typescript
async function runWithSubAgents(
  parent: QueryEngine,
  tasks: Task[]
): Promise<string[]> {
  // Parent dispatches independent tasks to sub-agents
  const promises = tasks.map((task) =>
    spawnSubAgent(
      task.description,
      parent.tools,
      /* background= */ true,
      task.editsFiles ? "worktree" : "shared"
    )
  );

  // Wait for all sub-agents
  const results = await Promise.all(promises);

  // Each result is a short summary (not the full conversation)
  // Parent's context grows by ~200 tokens per sub-agent
  return results;
}
🔧

Break It — See What Happens

Shared context (no isolation)
No worktree isolation (shared filesystem)
📊

Real-World Numbers

MetricValue
Parent context at spawn50-150K tokens typical
Sub-agent initial context50-200 tokens (task description only)
Result size back to parent~200-500 tokens (summary)
Agent typesGeneral, Explore (fast search), Plan (architecture)
Worktree creationgit worktree add (shared object store, separate working tree)
Excluded toolsAgent tool (prevents recursive fork bombs)
✨ Insight · The context savings are dramatic: a sub-agent that makes 10 tool calls generates ~5K tokens of internal conversation. Without isolation, the parent would inherit all 5K tokens. With isolation, the parent receives only a ~200-token summary — a 25x reduction in context growth.
🧠

Key Takeaways

What to remember for interviews

  1. 1Sub-agents start with empty message history — never inheriting the parent's 100K+ token conversation — maximizing the context window for the actual subtask.
  2. 2Sub-agents inherit all tools by default; harnesses typically denylist the Agent tool to prevent recursive fork bombs.
  3. 3Git worktree isolation gives each parallel sub-agent its own filesystem checkout so concurrent edits never conflict.
  4. 4Sub-agents return a single text summary (~200–500 tokens) to the parent, not their full conversation — limiting context growth to 25x less than without isolation.
  5. 5Spawn a sub-agent only when the subtask needs 10+ tool calls, is clearly scoped, and its result fits in a single summary; otherwise keep work in the parent loop.
📚

Further Reading

Recall

When does spawning a sub-agent reduce parent context usage?

When does spawning a sub-agent reduce parent context usage?
Recall

Why do harnesses like Claude Code restrict sub-agents from using the Agent tool?

Why do harnesses like Claude Code restrict sub-agents from using the Agent tool?
Trade-off

Two parallel sub-agents both need to edit files in the same repository. What is the correct mechanism to prevent file conflicts?

Two parallel sub-agents both need to edit files in the same repository. What is the correct mechanism to prevent file conflicts?
Trade-off

In a fan-out pattern, why should each sub-agent write its findings to a dedicated output file rather than returning them inline as text?

In a fan-out pattern, why should each sub-agent write its findings to a dedicated output file rather than returning them inline as text?
🎯

Interview Questions

Difficulty:
Company:

Showing 4 of 4

Design a sub-agent system with context isolation. How do you prevent context blowup?

★★★
AnthropicOpenAI

How would you implement parallel sub-agents that edit the same codebase safely?

★★★
GoogleMeta

What are the tradeoffs of foreground vs background sub-agent execution?

★★☆
AnthropicDatabricks

How would you implement a fan-out/fan-in pattern where 5 sub-agents research in parallel and a coordinator synthesizes results?

★★★
Anthropic