Skip to content

Transformer Math

Module 47 · AI Engineering

🔧 Tool System

5 Grep calls run in parallel, but Bash always waits its turn — why?

Status:

When Claude Code reads your files, it calls a Read tool. When it runs a command, it calls Bash. Behind these ~30 tools is a tool system — the registry, validator, orchestrator, and execution pipeline that makes tool use reliable at scale.

  • Every tool has a name, a prompt (what the LLM sees), and an inputSchema (Zod validation)
  • The registry sorts tools deterministically for prompt cache stability
  • Read-only tools run in parallel; write tools run serially — maximizing throughput without race conditions
🎮

Tool Execution Pipeline

What you are seeing

The complete lifecycle of a tool call: the model emits one or more tool_use blocks in a single turn, the harness finds each tool in the registry, validates input against the Zod schema, resolves permissions (deny/allow rules → PreToolUse hook → user prompt if needed), executes the tool, runs post-hooks, and returns the result.

What to try

Trace the flow from LLM output through each stage. Notice how validation catches bad input before execution, and how the partitioner groups read-only tools for parallel execution.

Tool Execution PipelineClick any step to learn moreDeny / Fail paths1Tool CallBash("npm test")2Find Toolregistry lookup by name3Validate InputZod schema parse4Pre-Tool Hooksshell scripts, can modify or block5Permission Check5-layer hierarchy6Executetool.call(input, context)7Post-Tool Hooksvalidate output, run formatters8Return Resulttool_result → query loopHook blocksexit 2 + reasonDeny rule matchedpattern in denyListUser rejectsinteractive prompt: NoTool Partitioningread-only runs in parallel, write is serialread-only (parallel)write (serial)Input:GrepGlobReadBashGrepBatch 1parallel (up to 10)GrepGlobReadBatch 2serial (one at a time)BashBatch 3parallel (up to 10)Grep[Grep, Glob, Read, Bash, Grep]3 batches: 3x parallel, 1x serial, 1x parallel

Try It: Tool Partitioner

Tool Call Partitioner

Build a tool queue to see how the scheduler batches read-only calls in parallel and isolates each write call serially.

Read
Write
0 tools queued
Add tools above to see batching
💡

The Intuition

How Batching Works in Practice

Claude groups tool calls by read/write safety. Read-only calls run in parallel (Batch 1), then writes run serially (Batch 2), then follow-up reads run in parallel again (Batch 3). This maximises throughput while keeping side-effects ordered.

Tool callRead-only?BatchExecution
Grep("TODO")YesBatch 1 (parallel)Runs with other reads
Glob("*.ts")YesBatch 1 (parallel)Runs with Grep
Read("main.ts")YesBatch 1 (parallel)Runs with both
Bash("npm test")NoBatch 2 (serial)Waits for Batch 1
Grep("error")YesBatch 3 (parallel)After Bash completes

The Tool Interface

Every tool implements the same interface: a name for identification, a prompt that the LLM sees as the tool description, and an inputSchema(Zod) that validates the LLM's generated input before execution. Two metadata methods — isReadOnly() and isConcurrencySafe() — tell the orchestrator how to schedule the tool.

Tool Registry

The getTools() function aggregates ~30 built-in tools, filters out any denied by configuration rules, and sorts them deterministically. The sorting is critical: tool definitions are part of the system prompt, and API providers cache prompt prefixes. Changing tool order breaks the cache, costing latency and money on every subsequent request.

💡 Tip · Tool prompt descriptions total ~3K tokens. With deterministic ordering, these tokens get a cache hit on every API call within a session. Over 50+ calls per task, this saves significant cost and latency.

Tool Orchestration

The model decides what runs in parallel: when it emits multiple tool_use blocks in a single assistant turn, the harness receives them already-parallel. partitionToolCalls() then enforces concurrency safety on what it receives — consecutive read-only tools (Glob, Grep, Read) execute concurrently, while write tools (Bash, Edit) each get their own serial slot to prevent race conditions. The harness can't create parallelism the model didn't request; its job is validation and safe scheduling. A practical concurrency ceiling (~10) limits file descriptor and memory pressure.

Execution Pipeline

Each tool call passes through a fixed pipeline:

  1. Find tool — registry lookup by name
  2. Validate input — Zod schema validation, reject malformed input with descriptive errors
  3. Permission resolution — deny rules checked first, then allow rules, then PreToolUse hook (which can itself allow/deny), then user prompt only if still unresolved. PreToolUse is part of this stage, not a stage before it — a hook that runs after user approval could mutate args the user already accepted.
  4. Execute — the tool's call() method runs
  5. Post-hooks — logging, metrics, result transformation
  6. Return result — tool_result appended to conversation
✨ Insight · The buildTool() pattern standardizes construction for all ~30 tools. Every tool gets the same validation, error handling, and hook integration — no tool can bypass the pipeline.

Error Isolation in Parallel Batches

When 3 Grep calls run concurrently and one fails, the other two should still complete and return their results. The harness wraps each parallel tool call in an individual try/catch — a failure produces a ToolError result for that slot, not a thrown exception that aborts the whole batch. All results (successes and errors alike) are appended to the conversation as tool_result blocks, with is_error: true set on failures. The LLM then reads all results together and decides how to handle partial failures — it might retry the failed call with different arguments, or proceed with the successful results if the failed one was non-critical. This mirrors how Promise.allSettled works: collect all outcomes, let the caller decide what to do with failures.

Tool Result Truncation

A grep across a large codebase can return tens of thousands of lines. Appending the full output verbatim to the conversation would explode token usage and trigger compaction on the very next turn. The tool system applies result truncation: if a tool result exceeds a character threshold (~10K characters), it is either truncated with a [truncated — X more lines]suffix, or routed to microcompact (which summarizes the result using the LLM before appending it). The truncation policy is tool-specific: Bash output is truncated from the end (keeping the most recent lines, since errors appear last), while Read output is truncated from the middle (keeping the file header and the section the agent asked about). This asymmetry matches how each tool's output is typically consumed.

Quick Check

Why does the tool registry sort tools in a deterministic order?

📐

Key Code Patterns

Tool Interface (TypeScript pseudocode)

typescript
interface Tool {
  name: string;
  prompt: string;       // LLM sees this description
  inputSchema: ZodSchema; // validates LLM's input
  call(input: unknown, context: Context): Promise<ToolResult>;
  isReadOnly(): boolean;
  isConcurrencySafe(): boolean;
}

function buildTool(config: {
  name: string;
  prompt: string;
  schema: ZodSchema;
  callFn: (input: unknown, ctx: Context) => Promise<ToolResult>;
  readOnly?: boolean;
  concurrencySafe?: boolean; // independent knob — a tool can be concurrency-safe without being read-only
}): Tool {
  // Standardized constructor — all 30 tools use this
  return {
    name: config.name,
    prompt: config.prompt,
    inputSchema: config.schema,
    async call(input, ctx) {
      const validated = config.schema.parse(input); // validate first
      return config.callFn(validated, ctx);          // then execute
    },
    isReadOnly: () => config.readOnly ?? false,
    // isConcurrencySafe can be independently configured; defaults to readOnly
    // as a conservative baseline but tools can override for finer control
    isConcurrencySafe: () => config.concurrencySafe ?? config.readOnly ?? false,
  };
}

Tool Partitioning

typescript
// Group read-only together (parallel), write tools separate (serial)
function partitionToolCalls(calls: ToolCall[]): ToolCall[][] {
  const batches: ToolCall[][] = [];
  let current: ToolCall[] = [];

  for (const call of calls) {
    const isReadOnly = call.tool.isReadOnly();
    const currentIsReadOnly = current[0]?.tool.isReadOnly() ?? isReadOnly;

    if (isReadOnly && currentIsReadOnly) {
      current.push(call);
    } else {
      if (current.length > 0) batches.push(current);
      current = [call];
    }
  }
  if (current.length > 0) batches.push(current);
  return batches;
}

// Example:
// Input:  [Grep, Glob, Read, Bash, Grep]
// Output: [[Grep, Glob, Read],  // parallel batch
//          [Bash],               // serial batch
//          [Grep]]               // parallel batch

Tool Execution Pipeline

typescript
async function executeToolCall(
  call: ToolCall,
  context: Context
): Promise<ToolResult | ToolError> {
  // 1. Find tool
  const tool = registry.get(call.name);
  if (!tool) return new ToolError(`Unknown tool: ${call.name}`);

  // 2. Validate input
  const parsed = tool.inputSchema.safeParse(call.input);
  if (!parsed.success) {
    return new ToolError(`Invalid input: ${parsed.error}`);
  }

  // 3. Pre-hooks
  const hookResult = await runPreHooks(tool, parsed.data);
  if (hookResult === DENY) return new ToolError("Denied by policy");

  // 4. Permission check
  if (!checkPermission(tool, parsed.data, context)) {
    return new ToolError("Permission denied");
  }

  // 5. Execute
  const result = await tool.call(parsed.data, context);

  // 6. Post-hooks
  await runPostHooks(tool, parsed.data, result);

  // 7. Return
  return new ToolResult(result);
}
🔧

Break It — See What Happens

No input validation
All tools serial (no parallel batching)
📊

Real-World Numbers

MetricValue
Built-in tools~30 tools
Max concurrent tools10 parallel per batch
Tool prompt token budget~3K tokens total
Input validationZod schema on every call
Throughput improvement3-5x with parallel read batching
Cache savings~90% on cached prefix tokens (deterministic sort)
✨ Insight · The 10-tool concurrency limit is a practical balance: higher limits increase memory pressure and file descriptor usage, while the diminishing returns of parallelism beyond 10 rarely justify the resource cost. Most exploration turns use 3-5 concurrent reads.
🧠

Key Takeaways

What to remember for interviews

  1. 1Every tool shares one interface — name, prompt (LLM description), inputSchema (Zod), call(), isReadOnly(), isConcurrencySafe() — enforced by the buildTool() factory for all ~30 built-in tools.
  2. 2The registry sorts tools deterministically so tool definitions in the system prompt are identical across requests, enabling API prompt prefix caching and saving ~90% on those ~3K tokens per call.
  3. 3The model controls parallelism by emitting multiple tool_use blocks in one turn; the harness enforces safety — read-only tools (Grep, Glob, Read) run concurrently (up to 10), write tools (Bash, Edit) serialize. This gives 3–5x throughput for exploration without race conditions.
  4. 4Zod schema validation runs before every tool call — it catches type errors and missing fields before execution, returning descriptive errors so the LLM can self-correct. Path traversal prevention requires explicit constraints in the schema or permission layer, not schema validation alone.
  5. 5Parallel batch failures are isolated per-slot using try/catch: one failed Grep doesn't abort the other two — all results (successes and errors) are returned together via Promise.allSettled semantics.
📚

Further Reading

🎯

Interview Questions

Difficulty:
Company:

Showing 4 of 4

Design a tool execution system for an AI agent that maximizes throughput while preventing race conditions.

★★★
GoogleAnthropic

How would you validate untrusted input from an LLM before executing a tool?

★★☆
AnthropicOpenAI

Why sort tools deterministically in the prompt? What's the cost impact?

★★☆
GoogleDatabricks

How would you design a buildTool() pattern that standardizes tool construction across 30+ tools?

★★☆
AnthropicGoogle