🔧 Tool System
5 Grep calls run in parallel, but Bash always waits its turn — why?
When Claude Code reads your files, it calls a Read tool. When it runs a command, it calls Bash. Behind these ~30 tools is a tool system — the registry, validator, orchestrator, and execution pipeline that makes tool use reliable at scale.
- Every tool has a name, a prompt (what the LLM sees), and an inputSchema (Zod validation)
- The registry sorts tools deterministically for prompt cache stability
- Read-only tools run in parallel; write tools run serially — maximizing throughput without race conditions
Tool Execution Pipeline
What you are seeing
The complete lifecycle of a tool call: the model emits one or more tool_use blocks in a single turn, the harness finds each tool in the registry, validates input against the Zod schema, resolves permissions (deny/allow rules → PreToolUse hook → user prompt if needed), executes the tool, runs post-hooks, and returns the result.
What to try
Trace the flow from LLM output through each stage. Notice how validation catches bad input before execution, and how the partitioner groups read-only tools for parallel execution.
Try It: Tool Partitioner
Tool Call Partitioner
Build a tool queue to see how the scheduler batches read-only calls in parallel and isolates each write call serially.
The Intuition
How Batching Works in Practice
Claude groups tool calls by read/write safety. Read-only calls run in parallel (Batch 1), then writes run serially (Batch 2), then follow-up reads run in parallel again (Batch 3). This maximises throughput while keeping side-effects ordered.
| Tool call | Read-only? | Batch | Execution |
|---|---|---|---|
Grep("TODO") | Yes | Batch 1 (parallel) | Runs with other reads |
Glob("*.ts") | Yes | Batch 1 (parallel) | Runs with Grep |
Read("main.ts") | Yes | Batch 1 (parallel) | Runs with both |
Bash("npm test") | No | Batch 2 (serial) | Waits for Batch 1 |
Grep("error") | Yes | Batch 3 (parallel) | After Bash completes |
The Tool Interface
Every tool implements the same interface: a name for identification, a prompt that the LLM sees as the tool description, and an inputSchema(Zod) that validates the LLM's generated input before execution. Two metadata methods — isReadOnly() and isConcurrencySafe() — tell the orchestrator how to schedule the tool.
Tool Registry
The getTools() function aggregates ~30 built-in tools, filters out any denied by configuration rules, and sorts them deterministically. The sorting is critical: tool definitions are part of the system prompt, and API providers cache prompt prefixes. Changing tool order breaks the cache, costing latency and money on every subsequent request.
Tool Orchestration
The model decides what runs in parallel: when it emits multiple tool_use blocks in a single assistant turn, the harness receives them already-parallel. partitionToolCalls() then enforces concurrency safety on what it receives — consecutive read-only tools (Glob, Grep, Read) execute concurrently, while write tools (Bash, Edit) each get their own serial slot to prevent race conditions. The harness can't create parallelism the model didn't request; its job is validation and safe scheduling. A practical concurrency ceiling (~10) limits file descriptor and memory pressure.
Execution Pipeline
Each tool call passes through a fixed pipeline:
- Find tool — registry lookup by name
- Validate input — Zod schema validation, reject malformed input with descriptive errors
- Permission resolution — deny rules checked first, then allow rules, then PreToolUse hook (which can itself allow/deny), then user prompt only if still unresolved. PreToolUse is part of this stage, not a stage before it — a hook that runs after user approval could mutate args the user already accepted.
- Execute — the tool's
call()method runs - Post-hooks — logging, metrics, result transformation
- Return result — tool_result appended to conversation
Error Isolation in Parallel Batches
When 3 Grep calls run concurrently and one fails, the other two should still complete and return their results. The harness wraps each parallel tool call in an individual try/catch — a failure produces a ToolError result for that slot, not a thrown exception that aborts the whole batch. All results (successes and errors alike) are appended to the conversation as tool_result blocks, with is_error: true set on failures. The LLM then reads all results together and decides how to handle partial failures — it might retry the failed call with different arguments, or proceed with the successful results if the failed one was non-critical. This mirrors how Promise.allSettled works: collect all outcomes, let the caller decide what to do with failures.
Tool Result Truncation
A grep across a large codebase can return tens of thousands of lines. Appending the full output verbatim to the conversation would explode token usage and trigger compaction on the very next turn. The tool system applies result truncation: if a tool result exceeds a character threshold (~10K characters), it is either truncated with a [truncated — X more lines]suffix, or routed to microcompact (which summarizes the result using the LLM before appending it). The truncation policy is tool-specific: Bash output is truncated from the end (keeping the most recent lines, since errors appear last), while Read output is truncated from the middle (keeping the file header and the section the agent asked about). This asymmetry matches how each tool's output is typically consumed.
Why does the tool registry sort tools in a deterministic order?
Key Code Patterns
Tool Interface (TypeScript pseudocode)
interface Tool {
name: string;
prompt: string; // LLM sees this description
inputSchema: ZodSchema; // validates LLM's input
call(input: unknown, context: Context): Promise<ToolResult>;
isReadOnly(): boolean;
isConcurrencySafe(): boolean;
}
function buildTool(config: {
name: string;
prompt: string;
schema: ZodSchema;
callFn: (input: unknown, ctx: Context) => Promise<ToolResult>;
readOnly?: boolean;
concurrencySafe?: boolean; // independent knob — a tool can be concurrency-safe without being read-only
}): Tool {
// Standardized constructor — all 30 tools use this
return {
name: config.name,
prompt: config.prompt,
inputSchema: config.schema,
async call(input, ctx) {
const validated = config.schema.parse(input); // validate first
return config.callFn(validated, ctx); // then execute
},
isReadOnly: () => config.readOnly ?? false,
// isConcurrencySafe can be independently configured; defaults to readOnly
// as a conservative baseline but tools can override for finer control
isConcurrencySafe: () => config.concurrencySafe ?? config.readOnly ?? false,
};
}Tool Partitioning
// Group read-only together (parallel), write tools separate (serial)
function partitionToolCalls(calls: ToolCall[]): ToolCall[][] {
const batches: ToolCall[][] = [];
let current: ToolCall[] = [];
for (const call of calls) {
const isReadOnly = call.tool.isReadOnly();
const currentIsReadOnly = current[0]?.tool.isReadOnly() ?? isReadOnly;
if (isReadOnly && currentIsReadOnly) {
current.push(call);
} else {
if (current.length > 0) batches.push(current);
current = [call];
}
}
if (current.length > 0) batches.push(current);
return batches;
}
// Example:
// Input: [Grep, Glob, Read, Bash, Grep]
// Output: [[Grep, Glob, Read], // parallel batch
// [Bash], // serial batch
// [Grep]] // parallel batchTool Execution Pipeline
async function executeToolCall(
call: ToolCall,
context: Context
): Promise<ToolResult | ToolError> {
// 1. Find tool
const tool = registry.get(call.name);
if (!tool) return new ToolError(`Unknown tool: ${call.name}`);
// 2. Validate input
const parsed = tool.inputSchema.safeParse(call.input);
if (!parsed.success) {
return new ToolError(`Invalid input: ${parsed.error}`);
}
// 3. Pre-hooks
const hookResult = await runPreHooks(tool, parsed.data);
if (hookResult === DENY) return new ToolError("Denied by policy");
// 4. Permission check
if (!checkPermission(tool, parsed.data, context)) {
return new ToolError("Permission denied");
}
// 5. Execute
const result = await tool.call(parsed.data, context);
// 6. Post-hooks
await runPostHooks(tool, parsed.data, result);
// 7. Return
return new ToolResult(result);
}Break It — See What Happens
Real-World Numbers
| Metric | Value |
|---|---|
| Built-in tools | ~30 tools |
| Max concurrent tools | 10 parallel per batch |
| Tool prompt token budget | ~3K tokens total |
| Input validation | Zod schema on every call |
| Throughput improvement | 3-5x with parallel read batching |
| Cache savings | ~90% on cached prefix tokens (deterministic sort) |
Key Takeaways
What to remember for interviews
- 1Every tool shares one interface — name, prompt (LLM description), inputSchema (Zod), call(), isReadOnly(), isConcurrencySafe() — enforced by the buildTool() factory for all ~30 built-in tools.
- 2The registry sorts tools deterministically so tool definitions in the system prompt are identical across requests, enabling API prompt prefix caching and saving ~90% on those ~3K tokens per call.
- 3The model controls parallelism by emitting multiple tool_use blocks in one turn; the harness enforces safety — read-only tools (Grep, Glob, Read) run concurrently (up to 10), write tools (Bash, Edit) serialize. This gives 3–5x throughput for exploration without race conditions.
- 4Zod schema validation runs before every tool call — it catches type errors and missing fields before execution, returning descriptive errors so the LLM can self-correct. Path traversal prevention requires explicit constraints in the schema or permission layer, not schema validation alone.
- 5Parallel batch failures are isolated per-slot using try/catch: one failed Grep doesn't abort the other two — all results (successes and errors) are returned together via Promise.allSettled semantics.
Further Reading
- Toolformer: Language Models Can Teach Themselves to Use Tools — Schick et al., 2023 — training LLMs to decide when and how to call external tools.
- Gorilla: Large Language Model Connected with Massive APIs — Patil et al., 2023 — improving LLM accuracy in API call generation via retrieval-augmented training.
- Claude Code (source) — Open-source reference for a production agent tool system — the architecture this module describes.
- JSON Schema Specification — The standard behind tool input validation — understanding this is key to designing tool interfaces.
- Berkeley Function-Calling Leaderboard — Live benchmark for LLM tool-calling accuracy — shows which models handle nested schemas, parallel calls, and error recovery best.
- OpenAI Function Calling Guide — The reference design for LLM tool interfaces — parallel function calling, strict mode, and tool_choice options.
- Zod: TypeScript-first Schema Validation — The runtime validation library used in production agent tool systems — bridges TypeScript types and runtime input validation.
Interview Questions
Showing 4 of 4
Design a tool execution system for an AI agent that maximizes throughput while preventing race conditions.
★★★How would you validate untrusted input from an LLM before executing a tool?
★★☆Why sort tools deterministically in the prompt? What's the cost impact?
★★☆How would you design a buildTool() pattern that standardizes tool construction across 30+ tools?
★★☆