🛟 Error Recovery
The API says 'prompt too long' — the agent silently compacts and retries before you notice
An AI agent running a 30-turn task will hit errors: the prompt grows too long, responses get truncated, rate limits kick in, the user presses Ctrl+C. The agent must recover from all of these without user intervention. Behind this is a transition system — each loop iteration classifies the outcome as either a Continue (retry) or Terminal (stop) transition.
- Continue: tool_use, reactive_compact_retry, max_output_tokens_recovery
- Terminal: completed, model_error, max_turns, aborted_streaming
- Each error type maps to a specific recovery strategy — or a clean exit
Error Recovery State Machine
What you are seeing
The query loop's transition system. Each iteration produces one of seven transitions — three that continue the loop and four that terminate it. The recovery strategies are encoded directly in the loop logic.
What to try
Trace each error scenario: what triggers it, how the agent recovers, and what prevents infinite retry loops.
# Query loop transition system
API call → success → tool_use? → YES → execute tools → LOOP
→ NO → COMPLETED (terminal)
API call → PromptTooLong → already compacted? → YES → MODEL_ERROR
→ NO → compact → LOOP
API call → MaxOutputTokens → retries >= 3? → YES → MODEL_ERROR
→ NO → double limit → LOOP
API call → RateLimit → sleep(retry_after) → LOOP
API call → AbortError → ABORTED (terminal)
The Intuition
What you’re seeing: the error decision tree — transient errors fork to retry-with-backoff, permanent errors fork to surface-or-abort. What to try: follow what happens when an HTTP 429 arrives mid-stream.
The Stakes
The agent is fixing a complex bug — 15 tool calls deep, 150K tokens of context. The API returns prompt too long.
Without recovery
Task fails. User restarts from scratch. Loses 15 minutes of work and all accumulated context.
With reactive compaction
Agent silently summarizes old turns, frees ~80K tokens, retries. User never notices.
Reactive Compaction
When the API returns "prompt too long," the agent doesn't fail — it compacts old messages by summarizing them, then retries. A one-shot flag (hasAttemptedReactiveCompact) prevents infinite loops: if compaction doesn't free enough space, the error becomes terminal.
Max Output Tokens Recovery
When the model's response is truncated (hit the output token limit), the agent retries with a doubled token limit — up to 3 attempts with escalating limits. This handles the common case where the model generates a long code block that gets cut off mid-function.
Abort Propagation
When the user presses Ctrl+C, an AbortController.abort() propagates through the entire pipeline — cancelling the API request, stopping tool execution, and terminating the stream. This happens in under 100ms, ensuring no wasted tokens or dangling operations.
Error Classification
Every error is classified as transient (retry: rate limits, network timeouts, prompt too long) or permanent (surface: invalid key, content policy violation, budget exhaustion). Getting this wrong is costly: retrying permanent errors wastes tokens, while surfacing transient errors unnecessarily interrupts the user.
AbortSignal Propagation Through Async Generators
When Ctrl+C fires, AbortController.abort() is called once — but the cancellation must propagate through the entire generator chain. Each async generator must pass the signal down to the next level and check signal.throwIfAborted() at yield points. The SDK's messages.stream() accepts an AbortSignal and propagates it to the underlying fetch() call, which aborts the in-flight HTTP request — no more tokens stream, the connection closes, and generation stops immediately (per MDN AbortController). Tools executing mid-stream are a harder problem: read-only tools (Read, Grep, Glob) are safe to cancel at any point, but Bash commands may have already written files. The recovery strategy is to let in-flight tool executions complete, then surface the abort transition rather than killing them mid-write.
Circuit Breaker for Multi-Agent Failures
In a coordinator/worker system, a failing downstream service (e.g., a test runner that always times out) can cascade: every worker retries, saturating the API rate limit and burning budget. The circuit breaker pattern (as described by Martin Fowler) adds a third state beyond transient/permanent: a tripped state where the agent stops attempting the failing operation entirely and surfaces the circuit-open error. After a cooldown period, it moves to half-open (try once) and resets to closed on success. For agent systems, the circuit breaker sits at the tool boundary: if Bash fails 3 times in a row with the same error pattern, trip the breaker and report rather than retrying indefinitely.
What happens when the API returns 'prompt too long' during an agent task?
Key Code Patterns
Query Loop with Error Recovery (TypeScript pseudocode)
const Transition = {
COMPLETED: "completed", // success — no more tool calls
TOOL_USE: "tool_use", // normal — execute tools, loop back
REACTIVE_COMPACT: "reactive_compact_retry", // prompt too long
MAX_TOKENS: "max_output_tokens_recovery", // output truncated
MODEL_ERROR: "model_error", // permanent API error
MAX_TURNS: "max_turns", // safety limit hit
ABORTED: "aborted_streaming", // user cancelled
} as const;
async function queryLoop(messages: Message[], tools: Tool[], config: Config): Promise<string> {
let hasAttemptedCompact = false;
let maxTokensRetries = 0;
while (true) {
try {
const response = await callApi(messages, tools);
const toolBlocks = extractToolUse(response);
if (toolBlocks.length === 0) return Transition.COMPLETED;
const results = await runTools(toolBlocks);
messages.push(...results);
// loop back
} catch (err) {
if (err instanceof PromptTooLongError) {
if (hasAttemptedCompact) return Transition.MODEL_ERROR; // already tried, give up
messages = await compact(messages);
hasAttemptedCompact = true;
continue; // retry with compacted messages
} else if (err instanceof MaxOutputTokensError) {
if (maxTokensRetries >= 3) return Transition.MODEL_ERROR;
maxTokensRetries++;
config.maxTokens *= 2; // escalate limit
continue;
} else if (err instanceof RateLimitError) {
await sleep(err.retryAfter); // backoff
continue;
} else if (err instanceof AbortError) {
return Transition.ABORTED;
}
throw err;
}
}
}Rate Limit Handling with Early Warning
function handleRateLimit(responseHeaders: Headers): void {
const remaining = parseInt(responseHeaders.get("x-ratelimit-remaining") ?? "0");
const resetAt = parseTime(responseHeaders.get("x-ratelimit-reset") ?? "");
const utilization = 1.0 - (remaining / limit);
const timeRemainingPct = (resetAt - now()) / windowSize;
if (utilization > timeRemainingPct) {
warn("Approaching rate limit — slowing down");
addDelay(calculateBackoff(utilization));
}
}Break It — See What Happens
Real-World Numbers
| Metric | Value |
|---|---|
| Max truncation retries | 3 attempts with escalating limits |
| Reactive compact attempts | 1 per loop iteration (one-shot) |
| Rate limit source | Parsed from response headers |
| Abort propagation | <100ms via AbortController |
| Error classification | Transient (retry) vs permanent (surface) |
Key Takeaways
What to remember for interviews
- 1Every loop iteration is classified as a Continue (retry) or Terminal (stop) transition — the agent never crashes, it classifies and routes every error.
- 2Reactive compaction is one-shot: the agent summarizes old messages to free space, but a hasAttemptedReactiveCompact flag prevents infinite compact-retry loops.
- 3Output truncation triggers up to 3 retries with escalating token limits; read-only tools are safe to cancel mid-stream, but Bash commands are let finish to avoid partial writes.
- 4Rate limits are handled proactively: the agent parses x-ratelimit-remaining headers and slows down before hitting the limit, not just after a 429 response.
- 5Circuit breakers extend the transient/permanent classification: after 3 identical Bash failures, the breaker trips and the agent reports rather than retrying indefinitely.
Further Reading
- Anthropic API Error Codes — Official reference for API error types, status codes, and recommended handling strategies.
- Exponential Backoff and Jitter (AWS) — AWS architecture blog on backoff strategies — full jitter outperforms equal jitter and decorrelated jitter.
- Circuit Breaker Pattern (Martin Fowler) — The pattern for preventing cascading failures when a downstream service is unhealthy.
- Claude Code (source) — Open-source reference for the error recovery and transition system described in this module.
- Release It! — Production-Ready Software (Nygard) — The book that codified circuit breakers, bulkheads, and timeouts — the stability patterns directly applied in agent error recovery.
- AbortController and AbortSignal (MDN) — The browser/Node.js API for cooperative cancellation — the mechanism behind Ctrl+C propagation through the streaming pipeline.
- Google SRE Book: Handling Overload — Google SRE's guidance on rate limiting, load shedding, and retry budgets — directly applicable to API rate-limit handling and backoff strategies.
An agent receives HTTP 429 (rate limit) from the API. Which transition should the loop take?
A Bash tool exits with code 1 (command not found). Should the agent treat this as a tool error or an API error?
Why does reactive compaction use a one-shot flag (`hasAttemptedReactiveCompact`) rather than retrying compaction until context fits?
A user presses Ctrl+C while the agent is mid-streaming a response. What should happen to in-flight tool calls?
Interview Questions
Showing 4 of 4
Design error recovery for an AI agent that handles context overflow mid-task.
★★★How would you implement graceful degradation when an LLM API rate-limits you?
★★☆What's the difference between transient and permanent errors in an agentic system?
★★☆Design an error recovery system that distinguishes between transient failures (retry) and permanent failures (escalate) for LLM tool calls.
★★★