🔮 Speculative Execution
While you're still typing, a speculative agent already searched the codebase for you
While you're typing your next message, the agent is already working. A speculative agentruns in the background, predicting what you'll ask next and pre-computing the answer. If the prediction is right, the result appears instantly. If wrong, the work is silently discarded — no harm done.
- Writes go to an overlay filesystem — never touches real files until accepted
- Only safe tools allowed: Read, Glob, Grep, TaskGet, TaskList (no writes, no Bash)
- State machine:
idle→running→accepted/rejected
Speculative Execution Flow
What you are seeing
The lifecycle of a speculative execution: the agent predicts the user's next action, runs it with an overlay filesystem and restricted tools, then either merges the result or discards it based on the user's actual message.
What to try
Follow the two paths: what happens when speculation aligns with user intent (accept + merge) vs when it diverges (reject + discard). Notice how the overlay FS makes both paths safe.
# Speculative execution lifecycle
User finishes turn → agent idle
Suppression check: last turn expensive? → YES → speculate
1. Predict next action from conversation context
2. Create overlay FS (copy-on-write layer)
3. Filter tools → [Read, Glob, Grep, TaskGet, TaskList]
4. Run speculative agent with overlay + safe tools
# User sends next message
Aligns with speculation? → YES → ACCEPT
→ merge overlay to real FS
→ skip redundant work, show cached result
Diverges from speculation? → NO → REJECT
→ discard overlay (no harm done)
→ run user's actual request normally
The Intuition
How It Works in Practice
While you are thinking about what to type next, the agent is already working. It predicts your next request and runs a speculative search using an overlay filesystem — writes go to a temp layer, not real files.
- Prediction matcheswhat you actually ask → instant results, overlay merges into real FS
- Prediction diverges→ discard the overlay, no harm done, run normally
Like CPU branch prediction, but for coding tasks. The overlay filesystem makes the bet fully reversible.
The CPU Analogy
This is the same pattern CPUs use for branch prediction: predict which branch the code will take, execute speculatively, then commit the result if the prediction was right or flush the pipeline if wrong. The agent predicts the user's "branch" (their next request), executes speculatively with the overlay FS as its pipeline, and either commits (merge) or flushes (discard).
Tool Filtering
The speculative agent only gets 5 tools: Read, Glob, Grep, TaskGet, and TaskList. All read-only, all side-effect-free. Even if the speculative agent hallucinates a dangerous action, it literally cannot execute it — the tool isn't available. This is defense in depth: the overlay FS protects against bad writes, and tool filtering prevents writes from being attempted at all.
Suppression Heuristics
Speculation isn't free — it costs an API call. The executor suppresses speculation when: the last turn was cheap (simple question, no tools), when the last tool usage was read-only (nothing to follow up on), or when the cost would exceed a threshold. In practice, speculation is suppressed ~60% of turns.
Prompt Suggestions
Beyond full speculative execution, the system can generate prompt suggestions — predictions of what the user will ask next, shown as clickable options. These are cheaper than full speculation (just a prediction, no execution) and help the user articulate their intent faster.
How Semantic Alignment Works
The accept/reject decision cannot use exact string matching — users rephrase the same intent in many ways. Instead, the speculative executor records a predicted intent label(e.g., "investigate the TypeError in auth/login.ts") alongside the result. When the user's message arrives, the system asks the model to classify whether the message aligns with that label — this is a cheap, single-turn call with no tool access. The classification prompt is short and templated: "Does ' {userMessage}' ask for '{predictedIntent}'? Answer YES or NO." Because the model is asked a binary question with forced-choice output, the latency is minimal (~100ms). A YES triggers the overlay merge; a NO triggers discard and normal execution. This is the same self-consistency pattern used in chain-of-thought prompting to validate reasoning steps — a second, cheaper model call to verify the first model's prediction.
Partial Speculation — Pre-Reading Without Pre-Writing
Even when full speculation is suppressed (the previous turn was cheap, or the task is unpredictable), the system can do partial speculation: pre-read files the user is likely to ask about next. If the user just edited src/auth/login.ts, the system pre-reads src/auth/middleware.ts and the test file. These reads are cheap (no API call, just disk I/O) and populate the tool result cache. When the user does ask about those files, the agent can skip the Read tool call entirely — the content is already in memory. This is a lower-risk form of speculation: no writes, no overlay, no alignment check needed — just prefetching.
Why does the speculative agent use an overlay filesystem?
Key Code Patterns
Speculative Executor (TypeScript pseudocode)
const SpeculationState = {
IDLE: "idle",
RUNNING: "running",
ACCEPTED: "accepted",
REJECTED: "rejected",
} as const;
type SpeculationStateValue = typeof SpeculationState[keyof typeof SpeculationState];
class SpeculativeExecutor {
private state: SpeculationStateValue = SpeculationState.IDLE;
private overlayFs: OverlayFileSystem = new OverlayFileSystem();
private safeTools: string[] = ["Read", "Glob", "Grep", "TaskGet", "TaskList"];
private result: unknown = null;
// Run speculative work in background
async speculate(conversation: Conversation): Promise<void> {
if (this.shouldSuppress(conversation)) return;
this.state = SpeculationState.RUNNING;
// Predict next steps
const prediction = await predictNextAction(conversation);
// Run with overlay FS and restricted tools
const engine = new QueryEngine({
tools: filterTools(this.safeTools),
filesystem: this.overlayFs, // writes go to overlay
});
this.result = await engine.submit(prediction);
}
// Check if speculation matches user intent
onUserMessage(message: string): void {
if (this.state !== SpeculationState.RUNNING) return;
if (alignsWithSpeculation(message, this.result)) {
this.state = SpeculationState.ACCEPTED;
this.overlayFs.mergeToReal(); // apply cached work
} else {
this.state = SpeculationState.REJECTED;
this.overlayFs.discard(); // throw away, no harm
}
}
// Don't speculate if it's not worth it
private shouldSuppress(conversation: Conversation): boolean {
if (conversation.lastTurnCost < threshold) return true; // cheap turn
if (conversation.lastToolWasReadOnly) return true; // nothing to speculate
return false;
}
}Overlay Filesystem (Copy-on-Write)
// Copy-on-write filesystem — reads from real, writes to temp
class OverlayFileSystem {
private overlay: Map<string, string> = new Map(); // path -> content
read(path: string): string {
if (this.overlay.has(path)) {
return this.overlay.get(path)!;
}
return realFs.read(path);
}
write(path: string, content: string): void {
this.overlay.set(path, content); // never touches real FS
}
mergeToReal(): void {
for (const [path, content] of this.overlay) {
realFs.write(path, content);
}
}
discard(): void {
this.overlay.clear();
}
}Break It — See What Happens
Real-World Numbers
| Metric | Value |
|---|---|
| Safe tools allowed | 5 (Read, Glob, Grep, TaskGet, TaskList) |
| Suppression rate | ~60% of turns (cost/relevance thresholds) |
| Overlay FS read latency | <1ms overhead per read |
| Merge/discard cost | Instant (file copy or dir delete) |
| Acceptance rate | Varies by task type |
Key Takeaways
What to remember for interviews
- 1Speculative execution predicts the user's next request and pre-computes the answer while they type — if correct, results appear instantly; if wrong, work is silently discarded.
- 2An overlay filesystem (copy-on-write) makes speculation safe: reads fall through to the real FS, writes go to a temp layer that is either merged (accept) or deleted (reject).
- 3Tool filtering is defense in depth: the speculative agent only gets 5 read-only tools (Read, Glob, Grep, TaskGet, TaskList) — it literally cannot execute writes even if it tries.
- 4Speculation is suppressed ~60% of turns via heuristics: skip if the last turn was cheap, read-only, or prediction confidence is low.
- 5Accept/reject uses semantic alignment, not string matching — a cheap binary classification call checks whether the user's message aligns with the predicted intent label.
Further Reading
- OverlayFS Documentation (Linux Kernel) — The kernel filesystem that inspired the copy-on-write pattern used in speculative execution.
- Speculative Execution in CPUs (Hennessy & Patterson) — The CPU architecture concept — predict the branch, execute speculatively, commit or rollback.
- Branch Prediction (Wikipedia) — How CPUs predict which branch to take — the same predict-execute-verify pattern applies to agent speculation.
- Claude Code (source) — Open-source agentic coding tool that may contain related implementation ideas for speculative execution and overlay-based isolation.
- Spectre and Meltdown: Lessons for Software Design — The real-world consequences of CPU speculative execution gone wrong — illustrates why side-effect isolation (overlay FS) is non-negotiable before committing speculative work.
- Copy-on-Write Semantics (Linux man page: fork) — The OS-level COW primitive that makes fork cheap — the same copy-on-write principle applied to filesystem overlays in agent speculative execution.
- Git Stash and Worktrees — The git primitives for saving and restoring working state — the lightweight alternative to overlay FS for speculative edits confined to git-tracked files.
Interview Questions
Showing 4 of 4
Design a speculative execution system for an AI agent. How do you ensure safety?
★★★What's the difference between an overlay filesystem and a git worktree for isolation?
★★☆How would you decide when speculation is worth the compute cost?
★★☆What verification strategy prevents speculative execution from committing side effects that the user hasn't approved?
★★★