⚙️ Part 8
AI Engineering: Inside the Agent Harness
A deep dive into how production AI coding agents are built — based on Claude Code's architecture.
This section reverse-engineers the techniques that make AI coding agents work: the agentic loop, tool systems, permission gates, context management, and more. Each module covers one subsystem with interactive diagrams, real code patterns, and interview questions.
Architecture Overview
Click any module to jump to its page. Connections show data/control flow.
⚙️ AI Engineering (19 modules)
Agent Harness Architecture
Agentic loops, tool orchestration, permission systems, and context management
Claude Code runs a while(true) loop — here's what's inside
Tool System
Tool interface, Zod schemas, registry, orchestration, and parallel execution
5 Grep calls run in parallel, but Bash always waits its turn — why?
Sub-agents
Context isolation, worktrees, background execution, and result aggregation
Each sub-agent gets a fresh 200K context window — the parent keeps working
Commands & Skills
Slash commands, skill markdown files, prompt injection, and the command registry
/compact is instant but 'compact this' takes 3 seconds — one never hits the API
Plugins & MCP
Model Context Protocol, external tool servers, plugin lifecycle, and transport layers
Claude doesn't know if a tool is built-in or from an MCP server — by design
State Management
Dual state systems: React context for UI, module state for services
Two state systems coexist — one triggers re-renders, one doesn't. Mix them up and the terminal freezes.
Context Compaction
Auto-compact, reactive compact, microcompact, context collapse, and token budgets
At 80% context usage, the agent silently summarizes its own history to keep going
Terminal UI (Ink)
React reconciler for terminals, Yoga flexbox, ANSI rendering, and keyboard focus
It's React — but instead of DOM nodes, it writes ANSI escape codes to stdout
Memory System
File-based persistent memory, memory types, auto-save triggers, and cross-session recall
Claude remembers you're a senior engineer — across sessions, without a database
Hooks & Permissions
PreToolUse/PostToolUse hooks, 5-layer permission hierarchy, and safety gates
A shell script you wrote can veto any tool call before Claude even sees the result
Prompt Engineering (System)
System prompt assembly, cache boundary optimization, dynamic sections, and prompt variants
The system prompt has a secret boundary — everything before it is cached, everything after is fresh
Configuration & Schemas
Settings.json, Zod validation, feature flags, MDM policies, and config hierarchy
Zod validates every key at startup — one typo in settings.json blocks the entire CLI from booting.
Bridges & IDE Integration
WebSocket bridge, VS Code/JetBrains extensions, permission callbacks, and message routing
A WebSocket reconnect drops to 0ms perceived latency for the user — but rebuilds the entire IDE state in 3 round trips. Here’s why that’s a design constraint, not a bug.
Streaming & API Layer
Async generators, queryModelWithStreaming, SSE parsing, and backpressure
Tokens appear one by one because five async generators pipe data like Unix pipes
Error Recovery
Reactive compact retry, max output tokens escalation, abort handling, and graceful degradation
The API says 'prompt too long' — the agent silently compacts and retries before you notice
Speculative Execution
Parallel speculation, overlay filesystems, safe tool subsets, and acceptance criteria
While you're still typing, a speculative agent already searched the codebase for you
Coordinator/Worker Pattern
Multi-agent coordination, restricted tool sets, environment gating, and task distribution
The coordinator writes prompts, not code — it manages a team of worker agents
Session Persistence
Session JSON, /resume reconstruction, message history, file snapshots, and attribution
Close the terminal, reopen it, type --resume — the conversation continues exactly where you left off
Cost Tracking & Budgets
Token counting, budget limits, per-model pricing, rate limit handling, and spend alerts
Claude Code emits cost events on every API response. Miss one and a runaway agent burns $200 before the budget gate fires.
Suggested Learning Path
The Big Picture
Agent Harness
How Tools Work
Tool System + Hooks & Permissions
Context Management
Context Compaction + Prompt Cache
Extensions
Sub-agents + Commands & Skills
Persistence
State Management + Memory System
Infrastructure
Terminal UI + Bridges + Plugins & MCP + Config & Schemas
What You'll Learn
- The agentic loop pattern (REPL -> LLM -> tool_use -> execute -> loop)
- Tool orchestration with parallel/serial partitioning
- 5-layer permission hierarchy
- Prompt cache optimization with static/dynamic boundary
- 4-strategy context compaction
- Sub-agent spawning with context isolation
- File-based persistent memory
- React terminal rendering (Ink)
- Bridge pattern for multi-frontend support