AI Engineering: Inside the Agent Harness

⚙️ Part 8

AI Engineering: Inside the Agent Harness

A deep dive into how production AI coding agents are built — based on Claude Code's architecture.

This section reverse-engineers the techniques that make AI coding agents work: the agentic loop, tool systems, permission gates, context management, and more. Each module covers one subsystem with interactive diagrams, real code patterns, and interview questions.

Start with Agent Harness Browse All 19 Modules

Architecture Overview

Click any module to jump to its page. Connections show data/control flow.

⚙️ AI Engineering (19 modules)

⚙️#46

Agent Harness Architecture

Agentic loops, tool orchestration, permission systems, and context management

Claude Code runs a while(true) loop — here's what's inside

🔧#47

Tool System

Tool interface, Zod schemas, registry, orchestration, and parallel execution

5 Grep calls run in parallel, but Bash always waits its turn — why?

🤖#48

Sub-agents

Context isolation, worktrees, background execution, and result aggregation

Each sub-agent gets a fresh 200K context window — the parent keeps working

📝#49

Commands & Skills

Slash commands, skill markdown files, prompt injection, and the command registry

/compact is instant but 'compact this' takes 3 seconds — one never hits the API

🔌#50

Plugins & MCP

Model Context Protocol, external tool servers, plugin lifecycle, and transport layers

Claude doesn't know if a tool is built-in or from an MCP server — by design

🗄️#51

State Management

Dual state systems: React context for UI, module state for services

Two state systems coexist — one triggers re-renders, one doesn't. Mix them up and the terminal freezes.

🗜️#52

Context Compaction

Auto-compact, reactive compact, microcompact, context collapse, and token budgets

At 80% context usage, the agent silently summarizes its own history to keep going

🖥️#53

Terminal UI (Ink)

React reconciler for terminals, Yoga flexbox, ANSI rendering, and keyboard focus

It's React — but instead of DOM nodes, it writes ANSI escape codes to stdout

🧠#54

Memory System

File-based persistent memory, memory types, auto-save triggers, and cross-session recall

Claude remembers you're a senior engineer — across sessions, without a database

🔒#55

Hooks & Permissions

PreToolUse/PostToolUse hooks, 5-layer permission hierarchy, and safety gates

A shell script you wrote can veto any tool call before Claude even sees the result

📋#56

Prompt Engineering (System)

System prompt assembly, cache boundary optimization, dynamic sections, and prompt variants

The system prompt has a secret boundary — everything before it is cached, everything after is fresh

⚡#57

Configuration & Schemas

Settings.json, Zod validation, feature flags, MDM policies, and config hierarchy

Zod validates every key at startup — one typo in settings.json blocks the entire CLI from booting.

🌉#58

Bridges & IDE Integration

WebSocket bridge, VS Code/JetBrains extensions, permission callbacks, and message routing

A WebSocket reconnect drops to 0ms perceived latency for the user — but rebuilds the entire IDE state in 3 round trips. Here’s why that’s a design constraint, not a bug.

🌊#59

Streaming & API Layer

Async generators, queryModelWithStreaming, SSE parsing, and backpressure

Tokens appear one by one because five async generators pipe data like Unix pipes

🛟#60

Error Recovery

Reactive compact retry, max output tokens escalation, abort handling, and graceful degradation

The API says 'prompt too long' — the agent silently compacts and retries before you notice

🔮#61

Speculative Execution

Parallel speculation, overlay filesystems, safe tool subsets, and acceptance criteria

While you're still typing, a speculative agent already searched the codebase for you

👔#62

Coordinator/Worker Pattern

Multi-agent coordination, restricted tool sets, environment gating, and task distribution

The coordinator writes prompts, not code — it manages a team of worker agents

💾#63

Session Persistence

Session JSON, /resume reconstruction, message history, file snapshots, and attribution

Close the terminal, reopen it, type --resume — the conversation continues exactly where you left off

💰#64

Cost Tracking & Budgets

Token counting, budget limits, per-model pricing, rate limit handling, and spend alerts

Claude Code emits cost events on every API response. Miss one and a runaway agent burns $200 before the budget gate fires.

Suggested Learning Path

The Big Picture

Agent Harness

How Tools Work

Tool System + Hooks & Permissions

Context Management

Context Compaction + Prompt Cache

Extensions

Sub-agents + Commands & Skills

Persistence

State Management + Memory System

Infrastructure

Terminal UI + Bridges + Plugins & MCP + Config & Schemas

What You'll Learn

The agentic loop pattern (REPL -> LLM -> tool_use -> execute -> loop)
Tool orchestration with parallel/serial partitioning
5-layer permission hierarchy
Prompt cache optimization with static/dynamic boundary
4-strategy context compaction
Sub-agent spawning with context isolation
File-based persistent memory
React terminal rendering (Ink)
Bridge pattern for multi-frontend support