Skip to content

Transformer Math

⚙️ Part 8

AI Engineering: Inside the Agent Harness

A deep dive into how production AI coding agents are built — based on Claude Code's architecture.

This section reverse-engineers the techniques that make AI coding agents work: the agentic loop, tool systems, permission gates, context management, and more. Each module covers one subsystem with interactive diagrams, real code patterns, and interview questions.

⚙️ AI Engineering (19 modules)

⚙️#46

Agent Harness Architecture

Agentic loops, tool orchestration, permission systems, and context management

Claude Code runs a while(true) loop — here's what's inside

🔧#47

Tool System

Tool interface, Zod schemas, registry, orchestration, and parallel execution

5 Grep calls run in parallel, but Bash always waits its turn — why?

🤖#48

Sub-agents

Context isolation, worktrees, background execution, and result aggregation

Each sub-agent gets a fresh 200K context window — the parent keeps working

📝#49

Commands & Skills

Slash commands, skill markdown files, prompt injection, and the command registry

/compact is instant but 'compact this' takes 3 seconds — one never hits the API

🔌#50

Plugins & MCP

Model Context Protocol, external tool servers, plugin lifecycle, and transport layers

Claude doesn't know if a tool is built-in or from an MCP server — by design

🗄️#51

State Management

Dual state systems: React context for UI, module state for services

Two state systems coexist — one triggers re-renders, one doesn't. Mix them up and the terminal freezes.

🗜️#52

Context Compaction

Auto-compact, reactive compact, microcompact, context collapse, and token budgets

At 80% context usage, the agent silently summarizes its own history to keep going

🖥️#53

Terminal UI (Ink)

React reconciler for terminals, Yoga flexbox, ANSI rendering, and keyboard focus

It's React — but instead of DOM nodes, it writes ANSI escape codes to stdout

🧠#54

Memory System

File-based persistent memory, memory types, auto-save triggers, and cross-session recall

Claude remembers you're a senior engineer — across sessions, without a database

🔒#55

Hooks & Permissions

PreToolUse/PostToolUse hooks, 5-layer permission hierarchy, and safety gates

A shell script you wrote can veto any tool call before Claude even sees the result

📋#56

Prompt Engineering (System)

System prompt assembly, cache boundary optimization, dynamic sections, and prompt variants

The system prompt has a secret boundary — everything before it is cached, everything after is fresh

#57

Configuration & Schemas

Settings.json, Zod validation, feature flags, MDM policies, and config hierarchy

Zod validates every key at startup — one typo in settings.json blocks the entire CLI from booting.

🌉#58

Bridges & IDE Integration

WebSocket bridge, VS Code/JetBrains extensions, permission callbacks, and message routing

A WebSocket reconnect drops to 0ms perceived latency for the user — but rebuilds the entire IDE state in 3 round trips. Here’s why that’s a design constraint, not a bug.

🌊#59

Streaming & API Layer

Async generators, queryModelWithStreaming, SSE parsing, and backpressure

Tokens appear one by one because five async generators pipe data like Unix pipes

🛟#60

Error Recovery

Reactive compact retry, max output tokens escalation, abort handling, and graceful degradation

The API says 'prompt too long' — the agent silently compacts and retries before you notice

🔮#61

Speculative Execution

Parallel speculation, overlay filesystems, safe tool subsets, and acceptance criteria

While you're still typing, a speculative agent already searched the codebase for you

👔#62

Coordinator/Worker Pattern

Multi-agent coordination, restricted tool sets, environment gating, and task distribution

The coordinator writes prompts, not code — it manages a team of worker agents

💾#63

Session Persistence

Session JSON, /resume reconstruction, message history, file snapshots, and attribution

Close the terminal, reopen it, type --resume — the conversation continues exactly where you left off

💰#64

Cost Tracking & Budgets

Token counting, budget limits, per-model pricing, rate limit handling, and spend alerts

Claude Code emits cost events on every API response. Miss one and a runaway agent burns $200 before the budget gate fires.

Suggested Learning Path

1

The Big Picture

Agent Harness

2

How Tools Work

Tool System + Hooks & Permissions

3

Context Management

Context Compaction + Prompt Cache

4

Extensions

Sub-agents + Commands & Skills

5

Persistence

State Management + Memory System

6

Infrastructure

Terminal UI + Bridges + Plugins & MCP + Config & Schemas

What You'll Learn

  • The agentic loop pattern (REPL -> LLM -> tool_use -> execute -> loop)
  • Tool orchestration with parallel/serial partitioning
  • 5-layer permission hierarchy
  • Prompt cache optimization with static/dynamic boundary
  • 4-strategy context compaction
  • Sub-agent spawning with context isolation
  • File-based persistent memory
  • React terminal rendering (Ink)
  • Bridge pattern for multi-frontend support