Memory System — Transformer Math

Module 54 · AI Engineering

🧠 Memory System

Claude remembers you're a senior engineer — across sessions, without a database

Status:

Every new conversation starts with a blank slate — the AI has no memory of previous sessions. A persistent memory system bridges this gap: it saves key information to disk and loads it back into context at the start of each session.

File-based — markdown files with YAML frontmatter, no database required
Index file (MEMORY.md) is always loaded into context — one-line pointers to memory files
Auto-save triggers — saves on user corrections, confirmed approaches, and role information

🎮

Memory File Structure

What you are seeing

The directory structure and file format of a typical memory system. The index file (MEMORY.md) is always loaded into context. Each memory file has YAML frontmatter for metadata and markdown content for the actual information.

What to try

Notice how the index file is compact — just pointers. The full memory content is only loaded when relevant. This bounds the context cost regardless of how many memories exist.

# Directory structure

~/.claude/projects/<project-hash>/memory/

MEMORY.md # index — always in context

user_profile.md # role, preferences

project_state.md # ongoing work status

feedback_design.md # corrections from user

reference_links.md # external resources

# MEMORY.md (the index)

- [User Profile](user_profile.md) — Google L6 Staff SWE

- [Project State](project_state.md) — 19 modules shipped

- [Feedback](feedback_design.md) — English only, real data

# Individual memory file format

---

name: User Profile

description: Role and interview prep goals

type: user

---

Google L6 Staff SWE, agent AI team.

Learning ML internals for interviews.

💡

The Intuition

What you’re seeing: the MEMORY.md index at session start points to individual memory files; mid-session writes land on disk but the live context keeps the boot-time snapshot. What to try: trace why a write made today only takes effect at next session start.

Before/After: Memory Index

The index makes memory 50x cheaper per turn.

Approach	Context cost per turn	How
Without index	~10K tokens	Agent loads all 50 memory files every turn
With MEMORY.md index	~200 tokens	50 one-line pointers — only relevant files loaded on demand

Why Files, Not a Database?

Memory is a small-data problem — an agent might accumulate hundreds of memories over months, not millions. Markdown files are human-readable (you can inspect and edit them), git-trackable (version history for free), and require zero infrastructure (no database server, no migrations). The tradeoff: no full-text search index. But with hundreds of files, linear scan is fast enough.

Memory Types

Different information has different update frequency and staleness risk:

User — role, preferences, communication style. Rarely changes.
Feedback — corrections and confirmations. High-signal, saves repeated mistakes.
Project — ongoing work status. Changes frequently, highest staleness risk.
Reference — external links and resources. Relatively stable.

💡 Tip · The index file (MEMORY.md) is the key design choice. It's always loaded into context, but it only contains one-line summaries with pointers. This means 50 memories cost ~50 lines of context, not 50 full documents.

Two Memory Layers

In Claude Code specifically, there are two distinct memory mechanisms that serve different purposes:

CLAUDE.md files — project instructions that persist across sessions, like a .editorconfig for AI. These are hand-authored by developers to define project conventions, coding standards, and context. They live at project root or globally at ~/.claude/CLAUDE.md.
Auto-memory — the file-based system this module describes: a MEMORY.md index plus individual memory files that automatically capture user preferences, feedback, and project context across conversations. Lives at ~/.claude/projects/<project>/memory/.

CLAUDE.md is for what developers want the agent to know (intentional configuration). Auto-memory is for what the agent learnsabout the user and project over time (emergent knowledge). This module focuses on the auto-memory system's architecture.

What NOT to Save

The biggest mistake is saving too much. Three categories to avoid:

Code patterns — derive from the actual code. A memorized pattern becomes wrong the moment someone refactors.
Git history — use git log. The commit history is authoritative and always current.
Debugging solutions— the fix is in the code change. Memorizing "we fixed X by doing Y" becomes stale when the code changes again.

✨ Insight · The test: if you could figure it out by reading the repo, don't memorize it. Memory should store intent and preferences (stable), not implementation details (volatile).

Why Not Embeddings and Vector Search?

The obvious upgrade to file-based memory is a vector database: embed every memory, retrieve the top-K most semantically similar at query time. This is the RAG approach (Lewis et al., 2020). But for an agent with hundreds of memories — not millions — the complexity cost dominates. Vector search requires an embedding model (latency per save), a vector store (infrastructure), and a retrieval step (another round-trip at query time). The file-based index sidesteps all of this: the agent reads the MEMORY.md index, decides which files are relevant using the same reasoning it applies to any text, and loads them. The LLM's in-context reasoning is the retrieval function. This scales surprisingly far — up to a few hundred memories — before the index itself becomes too large to scan in-context.

Session Start: What Gets Loaded and When

At session start, exactly one file is loaded into context unconditionally: MEMORY.md (the index). Individual memory files are loaded on demand during the conversation when the agent identifies them as relevant — either because the user's message matches a pointer in the index, or because the agent proactively fetches context before starting a complex task. This lazy loading pattern keeps the initial context cost fixed at ~200 tokens regardless of how much has been memorized. Compare this to always-loading all memories (10K+ tokens per session) or never loading memories (amnesia). The index is the middle path: O(1) startup cost, O(relevant) retrieval cost.

Quick Check

Why use markdown files instead of a database for agent memory?

📐

Key Code Patterns

Memory System

typescript

class MemorySystem {
  private memoryDir: string;
  private index: MemoryIndex;

  constructor(projectPath: string) {
    this.memoryDir = `~/.claude/projects/${projectPath}/memory/`;
    this.index = this.loadIndex();  // MEMORY.md
  }

  private loadIndex(): MemoryIndex {
    // Always loaded into context at session start
    return parseMarkdown(readFile(`${this.memoryDir}/MEMORY.md`));
  }

  save(name: string, content: string, type: string, description: string): void {
    // Step 1: Write memory file with YAML frontmatter
    const frontmatter = `---\nname: ${name}\ndescription: ${description}\ntype: ${type}\n---\n`;
    writeFile(`${this.memoryDir}/${name}.md`, frontmatter + content);

    // Step 2: Update index (one-line pointer)
    this.index.addEntry(`[${name}](${name}.md) — ${description}`);
  }

  recall(query: string): Memory[] {
    // Search memories by relevance to current task
    return this.memories.filter(m => relevant(m, query));
  }
}

Auto-Save Trigger Rules

typescript

type SaveDecision = { save: true; type: string } | { save: false };

function shouldSave(event: AgentEvent): SaveDecision {
  // Decide when to persist information to memory

  // HIGH SIGNAL — always save
  if (event.type === "user_correction") {
    // "Don't use Jest, we use Vitest" → save to feedback
    return { save: true, type: "feedback" };
  }
  if (event.type === "user_role_info") {
    // "I'm a Staff SWE at Google" → save to user
    return { save: true, type: "user" };
  }
  if (event.type === "approach_confirmed") {
    // User confirms non-obvious approach → save to feedback
    return { save: true, type: "feedback" };
  }
  if (event.type === "external_resource") {
    // "Here's the API docs: https://..." → save to reference
    return { save: true, type: "reference" };
  }

  // LOW SIGNAL — do NOT save
  if (event.type === "code_pattern")   return { save: false };  // derive from code
  if (event.type === "debug_solution") return { save: false };  // fix is in the code
  if (event.type === "git_history")    return { save: false };  // use git log

  return { save: false };
}

Memory vs Other Persistence

typescript

// Three persistence layers — different lifetimes

// 1. MEMORY — cross-session (survives restart)
//    Where: ~/.claude/projects/<project>/memory/
//    What:  user role, preferences, corrections
//    When:  loaded at every session start

// 2. TASKS — current session only
//    Where: in-memory task list
//    What:  "implement feature X", "fix bug Y"
//    When:  cleared when session ends

// 3. PLANS — current session only
//    Where: ~/.claude/plans/<session-id>/
//    What:  step-by-step execution plans
//    When:  deleted after task completion

🔧

Break It — See What Happens

No index file (load all memories into context)

Save everything (no filtering)

📊

Real-World Numbers

Metric	Value
Storage format	Markdown + YAML frontmatter
Index file size	~20-50 lines (~200 tokens)
Memory types	4 (user, feedback, project, reference)
Auto-save triggers	Corrections, role info, confirmed approaches
Typical memory count	10-100 files per project
Context cost per session	~200 tokens (index only)

✨ Insight · The 200-token context cost for the index is fixed — whether you have 10 memories or 100. This is the key scalability property: memory grows on disk without growing the context window.

🧠

Key Takeaways

What to remember for interviews

1The MEMORY.md index file is always loaded into context (~200 tokens) with one-line pointers; individual memory files are loaded on demand — startup cost is O(1) regardless of how many memories exist.
2Markdown + YAML frontmatter beats a database for agent memory: human-readable, git-trackable, zero infrastructure, and fast enough for hundreds of entries via linear scan.
3Four memory types with different staleness profiles: user (stable), feedback/corrections (high-signal), project state (highest staleness risk), and reference links (stable).
4Never memorize what can be derived: code patterns go stale on refactor, git history is authoritative, and debugging solutions live in the code change itself — save intent and preferences, not implementation details.
5LLM in-context reasoning serves as the retrieval function over the index, avoiding the latency and infrastructure cost of embeddings + vector search for a small-data problem.

📚

Interview Questions

Difficulty:

Company:

Showing 4 of 4

Design a persistent memory system for an AI assistant that works across sessions.

★★★

AnthropicOpenAI

How do you decide what to remember vs what to derive from the codebase?

★★☆

Google

What's the risk of stale memories? How do you handle memory that's no longer accurate?

★★☆

Anthropic

Compare semantic search vs. recency-based retrieval for agent memory — when does each strategy fail?

★★★

OpenAI

←

🖥️ Terminal UI (Ink)

🔒 Hooks & Permissions

→

🧠 Memory System

Memory File Structure

The Intuition

Key Code Patterns

Break It — See What Happens

Real-World Numbers

Key Takeaways

Further Reading

Interview Questions