Skip to content

Transformer Math

Module 37 · Applications

🔌 Tool Use & Protocols

How does Claude Code call 50 different tools with one protocol?

Status:

LLMs generate text. Tool use turns that text into action — the model outputs structured JSON, the runtime executes functions, and results flow back into context. This is how chatbots become agents. MCP standardizes tool connections. A2A standardizes agent-to-agent collaboration.

🔧

The Tool Call Loop

What you're seeing: the full lifecycle of a tool call — from user message to final response. The LLM can loop through multiple tool calls before generating its final reply. The JSON schema shows the exact format the runtime parses.

What to try: trace the arrows. Notice the loop back from “Result returned” to the LLM — this is how multi-step reasoning works. The MCP layer shows where external tool connections are standardized.

UsermessageLLMdecides to calltoolJSON emitted:{"name":"search", "arguments": {"query":"..."}}Runtimeparses JSON,executesMCP Serverstandardizestool connectionsExternalToolsDB / API / FSFinalResponseto usersendstool callresultcallsloop: LLM can call multiple tools
🎮

Tool Calling Flow

The complete lifecycle of a tool call: the LLM generates structured JSON, the runtime parses and executes it, and the result is injected back as a new message.

1

User Message

"What's the weather in Tokyo?"

2

LLM Generates Tool Call (JSON tokens)

{"name": "get_weather", "arguments": {"city": "Tokyo"}}

3

Runtime Parses & Executes

Validates JSON against schema, calls get_weather(city="Tokyo")

4

Tool Result Injected into Context

{"temperature": 22, "condition": "partly cloudy"}

5

LLM Generates Final Response

"It's 22°C and partly cloudy in Tokyo right now."

What: Standard protocol for model-to-tool connections

Direction:Model → Tools (vertical)

Exposes: Tools, Resources, Prompts

Analogy: USB-C — one plug, any device

What: Standard protocol for agent-to-agent collaboration

Direction:Agent ↔ Agent (horizontal)

Exposes: Agent Cards, Tasks, Streaming

Analogy: HTTP — agents discover and talk to each other

💡

The Intuition

Function calling is just structured generation.The model doesn't "execute" tools — it generates tokens that happen to be valid JSON matching a schema. The runtime parses the JSON and executes the actual function. The result is injected back into context as a new message, and the model continues generating.

Tool schemas live in the system prompt.When you register tools with an API, the provider formats each tool's name, description, and parameter schema into the system message. The model was fine-tuned on examples of choosing and calling tools based on these schemas. Each schema costs .

: Before MCP, every AI app had to write custom integrations for every tool. MCP defines a standard: servers expose tools/resources, clients connect to servers. One integration, any model. Think USB-C — one plug fits all devices.

A2A (Agent-to-Agent):As agents become specialized, they need to discover and delegate to each other. A2A defines Agent Cards (capability descriptions published at well-known URLs), a task lifecycle (submitted → working → completed), and streaming updates. .

✨ Insight · MCP and A2A are complementary, not competing. MCP connects a model to tools (vertical integration). A2A connects agents to agents (horizontal collaboration). A production agent uses both: MCP to access databases and APIs, A2A to delegate subtasks to specialized agents.

Tool description quality is the biggest lever. Gorilla (Patil et al., 2023) fine-tuned LLaMA on 1,645 API calls from HuggingFace, TensorFlow Hub, and Torch Hub and showed that retrieval-aware training with accurate, detailed tool descriptions significantly improves correct API selection and reduces hallucination of outdated parameter names. Crucially, retrieval-augmented tool lookup (fetching the relevant schema at query time) outperformed baking all schemas into the prompt — because it avoids hallucinating outdated parameter names when APIs evolve. The practical rule: write tool descriptions like documentation for a junior engineer, not like a variable name. Include: what the tool does, what it does not do, and at least one example input/output. A 3-sentence description consistently outperforms a 3-word one on tool-selection accuracy.

Quick check

Trade-off

A team builds an agent that needs to read from Postgres, call a REST API, and delegate a subtask to a specialist agent. MCP or A2A for each connection?

A team builds an agent that needs to read from Postgres, call a REST API, and delegate a subtask to a specialist agent. MCP or A2A for each connection?
Quick Check

How does function calling actually work at the token level?

📐

Context Budget Math

Context Window Budget

Everything must fit within the context limit. Tool schemas compete with conversation history for space:

💡 Tip · Each tool schema costs approximately depending on description length and parameter complexity. With consumed before any conversation starts.

Tool Schema Token Cost

Rule of thumb for estimating schema overhead:

Where is description length in tokens and is per-parameter description length. The 50-token base covers JSON structure, name, and type annotations.

Tool Schema Definition + Function Calling

python
# Tool schema (what the model sees in system prompt)
tools = [{
    "type": "function",
    "function": {
        "name": "search_docs",
        "description": "Search internal documentation by query",
        "parameters": {
            "type": "object",
            "properties": {
                "query": {"type": "string", "description": "Search query"},
                "top_k": {"type": "integer", "default": 5},
            },
            "required": ["query"],
        },
    },
}]

# Function calling loop
response = client.chat.completions.create(
    model="gpt-4", messages=messages, tools=tools
)

# Check if model wants to call a tool
if response.choices[0].message.tool_calls:
    # Must append the assistant message containing the tool call(s) first
    messages.append(response.choices[0].message)
    for call in response.choices[0].message.tool_calls:
        name = call.function.name          # "search_docs"
        args = json.loads(call.function.arguments)  # {"query": "..."}
        result = execute_tool(name, args)   # YOUR code runs here

        # Then inject the tool result back into context
        messages.append({"role": "tool", "content": json.dumps(result),
                         "tool_call_id": call.id})

Quick check

Derivation

You have a 16K context limit. System prompt = 800 tokens, conversation history = 5,200 tokens, response reserve = 2,000 tokens. Each tool schema averages 300 tokens. How many schemas fit?

You have a 16K context limit. System prompt = 800 tokens, conversation history = 5,200 tokens, response reserve = 2,000 tokens. Each tool schema averages 300 tokens. How many schemas fit?
🔧

Break It — See What Happens

50 tools in system prompt
No schema validation on tool outputs

Quick check

Trade-off

You disable schema validation on tool outputs to reduce latency. A compromised search tool now returns a prompt injection string. What is the most likely failure mode?

You disable schema validation on tool outputs to reduce latency. A compromised search tool now returns a prompt injection string. What is the most likely failure mode?
📊

Real-World Numbers

MetricValue
Tool schema overhead
Optimal tool countKeep namespaces under ~10 functions for efficiency; (OpenAI recommends tool search for large catalogs)
Function calling latencyEnvironment-dependent; typically adds meaningful overhead per round-trip (generation + parsing + execution + re-injection)
MCP adoption
A2A partners
Agent Card discovery — similar to robots.txt for agents
✨ Insight · The tool ecosystem is consolidating fast. MCP is becoming the standard for tool integration (replacing custom plugins), and A2A is emerging for multi-agent orchestration. Both use JSON-based schemas and HTTP transport — simple by design.
🆕

Structured Outputs & Computer Use Tool (2024–2025)

Two landmark advances shipped in 2024 that change the reliability and scope of tool use — one improves function-calling correctness, the other opens an entirely new tool category.

(Aug 2024)

  • strict: true in tool definition → 100% JSON Schema adherence (per OpenAI)
  • Supersedes “JSON mode” which hit ~99% but had no schema constraint
  • Supports recursive schemas, nested objects, enums
  • Works for both tool calls and response_format

(Oct 2024)

  • Special tool type: computer_20241022
  • Actions: screenshot, left_click, type, key, scroll, drag
  • Pixel-perfect (x, y) coordinates; screenshot returned as tool_result
  • Entirely new category: perception + action, no schema required from the external app
FeatureStructured OutputsComputer Use Tool
Problem solvedJSON schema conformanceInteracting with any software visually
Adherence rate vs ~99% JSON modeN/A — no schema to conform to
Input to modelText + tool schema in promptScreenshot (image) each step
Output from modelJSON conforming to schemaAction + (x, y) pixel coordinates
CategoryBetter function-calling reliabilityEntirely new tool paradigm
✨ Insight · Structured Outputs is an incremental improvement on existing function-calling — the same paradigm, but schema guarantees go from “usually right” to “always right.” Computer Use is a paradigm shift: instead of a typed API, the agent perceives pixels and acts on coordinates. Every tool in the world becomes accessible without an integration layer.
Deep dive — Structured Outputs implementation & strict mode constraints

OpenAI Structured Outputs (Aug 6, 2024) uses constrained decoding at generation time: the token sampler is restricted to tokens that keep the output JSON valid with respect to the provided schema. This is different from post-hoc validation — the model cannotproduce invalid JSON, not merely “usually doesn't.”

Constraints when using strict: true: (1) All fields must be listed in required. (2) No additional properties allowed (additionalProperties: false). (3) Supported types: string, number, boolean, integer, array, object, enum, anyOf. (4) Recursive schemas supported. (5) Max schema nesting: 5 levels; max 100 total properties.

Python: Structured Outputs with strict mode

python
import json
import openai  # pip install openai

client = openai.OpenAI()

tools = [{
    "type": "function",
    "function": {
        "name": "extract_event",
        "description": "Extract structured event data from text",
        "strict": True,   # <-- 100% schema adherence
        "parameters": {
            "type": "object",
            "properties": {
                "name":  {"type": "string"},
                "date":  {"type": "string", "description": "ISO 8601"},
                "venue": {"type": "string"},
                "confirmed": {"type": "boolean"},
            },
            "required": ["name", "date", "venue", "confirmed"],
            "additionalProperties": False,  # required for strict mode
        },
    },
}]

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user",
               "content": "The WWDC keynote is on June 9 at Apple Park."}],
    tools=tools,
    tool_choice={"type": "function", "function": {"name": "extract_event"}},
)

call = response.choices[0].message.tool_calls[0]
event = json.loads(call.function.arguments)
# Guaranteed to match the schema — no try/except needed
print(event)
🧠

Key Takeaways

What to remember for interviews

  1. 1Function calling is structured token generation — the model produces JSON that matches a tool schema; the runtime (not the model) executes the actual function and injects the result back into context.
  2. 2Tool schemas live in the system prompt (~200-500 tokens each), so 20 tools consume ~6K tokens before any conversation starts. Accuracy degrades significantly beyond ~20 tools due to choice overload.
  3. 3MCP (Model Context Protocol) standardizes model-to-tool connections with a single protocol ('USB-C for AI'), replacing the fragmented custom integrations each app had to build before.
  4. 4A2A (Agent-to-Agent) standardizes agent-to-agent collaboration via Agent Cards (published capability descriptions), a task lifecycle, and streaming updates — designed for cross-organization interoperability.
  5. 5Tool description quality is the single biggest lever for correct tool selection: a 3-sentence description with what the tool does, what it does not do, and an example input/output consistently outperforms a 3-word name.
  6. 6OpenAI Structured Outputs (Aug 2024) with strict: true guarantees 100% JSON Schema adherence via constrained decoding — superseding JSON mode's ~99% soft guarantee.
  7. 7Anthropic Computer Use Tool (Oct 2024) introduces a new paradigm: agents perceive screenshots and output pixel-precise actions, making every GUI app accessible without an API integration.
🧠

Recap quiz

🧠

Tool Use & Agents recap

Derivation

Your system prompt uses 1,000 tokens, conversation history 4,000, and you reserve 2,000 for the response. With a 32K context limit, how many 400-token tool schemas can you register?

Your system prompt uses 1,000 tokens, conversation history 4,000, and you reserve 2,000 for the response. With a 32K context limit, how many 400-token tool schemas can you register?
Trade-off

A product team wants to expose 80 internal tools to their agent. Accuracy is dropping and latency is high. What is the root cause and best fix?

A product team wants to expose 80 internal tools to their agent. Accuracy is dropping and latency is high. What is the root cause and best fix?
Trade-off

An agent needs to: (1) search a knowledge base, (2) read the top result, (3) decide whether to search again. Which calling mode should it use, and why?

An agent needs to: (1) search a knowledge base, (2) read the top result, (3) decide whether to search again. Which calling mode should it use, and why?
Trade-off

What problem did MCP solve that wasn’t solved by each AI app building its own tool integrations?

What problem did MCP solve that wasn’t solved by each AI app building its own tool integrations?
Trade-off

An orchestrator agent needs to delegate a tax-calculation subtask to a specialist agent from a third-party vendor. Which protocol is the right choice and why?

An orchestrator agent needs to delegate a tax-calculation subtask to a specialist agent from a third-party vendor. Which protocol is the right choice and why?
Trade-off

A search tool returns HTML instead of the expected JSON. The agent processes it anyway and gives a confident but wrong answer. Which defense would have caught this earliest in the call chain?

A search tool returns HTML instead of the expected JSON. The agent processes it anyway and gives a confident but wrong answer. Which defense would have caught this earliest in the call chain?
Derivation

Why does tool description quality matter more than model size for correct tool selection?

Why does tool description quality matter more than model size for correct tool selection?
📚

Further Reading

🎯

Interview Questions

Difficulty:
Company:

Showing 7 of 7

How does function calling actually work at the token level? What makes it different from regular generation?

★★☆
OpenAIAnthropic

Compare MCP (Model Context Protocol) and A2A (Agent-to-Agent). When would you use each?

★★★
AnthropicGoogle

What are the tradeoffs of parallel vs sequential tool calls? How do you decide?

★★☆
OpenAIAnthropic

How do you manage context window budget when using tools? What happens when you have too many tools?

★★☆
OpenAIGoogle

How should a tool-using agent handle errors and recover from failed tool calls?

★★☆
AnthropicOpenAI

How do agents communicate in multi-agent systems? Compare direct messaging, shared state, and A2A protocol approaches.

★★★
GoogleMeta

How would you evaluate tool use when tools can return adversarial or stale outputs?

★★★
AnthropicGoogle