Tool Use & Protocols — Transformer Math

Module 37 · Applications

🔌 Tool Use & Protocols

How does Claude Code call 50 different tools with one protocol?

Status:

LLMs generate text. Tool use turns that text into action — the model outputs structured JSON, the runtime executes functions, and results flow back into context. This is how chatbots become agents. MCP standardizes tool connections. A2A standardizes agent-to-agent collaboration.

🔧

The Tool Call Loop

What you're seeing: the full lifecycle of a tool call — from user message to final response. The LLM can loop through multiple tool calls before generating its final reply. The JSON schema shows the exact format the runtime parses.

What to try: trace the arrows. Notice the loop back from “Result returned” to the LLM — this is how multi-step reasoning works. The MCP layer shows where external tool connections are standardized.

🎮

Tool Calling Flow

The complete lifecycle of a tool call: the LLM generates structured JSON, the runtime parses and executes it, and the result is injected back as a new message.

User Message

"What's the weather in Tokyo?"

↓

LLM Generates Tool Call (JSON tokens)

{"name": "get_weather", "arguments": {"city": "Tokyo"}}

↓

Runtime Parses & Executes

Validates JSON against schema, calls get_weather(city="Tokyo")

↓

Tool Result Injected into Context

{"temperature": 22, "condition": "partly cloudy"}

↓

LLM Generates Final Response

"It's 22°C and partly cloudy in Tokyo right now."

What: Standard protocol for model-to-tool connections

Direction:Model → Tools (vertical)

Exposes: Tools, Resources, Prompts

Analogy: USB-C — one plug, any device

What: Standard protocol for agent-to-agent collaboration

Direction:Agent ↔ Agent (horizontal)

Exposes: Agent Cards, Tasks, Streaming

Analogy: HTTP — agents discover and talk to each other

💡

The Intuition

Function calling is just structured generation.The model doesn't "execute" tools — it generates tokens that happen to be valid JSON matching a schema. The runtime parses the JSON and executes the actual function. The result is injected back into context as a new message, and the model continues generating.

Tool schemas live in the system prompt.When you register tools with an API, the provider formats each tool's name, description, and parameter schema into the system message. The model was fine-tuned on examples of choosing and calling tools based on these schemas. Each schema costs .

: Before MCP, every AI app had to write custom integrations for every tool. MCP defines a standard: servers expose tools/resources, clients connect to servers. One integration, any model. Think USB-C — one plug fits all devices.

A2A (Agent-to-Agent):As agents become specialized, they need to discover and delegate to each other. A2A defines Agent Cards (capability descriptions published at well-known URLs), a task lifecycle (submitted → working → completed), and streaming updates. .

✨ Insight · MCP and A2A are complementary, not competing. MCP connects a model to tools (vertical integration). A2A connects agents to agents (horizontal collaboration). A production agent uses both: MCP to access databases and APIs, A2A to delegate subtasks to specialized agents.

Tool description quality is the biggest lever. Gorilla (Patil et al., 2023) fine-tuned LLaMA on 1,645 API calls from HuggingFace, TensorFlow Hub, and Torch Hub and showed that retrieval-aware training with accurate, detailed tool descriptions significantly improves correct API selection and reduces hallucination of outdated parameter names. Crucially, retrieval-augmented tool lookup (fetching the relevant schema at query time) outperformed baking all schemas into the prompt — because it avoids hallucinating outdated parameter names when APIs evolve. The practical rule: write tool descriptions like documentation for a junior engineer, not like a variable name. Include: what the tool does, what it does not do, and at least one example input/output. A 3-sentence description consistently outperforms a 3-word one on tool-selection accuracy.

Quick check

Trade-off

A team builds an agent that needs to read from Postgres, call a REST API, and delegate a subtask to a specialist agent. MCP or A2A for each connection?

MCP for everything — it handles all external connections.A2A for everything — it is the newer, more complete standard.MCP for Postgres and REST API (tools), A2A for the specialist agent (delegation).Direct function calls for all three — protocols add unnecessary latency.

Quick Check

How does function calling actually work at the token level?

📐

Context Budget Math

Context Window Budget

Everything must fit within the context limit. Tool schemas compete with conversation history for space:

💡 Tip · Each tool schema costs approximately depending on description length and parameter complexity. With consumed before any conversation starts.

Tool Schema Token Cost

Rule of thumb for estimating schema overhead:

Where is description length in tokens and is per-parameter description length. The 50-token base covers JSON structure, name, and type annotations.

Tool Schema Definition + Function Calling

python

# Tool schema (what the model sees in system prompt)
tools = [{
    "type": "function",
    "function": {
        "name": "search_docs",
        "description": "Search internal documentation by query",
        "parameters": {
            "type": "object",
            "properties": {
                "query": {"type": "string", "description": "Search query"},
                "top_k": {"type": "integer", "default": 5},
            },
            "required": ["query"],
        },
    },
}]

# Function calling loop
response = client.chat.completions.create(
    model="gpt-4", messages=messages, tools=tools
)

# Check if model wants to call a tool
if response.choices[0].message.tool_calls:
    # Must append the assistant message containing the tool call(s) first
    messages.append(response.choices[0].message)
    for call in response.choices[0].message.tool_calls:
        name = call.function.name          # "search_docs"
        args = json.loads(call.function.arguments)  # {"query": "..."}
        result = execute_tool(name, args)   # YOUR code runs here

        # Then inject the tool result back into context
        messages.append({"role": "tool", "content": json.dumps(result),
                         "tool_call_id": call.id})

Quick check

Derivation

You have a 16K context limit. System prompt = 800 tokens, conversation history = 5,200 tokens, response reserve = 2,000 tokens. Each tool schema averages 300 tokens. How many schemas fit?

26 schemas fit in the remaining 8,000-token budget.53 schemas fit — history tokens don’t count against schemas.8 schemas fit — schemas must leave a 50% headroom buffer by convention.Schemas are loaded lazily and do not count against the limit at call time.

🔧

Break It — See What Happens

50 tools in system prompt

No schema validation on tool outputs

Quick check

Trade-off

You disable schema validation on tool outputs to reduce latency. A compromised search tool now returns a prompt injection string. What is the most likely failure mode?

The model ignores unstructured strings and falls back to its prior knowledge.The model rate-limits itself to avoid processing malformed input.Latency improves by ~30% because validation was the bottleneck.The injected string overrides system instructions; the model follows attacker commands as trusted context.

📊

Real-World Numbers

Metric	Value
Tool schema overhead
Optimal tool count	Keep namespaces under ~10 functions for efficiency; (OpenAI recommends tool search for large catalogs)
Function calling latency	Environment-dependent; typically adds meaningful overhead per round-trip (generation + parsing + execution + re-injection)
MCP adoption
A2A partners
Agent Card discovery	— similar to robots.txt for agents

✨ Insight · The tool ecosystem is consolidating fast. MCP is becoming the standard for tool integration (replacing custom plugins), and A2A is emerging for multi-agent orchestration. Both use JSON-based schemas and HTTP transport — simple by design.

🆕

Structured Outputs & Computer Use Tool (2024–2025)

Two landmark advances shipped in 2024 that change the reliability and scope of tool use — one improves function-calling correctness, the other opens an entirely new tool category.

(Aug 2024)

strict: true in tool definition → 100% JSON Schema adherence (per OpenAI)
Supersedes “JSON mode” which hit ~99% but had no schema constraint
Supports recursive schemas, nested objects, enums
Works for both tool calls and response_format

(Oct 2024)

Special tool type: computer_20241022
Actions: screenshot, left_click, type, key, scroll, drag
Pixel-perfect (x, y) coordinates; screenshot returned as tool_result
Entirely new category: perception + action, no schema required from the external app

Feature	Structured Outputs	Computer Use Tool
Problem solved	JSON schema conformance	Interacting with any software visually
Adherence rate	vs ~99% JSON mode	N/A — no schema to conform to
Input to model	Text + tool schema in prompt	Screenshot (image) each step
Output from model	JSON conforming to schema	Action + (x, y) pixel coordinates
Category	Better function-calling reliability	Entirely new tool paradigm

✨ Insight · Structured Outputs is an incremental improvement on existing function-calling — the same paradigm, but schema guarantees go from “usually right” to “always right.” Computer Use is a paradigm shift: instead of a typed API, the agent perceives pixels and acts on coordinates. Every tool in the world becomes accessible without an integration layer.

Deep dive — Structured Outputs implementation & strict mode constraints

OpenAI Structured Outputs (Aug 6, 2024) uses constrained decoding at generation time: the token sampler is restricted to tokens that keep the output JSON valid with respect to the provided schema. This is different from post-hoc validation — the model cannotproduce invalid JSON, not merely “usually doesn't.”

Constraints when using strict: true: (1) All fields must be listed in required. (2) No additional properties allowed (additionalProperties: false). (3) Supported types: string, number, boolean, integer, array, object, enum, anyOf. (4) Recursive schemas supported. (5) Max schema nesting: 5 levels; max 100 total properties.

Python: Structured Outputs with strict mode

python

import json
import openai  # pip install openai

client = openai.OpenAI()

tools = [{
    "type": "function",
    "function": {
        "name": "extract_event",
        "description": "Extract structured event data from text",
        "strict": True,   # <-- 100% schema adherence
        "parameters": {
            "type": "object",
            "properties": {
                "name":  {"type": "string"},
                "date":  {"type": "string", "description": "ISO 8601"},
                "venue": {"type": "string"},
                "confirmed": {"type": "boolean"},
            },
            "required": ["name", "date", "venue", "confirmed"],
            "additionalProperties": False,  # required for strict mode
        },
    },
}]

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user",
               "content": "The WWDC keynote is on June 9 at Apple Park."}],
    tools=tools,
    tool_choice={"type": "function", "function": {"name": "extract_event"}},
)

call = response.choices[0].message.tool_calls[0]
event = json.loads(call.function.arguments)
# Guaranteed to match the schema — no try/except needed
print(event)

🧠

Key Takeaways

What to remember for interviews

1Function calling is structured token generation — the model produces JSON that matches a tool schema; the runtime (not the model) executes the actual function and injects the result back into context.
2Tool schemas live in the system prompt (~200-500 tokens each), so 20 tools consume ~6K tokens before any conversation starts. Accuracy degrades significantly beyond ~20 tools due to choice overload.
3MCP (Model Context Protocol) standardizes model-to-tool connections with a single protocol ('USB-C for AI'), replacing the fragmented custom integrations each app had to build before.
4A2A (Agent-to-Agent) standardizes agent-to-agent collaboration via Agent Cards (published capability descriptions), a task lifecycle, and streaming updates — designed for cross-organization interoperability.
5Tool description quality is the single biggest lever for correct tool selection: a 3-sentence description with what the tool does, what it does not do, and an example input/output consistently outperforms a 3-word name.
6OpenAI Structured Outputs (Aug 2024) with strict: true guarantees 100% JSON Schema adherence via constrained decoding — superseding JSON mode's ~99% soft guarantee.
7Anthropic Computer Use Tool (Oct 2024) introduces a new paradigm: agents perceive screenshots and output pixel-precise actions, making every GUI app accessible without an API integration.

🧠

Recap quiz

📚

Interview Questions

Difficulty:

Company:

Showing 7 of 7

How does function calling actually work at the token level? What makes it different from regular generation?

★★☆

OpenAIAnthropic

Compare MCP (Model Context Protocol) and A2A (Agent-to-Agent). When would you use each?

★★★

AnthropicGoogle

What are the tradeoffs of parallel vs sequential tool calls? How do you decide?

★★☆

OpenAIAnthropic

How do you manage context window budget when using tools? What happens when you have too many tools?

★★☆

OpenAIGoogle

How should a tool-using agent handle errors and recover from failed tool calls?

★★☆

AnthropicOpenAI

How do agents communicate in multi-agent systems? Compare direct messaging, shared state, and A2A protocol approaches.

★★★

GoogleMeta

How would you evaluate tool use when tools can return adversarial or stale outputs?

★★★

AnthropicGoogle

Transformer Math