Lesson 10 of 20Track 1

Tool use

Tool execution and result injection

Feeding tool results back into the conversation.

Video lesson Interactive exercise ~10 min

Video coming soon

From tool call to observation

The model has produced a tool call. Now your code has to do four things:

  1. Look up the function the model is asking for
  2. Validate and run it safely
  3. Format the result so the model can read it
  4. Inject the formatted result back into the message list as an observation

This lesson walks through each step with code you can use in production. We covered some of this in the orchestration loop module, but this is the focused, end-to-end version.

The tool registry

Treat your tools as a dictionary mapping name to function. Keep this separate from the schema list you send to the model.

def get_weather(city: str) -> dict:
    # ... call your real API ...
    return {"city": city, "temperature_c": 22, "condition": "sunny"}
 
def search_web(query: str, max_results: int = 5) -> list[dict]:
    # ... call your real search API ...
    return [{"title": "...", "url": "...", "snippet": "..."}]
 
tool_registry = {
    "get_weather": get_weather,
    "search_web": search_web,
}

The schema list and the registry must agree on names. A common bug is changing one and forgetting the other. A small helper closes that gap:

def make_tools_and_registry(specs):
    """Build both the OpenAI-style tools list and the registry from one source."""
    tools = []
    registry = {}
    for spec in specs:
        tools.append({"type": "function", "function": spec["schema"]})
        registry[spec["schema"]["name"]] = spec["fn"]
    return tools, registry

Now you write each tool once and both artifacts stay in sync.

Calling the tool

A safe call_tool function does five things:

def call_tool(call, registry, schemas):
    name = call.function.name
    args = call.function.arguments
 
    # 1. Does the tool exist?
    if name not in registry:
        return f"ERROR: Tool '{name}' not found. Available: {list(registry)}"
 
    # 2. Are the arguments valid against the schema?
    try:
        jsonschema.validate(args, schemas[name]["parameters"])
    except jsonschema.ValidationError as e:
        return f"ERROR: Invalid arguments for {name}. {e.message}"
 
    # 3. Run the function with a timeout
    try:
        with_timeout = run_with_timeout(registry[name], args, seconds=30)
        result = with_timeout
    except TimeoutError:
        return f"ERROR: {name} timed out after 30s"
    except Exception as e:
        return f"ERROR: {type(e).__name__} in {name}: {e}"
 
    # 4. Format the result for the model
    return format_result(result)

Each of these has its own pitfalls.

Timeouts

Tools should always run with a timeout. A web request that hangs for 5 minutes will silently kill your agent's responsiveness. The simplest cross-platform way is to use a thread plus a join with timeout:

import concurrent.futures
 
def run_with_timeout(fn, args, seconds=30):
    with concurrent.futures.ThreadPoolExecutor(max_workers=1) as ex:
        future = ex.submit(fn, **args)
        return future.result(timeout=seconds)

For tools that hit external APIs, set timeouts at the request level too. Layered timeouts protect you when one layer misbehaves.

Formatting results for the model

The model has to read your tool's output. That means it has to fit in the context window and be parseable from the model's perspective. Three rules:

Use string-friendly formats

Return JSON-serialized strings, plain text, or markdown. Don't return Python objects, NumPy arrays, or dataframes directly. The serializer matters.

import json
 
def format_result(result):
    if isinstance(result, str):
        return result
    try:
        return json.dumps(result, indent=2, default=str)
    except (TypeError, ValueError):
        return str(result)

Keep results small

A 2MB API response will eat your context window. Truncate or summarize before returning:

MAX_TOOL_OUTPUT_CHARS = 4000
 
def format_result(result):
    text = json.dumps(result, default=str) if not isinstance(result, str) else result
    if len(text) > MAX_TOOL_OUTPUT_CHARS:
        return text[:MAX_TOOL_OUTPUT_CHARS] + f"\n... [truncated, {len(text) - MAX_TOOL_OUTPUT_CHARS} chars omitted]"
    return text

For tools that return inherently large data (file contents, database query results), let the tool itself paginate. Return a page plus a hint about how to get the next page.

Strip noise

Real APIs return a lot of fields the model doesn't need: timestamps, internal IDs, debug info. Project to the fields the model will actually use.

def get_pull_request(number: int) -> dict:
    raw = github_api.get_pr(number)
    return {
        "title": raw["title"],
        "author": raw["user"]["login"],
        "state": raw["state"],
        "body": raw["body"][:500],  # truncate long descriptions
        "labels": [l["name"] for l in raw["labels"]],
    }

Less context noise means clearer reasoning.

Injecting the observation

Once the result is formatted, append it to the message list with the tool role:

messages.append({
    "role": "tool",
    "content": formatted_result,
    "tool_call_id": call.id,
})

The tool_call_id is required. It tells the model which of its tool calls this result corresponds to. Mismatched or missing IDs produce confused behavior, especially when the model issued multiple parallel tool calls.

Putting it all together

def execute_tool_calls(tool_calls, registry, schemas):
    """Run a list of tool calls and return a list of tool messages."""
    observations = []
    for call in tool_calls:
        content = call_tool(call, registry, schemas)
        observations.append({
            "role": "tool",
            "content": content,
            "tool_call_id": call.id,
        })
    return observations
 
# Inside the loop:
if message.tool_calls:
    messages.extend(execute_tool_calls(message.tool_calls, registry, schemas))

The loop body becomes very small. All the messy work lives inside call_tool and format_result, where it's easy to test in isolation.

Side effects and idempotency

Tools that read are easy. Tools that write are dangerous. A model that re-runs a send_email tool because of a retry will send the email twice. Two defenses:

PatternWhen to use it
Idempotency keysThe tool itself dedupes calls with the same key
Confirmation gatesAsk the user before destructive tools run (we cover this in Track 2)
Read/write splitGroup tools by safety. Read-only tools can run freely; write tools require explicit approval

For Track 1, the rule of thumb is: if a tool has side effects you can't undo, gate it. Don't trust the model to be careful.

Tool execution is your security boundary

The model is untrusted input. If your run_python_code tool runs in the same process as your agent, the model can do anything you can do. Sandbox tools that take open-ended input. We'll go deep on this in Track 4 ("Reliability") and Track 2 ("Safety and control"); for now, just be aware that tool execution is the place where most agent security incidents happen.

Key takeaway

Tool execution is where the model's reasoning meets the real world. Five practices make it reliable: a single source of truth for tools, schema validation before calling, timeouts, output formatting that fits the model's context, and care with side effects. Get these right and the rest of the agent stack feels easy.

The next module is the first end-to-end agent design pattern: ReAct. We'll combine everything from the last three modules into a working reasoning agent.

>_tool-execution.py
Loading editor...
Output will appear here.

Done with this lesson?