Tool use
Tool execution and result injection
Feeding tool results back into the conversation.
Video coming soon
From tool call to observation
The model has produced a tool call. Now your code has to do four things:
- Look up the function the model is asking for
- Validate and run it safely
- Format the result so the model can read it
- Inject the formatted result back into the message list as an
observation
This lesson walks through each step with code you can use in production. We covered some of this in the orchestration loop module, but this is the focused, end-to-end version.
The tool registry
Treat your tools as a dictionary mapping name to function. Keep this separate from the schema list you send to the model.
def get_weather(city: str) -> dict:
# ... call your real API ...
return {"city": city, "temperature_c": 22, "condition": "sunny"}
def search_web(query: str, max_results: int = 5) -> list[dict]:
# ... call your real search API ...
return [{"title": "...", "url": "...", "snippet": "..."}]
tool_registry = {
"get_weather": get_weather,
"search_web": search_web,
}The schema list and the registry must agree on names. A common bug is changing one and forgetting the other. A small helper closes that gap:
def make_tools_and_registry(specs):
"""Build both the OpenAI-style tools list and the registry from one source."""
tools = []
registry = {}
for spec in specs:
tools.append({"type": "function", "function": spec["schema"]})
registry[spec["schema"]["name"]] = spec["fn"]
return tools, registryNow you write each tool once and both artifacts stay in sync.
Calling the tool
A safe call_tool function does five things:
def call_tool(call, registry, schemas):
name = call.function.name
args = call.function.arguments
# 1. Does the tool exist?
if name not in registry:
return f"ERROR: Tool '{name}' not found. Available: {list(registry)}"
# 2. Are the arguments valid against the schema?
try:
jsonschema.validate(args, schemas[name]["parameters"])
except jsonschema.ValidationError as e:
return f"ERROR: Invalid arguments for {name}. {e.message}"
# 3. Run the function with a timeout
try:
with_timeout = run_with_timeout(registry[name], args, seconds=30)
result = with_timeout
except TimeoutError:
return f"ERROR: {name} timed out after 30s"
except Exception as e:
return f"ERROR: {type(e).__name__} in {name}: {e}"
# 4. Format the result for the model
return format_result(result)Each of these has its own pitfalls.
Timeouts
Tools should always run with a timeout. A web request that hangs for 5 minutes will silently kill your agent's responsiveness. The simplest cross-platform way is to use a thread plus a join with timeout:
import concurrent.futures
def run_with_timeout(fn, args, seconds=30):
with concurrent.futures.ThreadPoolExecutor(max_workers=1) as ex:
future = ex.submit(fn, **args)
return future.result(timeout=seconds)For tools that hit external APIs, set timeouts at the request level too. Layered timeouts protect you when one layer misbehaves.
Formatting results for the model
The model has to read your tool's output. That means it has to fit in the context window and be parseable from the model's perspective. Three rules:
Use string-friendly formats
Return JSON-serialized strings, plain text, or markdown. Don't return Python objects, NumPy arrays, or dataframes directly. The serializer matters.
import json
def format_result(result):
if isinstance(result, str):
return result
try:
return json.dumps(result, indent=2, default=str)
except (TypeError, ValueError):
return str(result)Keep results small
A 2MB API response will eat your context window. Truncate or summarize before returning:
MAX_TOOL_OUTPUT_CHARS = 4000
def format_result(result):
text = json.dumps(result, default=str) if not isinstance(result, str) else result
if len(text) > MAX_TOOL_OUTPUT_CHARS:
return text[:MAX_TOOL_OUTPUT_CHARS] + f"\n... [truncated, {len(text) - MAX_TOOL_OUTPUT_CHARS} chars omitted]"
return textFor tools that return inherently large data (file contents, database query results), let the tool itself paginate. Return a page plus a hint about how to get the next page.
Strip noise
Real APIs return a lot of fields the model doesn't need: timestamps, internal IDs, debug info. Project to the fields the model will actually use.
def get_pull_request(number: int) -> dict:
raw = github_api.get_pr(number)
return {
"title": raw["title"],
"author": raw["user"]["login"],
"state": raw["state"],
"body": raw["body"][:500], # truncate long descriptions
"labels": [l["name"] for l in raw["labels"]],
}Less context noise means clearer reasoning.
Injecting the observation
Once the result is formatted, append it to the message list with the tool role:
messages.append({
"role": "tool",
"content": formatted_result,
"tool_call_id": call.id,
})The tool_call_id is required. It tells the model which of its tool calls this result corresponds to. Mismatched or missing IDs produce confused behavior, especially when the model issued multiple parallel tool calls.
Putting it all together
def execute_tool_calls(tool_calls, registry, schemas):
"""Run a list of tool calls and return a list of tool messages."""
observations = []
for call in tool_calls:
content = call_tool(call, registry, schemas)
observations.append({
"role": "tool",
"content": content,
"tool_call_id": call.id,
})
return observations
# Inside the loop:
if message.tool_calls:
messages.extend(execute_tool_calls(message.tool_calls, registry, schemas))The loop body becomes very small. All the messy work lives inside call_tool and format_result, where it's easy to test in isolation.
Side effects and idempotency
Tools that read are easy. Tools that write are dangerous. A model that re-runs a send_email tool because of a retry will send the email twice. Two defenses:
| Pattern | When to use it |
|---|---|
| Idempotency keys | The tool itself dedupes calls with the same key |
| Confirmation gates | Ask the user before destructive tools run (we cover this in Track 2) |
| Read/write split | Group tools by safety. Read-only tools can run freely; write tools require explicit approval |
For Track 1, the rule of thumb is: if a tool has side effects you can't undo, gate it. Don't trust the model to be careful.
Tool execution is your security boundary
The model is untrusted input. If your run_python_code tool runs in the same process as your agent, the model can do anything you can do. Sandbox tools that take open-ended input. We'll go deep on this in Track 4 ("Reliability") and Track 2 ("Safety and control"); for now, just be aware that tool execution is the place where most agent security incidents happen.
Key takeaway
Tool execution is where the model's reasoning meets the real world. Five practices make it reliable: a single source of truth for tools, schema validation before calling, timeouts, output formatting that fits the model's context, and care with side effects. Get these right and the rest of the agent stack feels easy.
The next module is the first end-to-end agent design pattern: ReAct. We'll combine everything from the last three modules into a working reasoning agent.
Done with this lesson?