Tool use
Function calling: how models select tools
The mechanics of tool selection and invocation.
Video coming soon
What "function calling" actually is
The phrase "function calling" sounds like the model executes code. It doesn't. The model never executes anything. What it does is produce a structured request that asks you to call a function, and your code decides whether to honor the request.
A function call from the model is, fundamentally, JSON:
{
"name": "get_weather",
"arguments": { "city": "Tokyo" }
}That's it. The model returns this object instead of free-form text, and your loop is responsible for matching the name to a real Python function and invoking it. The "calling" happens entirely on your side.
How the model picks a tool
You hand the model a list of tool definitions. Each tool has a name, a description, and a JSON schema for its arguments:
tools = [
{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get the current weather for a city.",
"parameters": {
"type": "object",
"properties": {
"city": {"type": "string", "description": "City name"},
},
"required": ["city"],
},
},
},
{
"type": "function",
"function": {
"name": "search_web",
"description": "Search the public web and return result snippets.",
"parameters": {
"type": "object",
"properties": {
"query": {"type": "string"},
},
"required": ["query"],
},
},
},
]The model sees those definitions in its context. When you ask "what's the weather in Tokyo?", the model:
- Reads the user's message
- Reads the tool list
- Decides whether any tool is relevant
- If yes, returns a tool call request. If no, returns a normal text response.
The decision is made the same way the model decides anything: it's a next-token prediction conditioned on the prompt. Tool descriptions are part of the prompt, so good descriptions matter as much as a good system prompt.
What the model is actually trained on
Modern function-calling models (Llama 3, GPT-4, Claude, Mistral) are fine-tuned on examples of (tool list + user message + correct tool call). The training teaches the model two things:
- When to call a tool. Given a question that needs external information, output a tool call instead of guessing.
- How to format the call. Output valid JSON that matches the schema you provided.
Older or weaker models either skip step 1 (they hallucinate answers instead of calling tools) or fail at step 2 (they invent fields, mismatch types, or wrap JSON in prose). One of the practical reasons to use a recent model is that function calling becomes much more reliable.
A trace of one tool call
Say your messages list contains:
messages = [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "What's the weather in Tokyo?"},
]You call:
response = ollama.chat(model="llama3", messages=messages, tools=tools)The response comes back as a structured object. Pseudocode of what's inside:
response.message = {
"role": "assistant",
"content": "",
"tool_calls": [
{
"id": "call_abc123",
"function": {
"name": "get_weather",
"arguments": {"city": "Tokyo"},
},
},
],
}content is empty because the model has nothing to say yet. It's making a request. Your loop then runs get_weather(city="Tokyo"), captures "22C, sunny," and appends:
messages.append(response.message) # the assistant's tool call
messages.append({
"role": "tool",
"content": "22C, sunny",
"tool_call_id": "call_abc123",
})The next call to the model sees all four messages. Now it has the data and can produce a final text response.
Tool calls vs prompting
A common alternative to function calling is to ask the model to output a special token format ("Action: search\nArgs: weather Tokyo") and parse it yourself. This was the standard approach pre-2023 and is still how some research papers describe ReAct. It works, but:
| Approach | Reliability | Effort | Notes |
|---|---|---|---|
| Native function calling | High | Low | The model is fine-tuned for this |
| Custom prompt format | Medium | Medium | Works but you parse and validate |
| Plain text "I'll call X" | Low | High | The model often skips it or makes things up |
For new agents, use native function calling whenever your model supports it. We'll build a manual ReAct parser in Module 4 because it's instructive, but in production you should default to the native API.
Multiple tool calls in one response
The model can request multiple tools in a single response. Most APIs surface this as a list:
for call in response.message.tool_calls:
result = registry[call.function.name](**call.function.arguments)
messages.append({"role": "tool", "content": str(result), "tool_call_id": call.id})This matters for performance. If a model needs to look up three different cities, it can request all three in one turn instead of three sequential turns. You can run them in parallel:
import concurrent.futures
with concurrent.futures.ThreadPoolExecutor() as ex:
futures = {
call.id: ex.submit(registry[call.function.name], **call.function.arguments)
for call in response.message.tool_calls
}
for call_id, future in futures.items():
messages.append({"role": "tool", "content": str(future.result()), "tool_call_id": call_id})Parallel tool execution is one of the easiest performance wins in an agent system.
What 'tool' and 'function' really are
The OpenAI / Anthropic / Ollama APIs all use slightly different field names: tools vs functions, tool_calls vs function_call. The shape is the same: a name, a JSON-schema for arguments, a structured response object. If you understand one, you can read all of them.
Key takeaway
Function calling is not "the model running your code." It's the model returning a structured JSON request that your code decides whether to honor. The model picks a tool by reading its description, so the quality of your tool descriptions controls how often the model picks the right one. The next lesson is entirely about that: how to write tool descriptions and schemas the model will actually use correctly.
Done with this lesson?