Function calling: how models select tools

The mechanics of tool selection and invocation.

Video lesson ~10 min

Video coming soon

What "function calling" actually is

The phrase "function calling" sounds like the model executes code. It doesn't. The model never executes anything. What it does is produce a structured request that asks you to call a function, and your code decides whether to honor the request.

A function call from the model is, fundamentally, JSON:

{
  "name": "get_weather",
  "arguments": { "city": "Tokyo" }
}

That's it. The model returns this object instead of free-form text, and your loop is responsible for matching the name to a real Python function and invoking it. The "calling" happens entirely on your side.

How the model picks a tool

You hand the model a list of tool definitions. Each tool has a name, a description, and a JSON schema for its arguments:

tools = [
    {
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Get the current weather for a city.",
            "parameters": {
                "type": "object",
                "properties": {
                    "city": {"type": "string", "description": "City name"},
                },
                "required": ["city"],
            },
        },
    },
    {
        "type": "function",
        "function": {
            "name": "search_web",
            "description": "Search the public web and return result snippets.",
            "parameters": {
                "type": "object",
                "properties": {
                    "query": {"type": "string"},
                },
                "required": ["query"],
            },
        },
    },
]

The model sees those definitions in its context. When you ask "what's the weather in Tokyo?", the model:

Reads the user's message
Reads the tool list
Decides whether any tool is relevant
If yes, returns a tool call request. If no, returns a normal text response.

The decision is made the same way the model decides anything: it's a next-token prediction conditioned on the prompt. Tool descriptions are part of the prompt, so good descriptions matter as much as a good system prompt.

What the model is actually trained on

Modern function-calling models (Llama 3, GPT-4, Claude, Mistral) are fine-tuned on examples of (tool list + user message + correct tool call). The training teaches the model two things:

When to call a tool. Given a question that needs external information, output a tool call instead of guessing.
How to format the call. Output valid JSON that matches the schema you provided.

Older or weaker models either skip step 1 (they hallucinate answers instead of calling tools) or fail at step 2 (they invent fields, mismatch types, or wrap JSON in prose). One of the practical reasons to use a recent model is that function calling becomes much more reliable.

A trace of one tool call

Say your messages list contains:

messages = [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "What's the weather in Tokyo?"},
]

You call:

response = ollama.chat(model="llama3", messages=messages, tools=tools)

The response comes back as a structured object. Pseudocode of what's inside:

response.message = {
    "role": "assistant",
    "content": "",
    "tool_calls": [
        {
            "id": "call_abc123",
            "function": {
                "name": "get_weather",
                "arguments": {"city": "Tokyo"},
            },
        },
    ],
}

content is empty because the model has nothing to say yet. It's making a request. Your loop then runs get_weather(city="Tokyo"), captures "22C, sunny," and appends:

messages.append(response.message)  # the assistant's tool call
messages.append({
    "role": "tool",
    "content": "22C, sunny",
    "tool_call_id": "call_abc123",
})

The next call to the model sees all four messages. Now it has the data and can produce a final text response.

Tool calls vs prompting

A common alternative to function calling is to ask the model to output a special token format ("Action: search\nArgs: weather Tokyo") and parse it yourself. This was the standard approach pre-2023 and is still how some research papers describe ReAct. It works, but:

Approach	Reliability	Effort	Notes
Native function calling	High	Low	The model is fine-tuned for this
Custom prompt format	Medium	Medium	Works but you parse and validate
Plain text "I'll call X"	Low	High	The model often skips it or makes things up

For new agents, use native function calling whenever your model supports it. We'll build a manual ReAct parser in Module 4 because it's instructive, but in production you should default to the native API.

Multiple tool calls in one response

The model can request multiple tools in a single response. Most APIs surface this as a list:

for call in response.message.tool_calls:
    result = registry[call.function.name](**call.function.arguments)
    messages.append({"role": "tool", "content": str(result), "tool_call_id": call.id})

This matters for performance. If a model needs to look up three different cities, it can request all three in one turn instead of three sequential turns. You can run them in parallel:

import concurrent.futures
 
with concurrent.futures.ThreadPoolExecutor() as ex:
    futures = {
        call.id: ex.submit(registry[call.function.name], **call.function.arguments)
        for call in response.message.tool_calls
    }
    for call_id, future in futures.items():
        messages.append({"role": "tool", "content": str(future.result()), "tool_call_id": call_id})

Parallel tool execution is one of the easiest performance wins in an agent system.

What 'tool' and 'function' really are

The OpenAI / Anthropic / Ollama APIs all use slightly different field names: tools vs functions, tool_calls vs function_call. The shape is the same: a name, a JSON-schema for arguments, a structured response object. If you understand one, you can read all of them.

Key takeaway

Function calling is not "the model running your code." It's the model returning a structured JSON request that your code decides whether to honor. The model picks a tool by reading its description, so the quality of your tool descriptions controls how often the model picks the right one. The next lesson is entirely about that: how to write tool descriptions and schemas the model will actually use correctly.

Done with this lesson?

Error handling inside the loop

The orchestration loop

Designing good tool schemas

Tool use