The monolithic agent

One LLM, many tools.

Video lesson ~10 min

Video coming soon

One brain, all the tools

The simplest agent architecture is also the one most engineers reach for first: a single LLM, a single system prompt, a single message list, and a tool registry that contains every capability the agent might need. We call this the monolithic agent.

It's the architecture every Track 1 lesson built up to. It's also where most production agents live, even at large companies. Multi-agent systems get the press, but for a huge swath of real workloads, a well-tuned monolith is faster, cheaper, and more predictable than anything fancier.

This module starts here because you have to understand the shape of the monolith before you can articulate why and when to break it up.

What a monolith looks like in code

import ollama
 
SYSTEM_PROMPT = """You are an engineering assistant for the Acme team.
You can read files, search the codebase, query the staging database,
fetch GitHub issues, send Slack messages, and run tests.
 
Use tools to ground your answers in real data. Stop and answer when
you have enough information.
"""
 
tools = [
    {"type": "function", "function": read_file_schema},
    {"type": "function", "function": search_code_schema},
    {"type": "function", "function": query_db_schema},
    {"type": "function", "function": fetch_issue_schema},
    {"type": "function", "function": post_slack_schema},
    {"type": "function", "function": run_tests_schema},
]
 
registry = {
    "read_file": read_file,
    "search_code": search_code,
    "query_db": query_db,
    "fetch_issue": fetch_issue,
    "post_slack": post_slack,
    "run_tests": run_tests,
}
 
def monolith(user_input):
    messages = [
        {"role": "system", "content": SYSTEM_PROMPT},
        {"role": "user", "content": user_input},
    ]
    return run_loop(messages, tools, registry)

One model. One prompt. One loop. The model sees every tool on every turn, decides which to call, and works the request to completion.

Why monoliths work better than they look like they should

If you read agent literature you'd think this design is obviously bad. Six unrelated tools in one prompt? An LLM expected to be a code reader, a database admin, a Slack writer, and a test runner all at once? Surely that's worse than specialized agents?

Often it isn't. Three reasons:

1. The model handles role switching surprisingly well

Modern instruction-tuned models are trained on enormous mixed-domain data. Switching between "read this Python file" and "post a Slack message" doesn't require a different model. It requires the same model with different inputs.

2. State is shared for free

In a multi-agent system, passing context between agents is expensive: you have to serialize, summarize, decide what each agent needs to know. In a monolith, the conversation history is the shared state. The model that just read a file already knows what's in it when it decides to post a Slack message about it.

3. Latency is bounded by one model

A multi-agent system adds at least one extra LLM call per agent boundary (the orchestrator deciding who to call next, plus each worker's own reasoning). A monolith adds zero. For latency-sensitive workloads, that's a real difference.

When monoliths are the right call

Workload	Monolith fit
Domain-focused agent (just code, just sales, just ops)	Excellent
Tool count under ~15	Excellent
Single user-facing conversation	Excellent
Mixed domains where state must flow between them	Often good
Long-horizon tasks requiring planning	Mediocre
Tasks needing parallel work across independent subtasks	Bad

The first three rows are the home court. Most early-stage products fit here. Don't talk yourself out of a monolith because it sounds unsophisticated.

The system prompt does a lot of work

In a monolith, the system prompt is the single most important asset in the codebase. It establishes:

The domain. What the agent is for and what it isn't.
The persona. How the agent talks, how confident it is, when it asks for clarification.
The tool conventions. When to reach for tools versus answer from memory.
The exit conditions. When to stop tool-calling and write a final answer.

Spend real time on it. A 600-token system prompt that teaches the model how to use your tools well will outperform a 200-token prompt with a smarter model. We covered the schema-as-prompt idea in Track 1 Module 3; the system prompt is the layer above that.

A good monolith system prompt has structure:

ROLE: [one paragraph]
DOMAIN: [one paragraph on what's in scope, what isn't]
TOOLS: [one line per tool: when to use it, when not to]
STYLE: [terse, formal, casual, etc]
TERMINATION: [when to stop calling tools and finalize]

This template scales. You can grow each section as the agent's surface area grows, without rewriting from scratch.

Cost and token shape

Per turn, a monolith pays for:

The system prompt
The full tool definitions
The conversation history
The current user message
The reasoning output

The first two are constant per turn. The middle one grows over a session. We covered context budgeting in Track 1 Module 5; the same levers apply here: trim the prompt, summarize old turns, retrieve only relevant facts. If you do those, a monolith handles dozens of turns without the budget exploding.

Resist the urge to multi-agent prematurely

The most common mistake at this stage is to read about supervisor/worker patterns and immediately reach for them on a workload that a monolith would handle fine. Multi-agent adds real complexity: orchestration code, state passing, debugging across boundaries. Build the monolith first. Push it until it breaks. Then split it. The breakage points are what teach you where to split.

What this track is going to do

The next two lessons stay with the single-agent design. We'll add prompt routing (one agent, multiple personas) and then look at exactly where monoliths break down. From Module 2 onward we move into actual multi-agent territory: communication, topologies, state management, safety, metacognition.

By the end of Track 2 you'll know when each pattern is right, what each one costs, and how to evolve a monolith into a multi-agent system without rewriting from scratch.

Key takeaway

A monolith is one LLM with all the tools and one big system prompt. It's the boring answer, and it's the right answer more often than agent literature suggests. Track 1 already built one; this module's job is to make you confident about when to keep it monolithic and when to break it apart. The next lesson takes the smallest step beyond a pure monolith: prompt routing.

Done with this lesson?

Prompt routing

Single agent architectures