Why multiple agents?

Separation of concerns for LLMs.

Video lesson ~10 min

Video coming soon

Specialization, not population

The previous module ended with three failure modes that single agents hit: tool overload, context pollution, and confused reasoning across domains. Each one points to the same fix: stop asking one agent to do everything.

But "use more agents" is not actually a useful recommendation. The interesting question is what each agent owns. Multi-agent systems work when each agent has a clear domain, a focused tool set, and a clean context. They fail when you've just split a monolith into two monoliths that now have to talk.

This lesson is about why multi-agent earns its complexity, and what the unit of decomposition should be.

The honest argument for multi-agent

It is not "more agents are smarter." That framing leads people to spawn five-agent committees that produce worse outputs than a single well-prompted agent.

The honest argument is separation of concerns for LLMs. Same idea as separation of concerns in software: each module gets a smaller responsibility, a smaller surface area, and a smaller context. The benefits are the same too:

Smaller prompts. Each agent's system prompt covers one domain in depth, not many domains shallowly.
Smaller tool registries. Each agent sees only the tools it owns. Tool selection accuracy goes up.
Cleaner context. Each agent's message history is about its own work, not everyone else's.
Independent failure. One agent stuck in a loop does not poison the others' state.

These map almost one-for-one onto the failure modes from the previous module. That is the whole point: the multi-agent design exists to fix specific failures, not to be impressive.

Decomposing by domain, not by step

When teams first try multi-agent, the most common mistake is decomposing by step: "the planner agent figures out steps, the executor agent runs them, the writer agent writes the final answer." This sounds clean but it usually performs worse than a monolith because:

Every step has to serialize state to the next agent.
The planner has no way to react to what the executor finds.
The writer is starved of context the executor saw.

A better decomposition is by domain. The agents own knowledge areas, not pipeline stages.

Bad split (by step):
[planner] -> [executor] -> [writer]
 
Good split (by domain):
[orchestrator]
   ├─ [code agent]   (owns file/test/git tools)
   ├─ [data agent]   (owns sql/dashboard tools)
   └─ [comms agent]  (owns slack/email tools)

In the good split, each worker has expertise the others lack. The orchestrator routes work to whichever specialist is right for the question. Domain decomposition matches how human teams actually work.

What an agent boundary should give you

When deciding whether two responsibilities belong in one agent or two, ask three questions:

Do the system prompts compete?

If you would write two materially different system prompts (different role, different style, different conventions) for the two responsibilities, they probably want to be two agents. Stuffing both prompts into one is what causes the model to flip-flop between personas.

Do the tool sets overlap?

If the two responsibilities use almost the same tools, you do not have a domain split, you have a workflow split. Keep them in one agent and add structure with prompts or routing.

If the tool sets are mostly disjoint (the code agent never deploys, the ops agent never reads source files), the split is real. The disjoint tool sets are a strong signal you have actual domains.

Does state need to flow continuously?

If responsibility A's working memory is responsibility B's working memory (one looks at a file, the other writes about it), keep them together. The cost of serializing context across an agent boundary is real. Multi-agent works best when each agent operates on its own data and only the summary needs to flow.

The cost side

Multi-agent is not free. A two-agent system pays:

At least one extra LLM call per handoff. The orchestrator picks who to dispatch to. That is a turn that did not exist in a monolith.
Serialization overhead. Whatever the worker found has to be summarized for the orchestrator to route the next step.
Coordination bugs. Race conditions, cycles, agents calling each other in circles, deadlock when both wait on each other. None of these exist in a monolith.
Worse debuggability. A bug now spans two trace logs and two prompts. Track 4 Module 3 covers tracing for this exact reason.

If your workload comfortably fits a monolith, all of those costs are dead weight. The split has to pay for itself.

A small heuristic

Before splitting, check the diff:

Property	Monolith	Two-agent split
Tools per prompt	12	6 + 6
System prompt length	1500	800 + 800
Avg latency per request	1 LLM call x N turns	1 LLM call x N turns + handoff calls
Quality on cross-domain task	Confused	Better
Quality on single-domain task	Fine	Same or worse (handoff overhead)
Debugging difficulty	One trace	Two traces, plus handoff log

If your real workload is mostly cross-domain, the split is a win. If it is mostly single-domain with a few cross-domain edge cases, the monolith plus prompt routing usually wins.

The 'two pizzas' rule for agents

A useful rule of thumb borrowed from team design: an agent should own roughly the work that one focused engineer could hold in their head for a session. If your agent's responsibilities would not fit on a junior dev's onboarding doc, the agent is probably doing too much. Split it.

Communication is the hard part

Once you decide to split, the design problem moves from "what does my agent do" to "how do my agents talk." The next two lessons are about the communication side: which message-passing pattern to use, and how to structure the conversation between agents so the system stays coherent.

The mechanics are simpler than the intuition behind them. Patterns like supervisor/worker or pub/sub are not hard to implement. The hard part is keeping each agent's context clean as messages flow through. We will see that done badly (everyone broadcasting to everyone) and done well (structured handoffs with summaries) in the next two lessons.

Key takeaway

Multi-agent earns its complexity when single-agent hits one of three failure modes (tool overload, context pollution, confused reasoning across domains) and the within-session fixes are not enough. Decompose by domain, not by pipeline step. Specialization beats generalism, but specialization with a heavy handoff tax can lose to a clean monolith. The next lesson covers how the agents actually pass messages, where the costs live, and how to keep them small.

Done with this lesson?

Limits of single agents

Single agent architectures

Message passing patterns

Multi-agent communication