Prompt routing

One agent, multiple personas.

Interactive exercise ~10 min

One agent, many personas

A pure monolith uses one system prompt for everything. That works until your workload genuinely covers different modes that need different instructions: a "research" mode, a "code" mode, an "ops" mode. You don't need separate agents to handle this. You can keep one agent and swap the system prompt at runtime based on the user's intent.

This is prompt routing. It's the smallest possible step beyond a monolith and it covers a surprising amount of ground.

The mechanism

Two stages:

Classify the incoming request into one of N known categories.
Run the agent loop with a system prompt that matches the category.

Both stages can use the same model. The classifier is a tiny LLM call (often one token of output). The runtime swap is a string substitution.

ROUTES = {
    "research": SYSTEM_PROMPT_RESEARCH,
    "code": SYSTEM_PROMPT_CODE,
    "ops": SYSTEM_PROMPT_OPS,
    "general": SYSTEM_PROMPT_GENERAL,
}
 
CLASSIFIER_PROMPT = """Classify the user's request into one of:
- research: questions that need information gathering
- code: requests to read, write, or fix code
- ops: requests about deployments, infrastructure, or monitoring
- general: anything else
 
Reply with ONE WORD only.
"""
 
def classify(user_input, model="llama3"):
    response = ollama.chat(model=model, messages=[
        {"role": "system", "content": CLASSIFIER_PROMPT},
        {"role": "user", "content": user_input},
    ], options={"temperature": 0})
    label = response.message.content.strip().lower()
    return label if label in ROUTES else "general"
 
 
def routed_agent(user_input, tools, registry):
    route = classify(user_input)
    system = ROUTES[route]
    messages = [
        {"role": "system", "content": system},
        {"role": "user", "content": user_input},
    ]
    return run_loop(messages, tools, registry, route=route)

The agent itself is unchanged from Track 1. The new piece is the classifier in front of it.

Why this is meaningfully different from a monolith

You might ask: why not just put all four personas in one big system prompt and let the model figure it out? Sometimes that works. Often it doesn't, for two reasons:

Each persona gets a tighter prompt

A specialist prompt for "code" can be 800 tokens of code-specific instructions, conventions, and tool guidance. A specialist prompt for "ops" can be 800 tokens of completely different content. If you tried to fit both into one prompt, you'd have 1600 tokens of context the model has to filter through on every turn, half of which is irrelevant for any given request.

Routing lets each persona pay only for its own context.

Tools can be scoped per route

You can also restrict tool access by route. The "research" persona doesn't need deploy_to_production. The "ops" persona doesn't need summarize_paper. Less to choose from means more reliable choices.

TOOLS_BY_ROUTE = {
    "research": [search_web, read_url, take_notes],
    "code": [read_file, edit_file, run_tests],
    "ops": [check_deploy, query_metrics, alert_team],
    "general": [search_web, read_file],
}
 
def routed_agent(user_input, registry):
    route = classify(user_input)
    return run_loop(
        messages=[
            {"role": "system", "content": ROUTES[route]},
            {"role": "user", "content": user_input},
        ],
        tools=TOOLS_BY_ROUTE[route],
        registry=registry,
        route=route,
    )

Now each route gets a focused prompt AND a focused tool list. Same agent skeleton, four different runtime configurations.

Routing accuracy is the whole game

The classifier is the only new failure mode. If it picks the wrong route, the agent runs with the wrong prompt and the wrong tools. There are three things that move classifier accuracy:

1. Use temperature 0

The classifier should be deterministic. Set temperature=0 so the same input always produces the same label.

2. Constrain the output

The classifier should output one word from a known set. If the model returns something else, fall back to the default route. Production systems sometimes use a structured output API to guarantee the format.

3. Provide examples in the classifier prompt

A few examples beat a long verbal description.

CLASSIFIER_PROMPT = """Classify the user's request into one of: research, code, ops, general.
 
Examples:
"How do other libraries handle this?" -> research
"Fix the bug in auth.py" -> code
"Why is the staging deploy failing?" -> ops
"What time is it?" -> general
 
Reply with ONE WORD only.
"""

Three or four examples per category usually gets you above 95% accuracy on the categories that matter.

When routing is enough vs when it isn't

Prompt routing fits a specific shape of problem: requests that cleanly split into a small number of disjoint categories where each category benefits from its own prompt and tools. If your workload doesn't fit that shape, routing won't help.

Pattern	Routing fits?
Customer support: technical / billing / general	Yes
Coding assistant: read / write / debug / refactor	Yes
Research agent that has to mix retrieval, writing, and analysis	No (use a real planner)
Workflow with strict ordered steps	No (use a pipeline)
Tasks where multiple personas need to collaborate on the same request	No (use multi-agent)

The third row is the most common one to misclassify. If a single user request needs the combination of two personas (search the codebase AND query the database AND draft a fix), routing forces an arbitrary either-or choice. That's where multi-agent starts to earn its complexity.

Cost of the classifier

The classifier adds one LLM call per request. With temperature 0 and a small prompt this is fast and cheap, often under 100ms and a few dozen tokens. For most workloads this is invisible. For high-volume APIs you can cache classifications by similarity (treat the classifier as a key into a lookup table) or use a tiny non-LLM classifier (logistic regression, fastText) to drop the cost to microseconds.

Routing as a stepping stone

Prompt routing is a useful pattern in its own right, but it's also a stepping stone. The structure (classify, then dispatch) is the same structure used by orchestrator agents in Track 2 Module 3. The difference is that a real orchestrator dispatches to agents instead of prompts, and the dispatch can be recursive (agents dispatching to other agents).

If you build prompt routing first, the move to a real orchestrator is small. The classifier becomes the orchestrator. The persona prompts become the worker agents. Same wiring, more autonomy at each node.

Key takeaway

Prompt routing keeps the single-agent architecture but adds one classification call up front so each request gets a focused prompt and tool list. It's cheaper than multi-agent, more focused than a monolith, and the right answer for a lot of "I need different modes" workloads. The next lesson is the flip side: where this whole single-agent architecture finally breaks down and why you'd want anything more.

>_prompt-routing.py

Loading editor...

Output will appear here.

Done with this lesson?

The monolithic agent

Single agent architectures

Limits of single agents

Single agent architectures