Lesson 15 of 21Track 2

Agent safety and control

Tool whitelisting and blacklisting

Controlling what agents can and can't do.

Slideshow Interactive exercise 5 min read

Allow lists, not deny lists

The simplest safety control is also the most under-used: deciding which tools an agent is allowed to call.

Mental model: Unix capabilities. Principle of least privilege. An agent that doesn't need to send email shouldn't have a send-email tool, period.

1 / 7

Allow lists, not deny lists

The simplest safety control is also the most under-used: deciding which tools an agent is allowed to call. You already saw a version of this in Module 1 with prompt routing's tool scoping. This lesson generalizes the idea: every agent (or every step, or every session) should run with an explicit list of allowed tools, and everything else is forbidden by default.

The mental model is the same as Unix capabilities: principle of least privilege. An agent that does not need to send email should not have a send-email tool in its registry, period. Even if a malicious prompt convinced it to send one, the call would fail because the tool is not in the allow list.

Whitelisting vs blacklisting

Approach	What it says
Whitelist	Here is the closed set of tools you are allowed to call.
Blacklist	Here is the set of tools you are not allowed to call; everything else is fine.

Whitelisting is dramatically safer because it fails closed: a tool you forgot to mention is automatically forbidden. Blacklisting fails open: a tool you forgot to deny is automatically allowed.

In production, always whitelist. The blacklist mindset shows up in older systems and security incident reports for a reason.

Where the allow list lives

Three layers, in increasing scope:

Per-agent (default)

Each agent type has a static allow list defined alongside its system prompt and registry. The code agent allows read_file, edit_file, run_tests. The ops agent allows check_deploy, query_metrics. The lists do not overlap unless intentional.

CODE_AGENT_TOOLS = {"read_file", "edit_file", "run_tests"}
OPS_AGENT_TOOLS = {"check_deploy", "query_metrics", "rollback"}
 
 
def make_agent(name, system_prompt, allowed_tools):
    return Agent(
        name=name,
        system_prompt=system_prompt,
        tools={t: TOOL_REGISTRY[t] for t in allowed_tools},
    )

Per-session (override)

Some sessions need narrower lists than the agent's default. A read-only debugging session might disable all destructive tools regardless of which agent is running. The session-level allow list is the intersection of the agent's list and the session's policy.

def session_tools(agent, session):
    return agent.allowed & session.allowed

Per-step (rare)

For the highest-risk operations, individual steps in a plan might further restrict tools. "This step is just gathering information; no write tools allowed." Most systems do not need this granularity, but it is the right hammer for any step that is purely investigative.

How enforcement works

Two enforcement points:

Before exposing the tool to the model

Only present tools that are allowed for this turn in the tool list sent to the model. The model literally does not see disallowed tools, so it does not try to call them.

Before executing any tool call the model produced

Even if the model somehow tries to call a tool not in the list (older model, jailbreak attempt, schema confusion), the executor checks the list and refuses.

def execute_tool_call(call, allowed_tools):
    if call.name not in allowed_tools:
        return {
            "status": "denied",
            "reason": f"tool {call.name!r} not in allow list",
        }
    return TOOL_REGISTRY[call.name](**call.args)

Defense in depth: even though the model "shouldn't" call a disallowed tool because it isn't shown, you still check at execution. Skipping the second check is a real source of bugs because tool definitions can leak through prompts, prior conversation history, or other edge cases.

Returning denials gracefully

When a tool call is denied, the model should know why so it can adapt. Return a structured "denied" message that the next iteration can read:

{
    "role": "tool",
    "tool_call_id": call.id,
    "content": json.dumps({
        "error": "tool not allowed in this context",
        "tool": call.name,
        "hint": "this agent can only read; write tools are unavailable",
    }),
}

The model sees the denial and replans. This is much better than throwing an exception and crashing the loop.

Allow lists for multi-step tools

Some "tools" are really wrappers around lower-level capabilities. deploy_to_prod might internally call build_image, push_to_registry, and update_kubernetes_manifest. Allow-listing deploy_to_prod implicitly allows the underlying capabilities for the duration of that call.

This composition matters because it is where allow-lists silently grant more than they appear to. If you whitelist a "high-level" tool, audit what it can do under the hood. The principle of least privilege applies to the actual capability, not just the API surface name.

Auditing the list

Treat the allow list as a security artifact. Three things to do:

Code-review changes. Adding a tool to an agent's list should be a deliberate decision, not a config blip.
Log every denial. Each denied outcome is a signal: either the model is hallucinating tools (a bug) or you forgot to allow something the agent legitimately needs (also a bug).
Prune unused tools. If an agent has not called a tool in months, it probably should not have it. Remove and let users tell you if they need it back.

A monolith with everything is the worst case

The single most dangerous pattern is a monolithic agent with every tool in the codebase available to it. Any prompt-injection or hallucination can reach any tool. Even if you cannot decompose into multi-agent yet, scope tool access by route or by step. Allow-listing on a single agent gets you most of the safety benefit of multi-agent without the orchestration complexity.

What this enables

Once allow-lists are the foundation, the rest of the safety story falls into place:

Permission scopes (next lesson) add per-tool argument-level constraints on top of "yes/no this tool."
Approval gates add a human-in-the-loop step before specific tool calls fire.
Guardrails sanitize what flows in and out of allowed tools.

All of those layers sit on top of "this agent can call this tool, period." Without that base layer, every other control is fighting a battle it should not have to.

Key takeaway

Allow-lists are the safest baseline tool control: an agent runs with an explicit closed set of permitted tools, scoped per-agent and optionally per-session. Whitelist, never blacklist. Enforce both at tool-presentation time and at execution time. Log denials. Audit changes to the list. Everything else in this module sits on this foundation. The next lesson goes deeper on per-tool argument constraints, the most common failure point even within an allow list.

>_tool-whitelist-blacklist.py

Loading editor...

Output will appear here.

Done with this lesson?

Checkpointing and resumability

State management

Permission scopes per agent

Agent safety and control