Lesson 17 of 21Track 2

Agent safety and control

Human-in-the-loop approval gates

Confirm before executing destructive tools.

Video lesson Interactive exercise ~10 min

Video coming soon

When the right answer is "ask first"

Allow-lists and scopes are policies that decide automatically. Some calls deserve a human in the path. The right model for those is an approval gate: the loop pauses, surfaces the proposed action, and only proceeds when a human says yes.

This is the WAITING state from Module 4 lesson 2 doing real work. With a state machine, approval gates are a clean transition: EXECUTING -> WAITING -> EXECUTING (or CANCELLED). Without one, they're hacks bolted onto the loop body.

When to gate

Three triggers that almost always deserve gates:

Destructive or irreversible side effects

Anything that deletes, sends money, publishes content, or touches production. The cost of a wrong call is high; the cost of a 30-second human pause is low.

High-stakes external communication

Posting to a customer-facing channel, replying to a real email, sending a public Slack. These are visible mistakes that don't go away.

Actions outside the agent's normal pattern

If the agent is about to do something it has not done in this session before, that is a signal worth pausing on. Novelty often indicates either a real new requirement or a hallucination.

You don't have to gate everything risky. The bar is "the cost of getting this wrong exceeds the cost of asking." Routine tool calls that are easily reversible should not require a gate.

The shape of a gate

def execute_with_gate(call, agent, gate_provider):
    if needs_gate(call, agent):
        decision = gate_provider.ask(call)
        if decision.approved:
            return TOOL_REGISTRY[call.name](**call.args)
        else:
            return {
                "status": "denied",
                "reason": "human declined approval",
                "feedback": decision.feedback,
            }
    return TOOL_REGISTRY[call.name](**call.args)

gate_provider is the integration with however your humans provide approval: Slack message with buttons, web UI, REST endpoint, CLI prompt for dev work. The agent loop does not care which.

Synchronous vs asynchronous gates

Synchronous

The loop blocks until the human responds. Simple to implement: the orchestrator transitions to WAITING, polls or waits on a callback, transitions back when the answer arrives.

Works well for short tasks where the human is expected to be available within seconds. Breaks if the human takes minutes or hours: your loop is sitting on resources doing nothing.

Asynchronous

The loop persists state when it hits a gate (using the checkpointing from the previous module), exits, and waits for an external trigger to resume. The human responds at their leisure; an event resumes the orchestrator.

def execute_with_async_gate(call, loop, store):
    if needs_gate(call, loop.agent):
        loop.pending_gate = call
        loop.transition(State.WAITING)
        store.save(loop.snapshot())
        return None  # caller's job is over; resume happens later
    return TOOL_REGISTRY[call.name](**call.args)
 
 
def on_gate_response(loop_id, decision, store):
    snapshot = store.load(loop_id)
    loop = Loop.from_snapshot(snapshot)
    if decision.approved:
        loop.transition(State.EXECUTING)
        result = TOOL_REGISTRY[loop.pending_gate.name](**loop.pending_gate.args)
        loop.results.append(result)
    else:
        loop.transition(State.FAILED)
    drive(loop, store)

This is the right pattern for production agents that need human approval but can't afford to block a worker process for an hour.

What to show the human

The information you surface determines how good the decision is. Bad approval prompts produce rubber-stamped approvals.

A good approval payload includes:

What the agent wants to do (tool name + args, in human-readable form).
Why (the agent's stated reason, taken from its reflection).
Context (what step of the plan; what has happened so far).
Risk class (destructive? irreversible? customer-facing?).
Alternatives the agent considered, if any.

A user who can scan a clear summary in 5 seconds is more useful than one who has to read a 1000-token transcript.

APPROVAL REQUESTED
Agent:      ops-agent
Action:     deploy(env="production", sha="a3f1")
Reason:     reverts the staging crash from this morning by rolling back commit a3f1
Risk:       production deploy (irreversible)
Plan step:  3 of 4
[Approve] [Deny]

If the human can't make a decision from this, the approval prompt is missing information.

Approval as a teaching signal

Approvals are also a structured way to learn what your agent should and shouldn't do. Track approve / deny rates per tool, per agent, per call pattern. If an action is approved 99% of the time, it probably does not need a gate. If it is denied frequently, the agent is asking too aggressively or the policy is not encoded right.

Over time you can use approval data to tighten policy: ungate calls that are always approved; deepen scopes for calls that are often denied with similar feedback.

Failure modes

"Just hit approve"

If humans rubber-stamp approvals because they trust the agent, gates become theater. Two fixes: gate fewer things (so each gate matters), and require typed input for high-stakes approvals ("type APPROVE to confirm production deploy"). The friction is the point.

Stale approvals

A human approves a deploy at 2pm; the loop resumes at 4pm. The world may have changed in those two hours. For sensitive actions, attach a TTL to the approval ("approval valid for 5 minutes"). After expiry, re-prompt.

Approval bypass

An admin approval that bypasses gates "just for now" tends to become a permanent escape hatch. Put gates on bypasses too: any approval-bypass action should itself require multi-party sign-off.

Approval is not a substitute for scopes

Some teams treat approval gates as a way to skip writing strong scopes: "we'll just have a human check everything." That doesn't scale and produces approval fatigue. Use scopes to prevent obviously-wrong calls from ever being proposed; use approvals only for the smaller set where the call is plausibly correct but the cost of being wrong is high.

Approval as a state-machine transition

Tying it back to Module 4: an approval is EXECUTING -> WAITING -> EXECUTING (approved) or CANCELLED (denied). With a state machine and checkpoints, this works for arbitrarily long human delays. Without them, it works only for synchronous prompts in short sessions.

This is also why we built the state machine before talking about safety. Safety controls compose with the loop's transitions; without explicit transitions, safety controls live in ad-hoc places and slowly diverge from the rest of the loop.

Key takeaway

Approval gates pause the loop on high-cost actions, surface a clear payload to a human, and resume only after explicit approval. Use them sparingly: scopes prevent the obviously-wrong calls; approvals belong to the gray area where the agent's choice is plausible but the consequences justify a human check. Async gates with checkpointing make this work in production. The next lesson covers the layer below scopes and gates: input/output guardrails that sanitize what flows in and out of tools.

>_human-approval-gates.py

Loading editor...

Output will appear here.

Done with this lesson?

Permission scopes per agent

Agent safety and control

Input/output guardrails

Agent safety and control