Agent safety and control
Human-in-the-loop approval gates
Confirm before executing destructive tools.
Video coming soon
When the right answer is "ask first"
Allow-lists and scopes are policies that decide automatically. Some calls deserve a human in the path. The right model for those is an approval gate: the loop pauses, surfaces the proposed action, and only proceeds when a human says yes.
This is the WAITING state from Module 4 lesson 2 doing real work. With a state machine, approval gates are a clean transition: EXECUTING -> WAITING -> EXECUTING (or CANCELLED). Without one, they're hacks bolted onto the loop body.
When to gate
Three triggers that almost always deserve gates:
Destructive or irreversible side effects
Anything that deletes, sends money, publishes content, or touches production. The cost of a wrong call is high; the cost of a 30-second human pause is low.
High-stakes external communication
Posting to a customer-facing channel, replying to a real email, sending a public Slack. These are visible mistakes that don't go away.
Actions outside the agent's normal pattern
If the agent is about to do something it has not done in this session before, that is a signal worth pausing on. Novelty often indicates either a real new requirement or a hallucination.
You don't have to gate everything risky. The bar is "the cost of getting this wrong exceeds the cost of asking." Routine tool calls that are easily reversible should not require a gate.
The shape of a gate
def execute_with_gate(call, agent, gate_provider):
if needs_gate(call, agent):
decision = gate_provider.ask(call)
if decision.approved:
return TOOL_REGISTRY[call.name](**call.args)
else:
return {
"status": "denied",
"reason": "human declined approval",
"feedback": decision.feedback,
}
return TOOL_REGISTRY[call.name](**call.args)gate_provider is the integration with however your humans provide approval: Slack message with buttons, web UI, REST endpoint, CLI prompt for dev work. The agent loop does not care which.
Synchronous vs asynchronous gates
Synchronous
The loop blocks until the human responds. Simple to implement: the orchestrator transitions to WAITING, polls or waits on a callback, transitions back when the answer arrives.
Works well for short tasks where the human is expected to be available within seconds. Breaks if the human takes minutes or hours: your loop is sitting on resources doing nothing.
Asynchronous
The loop persists state when it hits a gate (using the checkpointing from the previous module), exits, and waits for an external trigger to resume. The human responds at their leisure; an event resumes the orchestrator.
def execute_with_async_gate(call, loop, store):
if needs_gate(call, loop.agent):
loop.pending_gate = call
loop.transition(State.WAITING)
store.save(loop.snapshot())
return None # caller's job is over; resume happens later
return TOOL_REGISTRY[call.name](**call.args)
def on_gate_response(loop_id, decision, store):
snapshot = store.load(loop_id)
loop = Loop.from_snapshot(snapshot)
if decision.approved:
loop.transition(State.EXECUTING)
result = TOOL_REGISTRY[loop.pending_gate.name](**loop.pending_gate.args)
loop.results.append(result)
else:
loop.transition(State.FAILED)
drive(loop, store)This is the right pattern for production agents that need human approval but can't afford to block a worker process for an hour.
What to show the human
The information you surface determines how good the decision is. Bad approval prompts produce rubber-stamped approvals.
A good approval payload includes:
- What the agent wants to do (tool name + args, in human-readable form).
- Why (the agent's stated reason, taken from its reflection).
- Context (what step of the plan; what has happened so far).
- Risk class (destructive? irreversible? customer-facing?).
- Alternatives the agent considered, if any.
A user who can scan a clear summary in 5 seconds is more useful than one who has to read a 1000-token transcript.
APPROVAL REQUESTED
Agent: ops-agent
Action: deploy(env="production", sha="a3f1")
Reason: reverts the staging crash from this morning by rolling back commit a3f1
Risk: production deploy (irreversible)
Plan step: 3 of 4
[Approve] [Deny]If the human can't make a decision from this, the approval prompt is missing information.
Approval as a teaching signal
Approvals are also a structured way to learn what your agent should and shouldn't do. Track approve / deny rates per tool, per agent, per call pattern. If an action is approved 99% of the time, it probably does not need a gate. If it is denied frequently, the agent is asking too aggressively or the policy is not encoded right.
Over time you can use approval data to tighten policy: ungate calls that are always approved; deepen scopes for calls that are often denied with similar feedback.
Failure modes
"Just hit approve"
If humans rubber-stamp approvals because they trust the agent, gates become theater. Two fixes: gate fewer things (so each gate matters), and require typed input for high-stakes approvals ("type APPROVE to confirm production deploy"). The friction is the point.
Stale approvals
A human approves a deploy at 2pm; the loop resumes at 4pm. The world may have changed in those two hours. For sensitive actions, attach a TTL to the approval ("approval valid for 5 minutes"). After expiry, re-prompt.
Approval bypass
An admin approval that bypasses gates "just for now" tends to become a permanent escape hatch. Put gates on bypasses too: any approval-bypass action should itself require multi-party sign-off.
Approval is not a substitute for scopes
Some teams treat approval gates as a way to skip writing strong scopes: "we'll just have a human check everything." That doesn't scale and produces approval fatigue. Use scopes to prevent obviously-wrong calls from ever being proposed; use approvals only for the smaller set where the call is plausibly correct but the cost of being wrong is high.
Approval as a state-machine transition
Tying it back to Module 4: an approval is EXECUTING -> WAITING -> EXECUTING (approved) or CANCELLED (denied). With a state machine and checkpoints, this works for arbitrarily long human delays. Without them, it works only for synchronous prompts in short sessions.
This is also why we built the state machine before talking about safety. Safety controls compose with the loop's transitions; without explicit transitions, safety controls live in ad-hoc places and slowly diverge from the rest of the loop.
Key takeaway
Approval gates pause the loop on high-cost actions, surface a clear payload to a human, and resume only after explicit approval. Use them sparingly: scopes prevent the obviously-wrong calls; approvals belong to the gray area where the agent's choice is plausible but the consequences justify a human check. Async gates with checkpointing make this work in production. The next lesson covers the layer below scopes and gates: input/output guardrails that sanitize what flows in and out of tools.
Done with this lesson?