Self-reflection patterns

Agents that evaluate their own outputs.

Video lesson ~10 min

Video coming soon

Agents that grade their own homework

Modules 1 through 5 of Track 2 have been about what the agent does: which topology, which state, which safety controls. Metacognition is about how the agent thinks about its own work. Self-reflection is the simplest form: after producing an output, the agent examines it before committing.

This sounds expensive. It is, sometimes. It is also one of the cheapest ways to lift quality on hard tasks. Small reflection passes catch a meaningful fraction of mistakes without requiring a smarter model.

The basic shape

def reflect_then_answer(question, draft):
    critique = model.run(
        system="Critique the draft. List concrete problems. If none, say 'OK'.",
        user=f"Question: {question}\n\nDraft: {draft}",
    )
 
    if critique.strip() == "OK":
        return draft
 
    revised = model.run(
        system="Revise the draft based on the critique.",
        user=f"Question: {question}\n\nDraft: {draft}\n\nCritique: {critique}",
    )
 
    return revised

Three calls instead of one: draft, critique, revise. Revision only happens when the critique finds something. On easy problems, the critic says "OK" and you've paid for one extra call to skip the revision.

Reflection is not a separate agent (though you can structure it that way). It is the same model with a different prompt and a fresh context, looking at the prior output as data instead of as a continuation.

Why it works

Two things change between the draft pass and the reflection pass:

Context is shorter

The draft pass has to keep a lot in mind: the question, the relevant tool outputs, prior history. The reflection pass has only the draft and a focused critique prompt. With less to attend to, the model can spot specific problems it would have missed during generation.

Role is different

A draft prompt asks "produce an answer." A critique prompt asks "find what's wrong." Different tasks activate different reasoning patterns. Models generate text more enthusiastically than they self-criticize from inside the same thread.

These are real effects, not just intuitions. They show up in evals: a small reflection step on top of a moderate model often beats a single pass from a bigger model on tasks where verifiability matters.

When reflection helps a lot

Tasks where reflection is worth the extra calls:

Multi-step reasoning where mistakes compound. A single bad inference midway through poisons the rest. Reflection catches it before it propagates.
Code generation. Re-reading code with a "find the bug" prompt finds bugs the generator missed.
Analysis tasks. "Is the conclusion supported by the evidence?" is a question the same model often answers well even if its first pass overclaimed.
Plans the agent will execute. Critiquing a plan before executing is much cheaper than discovering it was flawed after running it.

When it doesn't help

Reflection is overkill or actively bad on:

Trivial tasks. "What's the capital of France?" reflection is just cost.
Creative or open-ended tasks where there's no right answer. Reflection drives toward conservative outputs and squeezes out what made the original good.
Speed-critical responses. Two-or-three-pass reflection adds latency you may not be able to afford.

The decision of whether to reflect should itself be cheap. A simple heuristic ("reflect if the draft used tools, didn't if it didn't") works for many systems.

The critique prompt is the whole game

A bad critique prompt produces vague critiques that don't help. A good one is specific:

You are reviewing a draft answer to the user's question. Your job:
 
1. Check whether the draft directly answers the question. If not, note what's missing.
2. Check each factual claim. Are any unsupported by the tool outputs shown?
3. Check the reasoning. Are there logical gaps?
4. Check tone and length. Is the draft appropriate?
 
If you find no problems, respond with exactly: OK
Otherwise, respond with a short numbered list of concrete issues.

The critic is being given a job that fits an instruction-tuned model: check against criteria, output structured findings. Avoid asking the critic to "evaluate the quality" in vague terms; that produces vague answers.

Reflection vs revision

Reflection and revision are two different operations:

Reflection identifies problems in the draft.
Revision produces a fix.

Sometimes you can fold them: "find any problems, then fix them in one pass." This works for small fixes but tends to hide whether the model actually understood the issue. Splitting them gives you a record (what was wrong, what was fixed) that is useful for evaluation and for trust.

For agent loops, the same model handles both passes; for batch quality work, you can use a stronger model just for the critique phase, then a cheaper one for revision.

Reflection inside the orchestration loop

Tying it back to Module 4: reflection lives naturally in the REFLECTING state. After a worker returns, the supervisor's reflection prompt asks "is this answer good enough to commit?" If not, the supervisor either re-dispatches with a refined intent or replans entirely.

EXECUTING -> OBSERVING -> REFLECTING (critique) -> EXECUTING (retry with feedback)
                                                -> REFLECTING -> COMPLETE

This is the cleanest place to put it. It also means reflection is a system-level concept, not just a single-prompt trick: every step's output can be reflected on before being committed.

Avoiding reflection thrash

A failure mode: reflection finds problems, revision tries to fix them, the new draft has different problems, reflection finds those, revision tries again. The loop never converges. Two guards:

Cap the rounds

After N rounds (commonly 2 or 3), commit the best draft so far and move on. Models that can't fix something in three tries usually can't fix it in ten either.

Track which problems were named

If the same critique appears on the second draft, the model isn't actually fixing the issue, it's just hand-waving. Stop early; either escalate (different model, human) or commit and surface the limitation.

Reflection is most useful where verification is cheap

The economic case for reflection is strongest when verifying is cheaper than generating. Code that has tests, math that has solvers, claims that have ground truth in tools. In those cases the critic can check against an authority and produce concrete findings. Where there is no cheap verification, reflection becomes opinion-on-opinion, which is less reliable.

Connection to the rest

Self-reflection on its own is a mild improvement. Combined with the next two lessons (strategy adaptation, confidence estimation), it becomes a real metacognitive layer: the agent doesn't just produce answers, it adjusts its approach based on how its previous outputs went.

Key takeaway

Self-reflection is a critique-and-revise pass on the agent's own draft. It catches a meaningful fraction of mistakes in multi-step reasoning, code, and plans. The critique prompt is the whole game; vague critics produce vague critiques. Cap the rounds to avoid thrash. Reflection lives naturally in the orchestration loop's REFLECTING state and pairs with the strategy-adaptation pattern in the next lesson.

Done with this lesson?

Input/output guardrails

Agent safety and control

Strategy adaptation

Metacognition