Lesson 3 of 14Track 4

Reliability

Output validation and self-correction

Making agents check their own work.

Interactive exercise ~10 min

Trust but verify

The retries-and-fallbacks lesson handled tools that fail visibly. This lesson handles a sneakier problem: tools that succeed but produce wrong output, and models that produce outputs that pass surface checks but fail substance ones.

The pattern is output validation: every model or tool output passes through a verification layer before it's accepted as truth. Failed validation triggers either a retry (with feedback), a self-correction step, or escalation. This sits one layer above retries and one below evaluation (next module).

What "wrong but successful" looks like

Three flavors:

Format wrong

The model was asked to return JSON; it returned prose with backticks. The schema doesn't match. The downstream code can't parse it.

Content wrong but plausible

The model returned valid JSON with all the right fields, but the values are made up. The customer ID exists in the response but doesn't exist in the database.

Constraint violated

The output passes schema validation but violates a domain constraint. The discount is 110%. The deadline is in the past. The agent recommends scheduling a meeting in a different time zone than the user lives in.

These all "succeed" from a tool-call standpoint. They fail at validation, and your validator catches them.

Schema validation: the floor

Every structured output should be validated against a schema. JSON Schema, Pydantic, dataclass, whatever your stack uses.

from pydantic import BaseModel, ValidationError
 
 
class Forecast(BaseModel):
    city: str
    temp_f: int
    condition: str
 
 
def validate_forecast(raw):
    try:
        return Forecast.model_validate(raw)
    except ValidationError as e:
        return None, str(e)

If the model returned malformed output, the validator catches it before the rest of your code runs. This is the absolute minimum.

Self-correction with feedback

Schema-failed output should not be silently rejected. The model should be told what was wrong and given a chance to fix it.

async def get_forecast_with_correction(question, max_tries=3):
    history = []
    for attempt in range(max_tries):
        raw = await model(question, history=history)
        forecast = validate_forecast(raw)
        if forecast is not None:
            return forecast
        history.extend([
            {"role": "assistant", "content": str(raw)},
            {"role": "user",      "content": f"Your last response failed validation: {raw[1]}. Return valid JSON matching the Forecast schema."},
        ])
    raise RuntimeError("could not produce valid output after retries")

The model sees the failure and tries again. This is much more reliable than rejecting outright. Cap the rounds; if the model can't produce valid output after 2 or 3 tries, escalate.

Semantic validation: the next layer

Schema validation says "the shape is right." Semantic validation says "the contents make sense." Examples:

def semantic_validate(forecast):
    issues = []
    if forecast.temp_f < -100 or forecast.temp_f > 150:
        issues.append("temp_f outside plausible range")
    if forecast.condition not in {"sunny", "cloudy", "rain", "snow", "fog"}:
        issues.append(f"unknown condition {forecast.condition!r}")
    return issues

Same self-correction loop, different validator. The model rephrases when its values aren't sensible.

Cross-checking with sources

Some validations require checking against ground truth. If the agent claims to have created a PR, you can call back to GitHub and verify. If the agent claims a fact, you can check it against a tool result.

async def cross_check_pr_creation(claim):
    if claim.get("created_pr_number"):
        actual = await github.get_pr(claim["created_pr_number"])
        if not actual:
            return False, "claimed PR does not exist"
    return True, None

This is expensive (extra calls) so reserve it for high-stakes claims. Worth it for any state-changing operation the agent claims to have performed.

Verifier models

For free-text outputs that don't have a schema, a verifier model can check whether the output meets criteria. The pattern: the producer model writes; the verifier model judges; if rejected, the producer revises.

async def with_verifier(question, criteria):
    for _ in range(3):
        answer = await producer(question)
        verdict = await verifier(answer, criteria)
        if verdict.approved:
            return answer
        question_with_feedback = f"{question}\n\nPrior attempt failed because: {verdict.feedback}. Try again."
        question = question_with_feedback
    return answer  # commit best effort after retries

This is the same critique-and-revise pattern from Track 2 Module 6 lesson 1, used as a reliability mechanism instead of just a quality lift.

When validation fails repeatedly

If the model can't produce valid output after several attempts, you're in failure-handling territory:

  • Escalate to a human. The output is going to a user-facing surface; better to ask than to ship something wrong.
  • Escalate to a different model. Maybe a stronger model can do it.
  • Escalate to the user. Ask them to clarify; the original request might be impossible to satisfy as stated.
  • Commit best effort with a flag. The output is marked as low-confidence; downstream knows to handle gracefully.

Pick the policy in advance. A loop that just keeps retrying until budget exhausts is the worst outcome.

What about model-generated tool calls?

Tool calls are a kind of structured output. Most LLM APIs validate the call against the tool's schema before invoking, but you should still validate at your executor:

  • Schema (matches the tool's declared input shape).
  • Scope (matches the agent's allow list and permission scopes from Track 2 Module 5).
  • Semantic (the args make sense for the current state).

If any layer fails, return a structured error to the model so it can correct.

A guardrail-shaped pattern

Putting it all together, the validation pipeline for any model output looks like:

raw model output
   |
   v
[schema validate] ─-- fail ──▶ self-correct loop
   |                                |
   pass                              ▼
   v                          [escalate after N rounds]
[semantic validate] ─-- fail ──▶ self-correct loop
   |
   pass
   v
[cross-check sources] (when applicable)
   |
   pass
   v
[output guardrails] (Track 2 Module 5 lesson 4)
   |
   pass
   v
commit / return to user

Each layer catches a specific class of error. Most outputs sail through; the failures get redirected to self-correction or escalation. The user sees only outputs that passed every layer.

Validation as eval data

Every validation failure is a free training signal. Log them with enough context to reconstruct what the model produced and why it was rejected. Over time you build a dataset of model failures that drives both prompt iteration and the eval set we'll build in Module 2.

log.info("validation_failure", extra={
    "stage": "semantic",
    "raw_output_hash": hash_obj(raw),
    "issues": issues,
    "task_type": task_type,
})

The next module turns these logs into evals. The point here is that the validator is feeding the eval system whether you've thought about it that way or not.

Validation is cheap insurance

A common pushback: "validation slows things down and costs more." That's true; it also catches the bugs that hurt most. The cost of a wrong answer reaching a user is usually orders of magnitude higher than the cost of one extra LLM call to validate. Frame validation as insurance with low premiums and high payouts; the math works out.

Key takeaway

Output validation catches the failures that successful tool calls hide: format errors, semantic violations, hallucinated content. Schema validation is the floor; semantic checks and source cross-checks are the next layers. Failed validation triggers self-correction with feedback to the model; persistent failure escalates. Validation logs become eval data. The next lesson handles the orthogonal problem: changing what the agent is allowed to do at runtime, dynamically.

>_output-validation.py
Loading editor...
Output will appear here.

Done with this lesson?