Track 4

Production agents

Reliability, evaluation, observability, deployment, and scaling.

4 modules14 lessons 10 slideshows 9 exercises

Start track

Module 1

Reliability

Failure modes, retries, fallbacks, validation, and runtime policy.

4 lessons

Why agents fail: taxonomy of failure modes

Understanding the ways agents break.

Slides

Retry strategies and fallbacks

Graceful degradation for agent systems.

Slides Exercise

Output validation and self-correction

Making agents check their own work.

Exercise

Runtime policy enforcement

Dynamically enabling/disabling tools by context.

Slides Exercise

Module 2

Evaluation

Testing non-deterministic systems with evals, rubrics, and judge models.

3 lessons

Testing non-deterministic systems

The unique challenge of agent evaluation.

Slides

Building eval datasets

Golden answers, rubrics, and judge models.

Slides Exercise

Automated eval pipelines

LLM-as-judge and programmatic checks.

Exercise

Module 3

Observability

Structured logging, tracing, dashboards, and alerting.

3 lessons

Structured logging for agent systems

What to log and how to structure it.

Slides

Tracing multi-agent requests

Following a request through multiple agents.

Slides Exercise

Dashboards and alerting

What to monitor, what to page on.

Exercise

Module 4

Deployment and scaling

Containerization, queues, cost management, and human-in-the-loop.

4 lessons

Containerizing agent systems

Docker for agent deployments.

Slides

Queue-based architectures

Async agent workloads with job queues.

Slides Exercise

Cost management and rate limiting

Keeping production agent costs under control.

Human-in-the-loop patterns

When and how to involve humans in agent workflows.

Slides Exercise

Building with MCP