Modeling the orchestration loop as a state machine

PLANNING, EXECUTING, OBSERVING, REFLECTING, COMPLETE, FAILED.

Video lesson Interactive exercise ~10 min

Video coming soon

The orchestration loop as a state machine

So far we have described the orchestration loop in plain prose: plan, dispatch, observe, maybe replan, eventually finish. Once you start writing real code, that prose translates into a state machine: a small set of named states, transitions between them, and a current-state cursor that drives behavior.

The reason to formalize this: once your loop has a state machine, every interesting feature (resumability, observability, error recovery, human-in-the-loop) becomes a transition you can hook into. Without it, those features are scattered across the loop body.

The states

A reasonable default for a supervisor/worker loop:

PLANNING       agent is producing or revising the plan
EXECUTING      a worker is running an inner loop
OBSERVING      a worker has returned; the supervisor is reading the result
REFLECTING     the supervisor is deciding whether to replan, continue, or stop
WAITING        the loop is paused (e.g. waiting for human approval, async result)
COMPLETE       synthesis is done; final answer is ready
FAILED         an unrecoverable error occurred

Each state has clear allowed next-states. PLANNING goes to EXECUTING. EXECUTING goes to OBSERVING. OBSERVING goes to REFLECTING. REFLECTING goes back to PLANNING (replan), or to EXECUTING (continue), or to COMPLETE, or to FAILED.

from enum import Enum, auto
 
 
class State(Enum):
    PLANNING = auto()
    EXECUTING = auto()
    OBSERVING = auto()
    REFLECTING = auto()
    WAITING = auto()
    COMPLETE = auto()
    FAILED = auto()
 
 
TRANSITIONS = {
    State.PLANNING:    {State.EXECUTING, State.FAILED},
    State.EXECUTING:   {State.OBSERVING, State.WAITING, State.FAILED},
    State.OBSERVING:   {State.REFLECTING, State.FAILED},
    State.REFLECTING:  {State.PLANNING, State.EXECUTING, State.COMPLETE, State.FAILED},
    State.WAITING:     {State.EXECUTING, State.FAILED},
    State.COMPLETE:    set(),
    State.FAILED:      set(),
}

The transition table is your contract. Any code that wants to change the loop's state has to go through it. Illegal transitions throw.

The driving loop

def drive(loop):
    while loop.state not in (State.COMPLETE, State.FAILED):
        if loop.state == State.PLANNING:
            loop.plan = make_plan(loop)
            loop.transition(State.EXECUTING)
        elif loop.state == State.EXECUTING:
            loop.current = loop.plan.next_step()
            loop.transition(State.OBSERVING)  # after dispatch returns
            loop.current_result = run_worker(loop.current)
        elif loop.state == State.OBSERVING:
            loop.results.append(loop.current_result)
            loop.transition(State.REFLECTING)
        elif loop.state == State.REFLECTING:
            decision = reflect(loop)
            if decision == "continue":
                loop.transition(State.EXECUTING)
            elif decision == "replan":
                loop.transition(State.PLANNING)
            elif decision == "done":
                loop.transition(State.COMPLETE)
                loop.final = synthesize(loop)
            else:
                loop.transition(State.FAILED)

This is the same logic you would write without the state machine, just with the state cursor made explicit. The benefit is everything you can now do around it.

Why this formalism pays off

Observability becomes free

Log every transition. Now you have a perfect audit trail of what your loop was doing at every moment, with no extra instrumentation. Track 4 Module 3 covers this in depth, but the foundation is here: structured states make logs structured.

Resumability becomes possible

Persist (state, plan, results, current) after every transition. If the process crashes, reload that tuple and resume. Without a state machine, you do not know where to restart. With one, the next-action is purely a function of the state. The next lesson covers checkpointing in detail.

Human-in-the-loop is a clean transition

A WAITING state is just a state where the loop pauses until external input arrives. The transition EXECUTING -> WAITING -> EXECUTING lets you wedge human approval into any tool-call without surgery on the loop body. Module 5 lesson 3 builds on this.

Reflection is a first-class concept

The REFLECTING state is what makes the loop adaptive. Without it, supervisor/worker degrades into a fancy pipeline. Naming it as a separate state forces you to write a real reflection prompt: given everything we have learned, what is the right next move?

A small but critical detail: terminal states have no exits

COMPLETE and FAILED are terminal. Once you enter them, the loop is done. There is no "FAILED to RETRYING" transition because retrying is a different kind of decision: it should be made before you fail, in REFLECTING. If you find yourself wanting transitions out of FAILED, it is usually a sign that REFLECTING did not consider all the recovery options.

States that are missing

Real-world systems often add states the basic seven do not cover:

PARALLEL when multiple workers are running concurrently and the supervisor is awaiting all of them.
CONFIRMING when a destructive tool requires explicit user sign-off.
CANCELLED when a user has interrupted; distinct from FAILED because the work was not broken, just stopped.

Add states only when you find yourself awkwardly representing those flows in the existing ones. Premature states are clutter.

Designing the reflection prompt

The REFLECTING state is the hardest one to do well because it is itself an LLM call that has to decide between continue, replan, done, or fail. A reasonable prompt template:

You are the orchestrator. Here is the original request, the plan, the steps
completed so far, and the most recent worker's findings.
 
Decide one of:
- continue: the plan is still valid; move to the next step
- replan: the plan needs revision based on what we just learned
- done: we have enough to answer the user's request; synthesize
- fail: something is unrecoverable
 
Respond with exactly one word.

Constrain the output to one word; fall back to "continue" on parsing failure. The reflection call is small and frequent; keep it cheap.

Treat the state cursor as the source of truth

A common mistake is to track state implicitly: "we're done if the last result was a synthesis." This works until it doesn't. Make the state explicit. Persist it. Validate transitions. The cost is a few lines of code; the benefit is being able to reason about the system's behavior without reading the whole loop body.

Connection to the rest of the curriculum

The state machine is the backbone for everything that follows:

Module 4 lesson 3 (checkpointing) persists the state cursor.
Module 5 lesson 3 (human approval) adds a WAITING transition for sensitive actions.
Module 6 (metacognition) is essentially "smart REFLECTING" with self-critique loops.
Track 4 Module 1 (reliability) retries are FAILED-prevention strategies inside REFLECTING.
Track 4 Module 3 (tracing) uses transitions as natural span boundaries.

Once you have a real state machine, every later capability snaps onto it cleanly.

Key takeaway

Modeling the orchestration loop as a state machine (PLANNING, EXECUTING, OBSERVING, REFLECTING, WAITING, COMPLETE, FAILED) makes observability, resumability, and human-in-the-loop trivial to add. It also forces you to take REFLECTING seriously as the place where adaptive behavior lives. The next lesson uses this state machine to build checkpointing: making your agent loop survive crashes by saving and restoring the state cursor.

>_state-machine-loop.py

Loading editor...

Output will appear here.

Done with this lesson?

Shared state vs isolated state

State management

Checkpointing and resumability

State management