Lesson 13 of 20Track 1

The ReAct pattern

When ReAct breaks

Failure modes and guardrails for ReAct agents.

Interactive exercise ~10 min

ReAct breaks in predictable ways

ReAct works well when the task has a clear path: gather some information, reason about it, take an action, finish. It breaks when the path isn't clear, when tools are unreliable, or when the model loses track of what it's doing. The good news is that the failure modes are stereotyped. Once you've seen each one a few times, you can recognize and fix them quickly.

This lesson catalogs the six failure modes that show up most often in production ReAct agents, with the diagnostic and the fix for each.

Failure 1: thought-action mismatch

The model writes a sensible thought and then takes an action that has nothing to do with it.

Thought: I should look up the population of Tokyo to compare with Osaka.
Action: search[weather in Tokyo]

What happened: the model latched onto a recent token (the city name) and forgot the actual subject of the thought. This shows up most often when prompts are too long or when the system prompt is buried under conversation history.

Fix: restate the goal at the top of the user-role message on every turn, or move long context into a separate retrieval call so the system prompt stays close to the model's attention.

Failure 2: skipping the thought

The model produces an action with no thought, often because it's pattern-matched to a previous turn where it had momentum:

Action: search[Tokyo population]

This is a sign the model has decided it doesn't need to reason anymore. Sometimes that's fine. Often it's the start of a runaway loop.

Fix: make the thought required in the format and validate it in your parser. If the model produces an action without a thought, inject:

if not parsed["thought"]:
    messages.append({
        "role": "user",
        "content": "You skipped the Thought step. Please write a Thought first, then an Action."
    })
    continue

Failure 3: thought without action

The opposite problem. The model writes elaborate reasoning and never produces an action:

Thought: I need to think carefully about this. There are several factors:
the historical context, the cultural implications, the modern usage...
[3 paragraphs follow]

This is overthinking. The model has decided to ruminate instead of act. It's especially common when the question is ambiguous and the tools don't obviously help.

Fix: cap the thought length and force an action choice:

if not parsed["tool"]:
    messages.append({
        "role": "user",
        "content": "Your last response had no Action. Pick one of: search, lookup, finish."
    })

Failure 4: hallucinated tool calls

The model invents a tool that doesn't exist, or invents arguments to a real tool:

Action: lookup_database[user.id == 123]

lookup_database isn't in our tool list. Either the model is confused about what's available, or it's pattern-matching to tools it saw during training.

Fix: in the error observation, list the available tools and remind the model of the format:

if parsed["tool"] not in TOOL_REGISTRY:
    observation = (
        f"ERROR: Tool '{parsed['tool']}' does not exist. "
        f"Available tools: {list(TOOL_REGISTRY)}. "
        f"Use the format: tool[argument]"
    )

We covered this in the loop error handling lesson. The same fix applies in the ReAct context.

Failure 5: stuck in a search loop

The model keeps searching for variations of the same query, never converging:

Step 4: search[authentication implementation]
Step 5: search[auth code in repo]
Step 6: search[login implementation]
Step 7: search[authentication implementation Python]

The agent thinks each query is a new approach. It isn't. Each one returns nearly identical results.

Fix: detect repetition (we built this in the loop control lesson) and inject a nudge that breaks the pattern:

if is_stuck(messages):
    messages.append({
        "role": "user",
        "content": (
            "You've been searching for similar terms repeatedly. "
            "Either pick a different tool, search for a more specific term, "
            "or finish with the best answer you have."
        )
    })

Failure 6: premature finish

The model calls finish before it actually has the answer:

Step 1: search[Apple Remote]
Observation: The Apple Remote is a remote control introduced in 2005...
Step 2: finish[The Apple Remote was introduced in 2005]

But the question was about what other devices control Front Row. The model grabbed the first plausible-looking fact and bailed. This is one of the most damaging failures because the agent looks like it succeeded but the answer is wrong.

Fix: ask the model to verify before finishing:

PROMPT_ADDENDUM = """
Before calling finish[answer], write one final Thought that explicitly verifies:
1. Does this answer all parts of the original question?
2. Have you grounded the answer in observations, or are you guessing?
"""

This adds tokens but cuts premature-finish errors significantly.

A failure-mode table

Failure	Diagnostic	Fix
Thought-action mismatch	Action references a different topic than the thought	Restate the goal each turn
Skipped thought	Action with no preceding thought	Validate format, nudge
Skipped action	Long thought with no action	Cap thought length, force pick
Hallucinated tool	Tool name not in registry	Error with available tools
Search loop	Repeated similar queries	Detect repetition, nudge
Premature finish	finish[] called before all parts answered	Verification step before finish

Build a small library of these checks and run them inside your loop. Most "the agent is bad" complaints in production trace back to one of these six.

When ReAct is the wrong shape

Some tasks don't fit ReAct at all. If you see any of these, consider a different pattern:

Highly parallel tasks. ReAct is sequential. If you need to fetch 50 things and synthesize them, a planning step that fans out parallel sub-tasks is faster.
Long horizon tasks. Tasks that need 30+ steps tend to lose coherence. Break them into a hierarchy: an outer agent that plans, inner agents that execute one phase each.
Strict workflows. If the task always follows the same steps, a chain or pipeline is more reliable than an agent (we covered this in Module 1).

We cover supervisor/worker, planning agents, and other multi-agent topologies in Track 2. Most of those patterns exist precisely to address ReAct's limits.

Don't blame the model first

When a ReAct agent misbehaves, the temptation is to swap in a bigger model. Resist it. Most failures are tooling, prompting, or loop-control bugs. A 70B model with a confused tool list still produces confused output. Fix the scaffolding first; only then ask whether the model itself is the bottleneck.

Key takeaway

ReAct fails in six characteristic ways: bad thought-action alignment, missing thoughts, missing actions, hallucinated tools, search loops, and premature finishes. Each has a known fix that lives in your loop or your prompt, not in the model. Build the habit of identifying which class a failure belongs to before reaching for a bigger model.

The next module shifts from the loop and tools to the state the agent reasons over: how to manage memory, what to keep, what to summarize, and what to drop.

>_react-failure-modes.py

Loading editor...

Output will appear here.

Done with this lesson?

Implementing ReAct from scratch

The ReAct pattern

Conversation memory

Memory and context engineering