The ReAct pattern
When ReAct breaks
Failure modes and guardrails for ReAct agents.
ReAct breaks in predictable ways
ReAct works well when the task has a clear path: gather some information, reason about it, take an action, finish. It breaks when the path isn't clear, when tools are unreliable, or when the model loses track of what it's doing. The good news is that the failure modes are stereotyped. Once you've seen each one a few times, you can recognize and fix them quickly.
This lesson catalogs the six failure modes that show up most often in production ReAct agents, with the diagnostic and the fix for each.
Failure 1: thought-action mismatch
The model writes a sensible thought and then takes an action that has nothing to do with it.
Thought: I should look up the population of Tokyo to compare with Osaka.
Action: search[weather in Tokyo]What happened: the model latched onto a recent token (the city name) and forgot the actual subject of the thought. This shows up most often when prompts are too long or when the system prompt is buried under conversation history.
Fix: restate the goal at the top of the user-role message on every turn, or move long context into a separate retrieval call so the system prompt stays close to the model's attention.
Failure 2: skipping the thought
The model produces an action with no thought, often because it's pattern-matched to a previous turn where it had momentum:
Action: search[Tokyo population]This is a sign the model has decided it doesn't need to reason anymore. Sometimes that's fine. Often it's the start of a runaway loop.
Fix: make the thought required in the format and validate it in your parser. If the model produces an action without a thought, inject:
if not parsed["thought"]:
messages.append({
"role": "user",
"content": "You skipped the Thought step. Please write a Thought first, then an Action."
})
continueFailure 3: thought without action
The opposite problem. The model writes elaborate reasoning and never produces an action:
Thought: I need to think carefully about this. There are several factors:
the historical context, the cultural implications, the modern usage...
[3 paragraphs follow]This is overthinking. The model has decided to ruminate instead of act. It's especially common when the question is ambiguous and the tools don't obviously help.
Fix: cap the thought length and force an action choice:
if not parsed["tool"]:
messages.append({
"role": "user",
"content": "Your last response had no Action. Pick one of: search, lookup, finish."
})Failure 4: hallucinated tool calls
The model invents a tool that doesn't exist, or invents arguments to a real tool:
Action: lookup_database[user.id == 123]lookup_database isn't in our tool list. Either the model is confused about what's available, or it's pattern-matching to tools it saw during training.
Fix: in the error observation, list the available tools and remind the model of the format:
if parsed["tool"] not in TOOL_REGISTRY:
observation = (
f"ERROR: Tool '{parsed['tool']}' does not exist. "
f"Available tools: {list(TOOL_REGISTRY)}. "
f"Use the format: tool[argument]"
)We covered this in the loop error handling lesson. The same fix applies in the ReAct context.
Failure 5: stuck in a search loop
The model keeps searching for variations of the same query, never converging:
Step 4: search[authentication implementation]
Step 5: search[auth code in repo]
Step 6: search[login implementation]
Step 7: search[authentication implementation Python]The agent thinks each query is a new approach. It isn't. Each one returns nearly identical results.
Fix: detect repetition (we built this in the loop control lesson) and inject a nudge that breaks the pattern:
if is_stuck(messages):
messages.append({
"role": "user",
"content": (
"You've been searching for similar terms repeatedly. "
"Either pick a different tool, search for a more specific term, "
"or finish with the best answer you have."
)
})Failure 6: premature finish
The model calls finish before it actually has the answer:
Step 1: search[Apple Remote]
Observation: The Apple Remote is a remote control introduced in 2005...
Step 2: finish[The Apple Remote was introduced in 2005]But the question was about what other devices control Front Row. The model grabbed the first plausible-looking fact and bailed. This is one of the most damaging failures because the agent looks like it succeeded but the answer is wrong.
Fix: ask the model to verify before finishing:
PROMPT_ADDENDUM = """
Before calling finish[answer], write one final Thought that explicitly verifies:
1. Does this answer all parts of the original question?
2. Have you grounded the answer in observations, or are you guessing?
"""This adds tokens but cuts premature-finish errors significantly.
A failure-mode table
| Failure | Diagnostic | Fix |
|---|---|---|
| Thought-action mismatch | Action references a different topic than the thought | Restate the goal each turn |
| Skipped thought | Action with no preceding thought | Validate format, nudge |
| Skipped action | Long thought with no action | Cap thought length, force pick |
| Hallucinated tool | Tool name not in registry | Error with available tools |
| Search loop | Repeated similar queries | Detect repetition, nudge |
| Premature finish | finish[] called before all parts answered | Verification step before finish |
Build a small library of these checks and run them inside your loop. Most "the agent is bad" complaints in production trace back to one of these six.
When ReAct is the wrong shape
Some tasks don't fit ReAct at all. If you see any of these, consider a different pattern:
- Highly parallel tasks. ReAct is sequential. If you need to fetch 50 things and synthesize them, a planning step that fans out parallel sub-tasks is faster.
- Long horizon tasks. Tasks that need 30+ steps tend to lose coherence. Break them into a hierarchy: an outer agent that plans, inner agents that execute one phase each.
- Strict workflows. If the task always follows the same steps, a chain or pipeline is more reliable than an agent (we covered this in Module 1).
We cover supervisor/worker, planning agents, and other multi-agent topologies in Track 2. Most of those patterns exist precisely to address ReAct's limits.
Don't blame the model first
When a ReAct agent misbehaves, the temptation is to swap in a bigger model. Resist it. Most failures are tooling, prompting, or loop-control bugs. A 70B model with a confused tool list still produces confused output. Fix the scaffolding first; only then ask whether the model itself is the bottleneck.
Key takeaway
ReAct fails in six characteristic ways: bad thought-action alignment, missing thoughts, missing actions, hallucinated tools, search loops, and premature finishes. Each has a known fix that lives in your loop or your prompt, not in the model. Build the habit of identifying which class a failure belongs to before reaching for a bigger model.
The next module shifts from the loop and tools to the state the agent reasons over: how to manage memory, what to keep, what to summarize, and what to drop.
Done with this lesson?