Implementing ReAct from scratch

Building the thought/action/observation loop.

Video lesson Interactive exercise ~10 min

Video coming soon

Building ReAct from scratch

In the last lesson we walked through the ReAct paper. In this lesson we build a working ReAct agent without using any framework or even native function calling. We're going to ask a plain text model to emit "Thought:" and "Action:" and parse the output ourselves.

This is the most pedagogically useful agent you'll ever write. Once you've built it, the abstractions in LangChain, the OpenAI Agents SDK, and others stop feeling magical. They're just polished wrappers around what we're about to do.

The pieces

We need four things:

A prompt that teaches the model the ReAct format
A small set of tools the model can call
A parser that extracts the thought, action, and arguments from the model's text
The loop that ties it all together

We're using Ollama and llama3 because it runs locally and works fine for this exercise.

The prompt

SYSTEM_PROMPT = """You are a research agent. Solve the user's question by reasoning step by step and using tools.
 
Available tools:
- search[query]: search Wikipedia and return a snippet
- lookup[term]: look up a specific term in the most recent search result
- finish[answer]: provide your final answer and end the task
 
Always respond using EXACTLY this format:
 
Thought: <your reasoning about what to do next>
Action: <one of the tools above, in the format tool[argument]>
 
After Action: <tool[argument]>, stop. The system will run the tool and give you an Observation. Then continue with the next Thought and Action.
"""

Notice the format constraints. We tell the model what tools are available, the exact action syntax, and that it must stop after writing the action. Without those constraints, the model will helpfully predict the observation itself, which we don't want.

The tools

Two real tools and one terminator:

import wikipedia
 
def tool_search(query: str) -> str:
    try:
        results = wikipedia.search(query, results=3)
        if not results:
            return "No results found."
        page = wikipedia.page(results[0], auto_suggest=False)
        return page.summary[:1000]
    except Exception as e:
        return f"ERROR: {e}"
 
def tool_lookup(term: str, last_search: str) -> str:
    if not last_search:
        return "ERROR: No previous search to look up in."
    idx = last_search.lower().find(term.lower())
    if idx == -1:
        return f"'{term}' not found in last search result."
    start = max(0, idx - 100)
    end = min(len(last_search), idx + 200)
    return last_search[start:end]

The finish "tool" isn't really a tool. It's a sentinel that tells the loop to stop.

The parser

The model is going to emit text like:

Thought: I should search for the Apple Remote.
Action: search[Apple Remote]

We need a parser that extracts the thought, the action name, and the action argument. A regex is enough:

import re
 
ACTION_RE = re.compile(r"Action:\s*(\w+)\[(.+?)\]", re.DOTALL)
THOUGHT_RE = re.compile(r"Thought:\s*(.+?)(?=\nAction:|\Z)", re.DOTALL)
 
def parse_step(text):
    thought_match = THOUGHT_RE.search(text)
    action_match = ACTION_RE.search(text)
 
    thought = thought_match.group(1).strip() if thought_match else None
    if not action_match:
        return {"thought": thought, "tool": None, "arg": None}
 
    return {
        "thought": thought,
        "tool": action_match.group(1),
        "arg": action_match.group(2).strip(),
    }

That's the entire parser. Twenty lines.

The loop

import ollama
 
def run_react(question, max_steps=8):
    messages = [
        {"role": "system", "content": SYSTEM_PROMPT},
        {"role": "user", "content": f"Question: {question}"},
    ]
    last_search = ""
 
    for step in range(max_steps):
        # Reasoning: get the next Thought + Action
        response = ollama.chat(
            model="llama3",
            messages=messages,
            options={"stop": ["Observation:"]},  # don't let the model hallucinate observations
        )
        text = response.message.content
        messages.append({"role": "assistant", "content": text})
 
        parsed = parse_step(text)
        print(f"\n--- Step {step + 1} ---")
        print(f"Thought: {parsed['thought']}")
        print(f"Action:  {parsed['tool']}[{parsed['arg']}]")
 
        # Action selection
        if parsed["tool"] == "finish":
            return parsed["arg"]
        if parsed["tool"] is None:
            return f"[parse failed at step {step + 1}]"
 
        # Execution
        if parsed["tool"] == "search":
            observation = tool_search(parsed["arg"])
            last_search = observation
        elif parsed["tool"] == "lookup":
            observation = tool_lookup(parsed["arg"], last_search)
        else:
            observation = f"ERROR: Unknown tool '{parsed['tool']}'"
 
        # Observation
        print(f"Observation: {observation[:200]}...")
        messages.append({"role": "user", "content": f"Observation: {observation}"})
 
    return "[max steps reached]"

That's a complete ReAct agent. It's about 80 lines including the prompt, tools, parser, and loop. Run it on any factual question and you'll see thought-action-observation traces in your terminal.

A few details that matter

The stop sequence

options={"stop": ["Observation:"]} tells the model to stop generating as soon as it produces the word "Observation:". Without this, the model will write its own fake observation, which produces nonsense. Stop sequences are how you keep the model on the rails when it's emitting structured output.

Observations as user messages

We append the observation as a user role message. The reason: the model treats user messages as authoritative input. If you append observations as assistant messages, the model gets confused about who said what. If you append them with no role, they look like part of the assistant's own reasoning.

Truncating outputs

page.summary[:1000] and the lookup window prevent any single observation from blowing out the context window. We'll cover smarter context management in Module 5, but truncation is the simplest defense.

The "lookup" tool

lookup is the trick that makes multi-hop questions work in this minimal setup. After a search returns a paragraph, the model can ask for a specific term inside that paragraph and get a focused snippet. It's a primitive form of retrieval. Without it the model often re-searches when it should be reading.

Trying it

print(run_react("Aside from Apple Remote, what other devices can control Front Row?"))

A typical run:

--- Step 1 ---
Thought: I should search for Apple Remote first.
Action:  search[Apple Remote]
Observation: The Apple Remote is a remote control...
 
--- Step 2 ---
Thought: It mentions Front Row. I should search for Front Row to see what else controls it.
Action:  search[Front Row Apple]
Observation: Front Row is a discontinued media center...
 
--- Step 3 ---
Thought: The summary mentions keyboard function keys. I have the answer.
Action:  finish[Keyboard function keys can also control Front Row]

Three steps. Two tool calls. One final answer. That's a working ReAct agent.

What native function calling buys you

The agent above does the same thing as the function-calling version from Module 2, but with text parsing instead of structured tool calls. Native function calling APIs eliminate the parser, the stop sequence, and most of the prompt rules. They're worth using whenever your model supports them. But knowing how to build the manual version means you can debug the native version when it misbehaves, and you can fall back to it when you need to use a model that doesn't support tools.

Key takeaway

A ReAct agent is a prompt that teaches the model a thought/action format, a parser that extracts those parts from the response, a small set of tools, and a loop that ties them together. The whole thing fits in 100 lines. Every fancier agent pattern you learn after this is a refinement: better thoughts, better actions, better observations, or multiple ReAct loops running in concert. The next lesson is about what happens when this loop misbehaves.

>_react-implementation.py

Loading editor...

Output will appear here.

Done with this lesson?

Reasoning + Acting: the paper and the intuition

The ReAct pattern

When ReAct breaks

The ReAct pattern