Strategy adaptation

Changing approach mid-task based on results.

Video lesson Interactive exercise ~10 min

Video coming soon

Changing approach mid-task

Self-reflection finds problems with one output. Strategy adaptation goes further: the agent notices that its current approach is not working and switches to a different one. The first is per-message; the second is per-task.

This is the difference between "the draft has a bug, fix it" and "we've spent four turns reading code and still don't understand the bug; let's stop reading and try running the failing test instead." The second decision is metacognitive: it's a judgment about whether the strategy is paying off, not just whether a single output is correct.

Strategies, not actions

A strategy is a class of approaches: "investigate by reading source code," "investigate by running tests," "investigate by looking at logs," "ask the user a clarifying question." Each strategy implies different tools, different prompts, different success signals.

Adaptation works when the agent has multiple strategies it can switch between, not when it's stuck inside one. So this lesson presupposes a system that has named strategies, not just a registry of tools.

STRATEGIES = {
    "read_source":   {"tools": ["read_file", "search_code"], "intent_template": "..."},
    "run_tests":     {"tools": ["run_tests"], "intent_template": "..."},
    "check_logs":    {"tools": ["query_logs"], "intent_template": "..."},
    "ask_user":      {"tools": ["ask_clarification"], "intent_template": "..."},
}

A worker run is "execute strategy X for N turns." If it doesn't make progress, the supervisor picks a different strategy.

The signal

The supervisor needs a signal that the current strategy is stalling. Three useful ones:

No new information

If three consecutive worker turns produced no new tool outputs, the agent is going in circles. Switch.

No progress on the goal

If the supervisor's reflection over the last N steps shows it is "still trying to understand the bug," the same as it was N steps ago, the strategy isn't working. Switch.

Hitting a known wall

Some strategies have explicit failure modes. Reading source code that doesn't have the answer; running a test that always passes; querying a metric that doesn't exist. When the strategy reports "I can't, this isn't my domain," switch.

The signal needs to be cheap to compute. A strategy that demands its own evaluation step before adapting will spend more on metacognition than on the work.

The shape

def adaptive_outer_loop(task):
    history = []
    strategy = pick_initial_strategy(task)
    rounds_without_progress = 0
    MAX_ROUNDS_PER_STRATEGY = 3
 
    while not done(task, history):
        result = run_with_strategy(strategy, task, history)
        history.append(result)
 
        if made_progress(result, history):
            rounds_without_progress = 0
        else:
            rounds_without_progress += 1
 
        if rounds_without_progress >= MAX_ROUNDS_PER_STRATEGY:
            strategy = pick_next_strategy(task, history, exclude=[strategy])
            rounds_without_progress = 0
 
    return synthesize(history)

The outer loop watches its own progress and rotates strategies when stuck. Each strategy gets a budget; spending it without progress triggers a switch.

Picking the next strategy

This is where things get interesting. The naive approach (rotate through all strategies in order) wastes effort on ones the agent should know won't help. A better approach: ask the model to choose, given everything we've tried.

def pick_next_strategy(task, history, exclude):
    candidates = [s for s in STRATEGIES if s not in exclude]
    return model.run(
        system="""Given the task, the history, and the strategies we have not yet
tried, pick the most promising next strategy. Reply with one strategy name.""",
        user=f"Task: {task}\nHistory summary: {summarize(history)}\nCandidates: {candidates}",
    ).strip()

The model is now selecting strategies, not just executing them. This is a small but real shift: the agent's reasoning includes "what kind of work should I be doing right now."

Strategy adaptation vs replanning

These overlap but are not the same. Replanning (Module 4 / Module 3) is "given findings, what are the next steps?" Strategy adaptation is "given that this kind of work is not paying off, what kind of work should I switch to?"

Replanning happens within a strategy: "I learned X, so the next step in the read-source strategy should be reading file Y instead of file Z." Adaptation switches the strategy itself.

The distinction matters because they come from different signals. Replan when you learn something. Adapt when you stop learning.

Limitations

The set of strategies is fixed at design time

Most adaptive systems pick from a known set. Inventing new strategies on the fly is a much harder problem (and often a sign that you should write more strategies in code, not have the model invent them).

Adaptation can hide model limitations

If your agent flips between three strategies and never solves the task, the right answer might be "the task is outside the model's capability," not "we need a fourth strategy." Adaptation isn't infinite leverage.

It costs LLM calls

Every "should I switch?" decision is an LLM call. For tasks that succeed on the first strategy, this is overhead. Use cheap heuristics first (rounds without progress) and only ask the model when those heuristics flag a problem.

When adaptation pays off

Tasks where adaptation is worth the complexity:

Investigative work with multiple possible approaches (debugging, root-cause analysis, research).
Open-ended problems where the right path isn't obvious from the start.
Long-running tasks where the cost of going down a wrong path is high.

Tasks where it doesn't:

Single-strategy tasks where there's only one reasonable approach.
Short tasks where the overhead exceeds the savings.
Deterministic workflows where the steps are known in advance (use a pipeline instead).

Strategy as a first-class concept

Some teams keep strategies implicit, hidden inside a big system prompt. Others promote them to first-class objects with names, tool sets, and intent templates. The first works when you have one or two strategies; the second pays off as soon as you have three or more. Naming is half the battle: once "read_source" is a strategy with a name, it becomes possible to reason about not using it.

Connection to the next lesson

Strategy adaptation requires a notion of "this isn't going well." The next lesson formalizes that further: confidence estimation, where the agent maintains an explicit belief about how likely its current approach is to work, and uses that to decide whether to ask for help, escalate, or stop.

Key takeaway

Strategy adaptation lets the agent switch its kind of work when its current approach stalls. It requires named strategies, a cheap stall signal, and either a heuristic or a model-driven choice of what to try next. Don't conflate it with replanning: replanning fixes the next step within a strategy; adaptation changes the strategy. Worth the complexity for investigative and open-ended tasks; overhead on simple ones.

>_strategy-adaptation.py

Loading editor...

Output will appear here.

Done with this lesson?

Self-reflection patterns

Metacognition

Confidence estimation and knowing when to ask for help

Metacognition