Docs · Recipes · Agent Loop

Agent Loop

A multi-turn agent that calls tools until it's done. The loop body is small. The hard parts are the four guard rails: stop conditions, max iterations, token budget, and tool-failure recovery.

POST https://buzzai.cc/v1/messages

The shape of the loop

Stripped down, the agent loop is six lines:

while True:
    response = call_claude(messages)
    if response.stop_reason != "tool_use":
        return response
    tool_results = run_all_tool_uses(response)
    messages.append(assistant_turn(response))
    messages.append(user_turn(tool_results))

That much will work for happy-path demos. Production needs four extra guards.

Stop conditions

Claude tells you why it stopped via stop_reason. Treat each one explicitly:

`stop_reason`	Meaning	Loop action
`tool_use`	Wants to call one or more tools	Execute tools, append results, continue
`end_turn`	Done, returning a final answer	Exit loop, return text content
`max_tokens`	Hit `max_tokens` mid-response	Continue with the partial assistant message in history; append a user nudge or raise to the user
`stop_sequence`	Custom stop string matched	Exit; treat output as truncated at the marker
`pause_turn`	Server-side pause (rare; long thinking, server tools)	Send the same conversation back to resume
`refusal`	Model declined to continue	Surface to user; do not retry blindly

Max iterations safety valve

Some tasks legitimately take 20+ turns. Some get stuck in a loop calling the same tool with slightly different arguments. Pick a hard ceiling:

Simple tasks: 10 iterations.
Coding agents over a real repo: 30-50.
Research / multi-source synthesis: 50-100.

When you hit the ceiling, don't just abort. Inject a final user message that asks Claude to summarise progress and stop:

"You've hit the iteration limit. Stop calling tools. In your final message,
summarise what you accomplished, what's left, and any blockers."

Token budget

Read response.usage after every turn and accumulate. When you cross a threshold, force a wrap-up:

budget_input  = 1_000_000   # tokens
budget_output =   200_000

total_in = total_out = 0
for turn in range(MAX_TURNS):
    resp = client.messages.create(...)
    total_in  += resp.usage.input_tokens + resp.usage.cache_read_input_tokens
    total_out += resp.usage.output_tokens
    if total_in > budget_input or total_out > budget_output:
        # Inject "wrap up" instruction and break after one more call
        ...

Cache reads count against your budget at 10% of input cost, but they still count tokens. Track them separately if you bill end users by token; track only dollar-equivalent if you bill by cost.

Tool-failure recovery

Tools throw. Network calls time out. The model can request invalid arguments. Don't crash the loop — return the failure as a structured tool result and let Claude react.

{
  "type": "tool_result",
  "tool_use_id": "toolu_...",
  "content": "ERROR: file not found at path 'src/foo.py'. Try listing the directory first.",
  "is_error": true
}

Three failure tiers, with different handling:

Tier	Example	Recovery
Recoverable, model's fault	Bad path, wrong arg type	Return `is_error: true` with a hint. Claude usually retries with a fix.
Recoverable, transient	HTTP 503 from a downstream API, network blip	Retry inside the tool with backoff before returning. Limit to 2-3 attempts.
Unrecoverable	Permission denied, dependency missing, invalid credentials	Return error and abort the loop. Don't burn tokens watching Claude fail repeatedly.

Full working example

"""
Agent loop with iteration cap, token budget, and tool failure recovery.
Requires: pip install anthropic
"""
import os, time, random
from anthropic import Anthropic
from anthropic import APIStatusError

client = Anthropic(
    base_url="https://buzzai.cc",
    api_key=os.environ["BUZZ_API_KEY"],
)

# === Configurable guards ===
MAX_ITERATIONS = 30
BUDGET_INPUT_TOKENS  = 1_000_000
BUDGET_OUTPUT_TOKENS =   200_000

# === Tools ===
TOOLS = [
    {
        "name": "search",
        "description": "Search a knowledge base. Returns top matches.",
        "input_schema": {
            "type": "object",
            "properties": {"query": {"type": "string"}},
            "required": ["query"],
        },
    },
    {
        "name": "fetch_url",
        "description": "Fetch a URL and return its text content.",
        "input_schema": {
            "type": "object",
            "properties": {"url": {"type": "string", "format": "uri"}},
            "required": ["url"],
        },
    },
]


class UnrecoverableToolError(Exception):
    pass


def execute_tool(name, args):
    if name == "search":
        # ... do real search ...
        return f"Top 3 results for {args['query']!r}: ..."
    if name == "fetch_url":
        import urllib.request
        for attempt in range(3):
            try:
                with urllib.request.urlopen(args["url"], timeout=10) as r:
                    return r.read().decode("utf-8", errors="replace")[:8000]
            except urllib.error.HTTPError as e:
                if e.code in (403, 404):
                    raise UnrecoverableToolError(f"HTTP {e.code} for {args['url']}")
                time.sleep(2 ** attempt + random.random())
        raise RuntimeError(f"fetch_url failed after retries: {args['url']}")
    raise UnrecoverableToolError(f"unknown tool: {name}")


def call_with_retry(**kwargs):
    """Outer retry for transient API errors (429/500/529)."""
    for attempt in range(5):
        try:
            return client.messages.create(**kwargs)
        except APIStatusError as e:
            if e.status_code in (429, 500, 503, 529):
                wait = (2 ** attempt) + random.random()
                time.sleep(min(wait, 60))
                continue
            raise
    raise RuntimeError("API failed after retries")


def run_agent(user_request: str):
    messages = [{"role": "user", "content": user_request}]
    total_in = total_out = 0
    iteration = 0
    abort = False

    while iteration < MAX_ITERATIONS:
        iteration += 1

        # Inject wrap-up if we are out of budget
        if total_in > BUDGET_INPUT_TOKENS or total_out > BUDGET_OUTPUT_TOKENS:
            messages.append({
                "role": "user",
                "content": "Token budget exhausted. Stop calling tools. "
                           "Reply now with a summary of progress and what remains.",
            })

        resp = call_with_retry(
            model="claude-sonnet-4-6",
            max_tokens=4096,
            tools=TOOLS,
            messages=messages,
        )
        total_in  += resp.usage.input_tokens + (resp.usage.cache_read_input_tokens or 0)
        total_out += resp.usage.output_tokens

        messages.append({"role": "assistant", "content": resp.content})

        if resp.stop_reason == "end_turn":
            return _final_text(resp), {"iters": iteration, "in": total_in, "out": total_out}
        if resp.stop_reason == "refusal":
            return "[refused]", {"iters": iteration, "in": total_in, "out": total_out}
        if resp.stop_reason == "max_tokens":
            messages.append({"role": "user", "content": "Continue from where you stopped."})
            continue
        if resp.stop_reason != "tool_use":
            # pause_turn, stop_sequence, or unknown — re-loop with empty user nudge
            continue

        tool_results = []
        for block in resp.content:
            if block.type != "tool_use":
                continue
            try:
                output = execute_tool(block.name, block.input)
                tool_results.append({
                    "type": "tool_result",
                    "tool_use_id": block.id,
                    "content": output,
                })
            except UnrecoverableToolError as e:
                tool_results.append({
                    "type": "tool_result",
                    "tool_use_id": block.id,
                    "content": f"FATAL: {e}",
                    "is_error": True,
                })
                abort = True
            except Exception as e:
                tool_results.append({
                    "type": "tool_result",
                    "tool_use_id": block.id,
                    "content": f"ERROR (recoverable): {e}",
                    "is_error": True,
                })

        messages.append({"role": "user", "content": tool_results})
        if abort:
            return "[aborted: unrecoverable tool error]", {
                "iters": iteration, "in": total_in, "out": total_out,
            }

    # Hit MAX_ITERATIONS
    messages.append({
        "role": "user",
        "content": "You hit the iteration limit. Stop calling tools and "
                   "summarise what you accomplished, what's left, and any blockers.",
    })
    resp = call_with_retry(
        model="claude-sonnet-4-6",
        max_tokens=2048,
        messages=messages,  # tools omitted to force a text response
    )
    return _final_text(resp), {"iters": iteration + 1, "in": total_in, "out": total_out}


def _final_text(resp):
    return "\n".join(b.text for b in resp.content if b.type == "text")


if __name__ == "__main__":
    text, stats = run_agent("Find the three most cited 2025 papers on prompt caching and summarise their findings.")
    print(text)
    print(f"\n[stats] iterations={stats['iters']} input_tokens={stats['in']} output_tokens={stats['out']}")

// Agent loop with iteration cap, token budget, and tool failure recovery.
// Requires: npm i @anthropic-ai/sdk
import Anthropic, { APIError } from "@anthropic-ai/sdk";

const client = new Anthropic({
  baseURL: "https://buzzai.cc",
  apiKey: process.env.BUZZ_API_KEY,
});

const MAX_ITERATIONS = 30;
const BUDGET_INPUT_TOKENS = 1_000_000;
const BUDGET_OUTPUT_TOKENS = 200_000;

const TOOLS = [
  {
    name: "search",
    description: "Search a knowledge base. Returns top matches.",
    input_schema: {
      type: "object",
      properties: { query: { type: "string" } },
      required: ["query"],
    },
  },
  {
    name: "fetch_url",
    description: "Fetch a URL and return its text content.",
    input_schema: {
      type: "object",
      properties: { url: { type: "string", format: "uri" } },
      required: ["url"],
    },
  },
];

class UnrecoverableToolError extends Error {}

function sleep(ms) { return new Promise((r) => setTimeout(r, ms)); }

async function executeTool(name, args) {
  if (name === "search") {
    return `Top 3 results for "${args.query}": ...`;
  }
  if (name === "fetch_url") {
    for (let attempt = 0; attempt < 3; attempt++) {
      try {
        const r = await fetch(args.url, { signal: AbortSignal.timeout(10000) });
        if (r.status === 403 || r.status === 404) {
          throw new UnrecoverableToolError(`HTTP ${r.status} for ${args.url}`);
        }
        if (!r.ok) throw new Error(`HTTP ${r.status}`);
        const text = await r.text();
        return text.slice(0, 8000);
      } catch (e) {
        if (e instanceof UnrecoverableToolError) throw e;
        await sleep(1000 * 2 ** attempt + Math.random() * 1000);
      }
    }
    throw new Error(`fetch_url failed after retries: ${args.url}`);
  }
  throw new UnrecoverableToolError(`unknown tool: ${name}`);
}

async function callWithRetry(params) {
  for (let attempt = 0; attempt < 5; attempt++) {
    try {
      return await client.messages.create(params);
    } catch (e) {
      if (e instanceof APIError && [429, 500, 503, 529].includes(e.status)) {
        await sleep(Math.min(60000, 1000 * 2 ** attempt + Math.random() * 1000));
        continue;
      }
      throw e;
    }
  }
  throw new Error("API failed after retries");
}

function finalText(resp) {
  return resp.content.filter((b) => b.type === "text").map((b) => b.text).join("\n");
}

export async function runAgent(userRequest) {
  const messages = [{ role: "user", content: userRequest }];
  let totalIn = 0, totalOut = 0, iteration = 0, abort = false;

  while (iteration < MAX_ITERATIONS) {
    iteration++;

    if (totalIn > BUDGET_INPUT_TOKENS || totalOut > BUDGET_OUTPUT_TOKENS) {
      messages.push({
        role: "user",
        content:
          "Token budget exhausted. Stop calling tools. " +
          "Reply now with a summary of progress and what remains.",
      });
    }

    const resp = await callWithRetry({
      model: "claude-sonnet-4-6",
      max_tokens: 4096,
      tools: TOOLS,
      messages,
    });
    totalIn += (resp.usage.input_tokens || 0) + (resp.usage.cache_read_input_tokens || 0);
    totalOut += resp.usage.output_tokens || 0;

    messages.push({ role: "assistant", content: resp.content });

    if (resp.stop_reason === "end_turn") {
      return { text: finalText(resp), stats: { iters: iteration, in: totalIn, out: totalOut } };
    }
    if (resp.stop_reason === "refusal") {
      return { text: "[refused]", stats: { iters: iteration, in: totalIn, out: totalOut } };
    }
    if (resp.stop_reason === "max_tokens") {
      messages.push({ role: "user", content: "Continue from where you stopped." });
      continue;
    }
    if (resp.stop_reason !== "tool_use") continue;

    const toolResults = [];
    for (const block of resp.content) {
      if (block.type !== "tool_use") continue;
      try {
        const output = await executeTool(block.name, block.input);
        toolResults.push({ type: "tool_result", tool_use_id: block.id, content: output });
      } catch (e) {
        const isFatal = e instanceof UnrecoverableToolError;
        toolResults.push({
          type: "tool_result",
          tool_use_id: block.id,
          content: `${isFatal ? "FATAL" : "ERROR (recoverable)"}: ${e.message}`,
          is_error: true,
        });
        if (isFatal) abort = true;
      }
    }
    messages.push({ role: "user", content: toolResults });
    if (abort) {
      return {
        text: "[aborted: unrecoverable tool error]",
        stats: { iters: iteration, in: totalIn, out: totalOut },
      };
    }
  }

  messages.push({
    role: "user",
    content:
      "You hit the iteration limit. Stop calling tools and " +
      "summarise what you accomplished, what's left, and any blockers.",
  });
  const resp = await callWithRetry({
    model: "claude-sonnet-4-6",
    max_tokens: 2048,
    messages,
  });
  return {
    text: finalText(resp),
    stats: { iters: iteration + 1, in: totalIn, out: totalOut },
  };
}

const { text, stats } = await runAgent(
  "Find the three most cited 2025 papers on prompt caching and summarise their findings."
);
console.log(text);
console.log(`\n[stats] iterations=${stats.iters} input=${stats.in} output=${stats.out}`);

Managing message history

Long-running agents accumulate hundreds of tool results, which inflates input cost on every turn. Three tactics:

1. Cache the prefix

Stable system prompt + tool definitions sit in a cached system block. Doesn't help with the growing message tail, but eliminates the fixed cost.

2. Trim old tool results

After N turns, replace the content of old tool results with a short summary, keeping the structure intact:

def trim_old_results(messages, keep_last=8):
    # Walk all but the last `keep_last` user messages
    for msg in messages[:-keep_last]:
        if msg["role"] != "user" or not isinstance(msg["content"], list):
            continue
        for block in msg["content"]:
            if isinstance(block, dict) and block.get("type") == "tool_result":
                if len(block.get("content", "")) > 200:
                    block["content"] = block["content"][:180] + "... [trimmed]"

3. Summarise and restart

Periodically, ask Claude to summarise the conversation, then start a new conversation seeded with that summary. Heaviest tactic; useful for very long sessions (50+ turns).

Opus thinking inside the loop

For Opus 4.7 you can enable extended thinking. Two things to know:

Thinking blocks must be preserved. When you append the assistant turn back into messages, include the thinking blocks too. Stripping them breaks signature verification on the next turn.
temperature, top_p, top_k are ignored when thinking is on. Configure behaviour with the system prompt instead.

resp = client.messages.create(
    model="claude-opus-4-7",
    max_tokens=8192,
    thinking={"type": "enabled", "budget_tokens": 4096},
    tools=TOOLS,
    messages=messages,
)
# Append resp.content unchanged — thinking blocks ride along.
messages.append({"role": "assistant", "content": resp.content})