BUZZ AI Gateway
Docs · Recipes · Agent Loop

Agent Loop

A multi-turn agent that calls tools until it's done. The loop body is small. The hard parts are the four guard rails: stop conditions, max iterations, token budget, and tool-failure recovery.

POST https://buzzai.cc/v1/messages

The shape of the loop

Stripped down, the agent loop is six lines:

while True:
    response = call_claude(messages)
    if response.stop_reason != "tool_use":
        return response
    tool_results = run_all_tool_uses(response)
    messages.append(assistant_turn(response))
    messages.append(user_turn(tool_results))

That much will work for happy-path demos. Production needs four extra guards.

Stop conditions

Claude tells you why it stopped via stop_reason. Treat each one explicitly:

stop_reasonMeaningLoop action
tool_useWants to call one or more toolsExecute tools, append results, continue
end_turnDone, returning a final answerExit loop, return text content
max_tokensHit max_tokens mid-responseContinue with the partial assistant message in history; append a user nudge or raise to the user
stop_sequenceCustom stop string matchedExit; treat output as truncated at the marker
pause_turnServer-side pause (rare; long thinking, server tools)Send the same conversation back to resume
refusalModel declined to continueSurface to user; do not retry blindly

Max iterations safety valve

Some tasks legitimately take 20+ turns. Some get stuck in a loop calling the same tool with slightly different arguments. Pick a hard ceiling:

When you hit the ceiling, don't just abort. Inject a final user message that asks Claude to summarise progress and stop:

"You've hit the iteration limit. Stop calling tools. In your final message,
summarise what you accomplished, what's left, and any blockers."

Token budget

Read response.usage after every turn and accumulate. When you cross a threshold, force a wrap-up:

budget_input  = 1_000_000   # tokens
budget_output =   200_000

total_in = total_out = 0
for turn in range(MAX_TURNS):
    resp = client.messages.create(...)
    total_in  += resp.usage.input_tokens + resp.usage.cache_read_input_tokens
    total_out += resp.usage.output_tokens
    if total_in > budget_input or total_out > budget_output:
        # Inject "wrap up" instruction and break after one more call
        ...

Cache reads count against your budget at 10% of input cost, but they still count tokens. Track them separately if you bill end users by token; track only dollar-equivalent if you bill by cost.

Tool-failure recovery

Tools throw. Network calls time out. The model can request invalid arguments. Don't crash the loop — return the failure as a structured tool result and let Claude react.

{
  "type": "tool_result",
  "tool_use_id": "toolu_...",
  "content": "ERROR: file not found at path 'src/foo.py'. Try listing the directory first.",
  "is_error": true
}

Three failure tiers, with different handling:

TierExampleRecovery
Recoverable, model's faultBad path, wrong arg typeReturn is_error: true with a hint. Claude usually retries with a fix.
Recoverable, transientHTTP 503 from a downstream API, network blipRetry inside the tool with backoff before returning. Limit to 2-3 attempts.
UnrecoverablePermission denied, dependency missing, invalid credentialsReturn error and abort the loop. Don't burn tokens watching Claude fail repeatedly.

Full working example

"""
Agent loop with iteration cap, token budget, and tool failure recovery.
Requires: pip install anthropic
"""
import os, time, random
from anthropic import Anthropic
from anthropic import APIStatusError

client = Anthropic(
    base_url="https://buzzai.cc",
    api_key=os.environ["BUZZ_API_KEY"],
)

# === Configurable guards ===
MAX_ITERATIONS = 30
BUDGET_INPUT_TOKENS  = 1_000_000
BUDGET_OUTPUT_TOKENS =   200_000

# === Tools ===
TOOLS = [
    {
        "name": "search",
        "description": "Search a knowledge base. Returns top matches.",
        "input_schema": {
            "type": "object",
            "properties": {"query": {"type": "string"}},
            "required": ["query"],
        },
    },
    {
        "name": "fetch_url",
        "description": "Fetch a URL and return its text content.",
        "input_schema": {
            "type": "object",
            "properties": {"url": {"type": "string", "format": "uri"}},
            "required": ["url"],
        },
    },
]


class UnrecoverableToolError(Exception):
    pass


def execute_tool(name, args):
    if name == "search":
        # ... do real search ...
        return f"Top 3 results for {args['query']!r}: ..."
    if name == "fetch_url":
        import urllib.request
        for attempt in range(3):
            try:
                with urllib.request.urlopen(args["url"], timeout=10) as r:
                    return r.read().decode("utf-8", errors="replace")[:8000]
            except urllib.error.HTTPError as e:
                if e.code in (403, 404):
                    raise UnrecoverableToolError(f"HTTP {e.code} for {args['url']}")
                time.sleep(2 ** attempt + random.random())
        raise RuntimeError(f"fetch_url failed after retries: {args['url']}")
    raise UnrecoverableToolError(f"unknown tool: {name}")


def call_with_retry(**kwargs):
    """Outer retry for transient API errors (429/500/529)."""
    for attempt in range(5):
        try:
            return client.messages.create(**kwargs)
        except APIStatusError as e:
            if e.status_code in (429, 500, 503, 529):
                wait = (2 ** attempt) + random.random()
                time.sleep(min(wait, 60))
                continue
            raise
    raise RuntimeError("API failed after retries")


def run_agent(user_request: str):
    messages = [{"role": "user", "content": user_request}]
    total_in = total_out = 0
    iteration = 0
    abort = False

    while iteration < MAX_ITERATIONS:
        iteration += 1

        # Inject wrap-up if we are out of budget
        if total_in > BUDGET_INPUT_TOKENS or total_out > BUDGET_OUTPUT_TOKENS:
            messages.append({
                "role": "user",
                "content": "Token budget exhausted. Stop calling tools. "
                           "Reply now with a summary of progress and what remains.",
            })

        resp = call_with_retry(
            model="claude-sonnet-4-6",
            max_tokens=4096,
            tools=TOOLS,
            messages=messages,
        )
        total_in  += resp.usage.input_tokens + (resp.usage.cache_read_input_tokens or 0)
        total_out += resp.usage.output_tokens

        messages.append({"role": "assistant", "content": resp.content})

        if resp.stop_reason == "end_turn":
            return _final_text(resp), {"iters": iteration, "in": total_in, "out": total_out}
        if resp.stop_reason == "refusal":
            return "[refused]", {"iters": iteration, "in": total_in, "out": total_out}
        if resp.stop_reason == "max_tokens":
            messages.append({"role": "user", "content": "Continue from where you stopped."})
            continue
        if resp.stop_reason != "tool_use":
            # pause_turn, stop_sequence, or unknown — re-loop with empty user nudge
            continue

        tool_results = []
        for block in resp.content:
            if block.type != "tool_use":
                continue
            try:
                output = execute_tool(block.name, block.input)
                tool_results.append({
                    "type": "tool_result",
                    "tool_use_id": block.id,
                    "content": output,
                })
            except UnrecoverableToolError as e:
                tool_results.append({
                    "type": "tool_result",
                    "tool_use_id": block.id,
                    "content": f"FATAL: {e}",
                    "is_error": True,
                })
                abort = True
            except Exception as e:
                tool_results.append({
                    "type": "tool_result",
                    "tool_use_id": block.id,
                    "content": f"ERROR (recoverable): {e}",
                    "is_error": True,
                })

        messages.append({"role": "user", "content": tool_results})
        if abort:
            return "[aborted: unrecoverable tool error]", {
                "iters": iteration, "in": total_in, "out": total_out,
            }

    # Hit MAX_ITERATIONS
    messages.append({
        "role": "user",
        "content": "You hit the iteration limit. Stop calling tools and "
                   "summarise what you accomplished, what's left, and any blockers.",
    })
    resp = call_with_retry(
        model="claude-sonnet-4-6",
        max_tokens=2048,
        messages=messages,  # tools omitted to force a text response
    )
    return _final_text(resp), {"iters": iteration + 1, "in": total_in, "out": total_out}


def _final_text(resp):
    return "\n".join(b.text for b in resp.content if b.type == "text")


if __name__ == "__main__":
    text, stats = run_agent("Find the three most cited 2025 papers on prompt caching and summarise their findings.")
    print(text)
    print(f"\n[stats] iterations={stats['iters']} input_tokens={stats['in']} output_tokens={stats['out']}")
// Agent loop with iteration cap, token budget, and tool failure recovery.
// Requires: npm i @anthropic-ai/sdk
import Anthropic, { APIError } from "@anthropic-ai/sdk";

const client = new Anthropic({
  baseURL: "https://buzzai.cc",
  apiKey: process.env.BUZZ_API_KEY,
});

const MAX_ITERATIONS = 30;
const BUDGET_INPUT_TOKENS = 1_000_000;
const BUDGET_OUTPUT_TOKENS = 200_000;

const TOOLS = [
  {
    name: "search",
    description: "Search a knowledge base. Returns top matches.",
    input_schema: {
      type: "object",
      properties: { query: { type: "string" } },
      required: ["query"],
    },
  },
  {
    name: "fetch_url",
    description: "Fetch a URL and return its text content.",
    input_schema: {
      type: "object",
      properties: { url: { type: "string", format: "uri" } },
      required: ["url"],
    },
  },
];

class UnrecoverableToolError extends Error {}

function sleep(ms) { return new Promise((r) => setTimeout(r, ms)); }

async function executeTool(name, args) {
  if (name === "search") {
    return `Top 3 results for "${args.query}": ...`;
  }
  if (name === "fetch_url") {
    for (let attempt = 0; attempt < 3; attempt++) {
      try {
        const r = await fetch(args.url, { signal: AbortSignal.timeout(10000) });
        if (r.status === 403 || r.status === 404) {
          throw new UnrecoverableToolError(`HTTP ${r.status} for ${args.url}`);
        }
        if (!r.ok) throw new Error(`HTTP ${r.status}`);
        const text = await r.text();
        return text.slice(0, 8000);
      } catch (e) {
        if (e instanceof UnrecoverableToolError) throw e;
        await sleep(1000 * 2 ** attempt + Math.random() * 1000);
      }
    }
    throw new Error(`fetch_url failed after retries: ${args.url}`);
  }
  throw new UnrecoverableToolError(`unknown tool: ${name}`);
}

async function callWithRetry(params) {
  for (let attempt = 0; attempt < 5; attempt++) {
    try {
      return await client.messages.create(params);
    } catch (e) {
      if (e instanceof APIError && [429, 500, 503, 529].includes(e.status)) {
        await sleep(Math.min(60000, 1000 * 2 ** attempt + Math.random() * 1000));
        continue;
      }
      throw e;
    }
  }
  throw new Error("API failed after retries");
}

function finalText(resp) {
  return resp.content.filter((b) => b.type === "text").map((b) => b.text).join("\n");
}

export async function runAgent(userRequest) {
  const messages = [{ role: "user", content: userRequest }];
  let totalIn = 0, totalOut = 0, iteration = 0, abort = false;

  while (iteration < MAX_ITERATIONS) {
    iteration++;

    if (totalIn > BUDGET_INPUT_TOKENS || totalOut > BUDGET_OUTPUT_TOKENS) {
      messages.push({
        role: "user",
        content:
          "Token budget exhausted. Stop calling tools. " +
          "Reply now with a summary of progress and what remains.",
      });
    }

    const resp = await callWithRetry({
      model: "claude-sonnet-4-6",
      max_tokens: 4096,
      tools: TOOLS,
      messages,
    });
    totalIn += (resp.usage.input_tokens || 0) + (resp.usage.cache_read_input_tokens || 0);
    totalOut += resp.usage.output_tokens || 0;

    messages.push({ role: "assistant", content: resp.content });

    if (resp.stop_reason === "end_turn") {
      return { text: finalText(resp), stats: { iters: iteration, in: totalIn, out: totalOut } };
    }
    if (resp.stop_reason === "refusal") {
      return { text: "[refused]", stats: { iters: iteration, in: totalIn, out: totalOut } };
    }
    if (resp.stop_reason === "max_tokens") {
      messages.push({ role: "user", content: "Continue from where you stopped." });
      continue;
    }
    if (resp.stop_reason !== "tool_use") continue;

    const toolResults = [];
    for (const block of resp.content) {
      if (block.type !== "tool_use") continue;
      try {
        const output = await executeTool(block.name, block.input);
        toolResults.push({ type: "tool_result", tool_use_id: block.id, content: output });
      } catch (e) {
        const isFatal = e instanceof UnrecoverableToolError;
        toolResults.push({
          type: "tool_result",
          tool_use_id: block.id,
          content: `${isFatal ? "FATAL" : "ERROR (recoverable)"}: ${e.message}`,
          is_error: true,
        });
        if (isFatal) abort = true;
      }
    }
    messages.push({ role: "user", content: toolResults });
    if (abort) {
      return {
        text: "[aborted: unrecoverable tool error]",
        stats: { iters: iteration, in: totalIn, out: totalOut },
      };
    }
  }

  messages.push({
    role: "user",
    content:
      "You hit the iteration limit. Stop calling tools and " +
      "summarise what you accomplished, what's left, and any blockers.",
  });
  const resp = await callWithRetry({
    model: "claude-sonnet-4-6",
    max_tokens: 2048,
    messages,
  });
  return {
    text: finalText(resp),
    stats: { iters: iteration + 1, in: totalIn, out: totalOut },
  };
}

const { text, stats } = await runAgent(
  "Find the three most cited 2025 papers on prompt caching and summarise their findings."
);
console.log(text);
console.log(`\n[stats] iterations=${stats.iters} input=${stats.in} output=${stats.out}`);

Managing message history

Long-running agents accumulate hundreds of tool results, which inflates input cost on every turn. Three tactics:

1. Cache the prefix

Stable system prompt + tool definitions sit in a cached system block. Doesn't help with the growing message tail, but eliminates the fixed cost.

2. Trim old tool results

After N turns, replace the content of old tool results with a short summary, keeping the structure intact:

def trim_old_results(messages, keep_last=8):
    # Walk all but the last `keep_last` user messages
    for msg in messages[:-keep_last]:
        if msg["role"] != "user" or not isinstance(msg["content"], list):
            continue
        for block in msg["content"]:
            if isinstance(block, dict) and block.get("type") == "tool_result":
                if len(block.get("content", "")) > 200:
                    block["content"] = block["content"][:180] + "... [trimmed]"

3. Summarise and restart

Periodically, ask Claude to summarise the conversation, then start a new conversation seeded with that summary. Heaviest tactic; useful for very long sessions (50+ turns).

Opus thinking inside the loop

For Opus 4.7 you can enable extended thinking. Two things to know:

resp = client.messages.create(
    model="claude-opus-4-7",
    max_tokens=8192,
    thinking={"type": "enabled", "budget_tokens": 4096},
    tools=TOOLS,
    messages=messages,
)
# Append resp.content unchanged — thinking blocks ride along.
messages.append({"role": "assistant", "content": resp.content})

See also