Docs · Guides · Error Handling and Retries

Error Handling and Retries

Status code first, message second. This guide covers every HTTP code BUZZ returns on the Messages API, how the BUZZ error envelope differs from Anthropic's, and exponential-backoff templates for Python, Node, and Go you can drop into production.

HTTP code reference

HTTP	error.type	Retryable?	Typical cause
400	invalid_request_error	No	Malformed JSON, missing required field, schema-rejected input.
401	buzz_error / authentication_error	No	API key missing, malformed, or revoked.
403	permission_error	No	Key lacks permission for the model or group; IP not in allow list.
413	request_too_large	No	Request body greater than 32 MB on the Messages API.
429	rate_limit_error	Yes (with backoff)	Rate limit hit on requests, input tokens, or output tokens. Respect `retry-after`.
500	api_error / buzz_error	Yes	Transient internal error. Retry with backoff.
503	buzz_error · model_not_found	Conditional	BUZZ-specific. No upstream channel currently serves this model under your group.
529	overloaded_error	Yes (long backoff)	Anthropic upstream is overloaded provider-wide. Try a different route or wait.

Two error envelope shapes

BUZZ returns two distinguishable error shapes depending on where the failure happened. Your client should handle both.

Anthropic-passthrough envelope

Used when the upstream model itself rejected the request. Identical to direct Anthropic API:

{
  "type": "error",
  "error": {
    "type": "rate_limit_error",
    "message": "Number of request tokens has exceeded your per-minute rate limit"
  },
  "request_id": "req_011CR..."
}

BUZZ gateway-side envelope

Used for failures that BUZZ itself produced before hitting the upstream — auth rejection, schema validation, channel routing failure. Note: no top-level type:"error" wrapper, no request_id field. The BUZZ request id is appended to error.message:

{
  "error": {
    "type": "buzz_error",
    "message": "Invalid token (request id: 202605260713594...)"
  }
}

Production code should branch on error.type and tolerate either envelope:

def parse_error(resp_json):
    err = resp_json.get("error", {})
    return {
        "type": err.get("type", "unknown"),
        "message": err.get("message", ""),
        # Anthropic shape exposes request_id at top level;
        # BUZZ shape embeds it in message.
        "request_id": resp_json.get("request_id"),
    }

Exponential backoff template

The right pattern for transient errors (429, 500, 529, and any network-level failure) is exponential backoff with full jitter. Cap the maximum delay so a single bad minute on the upstream does not stall your tail latency forever.

import os
import random
import time

import httpx

BASE_DELAY = 1.0      # seconds
MAX_DELAY = 32.0      # cap
MAX_ATTEMPTS = 5

RETRYABLE_STATUS = {429, 500, 502, 503, 504, 529}


def call_messages(payload):
    headers = {
        "Authorization": f"Bearer {os.environ['BUZZ_API_KEY']}",
        "anthropic-version": "2023-06-01",
        "Content-Type": "application/json",
    }
    last_err = None
    for attempt in range(MAX_ATTEMPTS):
        try:
            r = httpx.post(
                "https://buzzai.cc/v1/messages",
                json=payload,
                headers=headers,
                timeout=120,
            )
        except httpx.RequestError as e:
            last_err = e
        else:
            if r.status_code < 400:
                return r.json()
            if r.status_code not in RETRYABLE_STATUS:
                # 4xx that isn't 429: do not retry
                r.raise_for_status()
            last_err = httpx.HTTPStatusError(
                f"HTTP {r.status_code}: {r.text[:200]}",
                request=r.request, response=r,
            )
            # Honor server-provided retry-after on 429.
            retry_after = r.headers.get("retry-after")
            if retry_after:
                time.sleep(float(retry_after))
                continue

        # Full jitter: random in [0, base * 2^attempt], capped.
        delay = min(MAX_DELAY, BASE_DELAY * (2 ** attempt))
        time.sleep(random.uniform(0, delay))
    raise last_err

const RETRYABLE = new Set([429, 500, 502, 503, 504, 529]);
const BASE_DELAY_MS = 1000;
const MAX_DELAY_MS = 32_000;
const MAX_ATTEMPTS = 5;

const sleep = (ms) => new Promise((r) => setTimeout(r, ms));

export async function callMessages(payload) {
  let lastErr;
  for (let attempt = 0; attempt < MAX_ATTEMPTS; attempt++) {
    try {
      const resp = await fetch("https://buzzai.cc/v1/messages", {
        method: "POST",
        headers: {
          "Authorization": `Bearer ${process.env.BUZZ_API_KEY}`,
          "anthropic-version": "2023-06-01",
          "Content-Type": "application/json",
        },
        body: JSON.stringify(payload),
      });

      if (resp.ok) return await resp.json();
      if (!RETRYABLE.has(resp.status)) {
        const body = await resp.text();
        throw new Error(`HTTP ${resp.status}: ${body.slice(0, 200)}`);
      }

      lastErr = new Error(`HTTP ${resp.status}`);
      const retryAfter = resp.headers.get("retry-after");
      if (retryAfter) {
        await sleep(Number(retryAfter) * 1000);
        continue;
      }
    } catch (e) {
      lastErr = e;
    }
    const cap = Math.min(MAX_DELAY_MS, BASE_DELAY_MS * 2 ** attempt);
    await sleep(Math.random() * cap);
  }
  throw lastErr;
}

package buzz

import (
    "bytes"
    "encoding/json"
    "fmt"
    "io"
    "math/rand"
    "net/http"
    "os"
    "strconv"
    "time"
)

var retryable = map[int]bool{
    429: true, 500: true, 502: true, 503: true, 504: true, 529: true,
}

const (
    baseDelay   = time.Second
    maxDelay    = 32 * time.Second
    maxAttempts = 5
)

func CallMessages(payload any) (map[string]any, error) {
    body, err := json.Marshal(payload)
    if err != nil {
        return nil, err
    }

    var lastErr error
    for attempt := 0; attempt < maxAttempts; attempt++ {
        req, _ := http.NewRequest(
            "POST",
            "https://buzzai.cc/v1/messages",
            bytes.NewReader(body),
        )
        req.Header.Set("Authorization", "Bearer "+os.Getenv("BUZZ_API_KEY"))
        req.Header.Set("anthropic-version", "2023-06-01")
        req.Header.Set("Content-Type", "application/json")

        resp, err := http.DefaultClient.Do(req)
        if err == nil {
            buf, _ := io.ReadAll(resp.Body)
            resp.Body.Close()
            if resp.StatusCode < 400 {
                var out map[string]any
                _ = json.Unmarshal(buf, &out)
                return out, nil
            }
            if !retryable[resp.StatusCode] {
                return nil, fmt.Errorf("HTTP %d: %s", resp.StatusCode, string(buf))
            }
            lastErr = fmt.Errorf("HTTP %d", resp.StatusCode)
            if ra := resp.Header.Get("Retry-After"); ra != "" {
                if secs, err2 := strconv.Atoi(ra); err2 == nil {
                    time.Sleep(time.Duration(secs) * time.Second)
                    continue
                }
            }
        } else {
            lastErr = err
        }

        cap := baseDelay << attempt
        if cap > maxDelay {
            cap = maxDelay
        }
        time.Sleep(time.Duration(rand.Int63n(int64(cap))))
    }
    return nil, lastErr
}

429: the three rate-limit dimensions

Anthropic enforces three independent budgets and any one of them can fire a 429. Knowing which one was hit determines whether you need to slow down, shrink prompts, or shrink outputs.

Dimension	What counts	Header (if exposed)	Mitigation
requests_per_minute	Number of POSTs in a 60-second window	`retry-after`	Add concurrency limits client-side; queue.
input_tokens_per_minute	Sum of `input_tokens` across the window	`retry-after`	Trim system prompt; use prompt caching to avoid re-sending the same tokens.
output_tokens_per_minute	Sum of `output_tokens` across the window	`retry-after`	Lower `max_tokens`; ask the model to be more concise.

The error message usually names the dimension explicitly:

{"type":"error","error":{"type":"rate_limit_error",
 "message":"Number of input tokens has exceeded your per-minute rate limit"}}

Your retry loop should always honor a retry-after response header before falling back to its own backoff schedule. The header value is in seconds.

Prompt caching is a 429 mitigation. A cached read costs cache_read_input_tokens, which is billed separately from input_tokens. Verified BUZZ behavior shows cold-call cache_creation=1200, input=2 on the second call where the prompt would otherwise have been 1202 input tokens. See the Prompt Caching concept.

503 buzz_error · model_not_found troubleshooting

This status is BUZZ-specific. It means: the model you asked for is not currently routable under your account's group. Anthropic itself does not return 503 — direct Anthropic returns 404 for unknown models. Treat 503 model_not_found as a routing problem, not an outage.

Diagnosis steps

Check the live model list. Hit GET /v1/models with the same API key. The response is the authoritative list of models your group can route to right now:
```
curl -H "Authorization: Bearer $BUZZ_API_KEY" https://buzzai.cc/v1/models
```
Use the dated alias. The undated alias claude-haiku-4-5 may be missing under some groups while the dated form claude-haiku-4-5-20251001 works. Both are valid identifiers; the dated form is the safest canonical value for production.
Check group permissions. If a model is in the global catalog but not in your GET /v1/models response, your group does not have access. Contact support to enable the channel for your group.

Common error message

HTTP/1.1 503 Service Unavailable
content-type: application/json

{"error":{"type":"buzz_error",
 "message":"No available channel for model claude-haiku-4-5-20251001 under group aws (request id: 202605260713...)"}}

The request id at the end of the message is your fastest path to a support diagnosis — paste it verbatim.

529: provider-wide overload and multi-route fallback

A 529 is qualitatively different from a 429. A 429 says your account is over budget. A 529 says everyone's upstream is saturated and your individual budget is fine. Raising your account tier does nothing for a 529 — the constraint is on the other side.

	429 rate_limit_error	529 overloaded_error
Scope	Per-account	Provider-wide
Fix on your side	Slow down or raise tier	Try a different route
Backoff	Short, jittered, honor `retry-after`	Long, jittered (5 to 30 s)
Stays for	Until the rolling window resets	Until provider capacity recovers

Fallback strategy

The clean answer to a 529 is to have more than one place to send the request. Two patterns work in practice:

Pattern A: model-tier fallback within BUZZ

If your task tolerates a smaller model, fall back from Sonnet to Haiku, or from Opus to Sonnet. Same gateway, different model id, same SDK call. This works because BUZZ does not lock you to one model per request.

def call_with_fallback(messages, max_tokens=400):
    fallbacks = [
        "claude-opus-4-7",
        "claude-sonnet-4-6",
        "claude-haiku-4-5-20251001",
    ]
    last = None
    for model in fallbacks:
        try:
            return call_messages({
                "model": model,
                "max_tokens": max_tokens,
                "messages": messages,
            })
        except Exception as e:
            # only fall back on transient errors (429, 500, 503, 529)
            if not is_transient(e):
                raise
            last = e
    raise last

Pattern B: client-side multi-gateway fallback

For applications with their own resilience requirements, keep BUZZ as primary and a second base URL as a backup. Both endpoints accept the same payload byte-for-byte; only the base URL and key change.

BASE_URLS = [
    ("https://buzzai.cc", os.environ["BUZZ_API_KEY"]),
    ("https://api.anthropic.com", os.environ["ANTHROPIC_API_KEY"]),
]

for base_url, key in BASE_URLS:
    try:
        return call(base_url, key, payload)
    except RetryableError:
        continue
raise LastError

This is exactly what BUZZ's transparent forwarding makes possible: the request bytes you send to BUZZ are the bytes Anthropic would have received, and the response bytes are the bytes Anthropic would have returned. Swapping endpoints is mechanical.

Non-retryable errors: what to do instead

4xx errors that are not 429 indicate your code is wrong, not the upstream. Retrying them wastes quota and amplifies the bug.

HTTP	Action
400 invalid_request_error	Surface the message to the developer. Common offenders: `messages` alternation broken, `max_tokens` too high for the model, malformed `tool_use` round-trip.
401	Re-issue the key from your dashboard. If you rotate keys, update your secret store before restarting.
403	The key is valid but lacks permission. Check the group / IP allow list on your account.
413	Trim the request. The 32 MB ceiling is on the encoded JSON body, not the prompt token count.

Errors during streaming

Once you've received HTTP 200 on a streamed request, any subsequent failure arrives as an SSE event: error frame, not as a non-2xx response. Frame shape:

event: error
data: {"type":"error","error":{"type":"overloaded_error","message":"..."}}

Treat it identically to the corresponding non-streaming status: retry on overloaded_error with long backoff, retry on rate_limit_error respecting retry-after, surface to caller on the rest. For mechanics see the streaming guide.

Production checklist

Branch on error.type, not just HTTP status. The same status (e.g. 400) covers different categories.
Tolerate both error envelope shapes (Anthropic and BUZZ).
Honor retry-after first, then fall back to exponential backoff with full jitter.
Cap retries at five attempts. Anything that needs more is a structural problem, not a transient one.
Never retry 400, 401, 403, 413 — you'll just rate-limit your own quota.
Log request_id (or the BUZZ request id from the message tail) on every failure. Support diagnosis depends on it.
For 529, plan for a fallback route, not just a longer sleep.

Error Handling and Retries

HTTP code reference

Two error envelope shapes

Anthropic-passthrough envelope

BUZZ gateway-side envelope

Exponential backoff template

429: the three rate-limit dimensions

503 buzz_error · model_not_found troubleshooting

Diagnosis steps

Common error message

529: provider-wide overload and multi-route fallback

Fallback strategy

Pattern A: model-tier fallback within BUZZ

Pattern B: client-side multi-gateway fallback

Non-retryable errors: what to do instead

Errors during streaming

Production checklist

See also