BUZZ AI Gateway
Docs · Guides · Error Handling and Retries

Error Handling and Retries

Status code first, message second. This guide covers every HTTP code BUZZ returns on the Messages API, how the BUZZ error envelope differs from Anthropic's, and exponential-backoff templates for Python, Node, and Go you can drop into production.

HTTP code reference

HTTPerror.typeRetryable?Typical cause
400invalid_request_errorNoMalformed JSON, missing required field, schema-rejected input.
401buzz_error / authentication_errorNoAPI key missing, malformed, or revoked.
403permission_errorNoKey lacks permission for the model or group; IP not in allow list.
413request_too_largeNoRequest body greater than 32 MB on the Messages API.
429rate_limit_errorYes (with backoff)Rate limit hit on requests, input tokens, or output tokens. Respect retry-after.
500api_error / buzz_errorYesTransient internal error. Retry with backoff.
503buzz_error · model_not_foundConditionalBUZZ-specific. No upstream channel currently serves this model under your group.
529overloaded_errorYes (long backoff)Anthropic upstream is overloaded provider-wide. Try a different route or wait.

Two error envelope shapes

BUZZ returns two distinguishable error shapes depending on where the failure happened. Your client should handle both.

Anthropic-passthrough envelope

Used when the upstream model itself rejected the request. Identical to direct Anthropic API:

{
  "type": "error",
  "error": {
    "type": "rate_limit_error",
    "message": "Number of request tokens has exceeded your per-minute rate limit"
  },
  "request_id": "req_011CR..."
}

BUZZ gateway-side envelope

Used for failures that BUZZ itself produced before hitting the upstream — auth rejection, schema validation, channel routing failure. Note: no top-level type:"error" wrapper, no request_id field. The BUZZ request id is appended to error.message:

{
  "error": {
    "type": "buzz_error",
    "message": "Invalid token (request id: 202605260713594...)"
  }
}

Production code should branch on error.type and tolerate either envelope:

def parse_error(resp_json):
    err = resp_json.get("error", {})
    return {
        "type": err.get("type", "unknown"),
        "message": err.get("message", ""),
        # Anthropic shape exposes request_id at top level;
        # BUZZ shape embeds it in message.
        "request_id": resp_json.get("request_id"),
    }

Exponential backoff template

The right pattern for transient errors (429, 500, 529, and any network-level failure) is exponential backoff with full jitter. Cap the maximum delay so a single bad minute on the upstream does not stall your tail latency forever.

import os
import random
import time

import httpx

BASE_DELAY = 1.0      # seconds
MAX_DELAY = 32.0      # cap
MAX_ATTEMPTS = 5

RETRYABLE_STATUS = {429, 500, 502, 503, 504, 529}


def call_messages(payload):
    headers = {
        "Authorization": f"Bearer {os.environ['BUZZ_API_KEY']}",
        "anthropic-version": "2023-06-01",
        "Content-Type": "application/json",
    }
    last_err = None
    for attempt in range(MAX_ATTEMPTS):
        try:
            r = httpx.post(
                "https://buzzai.cc/v1/messages",
                json=payload,
                headers=headers,
                timeout=120,
            )
        except httpx.RequestError as e:
            last_err = e
        else:
            if r.status_code < 400:
                return r.json()
            if r.status_code not in RETRYABLE_STATUS:
                # 4xx that isn't 429: do not retry
                r.raise_for_status()
            last_err = httpx.HTTPStatusError(
                f"HTTP {r.status_code}: {r.text[:200]}",
                request=r.request, response=r,
            )
            # Honor server-provided retry-after on 429.
            retry_after = r.headers.get("retry-after")
            if retry_after:
                time.sleep(float(retry_after))
                continue

        # Full jitter: random in [0, base * 2^attempt], capped.
        delay = min(MAX_DELAY, BASE_DELAY * (2 ** attempt))
        time.sleep(random.uniform(0, delay))
    raise last_err
const RETRYABLE = new Set([429, 500, 502, 503, 504, 529]);
const BASE_DELAY_MS = 1000;
const MAX_DELAY_MS = 32_000;
const MAX_ATTEMPTS = 5;

const sleep = (ms) => new Promise((r) => setTimeout(r, ms));

export async function callMessages(payload) {
  let lastErr;
  for (let attempt = 0; attempt < MAX_ATTEMPTS; attempt++) {
    try {
      const resp = await fetch("https://buzzai.cc/v1/messages", {
        method: "POST",
        headers: {
          "Authorization": `Bearer ${process.env.BUZZ_API_KEY}`,
          "anthropic-version": "2023-06-01",
          "Content-Type": "application/json",
        },
        body: JSON.stringify(payload),
      });

      if (resp.ok) return await resp.json();
      if (!RETRYABLE.has(resp.status)) {
        const body = await resp.text();
        throw new Error(`HTTP ${resp.status}: ${body.slice(0, 200)}`);
      }

      lastErr = new Error(`HTTP ${resp.status}`);
      const retryAfter = resp.headers.get("retry-after");
      if (retryAfter) {
        await sleep(Number(retryAfter) * 1000);
        continue;
      }
    } catch (e) {
      lastErr = e;
    }
    const cap = Math.min(MAX_DELAY_MS, BASE_DELAY_MS * 2 ** attempt);
    await sleep(Math.random() * cap);
  }
  throw lastErr;
}
package buzz

import (
    "bytes"
    "encoding/json"
    "fmt"
    "io"
    "math/rand"
    "net/http"
    "os"
    "strconv"
    "time"
)

var retryable = map[int]bool{
    429: true, 500: true, 502: true, 503: true, 504: true, 529: true,
}

const (
    baseDelay   = time.Second
    maxDelay    = 32 * time.Second
    maxAttempts = 5
)

func CallMessages(payload any) (map[string]any, error) {
    body, err := json.Marshal(payload)
    if err != nil {
        return nil, err
    }

    var lastErr error
    for attempt := 0; attempt < maxAttempts; attempt++ {
        req, _ := http.NewRequest(
            "POST",
            "https://buzzai.cc/v1/messages",
            bytes.NewReader(body),
        )
        req.Header.Set("Authorization", "Bearer "+os.Getenv("BUZZ_API_KEY"))
        req.Header.Set("anthropic-version", "2023-06-01")
        req.Header.Set("Content-Type", "application/json")

        resp, err := http.DefaultClient.Do(req)
        if err == nil {
            buf, _ := io.ReadAll(resp.Body)
            resp.Body.Close()
            if resp.StatusCode < 400 {
                var out map[string]any
                _ = json.Unmarshal(buf, &out)
                return out, nil
            }
            if !retryable[resp.StatusCode] {
                return nil, fmt.Errorf("HTTP %d: %s", resp.StatusCode, string(buf))
            }
            lastErr = fmt.Errorf("HTTP %d", resp.StatusCode)
            if ra := resp.Header.Get("Retry-After"); ra != "" {
                if secs, err2 := strconv.Atoi(ra); err2 == nil {
                    time.Sleep(time.Duration(secs) * time.Second)
                    continue
                }
            }
        } else {
            lastErr = err
        }

        cap := baseDelay << attempt
        if cap > maxDelay {
            cap = maxDelay
        }
        time.Sleep(time.Duration(rand.Int63n(int64(cap))))
    }
    return nil, lastErr
}

429: the three rate-limit dimensions

Anthropic enforces three independent budgets and any one of them can fire a 429. Knowing which one was hit determines whether you need to slow down, shrink prompts, or shrink outputs.

DimensionWhat countsHeader (if exposed)Mitigation
requests_per_minuteNumber of POSTs in a 60-second windowretry-afterAdd concurrency limits client-side; queue.
input_tokens_per_minuteSum of input_tokens across the windowretry-afterTrim system prompt; use prompt caching to avoid re-sending the same tokens.
output_tokens_per_minuteSum of output_tokens across the windowretry-afterLower max_tokens; ask the model to be more concise.

The error message usually names the dimension explicitly:

{"type":"error","error":{"type":"rate_limit_error",
 "message":"Number of input tokens has exceeded your per-minute rate limit"}}

Your retry loop should always honor a retry-after response header before falling back to its own backoff schedule. The header value is in seconds.

Prompt caching is a 429 mitigation. A cached read costs cache_read_input_tokens, which is billed separately from input_tokens. Verified BUZZ behavior shows cold-call cache_creation=1200, input=2 on the second call where the prompt would otherwise have been 1202 input tokens. See the Prompt Caching concept.

503 buzz_error · model_not_found troubleshooting

This status is BUZZ-specific. It means: the model you asked for is not currently routable under your account's group. Anthropic itself does not return 503 — direct Anthropic returns 404 for unknown models. Treat 503 model_not_found as a routing problem, not an outage.

Diagnosis steps

  1. Check the live model list. Hit GET /v1/models with the same API key. The response is the authoritative list of models your group can route to right now:
    curl -H "Authorization: Bearer $BUZZ_API_KEY" https://buzzai.cc/v1/models
  2. Use the dated alias. The undated alias claude-haiku-4-5 may be missing under some groups while the dated form claude-haiku-4-5-20251001 works. Both are valid identifiers; the dated form is the safest canonical value for production.
  3. Check group permissions. If a model is in the global catalog but not in your GET /v1/models response, your group does not have access. Contact support to enable the channel for your group.

Common error message

HTTP/1.1 503 Service Unavailable
content-type: application/json

{"error":{"type":"buzz_error",
 "message":"No available channel for model claude-haiku-4-5-20251001 under group aws (request id: 202605260713...)"}}

The request id at the end of the message is your fastest path to a support diagnosis — paste it verbatim.

529: provider-wide overload and multi-route fallback

A 529 is qualitatively different from a 429. A 429 says your account is over budget. A 529 says everyone's upstream is saturated and your individual budget is fine. Raising your account tier does nothing for a 529 — the constraint is on the other side.

 429 rate_limit_error529 overloaded_error
ScopePer-accountProvider-wide
Fix on your sideSlow down or raise tierTry a different route
BackoffShort, jittered, honor retry-afterLong, jittered (5 to 30 s)
Stays forUntil the rolling window resetsUntil provider capacity recovers

Fallback strategy

The clean answer to a 529 is to have more than one place to send the request. Two patterns work in practice:

Pattern A: model-tier fallback within BUZZ

If your task tolerates a smaller model, fall back from Sonnet to Haiku, or from Opus to Sonnet. Same gateway, different model id, same SDK call. This works because BUZZ does not lock you to one model per request.

def call_with_fallback(messages, max_tokens=400):
    fallbacks = [
        "claude-opus-4-7",
        "claude-sonnet-4-6",
        "claude-haiku-4-5-20251001",
    ]
    last = None
    for model in fallbacks:
        try:
            return call_messages({
                "model": model,
                "max_tokens": max_tokens,
                "messages": messages,
            })
        except Exception as e:
            # only fall back on transient errors (429, 500, 503, 529)
            if not is_transient(e):
                raise
            last = e
    raise last

Pattern B: client-side multi-gateway fallback

For applications with their own resilience requirements, keep BUZZ as primary and a second base URL as a backup. Both endpoints accept the same payload byte-for-byte; only the base URL and key change.

BASE_URLS = [
    ("https://buzzai.cc", os.environ["BUZZ_API_KEY"]),
    ("https://api.anthropic.com", os.environ["ANTHROPIC_API_KEY"]),
]

for base_url, key in BASE_URLS:
    try:
        return call(base_url, key, payload)
    except RetryableError:
        continue
raise LastError

This is exactly what BUZZ's transparent forwarding makes possible: the request bytes you send to BUZZ are the bytes Anthropic would have received, and the response bytes are the bytes Anthropic would have returned. Swapping endpoints is mechanical.

Non-retryable errors: what to do instead

4xx errors that are not 429 indicate your code is wrong, not the upstream. Retrying them wastes quota and amplifies the bug.

HTTPAction
400 invalid_request_errorSurface the message to the developer. Common offenders: messages alternation broken, max_tokens too high for the model, malformed tool_use round-trip.
401Re-issue the key from your dashboard. If you rotate keys, update your secret store before restarting.
403The key is valid but lacks permission. Check the group / IP allow list on your account.
413Trim the request. The 32 MB ceiling is on the encoded JSON body, not the prompt token count.

Errors during streaming

Once you've received HTTP 200 on a streamed request, any subsequent failure arrives as an SSE event: error frame, not as a non-2xx response. Frame shape:

event: error
data: {"type":"error","error":{"type":"overloaded_error","message":"..."}}

Treat it identically to the corresponding non-streaming status: retry on overloaded_error with long backoff, retry on rate_limit_error respecting retry-after, surface to caller on the rest. For mechanics see the streaming guide.

Production checklist

See also