Error Handling and Retries
Status code first, message second. This guide covers every HTTP code BUZZ returns on the Messages API, how the BUZZ error envelope differs from Anthropic's, and exponential-backoff templates for Python, Node, and Go you can drop into production.
HTTP code reference
| HTTP | error.type | Retryable? | Typical cause |
|---|---|---|---|
| 400 | invalid_request_error | No | Malformed JSON, missing required field, schema-rejected input. |
| 401 | buzz_error / authentication_error | No | API key missing, malformed, or revoked. |
| 403 | permission_error | No | Key lacks permission for the model or group; IP not in allow list. |
| 413 | request_too_large | No | Request body greater than 32 MB on the Messages API. |
| 429 | rate_limit_error | Yes (with backoff) | Rate limit hit on requests, input tokens, or output tokens. Respect retry-after. |
| 500 | api_error / buzz_error | Yes | Transient internal error. Retry with backoff. |
| 503 | buzz_error · model_not_found | Conditional | BUZZ-specific. No upstream channel currently serves this model under your group. |
| 529 | overloaded_error | Yes (long backoff) | Anthropic upstream is overloaded provider-wide. Try a different route or wait. |
Two error envelope shapes
BUZZ returns two distinguishable error shapes depending on where the failure happened. Your client should handle both.
Anthropic-passthrough envelope
Used when the upstream model itself rejected the request. Identical to direct Anthropic API:
{
"type": "error",
"error": {
"type": "rate_limit_error",
"message": "Number of request tokens has exceeded your per-minute rate limit"
},
"request_id": "req_011CR..."
}
BUZZ gateway-side envelope
Used for failures that BUZZ itself produced before hitting the upstream — auth rejection, schema validation, channel routing failure. Note: no top-level type:"error" wrapper, no request_id field. The BUZZ request id is appended to error.message:
{
"error": {
"type": "buzz_error",
"message": "Invalid token (request id: 202605260713594...)"
}
}
Production code should branch on error.type and tolerate either envelope:
def parse_error(resp_json):
err = resp_json.get("error", {})
return {
"type": err.get("type", "unknown"),
"message": err.get("message", ""),
# Anthropic shape exposes request_id at top level;
# BUZZ shape embeds it in message.
"request_id": resp_json.get("request_id"),
}
Exponential backoff template
The right pattern for transient errors (429, 500, 529, and any network-level failure) is exponential backoff with full jitter. Cap the maximum delay so a single bad minute on the upstream does not stall your tail latency forever.
import os
import random
import time
import httpx
BASE_DELAY = 1.0 # seconds
MAX_DELAY = 32.0 # cap
MAX_ATTEMPTS = 5
RETRYABLE_STATUS = {429, 500, 502, 503, 504, 529}
def call_messages(payload):
headers = {
"Authorization": f"Bearer {os.environ['BUZZ_API_KEY']}",
"anthropic-version": "2023-06-01",
"Content-Type": "application/json",
}
last_err = None
for attempt in range(MAX_ATTEMPTS):
try:
r = httpx.post(
"https://buzzai.cc/v1/messages",
json=payload,
headers=headers,
timeout=120,
)
except httpx.RequestError as e:
last_err = e
else:
if r.status_code < 400:
return r.json()
if r.status_code not in RETRYABLE_STATUS:
# 4xx that isn't 429: do not retry
r.raise_for_status()
last_err = httpx.HTTPStatusError(
f"HTTP {r.status_code}: {r.text[:200]}",
request=r.request, response=r,
)
# Honor server-provided retry-after on 429.
retry_after = r.headers.get("retry-after")
if retry_after:
time.sleep(float(retry_after))
continue
# Full jitter: random in [0, base * 2^attempt], capped.
delay = min(MAX_DELAY, BASE_DELAY * (2 ** attempt))
time.sleep(random.uniform(0, delay))
raise last_errconst RETRYABLE = new Set([429, 500, 502, 503, 504, 529]);
const BASE_DELAY_MS = 1000;
const MAX_DELAY_MS = 32_000;
const MAX_ATTEMPTS = 5;
const sleep = (ms) => new Promise((r) => setTimeout(r, ms));
export async function callMessages(payload) {
let lastErr;
for (let attempt = 0; attempt < MAX_ATTEMPTS; attempt++) {
try {
const resp = await fetch("https://buzzai.cc/v1/messages", {
method: "POST",
headers: {
"Authorization": `Bearer ${process.env.BUZZ_API_KEY}`,
"anthropic-version": "2023-06-01",
"Content-Type": "application/json",
},
body: JSON.stringify(payload),
});
if (resp.ok) return await resp.json();
if (!RETRYABLE.has(resp.status)) {
const body = await resp.text();
throw new Error(`HTTP ${resp.status}: ${body.slice(0, 200)}`);
}
lastErr = new Error(`HTTP ${resp.status}`);
const retryAfter = resp.headers.get("retry-after");
if (retryAfter) {
await sleep(Number(retryAfter) * 1000);
continue;
}
} catch (e) {
lastErr = e;
}
const cap = Math.min(MAX_DELAY_MS, BASE_DELAY_MS * 2 ** attempt);
await sleep(Math.random() * cap);
}
throw lastErr;
}package buzz
import (
"bytes"
"encoding/json"
"fmt"
"io"
"math/rand"
"net/http"
"os"
"strconv"
"time"
)
var retryable = map[int]bool{
429: true, 500: true, 502: true, 503: true, 504: true, 529: true,
}
const (
baseDelay = time.Second
maxDelay = 32 * time.Second
maxAttempts = 5
)
func CallMessages(payload any) (map[string]any, error) {
body, err := json.Marshal(payload)
if err != nil {
return nil, err
}
var lastErr error
for attempt := 0; attempt < maxAttempts; attempt++ {
req, _ := http.NewRequest(
"POST",
"https://buzzai.cc/v1/messages",
bytes.NewReader(body),
)
req.Header.Set("Authorization", "Bearer "+os.Getenv("BUZZ_API_KEY"))
req.Header.Set("anthropic-version", "2023-06-01")
req.Header.Set("Content-Type", "application/json")
resp, err := http.DefaultClient.Do(req)
if err == nil {
buf, _ := io.ReadAll(resp.Body)
resp.Body.Close()
if resp.StatusCode < 400 {
var out map[string]any
_ = json.Unmarshal(buf, &out)
return out, nil
}
if !retryable[resp.StatusCode] {
return nil, fmt.Errorf("HTTP %d: %s", resp.StatusCode, string(buf))
}
lastErr = fmt.Errorf("HTTP %d", resp.StatusCode)
if ra := resp.Header.Get("Retry-After"); ra != "" {
if secs, err2 := strconv.Atoi(ra); err2 == nil {
time.Sleep(time.Duration(secs) * time.Second)
continue
}
}
} else {
lastErr = err
}
cap := baseDelay << attempt
if cap > maxDelay {
cap = maxDelay
}
time.Sleep(time.Duration(rand.Int63n(int64(cap))))
}
return nil, lastErr
}429: the three rate-limit dimensions
Anthropic enforces three independent budgets and any one of them can fire a 429. Knowing which one was hit determines whether you need to slow down, shrink prompts, or shrink outputs.
| Dimension | What counts | Header (if exposed) | Mitigation |
|---|---|---|---|
| requests_per_minute | Number of POSTs in a 60-second window | retry-after | Add concurrency limits client-side; queue. |
| input_tokens_per_minute | Sum of input_tokens across the window | retry-after | Trim system prompt; use prompt caching to avoid re-sending the same tokens. |
| output_tokens_per_minute | Sum of output_tokens across the window | retry-after | Lower max_tokens; ask the model to be more concise. |
The error message usually names the dimension explicitly:
{"type":"error","error":{"type":"rate_limit_error",
"message":"Number of input tokens has exceeded your per-minute rate limit"}}
Your retry loop should always honor a retry-after response header before falling back to its own backoff schedule. The header value is in seconds.
cache_read_input_tokens, which is billed separately from input_tokens. Verified BUZZ behavior shows cold-call cache_creation=1200, input=2 on the second call where the prompt would otherwise have been 1202 input tokens. See the Prompt Caching concept.
503 buzz_error · model_not_found troubleshooting
This status is BUZZ-specific. It means: the model you asked for is not currently routable under your account's group. Anthropic itself does not return 503 — direct Anthropic returns 404 for unknown models. Treat 503 model_not_found as a routing problem, not an outage.
Diagnosis steps
- Check the live model list. Hit
GET /v1/modelswith the same API key. The response is the authoritative list of models your group can route to right now:curl -H "Authorization: Bearer $BUZZ_API_KEY" https://buzzai.cc/v1/models - Use the dated alias. The undated alias
claude-haiku-4-5may be missing under some groups while the dated formclaude-haiku-4-5-20251001works. Both are valid identifiers; the dated form is the safest canonical value for production. - Check group permissions. If a model is in the global catalog but not in your
GET /v1/modelsresponse, your group does not have access. Contact support to enable the channel for your group.
Common error message
HTTP/1.1 503 Service Unavailable
content-type: application/json
{"error":{"type":"buzz_error",
"message":"No available channel for model claude-haiku-4-5-20251001 under group aws (request id: 202605260713...)"}}
The request id at the end of the message is your fastest path to a support diagnosis — paste it verbatim.
529: provider-wide overload and multi-route fallback
A 529 is qualitatively different from a 429. A 429 says your account is over budget. A 529 says everyone's upstream is saturated and your individual budget is fine. Raising your account tier does nothing for a 529 — the constraint is on the other side.
| 429 rate_limit_error | 529 overloaded_error | |
|---|---|---|
| Scope | Per-account | Provider-wide |
| Fix on your side | Slow down or raise tier | Try a different route |
| Backoff | Short, jittered, honor retry-after | Long, jittered (5 to 30 s) |
| Stays for | Until the rolling window resets | Until provider capacity recovers |
Fallback strategy
The clean answer to a 529 is to have more than one place to send the request. Two patterns work in practice:
Pattern A: model-tier fallback within BUZZ
If your task tolerates a smaller model, fall back from Sonnet to Haiku, or from Opus to Sonnet. Same gateway, different model id, same SDK call. This works because BUZZ does not lock you to one model per request.
def call_with_fallback(messages, max_tokens=400):
fallbacks = [
"claude-opus-4-7",
"claude-sonnet-4-6",
"claude-haiku-4-5-20251001",
]
last = None
for model in fallbacks:
try:
return call_messages({
"model": model,
"max_tokens": max_tokens,
"messages": messages,
})
except Exception as e:
# only fall back on transient errors (429, 500, 503, 529)
if not is_transient(e):
raise
last = e
raise last
Pattern B: client-side multi-gateway fallback
For applications with their own resilience requirements, keep BUZZ as primary and a second base URL as a backup. Both endpoints accept the same payload byte-for-byte; only the base URL and key change.
BASE_URLS = [
("https://buzzai.cc", os.environ["BUZZ_API_KEY"]),
("https://api.anthropic.com", os.environ["ANTHROPIC_API_KEY"]),
]
for base_url, key in BASE_URLS:
try:
return call(base_url, key, payload)
except RetryableError:
continue
raise LastError
This is exactly what BUZZ's transparent forwarding makes possible: the request bytes you send to BUZZ are the bytes Anthropic would have received, and the response bytes are the bytes Anthropic would have returned. Swapping endpoints is mechanical.
Non-retryable errors: what to do instead
4xx errors that are not 429 indicate your code is wrong, not the upstream. Retrying them wastes quota and amplifies the bug.
| HTTP | Action |
|---|---|
| 400 invalid_request_error | Surface the message to the developer. Common offenders: messages alternation broken, max_tokens too high for the model, malformed tool_use round-trip. |
| 401 | Re-issue the key from your dashboard. If you rotate keys, update your secret store before restarting. |
| 403 | The key is valid but lacks permission. Check the group / IP allow list on your account. |
| 413 | Trim the request. The 32 MB ceiling is on the encoded JSON body, not the prompt token count. |
Errors during streaming
Once you've received HTTP 200 on a streamed request, any subsequent failure arrives as an SSE event: error frame, not as a non-2xx response. Frame shape:
event: error
data: {"type":"error","error":{"type":"overloaded_error","message":"..."}}
Treat it identically to the corresponding non-streaming status: retry on overloaded_error with long backoff, retry on rate_limit_error respecting retry-after, surface to caller on the rest. For mechanics see the streaming guide.
Production checklist
- Branch on
error.type, not just HTTP status. The same status (e.g. 400) covers different categories. - Tolerate both error envelope shapes (Anthropic and BUZZ).
- Honor
retry-afterfirst, then fall back to exponential backoff with full jitter. - Cap retries at five attempts. Anything that needs more is a structural problem, not a transient one.
- Never retry 400, 401, 403, 413 — you'll just rate-limit your own quota.
- Log
request_id(or the BUZZ request id from the message tail) on every failure. Support diagnosis depends on it. - For 529, plan for a fallback route, not just a longer sleep.