Anthropic API Error Code Reference: 401, 403, 429, 500, 529 — Root Cause and Fix
In production, status code first, message second. The HTTP status tells you who is responsible for the failure. The error.type tells you which subsystem. The message is for humans, and is often the least useful part. This is a working engineer's claude api error reference, organized by status code, with the actual root causes and the fixes that hold up under traffic.
Every team that ships a Claude integration eventually meets the same seven errors. The names are predictable. The fixes are not always obvious. The goal of this article is to make the diagnosis path mechanical: see the status code, jump to the section, run the check, apply the fix. By the end you should be able to triage anthropic api errors in under a minute, and know when an error is your bug, the network's bug, or Anthropic's bug.
The Anthropic API Error Map
Start here. Every error response from api.anthropic.com looks like this:
{
"type": "error",
"error": {
"type": "authentication_error",
"message": "invalid x-api-key"
}
}
The HTTP status and the error.type together tell you everything you need before reading the message. Use this map as your first stop.
| HTTP | error.type | Meaning | Root cause domain | Fix direction |
|---|---|---|---|---|
| 400 | invalid_request_error |
Malformed request body or arguments | Client | Validate JSON shape, model name, max_tokens, role alternation |
| 401 | authentication_error |
Credentials missing or invalid | Client / network | Verify x-api-key header, key value, gateway pass-through |
| 403 | permission_error |
Authenticated but not allowed | Account / policy | Check workspace entitlements, model access, content policy |
| 404 | not_found_error |
Resource (usually model) does not exist for this org | Client / account | Verify model identifier on the live models page |
| 413 | request_too_large |
Body exceeds 32 MB | Client | Use the Files API for large documents and images |
| 429 | rate_limit_error |
Account rate limit hit (RPM, ITPM, or OTPM) | Client traffic | Exponential backoff, batching, tier upgrade |
| 500 | api_error |
Unexpected server-side failure | Anthropic | Retry with backoff, fail over if persistent |
| 529 | overloaded_error |
Provider-wide capacity saturation | Anthropic | Multi-upstream fallback, queue, retry |
Now the per-error sections, in the order you are most likely to hit them.
401 authentication_error
The 401 is the most common first-week error. It almost never means "your key is wrong." It usually means "the bytes you sent did not contain a valid x-api-key header." That is a more specific claim, and the fix changes accordingly.
A canonical 401 looks like this:
HTTP/1.1 401 Unauthorized
content-type: application/json
{
"type": "error",
"error": {
"type": "authentication_error",
"message": "invalid x-api-key"
}
}
The five root causes, in descending order of frequency:
1. API key typo or whitespace
Pasting a key from a dashboard often picks up a leading space, a trailing newline, or a zero-width character. The Anthropic API does not trim. Verify with a hex dump if you must:
printf '%s' "$ANTHROPIC_API_KEY" | wc -c
# Should match the exact length of your key (no trailing newline)
2. Wrong header name
Anthropic uses x-api-key, not Authorization. This trips up developers coming from OpenAI. A request that looks correct but uses the wrong header will 401 every time:
# WRONG (OpenAI-style)
curl https://api.anthropic.com/v1/messages \
-H "Authorization: Bearer sk-ant-..." \
-H "anthropic-version: 2023-06-01" \
-H "content-type: application/json" \
-d '{"model":"claude-opus-4-8","max_tokens":64,"messages":[{"role":"user","content":"hi"}]}'
# RIGHT (Anthropic-style)
curl https://api.anthropic.com/v1/messages \
-H "x-api-key: sk-ant-..." \
-H "anthropic-version: 2023-06-01" \
-H "content-type: application/json" \
-d '{"model":"claude-opus-4-8","max_tokens":64,"messages":[{"role":"user","content":"hi"}]}'
3. Bearer typo
If you use a gateway that does accept Authorization: Bearer, double-check the prefix. Bearer is title-cased, single-spaced before the token. bearer , Bearer with two spaces, or Bearer: with a colon all fail. Most parsers are strict.
4. Gateway swallowing the header
If you sit behind a corporate proxy, an API gateway, or a custom Lambda authorizer, the upstream often strips x-api-key as a security default and never forwards it. The symptom is identical: 401 from Anthropic, even though your client sent the header. Confirm with an outbound capture or by hitting the gateway with a debug endpoint that echoes received headers.
5. Key revoked or workspace deleted
If you rotated keys yesterday and forgot to update one service, or a teammate revoked a shared key, the symptom is also a 401. Anthropic does not differentiate between "never existed" and "revoked" in the public error message. Check the dashboard to confirm the key is still active.
Diagnosis script
When in doubt, this five-line shell snippet isolates the auth layer from your application code:
#!/usr/bin/env bash
set -euo pipefail
curl -sS -o /tmp/resp.json -w "HTTP %{http_code}\n" \
https://api.anthropic.com/v1/messages \
-H "x-api-key: $ANTHROPIC_API_KEY" \
-H "anthropic-version: 2023-06-01" \
-H "content-type: application/json" \
-d '{"model":"claude-haiku-4-5-20251001","max_tokens":16,"messages":[{"role":"user","content":"ping"}]}'
cat /tmp/resp.json
If this returns 200, your key is fine and the bug is in your client. If it returns 401, work backwards from header to value to dashboard.
403 permission_error
A 403 means the request authenticated, but the principal is not allowed to do this thing. That is a different bug from 401, and the fix lives in account configuration, not credentials.
{
"type": "error",
"error": {
"type": "permission_error",
"message": "your organization does not have access to this model"
}
}
Four root causes:
1. Model not entitled to the workspace
Some Claude models gate access by workspace tier or region. A workspace might be entitled to Sonnet and Haiku but not Opus, or might require an explicit opt-in for a new release. Check the model list in the Anthropic console for the workspace the key belongs to.
2. Content policy block
Anthropic returns 403 (not 400) when the request runs into a policy boundary that the platform refuses to process. The message will usually indicate the category. The fix is to revise the request, not retry it.
3. Workspace API access disabled
Admins can disable programmatic API access for a workspace while keeping the console interactive surface alive. The key still authenticates but every call returns 403. Confirm with a workspace admin.
4. Scoped key called outside its scope
If keys in your org use scoped permissions (read-only, limited models, restricted endpoints) and your code calls something outside the scope, the result is 403. The dashboard shows the scope of each key.
Diagnosis
The fastest test is to call a model the workspace definitely owns:
curl -sS https://api.anthropic.com/v1/messages \
-H "x-api-key: $ANTHROPIC_API_KEY" \
-H "anthropic-version: 2023-06-01" \
-H "content-type: application/json" \
-d '{"model":"claude-haiku-4-5-20251001","max_tokens":8,"messages":[{"role":"user","content":"ok"}]}'
If this succeeds and only your original Opus call 403s, the workspace lacks Opus access. If both 403, the workspace itself is restricted.
404 not_found_error
Most 404s on the Messages API are about the model field. The endpoint exists, the route exists, but the model name does not resolve for your organization.
{
"type": "error",
"error": {
"type": "not_found_error",
"message": "model: claude-opus-4.7 not found"
}
}
Two patterns to check:
1. Model name typo
Anthropic uses hyphens, not dots. claude-opus-4-7 works. claude-opus-4.7 does not. Snapshots have date suffixes (claude-opus-4-5-20251101); a single-character drift returns 404.
The canonical model list is published at https://buzzai.cc/models for routable models on this gateway, and at the official Anthropic console for first-party access. Pin your code to a constant rather than scattering string literals across call sites.
2. Cross-org isolation
If you have access to a model in workspace A but call from a key issued in workspace B, the model is invisible to that key, and Anthropic returns 404 (not 403, because the resource is treated as nonexistent for your principal). Verify which workspace owns the key:
curl -sS https://api.anthropic.com/v1/models \
-H "x-api-key: $ANTHROPIC_API_KEY" \
-H "anthropic-version: 2023-06-01" | jq '.data[].id'
If the model you want is not in this list, the key cannot reach it.
413 request_too_large
The Anthropic Messages API caps request bodies at 32 MB. A 413 means you went over.
{
"type": "error",
"error": {
"type": "request_too_large",
"message": "request body exceeds 32 MB limit"
}
}
This is rarely about text. 32 MB is something like 8 million tokens of text, far above any context window. The usual culprit is base64-encoded images or PDFs inlined into messages[].content. The fix is the Files API.
Upload the file once, reference it by ID:
import anthropic
client = anthropic.Anthropic(api_key="sk-ant-...")
# Upload once
with open("contract.pdf", "rb") as f:
file = client.beta.files.upload(file=("contract.pdf", f, "application/pdf"))
# Reference many times
resp = client.messages.create(
model="claude-opus-4-8",
max_tokens=1024,
messages=[{
"role": "user",
"content": [
{"type": "document", "source": {"type": "file", "file_id": file.id}},
{"type": "text", "text": "Summarize the indemnification clauses."},
],
}],
)
print(resp.content[0].text)
Two side benefits beyond the size limit. First, the request body shrinks to a few hundred bytes, so latency drops. Second, prompt caching keys on the file reference, so repeated questions against the same document hit the cache reliably.
429 rate_limit_error
The 429 is the error every production team sees once they have real traffic. The claude api 429 rate_limit_error response carries enough information to know exactly which limit you hit.
HTTP/1.1 429 Too Many Requests
retry-after: 12
anthropic-ratelimit-requests-limit: 4000
anthropic-ratelimit-requests-remaining: 0
anthropic-ratelimit-requests-reset: 2026-05-26T14:31:00Z
anthropic-ratelimit-input-tokens-limit: 400000
anthropic-ratelimit-input-tokens-remaining: 142000
anthropic-ratelimit-output-tokens-limit: 80000
anthropic-ratelimit-output-tokens-remaining: 0
{
"type": "error",
"error": {
"type": "rate_limit_error",
"message": "rate limit exceeded for output_tokens"
}
}
Three separate limits
Anthropic enforces three independent rate limits per model tier:
- Requests per minute (RPM). The number of API calls. Hit this when you have many small requests.
- Input tokens per minute (ITPM). Total input tokens. Hit this with large prompts, long documents, or aggressive parallelism on context-heavy calls. Prompt caching helps because cache reads count at a small fraction.
- Output tokens per minute (OTPM). Total generated tokens. Hit this with long generations or unbounded
max_tokensvalues. Capmax_tokensand stream early to detect runaways.
The headers above tell you which one you hit. anthropic-ratelimit-output-tokens-remaining: 0 in the example points to OTPM, not RPM. The fixes differ. RPM is fixed by batching. ITPM is fixed by caching and trimming. OTPM is fixed by bounding output and chunking generation.
Exponential backoff with jitter (Python)
import random
import time
import anthropic
from anthropic import RateLimitError
client = anthropic.Anthropic()
def call_with_retry(messages, max_attempts=5):
delay = 1.0
for attempt in range(max_attempts):
try:
return client.messages.create(
model="claude-sonnet-4-6",
max_tokens=1024,
messages=messages,
)
except RateLimitError as e:
retry_after = float(e.response.headers.get("retry-after", delay))
sleep_for = retry_after + random.uniform(0, 0.5)
if attempt == max_attempts - 1:
raise
time.sleep(sleep_for)
delay = min(delay * 2, 30.0)
Exponential backoff with jitter (Node.js)
import Anthropic from "@anthropic-ai/sdk";
const client = new Anthropic();
async function callWithRetry(messages, maxAttempts = 5) {
let delay = 1000;
for (let attempt = 0; attempt < maxAttempts; attempt++) {
try {
return await client.messages.create({
model: "claude-sonnet-4-6",
max_tokens: 1024,
messages,
});
} catch (err) {
if (err.status !== 429 || attempt === maxAttempts - 1) throw err;
const retryAfter = Number(err.headers?.["retry-after"]) * 1000 || delay;
const jitter = Math.floor(Math.random() * 500);
await new Promise((r) => setTimeout(r, retryAfter + jitter));
delay = Math.min(delay * 2, 30000);
}
}
}
Exponential backoff with jitter (Go)
package main
import (
"context"
"math/rand"
"time"
"github.com/anthropics/anthropic-sdk-go"
)
func callWithRetry(ctx context.Context, client *anthropic.Client, params anthropic.MessageNewParams) (*anthropic.Message, error) {
delay := time.Second
var lastErr error
for attempt := 0; attempt < 5; attempt++ {
msg, err := client.Messages.New(ctx, params)
if err == nil {
return msg, nil
}
lastErr = err
var apiErr *anthropic.Error
if !errorsAs(err, &apiErr) || apiErr.StatusCode != 429 {
return nil, err
}
jitter := time.Duration(rand.Intn(500)) * time.Millisecond
time.Sleep(delay + jitter)
if delay < 30*time.Second {
delay *= 2
}
}
return nil, lastErr
}
Three rules that hold up in production:
- Always honor
Retry-After. The server knows when the window resets; do not guess shorter. - Add jitter. Without it, a fleet of clients synchronizes and stampedes the moment the window opens.
- Cap attempts. Three to five is plenty. If you cannot succeed in five tries, the right move is to surface the error to your caller, not to retry forever.
500 api_error
A 500 is Anthropic's way of saying "something inside our service went wrong, and it was not your request." The body is generic:
{
"type": "error",
"error": {
"type": "api_error",
"message": "internal server error"
}
}
The right policy is the same as 429: exponential backoff with jitter, capped attempts. Two additional rules:
- Do not let 500s convince you to mutate your request. The body did not cause it.
- If 500s persist for more than a couple of minutes on a particular model, escalate. Check the Anthropic status page. Consider failing over to a peer model family while the incident clears.
A simple isolation test: switch your call to Haiku temporarily. If Haiku works and Opus 500s, the incident is model-scoped. If both 500, it is broader.
529 overloaded_error
The 529 is the one that confuses teams the most, because it looks like a 429 at a glance and is not.
HTTP/1.1 529 Overloaded
{
"type": "error",
"error": {
"type": "overloaded_error",
"message": "API is temporarily overloaded"
}
}
How 529 differs from 429
| 429 rate_limit_error | 529 overloaded_error | |
|---|---|---|
| Scope | Per-account | Provider-wide |
| Cause | You sent too much, too fast | Anthropic infrastructure saturated |
| Fix | Slow down, batch, raise tier | Wait or fail over |
| Retry-After | Usually present | Often absent |
| Persistence | Resolves in seconds to a minute | Can last minutes during peak hours |
Raising your account tier does nothing for 529. The constraint is on the other side.
Multi-upstream fallback strategy
The clean answer to 529 is to have more than one place to send the request. A gateway with multi-route capability does this for you, but the application-level pattern is also straightforward:
import anthropic
PRIMARY = anthropic.Anthropic(base_url="https://buzzai.cc", api_key="buzz-...")
FALLBACK_MODELS = ["claude-sonnet-4-6", "claude-haiku-4-5-20251001"]
def resilient_call(messages, primary_model="claude-opus-4-8"):
candidates = [primary_model] + FALLBACK_MODELS
last_err = None
for model in candidates:
try:
return PRIMARY.messages.create(
model=model,
max_tokens=1024,
messages=messages,
)
except anthropic.APIStatusError as e:
if e.status_code in (429, 500, 529):
last_err = e
continue
raise
raise last_err
Three caveats. First, fail over only across requests where output quality differences are acceptable; do not silently downgrade a model in an eval pipeline. Second, log every fallback so you know your real availability picture. Third, if your gateway already does multi-route fallback under the hood, do not double-implement it; let the gateway absorb the spike and surface a clean response to your application.
How a gateway helps debug
A transparent gateway is not a layer that hides errors. It is a layer that gives you better tools to read them. Three properties matter for debugging anthropic api errors:
1. Byte-faithful error envelopes
The gateway should pass upstream error.type and error.message through unchanged. If you see a 401 with authentication_error, that is the upstream's verdict, not the gateway's. BUZZ preserves the envelope and uses a gateway_ prefix on its own errors so you can branch cleanly:
try:
resp = client.messages.create(...)
except anthropic.APIStatusError as e:
err_type = e.body.get("error", {}).get("type", "")
if err_type.startswith("gateway_"):
# Auth, billing, or routing failure inside the gateway
handle_gateway_error(e)
else:
# Upstream error, treat as if calling Anthropic directly
handle_upstream_error(e)
2. Per-request observability without retention
For each request, a well-built gateway records the model, input and output token counts, latency, and final status code, without storing prompt content. That is the minimum metadata you need to answer "which feature is producing 429s" or "is Opus throwing more 500s than Sonnet today" without setting up your own logging pipeline. The dashboard rolls these up by key, model, and status code.
3. Multi-route fallback on 529
The gateway can hold multiple upstream routes for the same model and try them in order on 529. From the application's perspective, a 529 you would have seen calling Anthropic directly turns into a successful response with slightly higher latency. This is the single biggest reliability lever for chatbots that have to keep flowing during peak hours.
That said, multi-route is not a license to lie. The gateway must not silently substitute a different model. The route is a different path to the same model identifier, not a different model. If a route holds Opus from upstream A and another holds Opus from upstream B, both are Opus, and the response is identical.
FAQ
What does claude api 401 authentication_error mean?
It means the upstream could not validate your credentials. The five usual causes are a typo or whitespace in the key, the wrong header name (x-api-key not Authorization), a malformed Bearer prefix, a gateway stripping the header, or a revoked key. Confirm by hitting /v1/messages with a minimal cURL request and the env-var key.
What is the difference between 429 rate_limit_error and 529 overloaded_error?
The 429 is per-account: your traffic exceeded a budget on RPM, ITPM, or OTPM. The 529 is provider-wide: Anthropic is saturated and your account is fine. Raising your tier helps with 429 and does nothing for 529. The right answer to 529 is multi-upstream fallback.
How should I retry on a claude api 429 rate_limit_error?
Exponential backoff with full jitter, honor Retry-After, cap at three to five attempts. Anthropic's SDKs do this by default. Do not retry tighter than the upstream-suggested delay; the limiter sees the burst and extends the window.
Why does my Claude request return 403 permission_error?
Authentication succeeded, but the principal is not allowed. Common causes are a model not entitled to the workspace, a content policy block, programmatic access disabled by an admin, or a scoped key being used outside its scope. Check with a known-good model first to isolate the problem.
What is the maximum request size for the Anthropic API?
32 MB. A 413 request_too_large means you exceeded it, almost always due to inlined base64 documents or images. The Files API is the right path: upload once, reference by file ID across requests.
What does 500 api_error mean and is it my fault?
No. 500 is an internal Anthropic failure. Retry with backoff. If it persists for minutes on one model, check the status page and consider failing over to a peer model family.
How do I distinguish input rate limit from output rate limit?
Read the anthropic-ratelimit-*-remaining headers in the 429 response. The one at zero is the limit you hit. Requests-per-minute is fixed by batching, input-tokens-per-minute by caching and trimming, output-tokens-per-minute by bounding max_tokens.
What does 404 not_found_error mean for a Claude API call?
Usually the model identifier in the request body does not exist for your organization. Common causes are a typo (claude-opus-4.7 instead of claude-opus-4-7), a deprecated snapshot, or calling a model that lives in a different workspace. List /v1/models to see what your key can actually reach.
Can a gateway help me debug Anthropic API errors?
Yes. A transparent gateway preserves error.type and error.message byte-for-byte, gives you per-request observability without storing prompt content, and can fail over across upstream routes on 529 so the application sees a clean response. The gateway prefixes its own errors with gateway_ so you can branch cleanly.
What is the right backoff for production retries?
Exponential backoff with full jitter, starting at one to two seconds, doubling each attempt, capped at thirty to sixty seconds, three to five attempts maximum. Always honor Retry-After. For chained tool-use loops, deduplicate side effects at the application layer so a retry does not double-count.
Want fewer 529s in production?
BUZZ AI Gateway forwards Claude traffic transparently, preserves Anthropic error envelopes byte-for-byte, and runs multi-route fallback under the hood so your application sees clean responses through capacity spikes. Same SDK, same model names, lower per-token cost. Start at https://buzzai.cc, check live rates at https://buzzai.cc/api/pricing, and see the routable model list at https://buzzai.cc/models.
Last reviewed: 2026-05-26