Home › Blog › Anthropic API Error Code Reference

Anthropic API Error Code Reference: 401, 403, 429, 500, 529 — Root Cause and Fix

Q: What does claude api 401 authentication_error mean?

A 401 authentication_error means the Anthropic API could not validate your credentials. The most common causes are a typo in the x-api-key header value, sending the key in an Authorization: Bearer header instead of x-api-key, a revoked or rotated key, an upstream gateway stripping the header, or accidentally pasting the key with surrounding whitespace. Verify the header name first, then the key value, then the gateway pass-through.

Q: What is the difference between 429 rate_limit_error and 529 overloaded_error?

A 429 rate_limit_error is account-specific: your key exceeded its requests-per-minute, input-tokens-per-minute, or output-tokens-per-minute budget. A 529 overloaded_error is provider-wide: Anthropic's servers are saturated and your individual key is fine. The fix differs. For 429 you slow down or raise your tier. For 529 you fail over to a different upstream route or wait for capacity to recover.

Q: How should I retry on a claude api 429 rate_limit_error?

Use exponential backoff with jitter, and respect the Retry-After header if present. The Anthropic SDK already does this by default. The minimum sensible policy is two seconds, four seconds, eight seconds, with a small random jitter, capped at three to five attempts. Never retry tighter than one second, because the rate limiter sees the burst and extends the window.

Q: Why does my Claude request return 403 permission_error?

A 403 permission_error means your key authenticated successfully but is not permitted to perform the operation. The four common causes are: requesting a model the workspace is not entitled to, hitting a content policy block, your organization disabled the API for the workspace the key belongs to, or the key was scoped to a subset of endpoints and you called one outside the scope.

Q: What is the maximum request size for the Anthropic API?

The Anthropic Messages API rejects request bodies above 32 MB with a 413 request_too_large error. For large documents or images, the Files API is the correct path: upload once, reference by file ID across many requests. This also keeps prompt caching effective, because the file reference is stable across calls.

Q: What does 500 api_error mean and is it my fault?

A 500 api_error indicates an unexpected server-side failure inside Anthropic. It is not caused by your request body. The correct response is to retry with backoff. If 500s persist for more than a few minutes on a particular model, check the Anthropic status page and consider failing over to a different model family or a different upstream route.

Q: How do I distinguish input rate limit from output rate limit?

Anthropic publishes three separate rate limits per model tier: requests per minute, input tokens per minute, and output tokens per minute. The 429 response includes headers indicating which limit was hit (anthropic-ratelimit-requests-remaining, anthropic-ratelimit-input-tokens-remaining, anthropic-ratelimit-output-tokens-remaining). The fix differs: requests are batched, input is trimmed or cached, output is bounded with max_tokens.

Q: What does 404 not_found_error mean for a Claude API call?

A 404 not_found_error usually means the model identifier in the request body does not exist or is not visible to the calling organization. Common causes are a typo in the snapshot suffix, using a model that was deprecated and removed, or trying to use a model that belongs to a different workspace. Verify the model name against the live model list before checking other layers.

Q: Can a gateway help me debug Anthropic API errors?

Yes. A transparent gateway preserves Anthropic error envelopes byte-for-byte, so the error.type and error.message you see are the upstream's exact response. The gateway also gives you per-request observability (model, token counts, status code) without storing prompt content, and on 529 overloaded_error it can fail over to a peer route automatically while your application sees a clean retry.

Q: What is the right backoff for production retries?

Use exponential backoff with full jitter. Start at one to two seconds, double each attempt, cap the delay at thirty to sixty seconds, and stop after three to five attempts. Always honor the Retry-After header when present. For idempotent reads this is safe; for chained tool-use loops, deduplicate at the application level so a retry does not double-count side effects.

In production, status code first, message second. The HTTP status tells you who is responsible for the failure. The error.type tells you which subsystem. The message is for humans, and is often the least useful part. This is a working engineer's claude api error reference, organized by status code, with the actual root causes and the fixes that hold up under traffic.

Engineering reference · Anthropic Messages API · HTTP status, error.type, retry strategy

Every team that ships a Claude integration eventually meets the same seven errors. The names are predictable. The fixes are not always obvious. The goal of this article is to make the diagnosis path mechanical: see the status code, jump to the section, run the check, apply the fix. By the end you should be able to triage anthropic api errors in under a minute, and know when an error is your bug, the network's bug, or Anthropic's bug.

The Anthropic API Error Map

Start here. Every error response from api.anthropic.com looks like this:

{
  "type": "error",
  "error": {
    "type": "authentication_error",
    "message": "invalid x-api-key"
  }
}

The HTTP status and the error.type together tell you everything you need before reading the message. Use this map as your first stop.

HTTP	error.type	Meaning	Root cause domain	Fix direction
400	`invalid_request_error`	Malformed request body or arguments	Client	Validate JSON shape, model name, max_tokens, role alternation
401	`authentication_error`	Credentials missing or invalid	Client / network	Verify `x-api-key` header, key value, gateway pass-through
403	`permission_error`	Authenticated but not allowed	Account / policy	Check workspace entitlements, model access, content policy
404	`not_found_error`	Resource (usually model) does not exist for this org	Client / account	Verify model identifier on the live models page
413	`request_too_large`	Body exceeds 32 MB	Client	Use the Files API for large documents and images
429	`rate_limit_error`	Account rate limit hit (RPM, ITPM, or OTPM)	Client traffic	Exponential backoff, batching, tier upgrade
500	`api_error`	Unexpected server-side failure	Anthropic	Retry with backoff, fail over if persistent
529	`overloaded_error`	Provider-wide capacity saturation	Anthropic	Multi-upstream fallback, queue, retry

Now the per-error sections, in the order you are most likely to hit them.

401 authentication_error

The 401 is the most common first-week error. It almost never means "your key is wrong." It usually means "the bytes you sent did not contain a valid x-api-key header." That is a more specific claim, and the fix changes accordingly.

A canonical 401 looks like this:

HTTP/1.1 401 Unauthorized
content-type: application/json

{
  "type": "error",
  "error": {
    "type": "authentication_error",
    "message": "invalid x-api-key"
  }
}

The five root causes, in descending order of frequency:

1. API key typo or whitespace

Pasting a key from a dashboard often picks up a leading space, a trailing newline, or a zero-width character. The Anthropic API does not trim. Verify with a hex dump if you must:

printf '%s' "$ANTHROPIC_API_KEY" | wc -c
# Should match the exact length of your key (no trailing newline)

2. Wrong header name

Anthropic uses x-api-key, not Authorization. This trips up developers coming from OpenAI. A request that looks correct but uses the wrong header will 401 every time:

# WRONG (OpenAI-style)
curl https://api.anthropic.com/v1/messages \
  -H "Authorization: Bearer sk-ant-..." \
  -H "anthropic-version: 2023-06-01" \
  -H "content-type: application/json" \
  -d '{"model":"claude-opus-4-8","max_tokens":64,"messages":[{"role":"user","content":"hi"}]}'

# RIGHT (Anthropic-style)
curl https://api.anthropic.com/v1/messages \
  -H "x-api-key: sk-ant-..." \
  -H "anthropic-version: 2023-06-01" \
  -H "content-type: application/json" \
  -d '{"model":"claude-opus-4-8","max_tokens":64,"messages":[{"role":"user","content":"hi"}]}'

3. Bearer typo

If you use a gateway that does accept Authorization: Bearer, double-check the prefix. Bearer is title-cased, single-spaced before the token. bearer , Bearer with two spaces, or Bearer: with a colon all fail. Most parsers are strict.

4. Gateway swallowing the header

If you sit behind a corporate proxy, an API gateway, or a custom Lambda authorizer, the upstream often strips x-api-key as a security default and never forwards it. The symptom is identical: 401 from Anthropic, even though your client sent the header. Confirm with an outbound capture or by hitting the gateway with a debug endpoint that echoes received headers.

5. Key revoked or workspace deleted

If you rotated keys yesterday and forgot to update one service, or a teammate revoked a shared key, the symptom is also a 401. Anthropic does not differentiate between "never existed" and "revoked" in the public error message. Check the dashboard to confirm the key is still active.

Diagnosis script

When in doubt, this five-line shell snippet isolates the auth layer from your application code:

#!/usr/bin/env bash
set -euo pipefail
curl -sS -o /tmp/resp.json -w "HTTP %{http_code}\n" \
  https://api.anthropic.com/v1/messages \
  -H "x-api-key: $ANTHROPIC_API_KEY" \
  -H "anthropic-version: 2023-06-01" \
  -H "content-type: application/json" \
  -d '{"model":"claude-haiku-4-5-20251001","max_tokens":16,"messages":[{"role":"user","content":"ping"}]}'
cat /tmp/resp.json

If this returns 200, your key is fine and the bug is in your client. If it returns 401, work backwards from header to value to dashboard.

403 permission_error

A 403 means the request authenticated, but the principal is not allowed to do this thing. That is a different bug from 401, and the fix lives in account configuration, not credentials.

{
  "type": "error",
  "error": {
    "type": "permission_error",
    "message": "your organization does not have access to this model"
  }
}

Four root causes:

1. Model not entitled to the workspace

Some Claude models gate access by workspace tier or region. A workspace might be entitled to Sonnet and Haiku but not Opus, or might require an explicit opt-in for a new release. Check the model list in the Anthropic console for the workspace the key belongs to.

2. Content policy block

Anthropic returns 403 (not 400) when the request runs into a policy boundary that the platform refuses to process. The message will usually indicate the category. The fix is to revise the request, not retry it.

3. Workspace API access disabled

Admins can disable programmatic API access for a workspace while keeping the console interactive surface alive. The key still authenticates but every call returns 403. Confirm with a workspace admin.

4. Scoped key called outside its scope

If keys in your org use scoped permissions (read-only, limited models, restricted endpoints) and your code calls something outside the scope, the result is 403. The dashboard shows the scope of each key.

Diagnosis

The fastest test is to call a model the workspace definitely owns:

curl -sS https://api.anthropic.com/v1/messages \
  -H "x-api-key: $ANTHROPIC_API_KEY" \
  -H "anthropic-version: 2023-06-01" \
  -H "content-type: application/json" \
  -d '{"model":"claude-haiku-4-5-20251001","max_tokens":8,"messages":[{"role":"user","content":"ok"}]}'

If this succeeds and only your original Opus call 403s, the workspace lacks Opus access. If both 403, the workspace itself is restricted.

404 not_found_error

Most 404s on the Messages API are about the model field. The endpoint exists, the route exists, but the model name does not resolve for your organization.

{
  "type": "error",
  "error": {
    "type": "not_found_error",
    "message": "model: claude-opus-4.7 not found"
  }
}

Two patterns to check:

1. Model name typo

Anthropic uses hyphens, not dots. claude-opus-4-7 works. claude-opus-4.7 does not. Snapshots have date suffixes (claude-opus-4-5-20251101); a single-character drift returns 404.

The canonical model list is published at https://buzzai.cc/models for routable models on this gateway, and at the official Anthropic console for first-party access. Pin your code to a constant rather than scattering string literals across call sites.

2. Cross-org isolation

If you have access to a model in workspace A but call from a key issued in workspace B, the model is invisible to that key, and Anthropic returns 404 (not 403, because the resource is treated as nonexistent for your principal). Verify which workspace owns the key:

curl -sS https://api.anthropic.com/v1/models \
  -H "x-api-key: $ANTHROPIC_API_KEY" \
  -H "anthropic-version: 2023-06-01" | jq '.data[].id'

If the model you want is not in this list, the key cannot reach it.

413 request_too_large

The Anthropic Messages API caps request bodies at 32 MB. A 413 means you went over.

{
  "type": "error",
  "error": {
    "type": "request_too_large",
    "message": "request body exceeds 32 MB limit"
  }
}

This is rarely about text. 32 MB is something like 8 million tokens of text, far above any context window. The usual culprit is base64-encoded images or PDFs inlined into messages[].content. The fix is the Files API.

Upload the file once, reference it by ID:

import anthropic

client = anthropic.Anthropic(api_key="sk-ant-...")

# Upload once
with open("contract.pdf", "rb") as f:
    file = client.beta.files.upload(file=("contract.pdf", f, "application/pdf"))

# Reference many times
resp = client.messages.create(
    model="claude-opus-4-8",
    max_tokens=1024,
    messages=[{
        "role": "user",
        "content": [
            {"type": "document", "source": {"type": "file", "file_id": file.id}},
            {"type": "text", "text": "Summarize the indemnification clauses."},
        ],
    }],
)
print(resp.content[0].text)

Two side benefits beyond the size limit. First, the request body shrinks to a few hundred bytes, so latency drops. Second, prompt caching keys on the file reference, so repeated questions against the same document hit the cache reliably.

429 rate_limit_error

The 429 is the error every production team sees once they have real traffic. The claude api 429 rate_limit_error response carries enough information to know exactly which limit you hit.

HTTP/1.1 429 Too Many Requests
retry-after: 12
anthropic-ratelimit-requests-limit: 4000
anthropic-ratelimit-requests-remaining: 0
anthropic-ratelimit-requests-reset: 2026-05-26T14:31:00Z
anthropic-ratelimit-input-tokens-limit: 400000
anthropic-ratelimit-input-tokens-remaining: 142000
anthropic-ratelimit-output-tokens-limit: 80000
anthropic-ratelimit-output-tokens-remaining: 0

{
  "type": "error",
  "error": {
    "type": "rate_limit_error",
    "message": "rate limit exceeded for output_tokens"
  }
}

Three separate limits

Anthropic enforces three independent rate limits per model tier:

Requests per minute (RPM). The number of API calls. Hit this when you have many small requests.
Input tokens per minute (ITPM). Total input tokens. Hit this with large prompts, long documents, or aggressive parallelism on context-heavy calls. Prompt caching helps because cache reads count at a small fraction.
Output tokens per minute (OTPM). Total generated tokens. Hit this with long generations or unbounded max_tokens values. Cap max_tokens and stream early to detect runaways.

The headers above tell you which one you hit. anthropic-ratelimit-output-tokens-remaining: 0 in the example points to OTPM, not RPM. The fixes differ. RPM is fixed by batching. ITPM is fixed by caching and trimming. OTPM is fixed by bounding output and chunking generation.

Exponential backoff with jitter (Python)

import random
import time
import anthropic
from anthropic import RateLimitError

client = anthropic.Anthropic()

def call_with_retry(messages, max_attempts=5):
    delay = 1.0
    for attempt in range(max_attempts):
        try:
            return client.messages.create(
                model="claude-sonnet-4-6",
                max_tokens=1024,
                messages=messages,
            )
        except RateLimitError as e:
            retry_after = float(e.response.headers.get("retry-after", delay))
            sleep_for = retry_after + random.uniform(0, 0.5)
            if attempt == max_attempts - 1:
                raise
            time.sleep(sleep_for)
            delay = min(delay * 2, 30.0)

Exponential backoff with jitter (Node.js)

import Anthropic from "@anthropic-ai/sdk";

const client = new Anthropic();

async function callWithRetry(messages, maxAttempts = 5) {
  let delay = 1000;
  for (let attempt = 0; attempt < maxAttempts; attempt++) {
    try {
      return await client.messages.create({
        model: "claude-sonnet-4-6",
        max_tokens: 1024,
        messages,
      });
    } catch (err) {
      if (err.status !== 429 || attempt === maxAttempts - 1) throw err;
      const retryAfter = Number(err.headers?.["retry-after"]) * 1000 || delay;
      const jitter = Math.floor(Math.random() * 500);
      await new Promise((r) => setTimeout(r, retryAfter + jitter));
      delay = Math.min(delay * 2, 30000);
    }
  }
}

Exponential backoff with jitter (Go)

package main

import (
    "context"
    "math/rand"
    "time"

    "github.com/anthropics/anthropic-sdk-go"
)

func callWithRetry(ctx context.Context, client *anthropic.Client, params anthropic.MessageNewParams) (*anthropic.Message, error) {
    delay := time.Second
    var lastErr error
    for attempt := 0; attempt < 5; attempt++ {
        msg, err := client.Messages.New(ctx, params)
        if err == nil {
            return msg, nil
        }
        lastErr = err
        var apiErr *anthropic.Error
        if !errorsAs(err, &apiErr) || apiErr.StatusCode != 429 {
            return nil, err
        }
        jitter := time.Duration(rand.Intn(500)) * time.Millisecond
        time.Sleep(delay + jitter)
        if delay < 30*time.Second {
            delay *= 2
        }
    }
    return nil, lastErr
}

Three rules that hold up in production:

Always honor Retry-After. The server knows when the window resets; do not guess shorter.
Add jitter. Without it, a fleet of clients synchronizes and stampedes the moment the window opens.
Cap attempts. Three to five is plenty. If you cannot succeed in five tries, the right move is to surface the error to your caller, not to retry forever.

500 api_error

A 500 is Anthropic's way of saying "something inside our service went wrong, and it was not your request." The body is generic:

{
  "type": "error",
  "error": {
    "type": "api_error",
    "message": "internal server error"
  }
}

The right policy is the same as 429: exponential backoff with jitter, capped attempts. Two additional rules:

Do not let 500s convince you to mutate your request. The body did not cause it.
If 500s persist for more than a couple of minutes on a particular model, escalate. Check the Anthropic status page. Consider failing over to a peer model family while the incident clears.

A simple isolation test: switch your call to Haiku temporarily. If Haiku works and Opus 500s, the incident is model-scoped. If both 500, it is broader.

529 overloaded_error

The 529 is the one that confuses teams the most, because it looks like a 429 at a glance and is not.

HTTP/1.1 529 Overloaded
{
  "type": "error",
  "error": {
    "type": "overloaded_error",
    "message": "API is temporarily overloaded"
  }
}

How 529 differs from 429

	429 rate_limit_error	529 overloaded_error
Scope	Per-account	Provider-wide
Cause	You sent too much, too fast	Anthropic infrastructure saturated
Fix	Slow down, batch, raise tier	Wait or fail over
Retry-After	Usually present	Often absent
Persistence	Resolves in seconds to a minute	Can last minutes during peak hours

Raising your account tier does nothing for 529. The constraint is on the other side.

Multi-upstream fallback strategy

The clean answer to 529 is to have more than one place to send the request. A gateway with multi-route capability does this for you, but the application-level pattern is also straightforward:

import anthropic

PRIMARY = anthropic.Anthropic(base_url="https://buzzai.cc", api_key="buzz-...")
FALLBACK_MODELS = ["claude-sonnet-4-6", "claude-haiku-4-5-20251001"]

def resilient_call(messages, primary_model="claude-opus-4-8"):
    candidates = [primary_model] + FALLBACK_MODELS
    last_err = None
    for model in candidates:
        try:
            return PRIMARY.messages.create(
                model=model,
                max_tokens=1024,
                messages=messages,
            )
        except anthropic.APIStatusError as e:
            if e.status_code in (429, 500, 529):
                last_err = e
                continue
            raise
    raise last_err

Three caveats. First, fail over only across requests where output quality differences are acceptable; do not silently downgrade a model in an eval pipeline. Second, log every fallback so you know your real availability picture. Third, if your gateway already does multi-route fallback under the hood, do not double-implement it; let the gateway absorb the spike and surface a clean response to your application.

How a gateway helps debug

A transparent gateway is not a layer that hides errors. It is a layer that gives you better tools to read them. Three properties matter for debugging anthropic api errors:

1. Byte-faithful error envelopes

The gateway should pass upstream error.type and error.message through unchanged. If you see a 401 with authentication_error, that is the upstream's verdict, not the gateway's. BUZZ preserves the envelope and uses a gateway_ prefix on its own errors so you can branch cleanly:

try:
    resp = client.messages.create(...)
except anthropic.APIStatusError as e:
    err_type = e.body.get("error", {}).get("type", "")
    if err_type.startswith("gateway_"):
        # Auth, billing, or routing failure inside the gateway
        handle_gateway_error(e)
    else:
        # Upstream error, treat as if calling Anthropic directly
        handle_upstream_error(e)

2. Per-request observability without retention

For each request, a well-built gateway records the model, input and output token counts, latency, and final status code, without storing prompt content. That is the minimum metadata you need to answer "which feature is producing 429s" or "is Opus throwing more 500s than Sonnet today" without setting up your own logging pipeline. The dashboard rolls these up by key, model, and status code.

3. Multi-route fallback on 529

The gateway can hold multiple upstream routes for the same model and try them in order on 529. From the application's perspective, a 529 you would have seen calling Anthropic directly turns into a successful response with slightly higher latency. This is the single biggest reliability lever for chatbots that have to keep flowing during peak hours.

That said, multi-route is not a license to lie. The gateway must not silently substitute a different model. The route is a different path to the same model identifier, not a different model. If a route holds Opus from upstream A and another holds Opus from upstream B, both are Opus, and the response is identical.

FAQ

What does claude api 401 authentication_error mean?

It means the upstream could not validate your credentials. The five usual causes are a typo or whitespace in the key, the wrong header name (x-api-key not Authorization), a malformed Bearer prefix, a gateway stripping the header, or a revoked key. Confirm by hitting /v1/messages with a minimal cURL request and the env-var key.

What is the difference between 429 rate_limit_error and 529 overloaded_error?

The 429 is per-account: your traffic exceeded a budget on RPM, ITPM, or OTPM. The 529 is provider-wide: Anthropic is saturated and your account is fine. Raising your tier helps with 429 and does nothing for 529. The right answer to 529 is multi-upstream fallback.

How should I retry on a claude api 429 rate_limit_error?

Exponential backoff with full jitter, honor Retry-After, cap at three to five attempts. Anthropic's SDKs do this by default. Do not retry tighter than the upstream-suggested delay; the limiter sees the burst and extends the window.

Why does my Claude request return 403 permission_error?

Authentication succeeded, but the principal is not allowed. Common causes are a model not entitled to the workspace, a content policy block, programmatic access disabled by an admin, or a scoped key being used outside its scope. Check with a known-good model first to isolate the problem.

What is the maximum request size for the Anthropic API?

32 MB. A 413 request_too_large means you exceeded it, almost always due to inlined base64 documents or images. The Files API is the right path: upload once, reference by file ID across requests.

What does 500 api_error mean and is it my fault?

No. 500 is an internal Anthropic failure. Retry with backoff. If it persists for minutes on one model, check the status page and consider failing over to a peer model family.

How do I distinguish input rate limit from output rate limit?

Read the anthropic-ratelimit-*-remaining headers in the 429 response. The one at zero is the limit you hit. Requests-per-minute is fixed by batching, input-tokens-per-minute by caching and trimming, output-tokens-per-minute by bounding max_tokens.

What does 404 not_found_error mean for a Claude API call?

Usually the model identifier in the request body does not exist for your organization. Common causes are a typo (claude-opus-4.7 instead of claude-opus-4-7), a deprecated snapshot, or calling a model that lives in a different workspace. List /v1/models to see what your key can actually reach.

Can a gateway help me debug Anthropic API errors?

Yes. A transparent gateway preserves error.type and error.message byte-for-byte, gives you per-request observability without storing prompt content, and can fail over across upstream routes on 529 so the application sees a clean response. The gateway prefixes its own errors with gateway_ so you can branch cleanly.

What is the right backoff for production retries?

Exponential backoff with full jitter, starting at one to two seconds, doubling each attempt, capped at thirty to sixty seconds, three to five attempts maximum. Always honor Retry-After. For chained tool-use loops, deduplicate side effects at the application layer so a retry does not double-count.

Want fewer 529s in production?

BUZZ AI Gateway forwards Claude traffic transparently, preserves Anthropic error envelopes byte-for-byte, and runs multi-route fallback under the hood so your application sees clean responses through capacity spikes. Same SDK, same model names, lower per-token cost. Start at https://buzzai.cc, check live rates at https://buzzai.cc/api/pricing, and see the routable model list at https://buzzai.cc/models.

Published: 2026-05-26
Last reviewed: 2026-05-26