BUZZ AI Gateway
Docs · API Reference · POST /v1/messages

Messages

Create a Claude message. The Messages API is BUZZ's primary endpoint for chat-style completion: a list of input messages in, a model-generated message out. Drop-in compatible with the Anthropic Messages API.

POST https://buzzai.cc/v1/messages
Drop-in Anthropic compatibility. If you have working code against Anthropic's https://api.anthropic.com/v1/messages, you only need to change the base URL and API key. Streaming, tool use, prompt caching, and extended thinking are all forwarded transparently to the upstream model.

Authentication

BUZZ accepts three authentication header forms:

HeaderNotes
Authorization: Bearer <KEY>Recommended. Matches the OpenAI SDK convention used elsewhere in BUZZ.
Authorization: Bearer sk-<KEY>The sk- prefix is automatically stripped.
x-api-key: <KEY>Drop-in compatible with the Anthropic SDK default.

The anthropic-version header is optional on BUZZ — it defaults to 2023-06-01 when omitted. We still recommend sending it explicitly so your code stays portable to direct Anthropic.

Authorization: Bearer <YOUR_BUZZ_KEY>
anthropic-version: 2023-06-01
content-type: application/json

Example request

curl -X POST https://buzzai.cc/v1/messages \
  -H "Authorization: Bearer $BUZZ_API_KEY" \
  -H "anthropic-version: 2023-06-01" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "claude-haiku-4-5-20251001",
    "max_tokens": 80,
    "messages": [
      {"role": "user", "content": "reply with exactly: hello world"}
    ]
  }'
from anthropic import Anthropic

client = Anthropic(
    base_url="https://buzzai.cc",
    api_key="",
)

message = client.messages.create(
    model="claude-haiku-4-5-20251001",
    max_tokens=80,
    messages=[
        {"role": "user", "content": "reply with exactly: hello world"}
    ],
)

print(message.content[0].text)
import Anthropic from "@anthropic-ai/sdk";

const client = new Anthropic({
  baseURL: "https://buzzai.cc",
  apiKey: process.env.BUZZ_API_KEY,
});

const message = await client.messages.create({
  model: "claude-haiku-4-5-20251001",
  max_tokens: 80,
  messages: [
    { role: "user", content: "reply with exactly: hello world" },
  ],
});

console.log(message.content[0].text);

Response

{
  "id": "msg_01ouKJ3o9AnAJb7JtWF25Dk2",
  "type": "message",
  "role": "assistant",
  "model": "claude-haiku-4-5-20251001",
  "content": [
    {"type": "text", "text": "hello world"}
  ],
  "stop_reason": "end_turn",
  "stop_sequence": null,
  "usage": {
    "input_tokens": 6,
    "output_tokens": 2,
    "cache_creation_input_tokens": 0,
    "cache_read_input_tokens": 0,
    "service_tier": "standard"
  }
}

Note: BUZZ may include additional fields (usage.iterations[], context_management.applied_edits) for accounting transparency. Your parser should accept unknown fields.

Body parameters

modelrequired

string — The model that will complete your prompt. Get the live list from GET /v1/models.

Common values:

Dated aliases like claude-haiku-4-5-20251001, claude-sonnet-4-5-20250929, claude-opus-4-5-20251101 are also accepted. If a model is not available under your group, you'll receive HTTP 503 model_not_found.

max_tokensrequired

integer — Maximum number of output tokens. Different models have different maximum values, see Anthropic model cards.

messagesrequired

array — Conversation turns. Each message has role ("user" or "assistant") and content (string or array of content blocks).

"messages": [
  {"role": "user", "content": "Hello"},
  {"role": "assistant", "content": "Hi! How can I help?"},
  {"role": "user", "content": "What's 2+2?"}
]

Content can also be a structured array supporting text, images, tool use, and tool results:

{
  "role": "user",
  "content": [
    {"type": "text", "text": "What's in this image?"},
    {"type": "image", "source": {...}}
  ]
}

systemoptional

string | array — System prompt. Provide context and instructions outside the conversation. The array form supports cache_control for prompt caching.

"system": [
  {
    "type": "text",
    "text": "You are a helpful assistant.",
    "cache_control": {"type": "ephemeral"}
  }
]

streamoptional

boolean — Set true to receive Server-Sent Events. See the streaming section below.

temperatureoptional

number — Sampling temperature, 0.0 to 1.0. Lower is more deterministic.

Note: When using thinking mode (Opus 4.7), temperature / top_p / top_k are ignored by Anthropic.

top_poptional

number — Nucleus sampling. Use temperature or top_p, not both.

top_koptional

integer — Sample only from the top K options.

stop_sequencesoptional

array of strings — Custom text sequences that cause the model to stop.

toolsoptional

array — Tool definitions for tool use / function calling. See Tool Use concept.

"tools": [
  {
    "name": "get_weather",
    "description": "Get the current weather for a city",
    "input_schema": {
      "type": "object",
      "properties": {"city": {"type": "string"}},
      "required": ["city"]
    }
  }
]

tool_choiceoptional

object — How the model should use tools.

thinkingoptional

object — Extended thinking control (Claude Opus 4.7+). Reserves internal reasoning tokens before the visible response.

"thinking": {
  "type": "enabled",
  "budget_tokens": 4096
}

metadataoptional

object — Free-form metadata. Anthropic primarily uses {"user_id": "..."}.

cache_control (top-level)optional

object — Mark content for prompt caching. See Prompt Caching.

service_tieradvanced

string — Anthropic service tier. Channel-gated: may be silently filtered unless your channel allows it.

Channel-gated parameters. The fields inference_geo, speed, and service_tier are silently dropped by BUZZ unless the upstream channel has the corresponding allow-flag enabled. If you need these, contact support.

Response

FieldTypeDescription
idstringUnique message ID, format msg_...
typestringAlways "message"
rolestringAlways "assistant"
modelstringModel used. May be a dated variant of the requested model.
contentarrayArray of content blocks: text, tool_use, thinking
stop_reasonstringend_turn | max_tokens | stop_sequence | tool_use | pause_turn | refusal
stop_sequencestring | nullStop sequence that triggered termination, or null
usageobjectToken usage counters (see below)

usage object

FieldTypeDescription
input_tokensintegerTokens consumed by input (excluding cached)
output_tokensintegerTokens generated
cache_creation_input_tokensintegerTokens written to cache (this call)
cache_read_input_tokensintegerTokens read from cache (cache hit)
cache_creationobjectBreakdown by 5-minute / 1-hour TTL
service_tierstring"standard" | "priority" | "batch"

Streaming

Set "stream": true in the request body to receive a Server-Sent Events (SSE) stream. The stream consists of these event types:

EventWhen
message_startBeginning of the response, contains initial usage
content_block_startA new content block begins (text, tool_use, thinking)
content_block_deltaIncremental content (text delta, partial JSON, thinking delta)
content_block_stopA content block ends
message_deltaFinal stop reason and usage update
message_stopEnd of stream
pingKeep-alive (may be sent any time)
errorMid-stream error (after a 200 response)

Captured stream sample

event: message_start
data: {"type":"message_start","message":{"id":"msg_...","model":"claude-haiku-4-5-20251001","content":[],...}}

event: content_block_start
data: {"type":"content_block_start","index":0,"content_block":{"type":"text","text":""}}

event: content_block_delta
data: {"type":"content_block_delta","index":0,"delta":{"type":"text_delta","text":"Hello"}}

event: content_block_stop
data: {"type":"content_block_stop","index":0}

event: message_delta
data: {"type":"message_delta","delta":{"stop_reason":"end_turn"},"usage":{"output_tokens":5}}

event: message_stop
data: {"type":"message_stop"}

Tool use example

Provide tool definitions and let Claude decide when to call them.

{
  "model": "claude-haiku-4-5-20251001",
  "max_tokens": 200,
  "tools": [
    {
      "name": "get_weather",
      "description": "Get the current weather for a city",
      "input_schema": {
        "type": "object",
        "properties": {"city": {"type": "string"}},
        "required": ["city"]
      }
    }
  ],
  "messages": [
    {"role": "user", "content": "What is the weather in Tokyo?"}
  ]
}

Response (tool_use turn)

{
  "id": "msg_01gxQPtqeRobjbfSNuCTyijE",
  "type": "message",
  "role": "assistant",
  "content": [
    {
      "type": "tool_use",
      "id": "toolu_IbId2k5Cs4dpj5vgdvJJDA",
      "name": "get_weather",
      "input": {"city": "Tokyo"}
    }
  ],
  "stop_reason": "tool_use",
  "usage": {"input_tokens": 35, "output_tokens": 6}
}

To complete the round-trip, append the assistant's tool_use message and a new user message containing a tool_result block:

"messages": [
  {"role": "user", "content": "What is the weather in Tokyo?"},
  {"role": "assistant", "content": [{"type": "tool_use", "id": "toolu_...", ...}]},
  {
    "role": "user",
    "content": [
      {
        "type": "tool_result",
        "tool_use_id": "toolu_...",
        "content": "Sunny, 22°C"
      }
    ]
  }
]

Prompt caching

Mark content with cache_control: {"type": "ephemeral"} to enable Anthropic's prompt cache. The first call writes to cache (counts as cache_creation_input_tokens); subsequent calls read from cache at 1/10 the input price.

"system": [
  {
    "type": "text",
    "text": "You are an analyst. Below is the full company knowledge base...(20K tokens)",
    "cache_control": {"type": "ephemeral"}
  }
]

Verified BUZZ behavior:

Callinput_tokenscache_creationcache_read
1 (cold)212000
2 (warm)201200

BUZZ forwards cache_control directives without modification. See Prompt Caching concept.

Errors

HTTPerror.typeSourceTypical cause
400invalid_request_errorAnthropicMalformed JSON or invalid field value
401buzz_error / authentication_errorBUZZMissing or invalid API key
403permission_errorBUZZKey lacks permission, IP not allow-listed
413buzz_error / read_request_body_failedBUZZBody > 32 MB. (Anthropic upstream uses request_too_large; on BUZZ this is wrapped before reaching the upstream.)
429rate_limit_errorAnthropicRate limit hit; respect retry-after
500buzz_error / api_errorEitherBUZZ wraps internal errors as buzz_error; uncaught upstream errors may surface as api_error. Retry with backoff.
503buzz_error / model_not_foundBUZZBUZZ-specific: no available channel for the model+group
529overloaded_errorAnthropicUpstream Anthropic overloaded; not generated by BUZZ. Retry with longer backoff.

Error envelope (Anthropic-side):

{
  "type": "error",
  "error": {"type": "rate_limit_error", "message": "..."},
  "request_id": "req_..."
}

Error envelope (BUZZ gateway-side):

{
  "error": {
    "type": "buzz_error",
    "message": "... (request id: 202605260713594...)"
  }
}

See Claude API error code reference for diagnosis and fix.

See also