Docs · API Reference · POST /v1/messages

Messages

Create a Claude message. The Messages API is BUZZ's primary endpoint for chat-style completion: a list of input messages in, a model-generated message out. Drop-in compatible with the Anthropic Messages API.

POST https://buzzai.cc/v1/messages

Drop-in Anthropic compatibility. If you have working code against Anthropic's https://api.anthropic.com/v1/messages, you only need to change the base URL and API key. Streaming, tool use, prompt caching, and extended thinking are all forwarded transparently to the upstream model.

Authentication

BUZZ accepts three authentication header forms:

Header	Notes
`Authorization: Bearer <KEY>`	Recommended. Matches the OpenAI SDK convention used elsewhere in BUZZ.
`Authorization: Bearer sk-<KEY>`	The `sk-` prefix is automatically stripped.
`x-api-key: <KEY>`	Drop-in compatible with the Anthropic SDK default.

The anthropic-version header is optional on BUZZ — it defaults to 2023-06-01 when omitted. We still recommend sending it explicitly so your code stays portable to direct Anthropic.

Authorization: Bearer <YOUR_BUZZ_KEY>
anthropic-version: 2023-06-01
content-type: application/json

Example request

curl -X POST https://buzzai.cc/v1/messages \
  -H "Authorization: Bearer $BUZZ_API_KEY" \
  -H "anthropic-version: 2023-06-01" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "claude-haiku-4-5-20251001",
    "max_tokens": 80,
    "messages": [
      {"role": "user", "content": "reply with exactly: hello world"}
    ]
  }'

from anthropic import Anthropic

client = Anthropic(
    base_url="https://buzzai.cc",
    api_key="",
)

message = client.messages.create(
    model="claude-haiku-4-5-20251001",
    max_tokens=80,
    messages=[
        {"role": "user", "content": "reply with exactly: hello world"}
    ],
)

print(message.content[0].text)

import Anthropic from "@anthropic-ai/sdk";

const client = new Anthropic({
  baseURL: "https://buzzai.cc",
  apiKey: process.env.BUZZ_API_KEY,
});

const message = await client.messages.create({
  model: "claude-haiku-4-5-20251001",
  max_tokens: 80,
  messages: [
    { role: "user", content: "reply with exactly: hello world" },
  ],
});

console.log(message.content[0].text);

Response

{
  "id": "msg_01ouKJ3o9AnAJb7JtWF25Dk2",
  "type": "message",
  "role": "assistant",
  "model": "claude-haiku-4-5-20251001",
  "content": [
    {"type": "text", "text": "hello world"}
  ],
  "stop_reason": "end_turn",
  "stop_sequence": null,
  "usage": {
    "input_tokens": 6,
    "output_tokens": 2,
    "cache_creation_input_tokens": 0,
    "cache_read_input_tokens": 0,
    "service_tier": "standard"
  }
}

Note: BUZZ may include additional fields (usage.iterations[], context_management.applied_edits) for accounting transparency. Your parser should accept unknown fields.

Body parameters

modelrequired

string — The model that will complete your prompt. Get the live list from GET /v1/models.

Common values:

claude-opus-4-7 — most capable
claude-sonnet-4-6 — balanced
claude-haiku-4-5-20251001 — fastest, lowest cost

Dated aliases like claude-haiku-4-5-20251001, claude-sonnet-4-5-20250929, claude-opus-4-5-20251101 are also accepted. If a model is not available under your group, you'll receive HTTP 503 model_not_found.

max_tokensrequired

integer — Maximum number of output tokens. Different models have different maximum values, see Anthropic model cards.

messagesrequired

array — Conversation turns. Each message has role ("user" or "assistant") and content (string or array of content blocks).

"messages": [
  {"role": "user", "content": "Hello"},
  {"role": "assistant", "content": "Hi! How can I help?"},
  {"role": "user", "content": "What's 2+2?"}
]

Content can also be a structured array supporting text, images, tool use, and tool results:

{
  "role": "user",
  "content": [
    {"type": "text", "text": "What's in this image?"},
    {"type": "image", "source": {...}}
  ]
}

systemoptional

string | array — System prompt. Provide context and instructions outside the conversation. The array form supports cache_control for prompt caching.

"system": [
  {
    "type": "text",
    "text": "You are a helpful assistant.",
    "cache_control": {"type": "ephemeral"}
  }
]

streamoptional

boolean — Set true to receive Server-Sent Events. See the streaming section below.

temperatureoptional

number — Sampling temperature, 0.0 to 1.0. Lower is more deterministic.

Note: When using thinking mode (Opus 4.7), temperature / top_p / top_k are ignored by Anthropic.

top_poptional

number — Nucleus sampling. Use temperature or top_p, not both.

top_koptional

integer — Sample only from the top K options.

stop_sequencesoptional

array of strings — Custom text sequences that cause the model to stop.

toolsoptional

array — Tool definitions for tool use / function calling. See Tool Use concept.

"tools": [
  {
    "name": "get_weather",
    "description": "Get the current weather for a city",
    "input_schema": {
      "type": "object",
      "properties": {"city": {"type": "string"}},
      "required": ["city"]
    }
  }
]

tool_choiceoptional

object — How the model should use tools.

{"type": "auto"} — model decides (default)
{"type": "any"} — must use one of the provided tools
{"type": "tool", "name": "..."} — must use this specific tool
{"type": "none"} — never use tools

thinkingoptional

object — Extended thinking control (Claude Opus 4.7+). Reserves internal reasoning tokens before the visible response.

"thinking": {
  "type": "enabled",
  "budget_tokens": 4096
}

metadataoptional

object — Free-form metadata. Anthropic primarily uses {"user_id": "..."}.

cache_control (top-level)optional

object — Mark content for prompt caching. See Prompt Caching.

service_tieradvanced

string — Anthropic service tier. Channel-gated: may be silently filtered unless your channel allows it.

Channel-gated parameters. The fields inference_geo, speed, and service_tier are silently dropped by BUZZ unless the upstream channel has the corresponding allow-flag enabled. If you need these, contact support.

Response

Field	Type	Description
id	string	Unique message ID, format `msg_...`
type	string	Always `"message"`
role	string	Always `"assistant"`
model	string	Model used. May be a dated variant of the requested model.
content	array	Array of content blocks: `text`, `tool_use`, `thinking`
stop_reason	string	`end_turn` \| `max_tokens` \| `stop_sequence` \| `tool_use` \| `pause_turn` \| `refusal`
stop_sequence	string \| null	Stop sequence that triggered termination, or null
usage	object	Token usage counters (see below)

usage object

Field	Type	Description
input_tokens	integer	Tokens consumed by input (excluding cached)
output_tokens	integer	Tokens generated
cache_creation_input_tokens	integer	Tokens written to cache (this call)
cache_read_input_tokens	integer	Tokens read from cache (cache hit)
cache_creation	object	Breakdown by 5-minute / 1-hour TTL
service_tier	string	`"standard"` \| `"priority"` \| `"batch"`

Streaming

Set "stream": true in the request body to receive a Server-Sent Events (SSE) stream. The stream consists of these event types:

Event	When
message_start	Beginning of the response, contains initial usage
content_block_start	A new content block begins (text, tool_use, thinking)
content_block_delta	Incremental content (text delta, partial JSON, thinking delta)
content_block_stop	A content block ends
message_delta	Final stop reason and usage update
message_stop	End of stream
ping	Keep-alive (may be sent any time)
error	Mid-stream error (after a 200 response)

Captured stream sample

event: message_start
data: {"type":"message_start","message":{"id":"msg_...","model":"claude-haiku-4-5-20251001","content":[],...}}

event: content_block_start
data: {"type":"content_block_start","index":0,"content_block":{"type":"text","text":""}}

event: content_block_delta
data: {"type":"content_block_delta","index":0,"delta":{"type":"text_delta","text":"Hello"}}

event: content_block_stop
data: {"type":"content_block_stop","index":0}

event: message_delta
data: {"type":"message_delta","delta":{"stop_reason":"end_turn"},"usage":{"output_tokens":5}}

event: message_stop
data: {"type":"message_stop"}

Tool use example

Provide tool definitions and let Claude decide when to call them.

{
  "model": "claude-haiku-4-5-20251001",
  "max_tokens": 200,
  "tools": [
    {
      "name": "get_weather",
      "description": "Get the current weather for a city",
      "input_schema": {
        "type": "object",
        "properties": {"city": {"type": "string"}},
        "required": ["city"]
      }
    }
  ],
  "messages": [
    {"role": "user", "content": "What is the weather in Tokyo?"}
  ]
}

Response (tool_use turn)

{
  "id": "msg_01gxQPtqeRobjbfSNuCTyijE",
  "type": "message",
  "role": "assistant",
  "content": [
    {
      "type": "tool_use",
      "id": "toolu_IbId2k5Cs4dpj5vgdvJJDA",
      "name": "get_weather",
      "input": {"city": "Tokyo"}
    }
  ],
  "stop_reason": "tool_use",
  "usage": {"input_tokens": 35, "output_tokens": 6}
}

To complete the round-trip, append the assistant's tool_use message and a new user message containing a tool_result block:

"messages": [
  {"role": "user", "content": "What is the weather in Tokyo?"},
  {"role": "assistant", "content": [{"type": "tool_use", "id": "toolu_...", ...}]},
  {
    "role": "user",
    "content": [
      {
        "type": "tool_result",
        "tool_use_id": "toolu_...",
        "content": "Sunny, 22°C"
      }
    ]
  }
]

Prompt caching

Mark content with cache_control: {"type": "ephemeral"} to enable Anthropic's prompt cache. The first call writes to cache (counts as cache_creation_input_tokens); subsequent calls read from cache at 1/10 the input price.

"system": [
  {
    "type": "text",
    "text": "You are an analyst. Below is the full company knowledge base...(20K tokens)",
    "cache_control": {"type": "ephemeral"}
  }
]

Verified BUZZ behavior:

Call	input_tokens	cache_creation	cache_read
1 (cold)	2	1200	0
2 (warm)	2	0	1200

BUZZ forwards cache_control directives without modification. See Prompt Caching concept.

Errors

HTTP	error.type	Source	Typical cause
400	invalid_request_error	Anthropic	Malformed JSON or invalid field value
401	buzz_error / authentication_error	BUZZ	Missing or invalid API key
403	permission_error	BUZZ	Key lacks permission, IP not allow-listed
413	buzz_error / read_request_body_failed	BUZZ	Body > 32 MB. (Anthropic upstream uses `request_too_large`; on BUZZ this is wrapped before reaching the upstream.)
429	rate_limit_error	Anthropic	Rate limit hit; respect `retry-after`
500	buzz_error / api_error	Either	BUZZ wraps internal errors as `buzz_error`; uncaught upstream errors may surface as `api_error`. Retry with backoff.
503	buzz_error / model_not_found	BUZZ	BUZZ-specific: no available channel for the model+group
529	overloaded_error	Anthropic	Upstream Anthropic overloaded; not generated by BUZZ. Retry with longer backoff.

Error envelope (Anthropic-side):

{
  "type": "error",
  "error": {"type": "rate_limit_error", "message": "..."},
  "request_id": "req_..."
}

Error envelope (BUZZ gateway-side):

{
  "error": {
    "type": "buzz_error",
    "message": "... (request id: 202605260713594...)"
  }
}

See Claude API error code reference for diagnosis and fix.

Messages

Authentication

Example request

Response

Body parameters

modelrequired

max_tokensrequired

messagesrequired

systemoptional

streamoptional

temperatureoptional

top_poptional

top_koptional

stop_sequencesoptional

toolsoptional

tool_choiceoptional

thinkingoptional

metadataoptional

cache_control (top-level)optional

service_tieradvanced

Response

usage object

Streaming

Captured stream sample

Tool use example

Response (tool_use turn)

Prompt caching

Errors

See also