Messages
Create a Claude message. The Messages API is BUZZ's primary endpoint for chat-style completion: a list of input messages in, a model-generated message out. Drop-in compatible with the Anthropic Messages API.
https://api.anthropic.com/v1/messages, you only need to change the base URL and API key. Streaming, tool use, prompt caching, and extended thinking are all forwarded transparently to the upstream model.
Authentication
BUZZ accepts three authentication header forms:
| Header | Notes |
|---|---|
Authorization: Bearer <KEY> | Recommended. Matches the OpenAI SDK convention used elsewhere in BUZZ. |
Authorization: Bearer sk-<KEY> | The sk- prefix is automatically stripped. |
x-api-key: <KEY> | Drop-in compatible with the Anthropic SDK default. |
The anthropic-version header is optional on BUZZ — it defaults to 2023-06-01 when omitted. We still recommend sending it explicitly so your code stays portable to direct Anthropic.
Authorization: Bearer <YOUR_BUZZ_KEY>
anthropic-version: 2023-06-01
content-type: application/json
Example request
curl -X POST https://buzzai.cc/v1/messages \
-H "Authorization: Bearer $BUZZ_API_KEY" \
-H "anthropic-version: 2023-06-01" \
-H "Content-Type: application/json" \
-d '{
"model": "claude-haiku-4-5-20251001",
"max_tokens": 80,
"messages": [
{"role": "user", "content": "reply with exactly: hello world"}
]
}'from anthropic import Anthropic
client = Anthropic(
base_url="https://buzzai.cc",
api_key="",
)
message = client.messages.create(
model="claude-haiku-4-5-20251001",
max_tokens=80,
messages=[
{"role": "user", "content": "reply with exactly: hello world"}
],
)
print(message.content[0].text) import Anthropic from "@anthropic-ai/sdk";
const client = new Anthropic({
baseURL: "https://buzzai.cc",
apiKey: process.env.BUZZ_API_KEY,
});
const message = await client.messages.create({
model: "claude-haiku-4-5-20251001",
max_tokens: 80,
messages: [
{ role: "user", content: "reply with exactly: hello world" },
],
});
console.log(message.content[0].text);Response
{
"id": "msg_01ouKJ3o9AnAJb7JtWF25Dk2",
"type": "message",
"role": "assistant",
"model": "claude-haiku-4-5-20251001",
"content": [
{"type": "text", "text": "hello world"}
],
"stop_reason": "end_turn",
"stop_sequence": null,
"usage": {
"input_tokens": 6,
"output_tokens": 2,
"cache_creation_input_tokens": 0,
"cache_read_input_tokens": 0,
"service_tier": "standard"
}
}
Note: BUZZ may include additional fields (usage.iterations[], context_management.applied_edits) for accounting transparency. Your parser should accept unknown fields.
Body parameters
modelrequired
string — The model that will complete your prompt. Get the live list from GET /v1/models.
Common values:
claude-opus-4-7— most capableclaude-sonnet-4-6— balancedclaude-haiku-4-5-20251001— fastest, lowest cost
Dated aliases like claude-haiku-4-5-20251001, claude-sonnet-4-5-20250929, claude-opus-4-5-20251101 are also accepted. If a model is not available under your group, you'll receive HTTP 503 model_not_found.
max_tokensrequired
integer — Maximum number of output tokens. Different models have different maximum values, see Anthropic model cards.
messagesrequired
array — Conversation turns. Each message has role ("user" or "assistant") and content (string or array of content blocks).
"messages": [
{"role": "user", "content": "Hello"},
{"role": "assistant", "content": "Hi! How can I help?"},
{"role": "user", "content": "What's 2+2?"}
]
Content can also be a structured array supporting text, images, tool use, and tool results:
{
"role": "user",
"content": [
{"type": "text", "text": "What's in this image?"},
{"type": "image", "source": {...}}
]
}
systemoptional
string | array — System prompt. Provide context and instructions outside the conversation. The array form supports cache_control for prompt caching.
"system": [
{
"type": "text",
"text": "You are a helpful assistant.",
"cache_control": {"type": "ephemeral"}
}
]
streamoptional
boolean — Set true to receive Server-Sent Events. See the streaming section below.
temperatureoptional
number — Sampling temperature, 0.0 to 1.0. Lower is more deterministic.
Note: When using thinking mode (Opus 4.7), temperature / top_p / top_k are ignored by Anthropic.
top_poptional
number — Nucleus sampling. Use temperature or top_p, not both.
top_koptional
integer — Sample only from the top K options.
stop_sequencesoptional
array of strings — Custom text sequences that cause the model to stop.
toolsoptional
array — Tool definitions for tool use / function calling. See Tool Use concept.
"tools": [
{
"name": "get_weather",
"description": "Get the current weather for a city",
"input_schema": {
"type": "object",
"properties": {"city": {"type": "string"}},
"required": ["city"]
}
}
]
tool_choiceoptional
object — How the model should use tools.
{"type": "auto"}— model decides (default){"type": "any"}— must use one of the provided tools{"type": "tool", "name": "..."}— must use this specific tool{"type": "none"}— never use tools
thinkingoptional
object — Extended thinking control (Claude Opus 4.7+). Reserves internal reasoning tokens before the visible response.
"thinking": {
"type": "enabled",
"budget_tokens": 4096
}
metadataoptional
object — Free-form metadata. Anthropic primarily uses {"user_id": "..."}.
cache_control (top-level)optional
object — Mark content for prompt caching. See Prompt Caching.
service_tieradvanced
string — Anthropic service tier. Channel-gated: may be silently filtered unless your channel allows it.
inference_geo, speed, and service_tier are silently dropped by BUZZ unless the upstream channel has the corresponding allow-flag enabled. If you need these, contact support.
Response
| Field | Type | Description |
|---|---|---|
| id | string | Unique message ID, format msg_... |
| type | string | Always "message" |
| role | string | Always "assistant" |
| model | string | Model used. May be a dated variant of the requested model. |
| content | array | Array of content blocks: text, tool_use, thinking |
| stop_reason | string | end_turn | max_tokens | stop_sequence | tool_use | pause_turn | refusal |
| stop_sequence | string | null | Stop sequence that triggered termination, or null |
| usage | object | Token usage counters (see below) |
usage object
| Field | Type | Description |
|---|---|---|
| input_tokens | integer | Tokens consumed by input (excluding cached) |
| output_tokens | integer | Tokens generated |
| cache_creation_input_tokens | integer | Tokens written to cache (this call) |
| cache_read_input_tokens | integer | Tokens read from cache (cache hit) |
| cache_creation | object | Breakdown by 5-minute / 1-hour TTL |
| service_tier | string | "standard" | "priority" | "batch" |
Streaming
Set "stream": true in the request body to receive a Server-Sent Events (SSE) stream. The stream consists of these event types:
| Event | When |
|---|---|
| message_start | Beginning of the response, contains initial usage |
| content_block_start | A new content block begins (text, tool_use, thinking) |
| content_block_delta | Incremental content (text delta, partial JSON, thinking delta) |
| content_block_stop | A content block ends |
| message_delta | Final stop reason and usage update |
| message_stop | End of stream |
| ping | Keep-alive (may be sent any time) |
| error | Mid-stream error (after a 200 response) |
Captured stream sample
event: message_start
data: {"type":"message_start","message":{"id":"msg_...","model":"claude-haiku-4-5-20251001","content":[],...}}
event: content_block_start
data: {"type":"content_block_start","index":0,"content_block":{"type":"text","text":""}}
event: content_block_delta
data: {"type":"content_block_delta","index":0,"delta":{"type":"text_delta","text":"Hello"}}
event: content_block_stop
data: {"type":"content_block_stop","index":0}
event: message_delta
data: {"type":"message_delta","delta":{"stop_reason":"end_turn"},"usage":{"output_tokens":5}}
event: message_stop
data: {"type":"message_stop"}
Tool use example
Provide tool definitions and let Claude decide when to call them.
{
"model": "claude-haiku-4-5-20251001",
"max_tokens": 200,
"tools": [
{
"name": "get_weather",
"description": "Get the current weather for a city",
"input_schema": {
"type": "object",
"properties": {"city": {"type": "string"}},
"required": ["city"]
}
}
],
"messages": [
{"role": "user", "content": "What is the weather in Tokyo?"}
]
}
Response (tool_use turn)
{
"id": "msg_01gxQPtqeRobjbfSNuCTyijE",
"type": "message",
"role": "assistant",
"content": [
{
"type": "tool_use",
"id": "toolu_IbId2k5Cs4dpj5vgdvJJDA",
"name": "get_weather",
"input": {"city": "Tokyo"}
}
],
"stop_reason": "tool_use",
"usage": {"input_tokens": 35, "output_tokens": 6}
}
To complete the round-trip, append the assistant's tool_use message and a new user message containing a tool_result block:
"messages": [
{"role": "user", "content": "What is the weather in Tokyo?"},
{"role": "assistant", "content": [{"type": "tool_use", "id": "toolu_...", ...}]},
{
"role": "user",
"content": [
{
"type": "tool_result",
"tool_use_id": "toolu_...",
"content": "Sunny, 22°C"
}
]
}
]
Prompt caching
Mark content with cache_control: {"type": "ephemeral"} to enable Anthropic's prompt cache. The first call writes to cache (counts as cache_creation_input_tokens); subsequent calls read from cache at 1/10 the input price.
"system": [
{
"type": "text",
"text": "You are an analyst. Below is the full company knowledge base...(20K tokens)",
"cache_control": {"type": "ephemeral"}
}
]
Verified BUZZ behavior:
| Call | input_tokens | cache_creation | cache_read |
|---|---|---|---|
| 1 (cold) | 2 | 1200 | 0 |
| 2 (warm) | 2 | 0 | 1200 |
BUZZ forwards cache_control directives without modification. See Prompt Caching concept.
Errors
| HTTP | error.type | Source | Typical cause |
|---|---|---|---|
| 400 | invalid_request_error | Anthropic | Malformed JSON or invalid field value |
| 401 | buzz_error / authentication_error | BUZZ | Missing or invalid API key |
| 403 | permission_error | BUZZ | Key lacks permission, IP not allow-listed |
| 413 | buzz_error / read_request_body_failed | BUZZ | Body > 32 MB. (Anthropic upstream uses request_too_large; on BUZZ this is wrapped before reaching the upstream.) |
| 429 | rate_limit_error | Anthropic | Rate limit hit; respect retry-after |
| 500 | buzz_error / api_error | Either | BUZZ wraps internal errors as buzz_error; uncaught upstream errors may surface as api_error. Retry with backoff. |
| 503 | buzz_error / model_not_found | BUZZ | BUZZ-specific: no available channel for the model+group |
| 529 | overloaded_error | Anthropic | Upstream Anthropic overloaded; not generated by BUZZ. Retry with longer backoff. |
Error envelope (Anthropic-side):
{
"type": "error",
"error": {"type": "rate_limit_error", "message": "..."},
"request_id": "req_..."
}
Error envelope (BUZZ gateway-side):
{
"error": {
"type": "buzz_error",
"message": "... (request id: 202605260713594...)"
}
}
See Claude API error code reference for diagnosis and fix.
See also
POST /v1/chat/completions— OpenAI-compatible alternative- Tool Use concept
- Prompt Caching concept
- Streaming guide
- Prompt caching playbook