Docs · API Reference · POST /v1/chat/completions

Chat Completions

Create a chat completion using the OpenAI-compatible schema. The same endpoint can target OpenAI, Claude, Gemini, Qwen, DeepSeek and other backends — BUZZ resolves the upstream from the model field and converts protocols when needed.

POST https://buzzai.cc/v1/chat/completions

Drop-in OpenAI compatibility. Code that already speaks https://api.openai.com/v1/chat/completions only needs the base URL and API key changed. All 35 official OpenAI request fields are accepted, including tools, response_format, web_search_options, reasoning_effort, stream_options and verbosity. POST /v1/completions is also routed through the same OpenAI format.

Authentication

BUZZ accepts the standard OpenAI bearer-token form:

Header	Notes
`Authorization: Bearer <KEY>`	Recommended. Standard OpenAI SDK convention.
`Authorization: Bearer sk-<KEY>`	The `sk-` prefix is automatically stripped server-side.

The Anthropic x-api-key header is honored on /v1/messages and /v1/models; for /v1/chat/completions use Authorization: Bearer.

Authorization: Bearer <YOUR_BUZZ_KEY>
Content-Type: application/json

Example request

curl -X POST https://buzzai.cc/v1/chat/completions \
  -H "Authorization: Bearer $BUZZ_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "claude-haiku-4-5-20251001",
    "messages": [
      {"role": "user", "content": "reply with exactly: hello world"}
    ],
    "max_tokens": 32
  }'

from openai import OpenAI

client = OpenAI(
    base_url="https://buzzai.cc/v1",
    api_key="",
)

resp = client.chat.completions.create(
    model="claude-haiku-4-5-20251001",
    messages=[
        {"role": "user", "content": "reply with exactly: hello world"}
    ],
    max_tokens=32,
)

print(resp.choices[0].message.content)

import OpenAI from "openai";

const client = new OpenAI({
  baseURL: "https://buzzai.cc/v1",
  apiKey: process.env.BUZZ_API_KEY,
});

const resp = await client.chat.completions.create({
  model: "claude-haiku-4-5-20251001",
  messages: [
    { role: "user", content: "reply with exactly: hello world" },
  ],
  max_tokens: 32,
});

console.log(resp.choices[0].message.content);

Response

{
  "id": "chatcmpl-AbCdEf...",
  "object": "chat.completion",
  "created": 1748246400,
  "model": "claude-haiku-4-5-20251001",
  "choices": [
    {
      "index": 0,
      "message": {"role": "assistant", "content": "hello world"},
      "logprobs": null,
      "finish_reason": "stop"
    }
  ],
  "usage": {"prompt_tokens": 14, "completion_tokens": 2, "total_tokens": 16}
}

Body parameters

BUZZ accepts every field defined in the OpenAI CreateChatCompletionRequest schema. The table below lists all 35 official fields plus top_k, which is widely used by non-OpenAI upstreams.

Core fields

Field	Type	Description
modelrequired	string	The model that will complete the request. See `GET /v1/models`.
messagesrequired	array	Conversation turns. Roles: `system` / `developer` / `user` / `assistant` / `tool`. Required, non-empty.
max_tokens	integer	Maximum output tokens. Deprecated by OpenAI in favor of `max_completion_tokens` but still accepted.
max_completion_tokens	integer	Modern token cap; also limits reasoning tokens.
stream	boolean	If `true`, returns a Server-Sent Events stream. Default `false`.
stream_options	object	`{"include_usage": true}` emits final usage on the last chunk.
n	integer	Number of completions, 1..128. Default 1.
stop	string \| array	Up to 4 sequences that halt generation.

Sampling

Field	Type	Description
temperature	number	Sampling temperature, 0..2. Default 1.
top_p	number	Nucleus sampling, 0..1. Default 1. Use either `temperature` or `top_p`, not both.
top_k	integer	Non-OpenAI extension; passed through to backends that support it (Claude, Gemini, Qwen).
seed	integer	Best-effort determinism. Beta on OpenAI.
presence_penalty	number	-2..2. Default 0.
frequency_penalty	number	-2..2. Default 0.
logit_bias	object	Token id → bias (-100..100).
logprobs	boolean	Return per-token log probabilities. Default `false`.
top_logprobs	integer	0..20. Requires `logprobs: true`.

Tools and structured output

Field	Type	Description
tools	array	Function or custom tool definitions. See Tool Use.
tool_choice	string \| object	`"auto"`, `"required"`, `"none"`, or `{"type":"function","function":{"name":"..."}}`.
parallel_tool_calls	boolean	Allow multiple tool calls in one turn.
response_format	object	`{"type":"text"}`, `{"type":"json_object"}`, or `{"type":"json_schema","json_schema":{...}}`.
function_call	string \| object	Deprecated. Pre-tools API; still accepted.
functions	array	Deprecated. Use `tools` instead.
prediction	object	Predicted output to speed up rewrite-style generation.

Reasoning and modalities

Field	Type	Description
reasoning_effort	string	`minimal` / `low` / `medium` / `high`. On Claude paths, BUZZ maps `low`/`medium`/`high` to a `thinking.budget_tokens` of 1280 / 2048 / 4096 respectively.
verbosity	string	`low` / `medium` / `high` for GPT-5 family.
modalities	array	Output modalities, e.g. `["text"]`, `["text","audio"]`.
audio	object	`{"voice":"...","format":"..."}` for audio outputs.
web_search_options	object	Native web-search hint. `search_context_size` accepts `low`/`medium`/`high` (default `medium`). On Claude paths it is converted to the `web_search_20250305` tool with `max_uses` of 1 / 5 / 10.

Caching, accounting and identity

Field	Type	Description
prompt_cache_key	string	Routing hint to maximize prompt-cache hit rate.
prompt_cache_retention	string	`in_memory` or `24h`.
store	boolean	Store the request/response on the upstream. Forwarded by default; channels can opt out via `disable_store`.
metadata	object	Free-form string-to-string map.
user	string	Deprecated end-user identifier.
safety_identifier	string	Filtered by default. See note below.
service_tier	string	Filtered by default. See note below.

Channel-gated fields. Six fields are stripped from the upstream payload unless the channel has the matching allow-flag enabled: service_tier, safety_identifier, stream_options.include_obfuscation, plus the Claude-specific inference_geo and speed. store is forwarded by default but can be disabled per channel. If a channel or the gateway is in pass-through mode, these filters are skipped.

When the model is a Claude model

The same /v1/chat/completions endpoint can target Claude models (e.g. claude-haiku-4-5-20251001, claude-sonnet-4-6, claude-opus-4-7). BUZZ rewrites the request into the Anthropic Messages API and converts the response back to OpenAI shape — but several OpenAI-only fields have no Anthropic equivalent and are silently dropped.

Fields silently dropped on Claude paths

The Claude request builder does not include any of these fields, so values you pass are ignored. Plan accordingly:

Field	Effect on Claude
n	Always single completion. `n > 1` is not supported.
presence_penalty	Dropped.
frequency_penalty	Dropped.
logit_bias	Dropped.
logprobs / top_logprobs	Dropped. `logprobs` is always `null` in the response.
seed	Dropped.
response_format	Dropped. JSON-mode and JSON-schema enforcement do not pass through. Use prompt-engineered JSON or call a GPT model.
function_call / functions	Dropped (use modern `tools`).
prediction / modalities / audio	Dropped.
verbosity	Dropped.
user / safety_identifier	Dropped.
prompt_cache_key	Dropped.
metadata	Dropped.
service_tier / store	Dropped.

Sampling overrides on Claude

When a Claude model runs in thinking mode or is opus-4-7, sampling parameters are overridden by BUZZ before the request leaves the gateway:

For opus-4-7 with thinking suffix: temperature, top_p, top_k are all cleared, thinking.type=adaptive, output_config.effort=high.
For other models with -thinking suffix: thinking.type=enabled, budget_tokens = 80% of max_tokens, top_p cleared, temperature forced to 1.0.
Without thinking, temperature / top_p / top_k pass through normally.

Field translations

OpenAI input	Claude representation
messages with role `system`	Hoisted into Claude top-level `system: [{type:"text",...}]`
tools / tool_choice / parallel_tool_calls	Translated to Anthropic `tools` with `name` / `description` / `input_schema`
image_url / input_audio / file / video_url content parts	Fetched and re-encoded as Claude base64 `image` / `document` sources
stop (string or array)	`stop_sequences` array
max_tokens / max_completion_tokens	Larger of the two becomes Claude `max_tokens`
web_search_options	`web_search_20250305` tool with `max_uses` 1 / 5 / 10
reasoning_effort	`thinking.budget_tokens` 1280 / 2048 / 4096
reasoning (OpenRouter style)	Parsed; `reasoning.max_tokens` overrides the budget

Response

Non-streaming responses follow the standard OpenAI chat.completion shape.

Field	Type	Description
id	string	OpenAI: `chatcmpl-...`. Claude: the upstream `msg_...` id is reused.
object	string	Always `"chat.completion"`.
created	integer	Unix timestamp.
model	string	Model the upstream actually used; may be a dated alias.
choices[].index	integer	0-based choice index.
choices[].message.role	string	Always `"assistant"`.
choices[].message.content	string \| null	Concatenated text. Null if only tool calls were returned.
choices[].message.tool_calls	array	Function tool calls; on Claude paths, built from `tool_use` blocks.
choices[].message.reasoning_content	string	BUZZ extension. Populated for o-series, gpt-5 thinking models, and Claude thinking blocks.
choices[].finish_reason	string	`stop` / `length` / `tool_calls` / `content_filter` / `function_call`. Claude maps `end_turn→stop`, `max_tokens→length`, `tool_use→tool_calls`, `refusal→content_filter`.
choices[].logprobs	object \| null	Forwarded for OpenAI; always null for Claude.
system_fingerprint	string	Forwarded only when present upstream. Never set on Claude responses.
service_tier	string \| null	Forwarded for OpenAI; not returned for Claude.
usage	object	Token counters; see below.

usage object

OpenAI fields are returned verbatim. BUZZ adds extension fields for cache and reasoning visibility.

Field	Description
prompt_tokens	Input tokens (on Claude paths, `input_tokens + cached_tokens + cached_creation_tokens` combined).
completion_tokens	Output tokens.
total_tokens	`prompt_tokens + completion_tokens`.
prompt_tokens_details	`{cached_tokens, cached_creation_tokens, text_tokens, audio_tokens, image_tokens}`.
completion_tokens_details	`{text_tokens, audio_tokens, image_tokens, reasoning_tokens}`.
prompt_cache_hit_tokens	BUZZ extension. Convenience scalar for cache hits.
input_tokens / output_tokens	BUZZ extension. Anthropic-style aliases for cross-protocol consumers.
claude_cache_creation_5m_tokens / _1h_tokens	BUZZ extension. Splits Claude 1-hour cache.
usage_semantic / usage_source	BUZZ extension. `usage_source = "anthropic"` indicates the call ran through Claude.

Streaming

Set "stream": true to receive a Server-Sent Events stream. Each event is one data: {...} line, terminated by data: [DONE].

Stream chunk shape

{
  "id": "chatcmpl-...",
  "object": "chat.completion.chunk",
  "created": 1748246400,
  "model": "claude-haiku-4-5-20251001",
  "choices": [
    {
      "index": 0,
      "delta": {"role": "assistant", "content": "Hello"},
      "logprobs": null,
      "finish_reason": null
    }
  ]
}

Set stream_options.include_usage = true to receive a final chunk where choices is empty and usage is populated.

Claude-stream translations

When the model is a Claude model, BUZZ converts Anthropic event: stream into OpenAI chunks:

Claude event	OpenAI chunk delta
message_start	First chunk with `delta.role="assistant"`, `delta.content=""`
content_block_start (text)	Initial text chunk
content_block_start (tool_use)	`delta.tool_calls[].id` and `function.name`
content_block_delta (text_delta)	`delta.content` increment
content_block_delta (input_json_delta)	`delta.tool_calls[].function.arguments` increment
content_block_delta (thinking_delta)	`delta.reasoning_content` increment
content_block_delta (signature_delta)	Newline placeholder appended to `reasoning_content` (signed segment is not exposed)
message_delta	`finish_reason` set via OpenAI mapping
message_stop	Skipped; `data: [DONE]` is appended by the gateway

Identifying which upstream answered

Because both OpenAI and Claude are reachable through this endpoint, you can fingerprint which family handled a given response without parsing the model name:

Signal	OpenAI / GPT	Claude
`id` prefix	`chatcmpl-...`	`msg_...`
`system_fingerprint`	Present when upstream returns it	Never present
`choices[].logprobs`	Object when requested	Always `null`
`usage.usage_source`	Empty	`"anthropic"`
`service_tier`	May be present	Not returned
`choices[].message.reasoning_content`	Only on o-series / gpt-5 thinking	Always present in thinking mode

Errors

OpenAI-format error envelope:

{
  "error": {
    "code": "",
    "message": "Invalid token (request id: 202605260708...)",
    "type": "buzz_error"
  }
}

HTTP	error.type	Typical cause
400	invalid_request_error	Malformed JSON, missing `model`, empty `messages`, `max_tokens` too large
401	buzz_error / authentication_error	Missing or unrecognized API key
403	permission_error	Key lacks permission for the requested model or group
413	request_too_large	Body exceeds the gateway limit
429	rate_limit_error	Per-key, per-model, or per-channel rate limit hit; respect `retry-after`
500	api_error	Internal error; retry with backoff
503	buzz_error / model_not_found	BUZZ-specific: no available channel for the model + group
529	overloaded_error	Upstream overloaded; retry with longer backoff

Chat Completions

Authentication

Example request

Response

Body parameters

Core fields

Sampling

Tools and structured output

Reasoning and modalities

Caching, accounting and identity

When the model is a Claude model

Fields silently dropped on Claude paths

Sampling overrides on Claude

Field translations

Response

usage object

Streaming

Stream chunk shape

Claude-stream translations

Identifying which upstream answered

Errors

See also