Chat Completions
Create a chat completion using the OpenAI-compatible schema. The same endpoint can target OpenAI, Claude, Gemini, Qwen, DeepSeek and other backends — BUZZ resolves the upstream from the model field and converts protocols when needed.
https://api.openai.com/v1/chat/completions only needs the base URL and API key changed. All 35 official OpenAI request fields are accepted, including tools, response_format, web_search_options, reasoning_effort, stream_options and verbosity. POST /v1/completions is also routed through the same OpenAI format.
Authentication
BUZZ accepts the standard OpenAI bearer-token form:
| Header | Notes |
|---|---|
Authorization: Bearer <KEY> | Recommended. Standard OpenAI SDK convention. |
Authorization: Bearer sk-<KEY> | The sk- prefix is automatically stripped server-side. |
The Anthropic x-api-key header is honored on /v1/messages and /v1/models; for /v1/chat/completions use Authorization: Bearer.
Authorization: Bearer <YOUR_BUZZ_KEY>
Content-Type: application/json
Example request
curl -X POST https://buzzai.cc/v1/chat/completions \
-H "Authorization: Bearer $BUZZ_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "claude-haiku-4-5-20251001",
"messages": [
{"role": "user", "content": "reply with exactly: hello world"}
],
"max_tokens": 32
}'from openai import OpenAI
client = OpenAI(
base_url="https://buzzai.cc/v1",
api_key="",
)
resp = client.chat.completions.create(
model="claude-haiku-4-5-20251001",
messages=[
{"role": "user", "content": "reply with exactly: hello world"}
],
max_tokens=32,
)
print(resp.choices[0].message.content) import OpenAI from "openai";
const client = new OpenAI({
baseURL: "https://buzzai.cc/v1",
apiKey: process.env.BUZZ_API_KEY,
});
const resp = await client.chat.completions.create({
model: "claude-haiku-4-5-20251001",
messages: [
{ role: "user", content: "reply with exactly: hello world" },
],
max_tokens: 32,
});
console.log(resp.choices[0].message.content);Response
{
"id": "chatcmpl-AbCdEf...",
"object": "chat.completion",
"created": 1748246400,
"model": "claude-haiku-4-5-20251001",
"choices": [
{
"index": 0,
"message": {"role": "assistant", "content": "hello world"},
"logprobs": null,
"finish_reason": "stop"
}
],
"usage": {"prompt_tokens": 14, "completion_tokens": 2, "total_tokens": 16}
}
Body parameters
BUZZ accepts every field defined in the OpenAI CreateChatCompletionRequest schema. The table below lists all 35 official fields plus top_k, which is widely used by non-OpenAI upstreams.
Core fields
| Field | Type | Description |
|---|---|---|
| modelrequired | string | The model that will complete the request. See GET /v1/models. |
| messagesrequired | array | Conversation turns. Roles: system / developer / user / assistant / tool. Required, non-empty. |
| max_tokens | integer | Maximum output tokens. Deprecated by OpenAI in favor of max_completion_tokens but still accepted. |
| max_completion_tokens | integer | Modern token cap; also limits reasoning tokens. |
| stream | boolean | If true, returns a Server-Sent Events stream. Default false. |
| stream_options | object | {"include_usage": true} emits final usage on the last chunk. |
| n | integer | Number of completions, 1..128. Default 1. |
| stop | string | array | Up to 4 sequences that halt generation. |
Sampling
| Field | Type | Description |
|---|---|---|
| temperature | number | Sampling temperature, 0..2. Default 1. |
| top_p | number | Nucleus sampling, 0..1. Default 1. Use either temperature or top_p, not both. |
| top_k | integer | Non-OpenAI extension; passed through to backends that support it (Claude, Gemini, Qwen). |
| seed | integer | Best-effort determinism. Beta on OpenAI. |
| presence_penalty | number | -2..2. Default 0. |
| frequency_penalty | number | -2..2. Default 0. |
| logit_bias | object | Token id → bias (-100..100). |
| logprobs | boolean | Return per-token log probabilities. Default false. |
| top_logprobs | integer | 0..20. Requires logprobs: true. |
Tools and structured output
| Field | Type | Description |
|---|---|---|
| tools | array | Function or custom tool definitions. See Tool Use. |
| tool_choice | string | object | "auto", "required", "none", or {"type":"function","function":{"name":"..."}}. |
| parallel_tool_calls | boolean | Allow multiple tool calls in one turn. |
| response_format | object | {"type":"text"}, {"type":"json_object"}, or {"type":"json_schema","json_schema":{...}}. |
| function_call | string | object | Deprecated. Pre-tools API; still accepted. |
| functions | array | Deprecated. Use tools instead. |
| prediction | object | Predicted output to speed up rewrite-style generation. |
Reasoning and modalities
| Field | Type | Description |
|---|---|---|
| reasoning_effort | string | minimal / low / medium / high. On Claude paths, BUZZ maps low/medium/high to a thinking.budget_tokens of 1280 / 2048 / 4096 respectively. |
| verbosity | string | low / medium / high for GPT-5 family. |
| modalities | array | Output modalities, e.g. ["text"], ["text","audio"]. |
| audio | object | {"voice":"...","format":"..."} for audio outputs. |
| web_search_options | object | Native web-search hint. search_context_size accepts low/medium/high (default medium). On Claude paths it is converted to the web_search_20250305 tool with max_uses of 1 / 5 / 10. |
Caching, accounting and identity
| Field | Type | Description |
|---|---|---|
| prompt_cache_key | string | Routing hint to maximize prompt-cache hit rate. |
| prompt_cache_retention | string | in_memory or 24h. |
| store | boolean | Store the request/response on the upstream. Forwarded by default; channels can opt out via disable_store. |
| metadata | object | Free-form string-to-string map. |
| user | string | Deprecated end-user identifier. |
| safety_identifier | string | Filtered by default. See note below. |
| service_tier | string | Filtered by default. See note below. |
service_tier, safety_identifier, stream_options.include_obfuscation, plus the Claude-specific inference_geo and speed. store is forwarded by default but can be disabled per channel. If a channel or the gateway is in pass-through mode, these filters are skipped.
When the model is a Claude model
The same /v1/chat/completions endpoint can target Claude models (e.g. claude-haiku-4-5-20251001, claude-sonnet-4-6, claude-opus-4-7). BUZZ rewrites the request into the Anthropic Messages API and converts the response back to OpenAI shape — but several OpenAI-only fields have no Anthropic equivalent and are silently dropped.
Fields silently dropped on Claude paths
The Claude request builder does not include any of these fields, so values you pass are ignored. Plan accordingly:
| Field | Effect on Claude |
|---|---|
| n | Always single completion. n > 1 is not supported. |
| presence_penalty | Dropped. |
| frequency_penalty | Dropped. |
| logit_bias | Dropped. |
| logprobs / top_logprobs | Dropped. logprobs is always null in the response. |
| seed | Dropped. |
| response_format | Dropped. JSON-mode and JSON-schema enforcement do not pass through. Use prompt-engineered JSON or call a GPT model. |
| function_call / functions | Dropped (use modern tools). |
| prediction / modalities / audio | Dropped. |
| verbosity | Dropped. |
| user / safety_identifier | Dropped. |
| prompt_cache_key | Dropped. |
| metadata | Dropped. |
| service_tier / store | Dropped. |
Sampling overrides on Claude
When a Claude model runs in thinking mode or is opus-4-7, sampling parameters are overridden by BUZZ before the request leaves the gateway:
- For
opus-4-7with thinking suffix:temperature,top_p,top_kare all cleared,thinking.type=adaptive,output_config.effort=high. - For other models with
-thinkingsuffix:thinking.type=enabled,budget_tokens= 80% ofmax_tokens,top_pcleared,temperatureforced to1.0. - Without thinking,
temperature/top_p/top_kpass through normally.
Field translations
| OpenAI input | Claude representation |
|---|---|
messages with role system | Hoisted into Claude top-level system: [{type:"text",...}] |
| tools / tool_choice / parallel_tool_calls | Translated to Anthropic tools with name / description / input_schema |
| image_url / input_audio / file / video_url content parts | Fetched and re-encoded as Claude base64 image / document sources |
| stop (string or array) | stop_sequences array |
| max_tokens / max_completion_tokens | Larger of the two becomes Claude max_tokens |
| web_search_options | web_search_20250305 tool with max_uses 1 / 5 / 10 |
| reasoning_effort | thinking.budget_tokens 1280 / 2048 / 4096 |
| reasoning (OpenRouter style) | Parsed; reasoning.max_tokens overrides the budget |
Response
Non-streaming responses follow the standard OpenAI chat.completion shape.
| Field | Type | Description |
|---|---|---|
| id | string | OpenAI: chatcmpl-.... Claude: the upstream msg_... id is reused. |
| object | string | Always "chat.completion". |
| created | integer | Unix timestamp. |
| model | string | Model the upstream actually used; may be a dated alias. |
| choices[].index | integer | 0-based choice index. |
| choices[].message.role | string | Always "assistant". |
| choices[].message.content | string | null | Concatenated text. Null if only tool calls were returned. |
| choices[].message.tool_calls | array | Function tool calls; on Claude paths, built from tool_use blocks. |
| choices[].message.reasoning_content | string | BUZZ extension. Populated for o-series, gpt-5 thinking models, and Claude thinking blocks. |
| choices[].finish_reason | string | stop / length / tool_calls / content_filter / function_call. Claude maps end_turn→stop, max_tokens→length, tool_use→tool_calls, refusal→content_filter. |
| choices[].logprobs | object | null | Forwarded for OpenAI; always null for Claude. |
| system_fingerprint | string | Forwarded only when present upstream. Never set on Claude responses. |
| service_tier | string | null | Forwarded for OpenAI; not returned for Claude. |
| usage | object | Token counters; see below. |
usage object
OpenAI fields are returned verbatim. BUZZ adds extension fields for cache and reasoning visibility.
| Field | Description |
|---|---|
| prompt_tokens | Input tokens (on Claude paths, input_tokens + cached_tokens + cached_creation_tokens combined). |
| completion_tokens | Output tokens. |
| total_tokens | prompt_tokens + completion_tokens. |
| prompt_tokens_details | {cached_tokens, cached_creation_tokens, text_tokens, audio_tokens, image_tokens}. |
| completion_tokens_details | {text_tokens, audio_tokens, image_tokens, reasoning_tokens}. |
| prompt_cache_hit_tokens | BUZZ extension. Convenience scalar for cache hits. |
| input_tokens / output_tokens | BUZZ extension. Anthropic-style aliases for cross-protocol consumers. |
| claude_cache_creation_5m_tokens / _1h_tokens | BUZZ extension. Splits Claude 1-hour cache. |
| usage_semantic / usage_source | BUZZ extension. usage_source = "anthropic" indicates the call ran through Claude. |
Streaming
Set "stream": true to receive a Server-Sent Events stream. Each event is one data: {...} line, terminated by data: [DONE].
Stream chunk shape
{
"id": "chatcmpl-...",
"object": "chat.completion.chunk",
"created": 1748246400,
"model": "claude-haiku-4-5-20251001",
"choices": [
{
"index": 0,
"delta": {"role": "assistant", "content": "Hello"},
"logprobs": null,
"finish_reason": null
}
]
}
Set stream_options.include_usage = true to receive a final chunk where choices is empty and usage is populated.
Claude-stream translations
When the model is a Claude model, BUZZ converts Anthropic event: stream into OpenAI chunks:
| Claude event | OpenAI chunk delta |
|---|---|
| message_start | First chunk with delta.role="assistant", delta.content="" |
| content_block_start (text) | Initial text chunk |
| content_block_start (tool_use) | delta.tool_calls[].id and function.name |
| content_block_delta (text_delta) | delta.content increment |
| content_block_delta (input_json_delta) | delta.tool_calls[].function.arguments increment |
| content_block_delta (thinking_delta) | delta.reasoning_content increment |
| content_block_delta (signature_delta) | Newline placeholder appended to reasoning_content (signed segment is not exposed) |
| message_delta | finish_reason set via OpenAI mapping |
| message_stop | Skipped; data: [DONE] is appended by the gateway |
Identifying which upstream answered
Because both OpenAI and Claude are reachable through this endpoint, you can fingerprint which family handled a given response without parsing the model name:
| Signal | OpenAI / GPT | Claude |
|---|---|---|
id prefix | chatcmpl-... | msg_... |
system_fingerprint | Present when upstream returns it | Never present |
choices[].logprobs | Object when requested | Always null |
usage.usage_source | Empty | "anthropic" |
service_tier | May be present | Not returned |
choices[].message.reasoning_content | Only on o-series / gpt-5 thinking | Always present in thinking mode |
Errors
OpenAI-format error envelope:
{
"error": {
"code": "",
"message": "Invalid token (request id: 202605260708...)",
"type": "buzz_error"
}
}
| HTTP | error.type | Typical cause |
|---|---|---|
| 400 | invalid_request_error | Malformed JSON, missing model, empty messages, max_tokens too large |
| 401 | buzz_error / authentication_error | Missing or unrecognized API key |
| 403 | permission_error | Key lacks permission for the requested model or group |
| 413 | request_too_large | Body exceeds the gateway limit |
| 429 | rate_limit_error | Per-key, per-model, or per-channel rate limit hit; respect retry-after |
| 500 | api_error | Internal error; retry with backoff |
| 503 | buzz_error / model_not_found | BUZZ-specific: no available channel for the model + group |
| 529 | overloaded_error | Upstream overloaded; retry with longer backoff |
See also
POST /v1/messages— native Anthropic formatGET /v1/models— discover available models- Tool Use concept
- Prompt Caching concept
- Streaming guide