BUZZ AI Gateway
Docs · API Reference · POST /v1/chat/completions

Chat Completions

Create a chat completion using the OpenAI-compatible schema. The same endpoint can target OpenAI, Claude, Gemini, Qwen, DeepSeek and other backends — BUZZ resolves the upstream from the model field and converts protocols when needed.

POST https://buzzai.cc/v1/chat/completions
Drop-in OpenAI compatibility. Code that already speaks https://api.openai.com/v1/chat/completions only needs the base URL and API key changed. All 35 official OpenAI request fields are accepted, including tools, response_format, web_search_options, reasoning_effort, stream_options and verbosity. POST /v1/completions is also routed through the same OpenAI format.

Authentication

BUZZ accepts the standard OpenAI bearer-token form:

HeaderNotes
Authorization: Bearer <KEY>Recommended. Standard OpenAI SDK convention.
Authorization: Bearer sk-<KEY>The sk- prefix is automatically stripped server-side.

The Anthropic x-api-key header is honored on /v1/messages and /v1/models; for /v1/chat/completions use Authorization: Bearer.

Authorization: Bearer <YOUR_BUZZ_KEY>
Content-Type: application/json

Example request

curl -X POST https://buzzai.cc/v1/chat/completions \
  -H "Authorization: Bearer $BUZZ_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "claude-haiku-4-5-20251001",
    "messages": [
      {"role": "user", "content": "reply with exactly: hello world"}
    ],
    "max_tokens": 32
  }'
from openai import OpenAI

client = OpenAI(
    base_url="https://buzzai.cc/v1",
    api_key="",
)

resp = client.chat.completions.create(
    model="claude-haiku-4-5-20251001",
    messages=[
        {"role": "user", "content": "reply with exactly: hello world"}
    ],
    max_tokens=32,
)

print(resp.choices[0].message.content)
import OpenAI from "openai";

const client = new OpenAI({
  baseURL: "https://buzzai.cc/v1",
  apiKey: process.env.BUZZ_API_KEY,
});

const resp = await client.chat.completions.create({
  model: "claude-haiku-4-5-20251001",
  messages: [
    { role: "user", content: "reply with exactly: hello world" },
  ],
  max_tokens: 32,
});

console.log(resp.choices[0].message.content);

Response

{
  "id": "chatcmpl-AbCdEf...",
  "object": "chat.completion",
  "created": 1748246400,
  "model": "claude-haiku-4-5-20251001",
  "choices": [
    {
      "index": 0,
      "message": {"role": "assistant", "content": "hello world"},
      "logprobs": null,
      "finish_reason": "stop"
    }
  ],
  "usage": {"prompt_tokens": 14, "completion_tokens": 2, "total_tokens": 16}
}

Body parameters

BUZZ accepts every field defined in the OpenAI CreateChatCompletionRequest schema. The table below lists all 35 official fields plus top_k, which is widely used by non-OpenAI upstreams.

Core fields

FieldTypeDescription
modelrequiredstringThe model that will complete the request. See GET /v1/models.
messagesrequiredarrayConversation turns. Roles: system / developer / user / assistant / tool. Required, non-empty.
max_tokensintegerMaximum output tokens. Deprecated by OpenAI in favor of max_completion_tokens but still accepted.
max_completion_tokensintegerModern token cap; also limits reasoning tokens.
streambooleanIf true, returns a Server-Sent Events stream. Default false.
stream_optionsobject{"include_usage": true} emits final usage on the last chunk.
nintegerNumber of completions, 1..128. Default 1.
stopstring | arrayUp to 4 sequences that halt generation.

Sampling

FieldTypeDescription
temperaturenumberSampling temperature, 0..2. Default 1.
top_pnumberNucleus sampling, 0..1. Default 1. Use either temperature or top_p, not both.
top_kintegerNon-OpenAI extension; passed through to backends that support it (Claude, Gemini, Qwen).
seedintegerBest-effort determinism. Beta on OpenAI.
presence_penaltynumber-2..2. Default 0.
frequency_penaltynumber-2..2. Default 0.
logit_biasobjectToken id → bias (-100..100).
logprobsbooleanReturn per-token log probabilities. Default false.
top_logprobsinteger0..20. Requires logprobs: true.

Tools and structured output

FieldTypeDescription
toolsarrayFunction or custom tool definitions. See Tool Use.
tool_choicestring | object"auto", "required", "none", or {"type":"function","function":{"name":"..."}}.
parallel_tool_callsbooleanAllow multiple tool calls in one turn.
response_formatobject{"type":"text"}, {"type":"json_object"}, or {"type":"json_schema","json_schema":{...}}.
function_callstring | objectDeprecated. Pre-tools API; still accepted.
functionsarrayDeprecated. Use tools instead.
predictionobjectPredicted output to speed up rewrite-style generation.

Reasoning and modalities

FieldTypeDescription
reasoning_effortstringminimal / low / medium / high. On Claude paths, BUZZ maps low/medium/high to a thinking.budget_tokens of 1280 / 2048 / 4096 respectively.
verbositystringlow / medium / high for GPT-5 family.
modalitiesarrayOutput modalities, e.g. ["text"], ["text","audio"].
audioobject{"voice":"...","format":"..."} for audio outputs.
web_search_optionsobjectNative web-search hint. search_context_size accepts low/medium/high (default medium). On Claude paths it is converted to the web_search_20250305 tool with max_uses of 1 / 5 / 10.

Caching, accounting and identity

FieldTypeDescription
prompt_cache_keystringRouting hint to maximize prompt-cache hit rate.
prompt_cache_retentionstringin_memory or 24h.
storebooleanStore the request/response on the upstream. Forwarded by default; channels can opt out via disable_store.
metadataobjectFree-form string-to-string map.
userstringDeprecated end-user identifier.
safety_identifierstringFiltered by default. See note below.
service_tierstringFiltered by default. See note below.
Channel-gated fields. Six fields are stripped from the upstream payload unless the channel has the matching allow-flag enabled: service_tier, safety_identifier, stream_options.include_obfuscation, plus the Claude-specific inference_geo and speed. store is forwarded by default but can be disabled per channel. If a channel or the gateway is in pass-through mode, these filters are skipped.

When the model is a Claude model

The same /v1/chat/completions endpoint can target Claude models (e.g. claude-haiku-4-5-20251001, claude-sonnet-4-6, claude-opus-4-7). BUZZ rewrites the request into the Anthropic Messages API and converts the response back to OpenAI shape — but several OpenAI-only fields have no Anthropic equivalent and are silently dropped.

Fields silently dropped on Claude paths

The Claude request builder does not include any of these fields, so values you pass are ignored. Plan accordingly:

FieldEffect on Claude
nAlways single completion. n > 1 is not supported.
presence_penaltyDropped.
frequency_penaltyDropped.
logit_biasDropped.
logprobs / top_logprobsDropped. logprobs is always null in the response.
seedDropped.
response_formatDropped. JSON-mode and JSON-schema enforcement do not pass through. Use prompt-engineered JSON or call a GPT model.
function_call / functionsDropped (use modern tools).
prediction / modalities / audioDropped.
verbosityDropped.
user / safety_identifierDropped.
prompt_cache_keyDropped.
metadataDropped.
service_tier / storeDropped.

Sampling overrides on Claude

When a Claude model runs in thinking mode or is opus-4-7, sampling parameters are overridden by BUZZ before the request leaves the gateway:

Field translations

OpenAI inputClaude representation
messages with role systemHoisted into Claude top-level system: [{type:"text",...}]
tools / tool_choice / parallel_tool_callsTranslated to Anthropic tools with name / description / input_schema
image_url / input_audio / file / video_url content partsFetched and re-encoded as Claude base64 image / document sources
stop (string or array)stop_sequences array
max_tokens / max_completion_tokensLarger of the two becomes Claude max_tokens
web_search_optionsweb_search_20250305 tool with max_uses 1 / 5 / 10
reasoning_effortthinking.budget_tokens 1280 / 2048 / 4096
reasoning (OpenRouter style)Parsed; reasoning.max_tokens overrides the budget

Response

Non-streaming responses follow the standard OpenAI chat.completion shape.

FieldTypeDescription
idstringOpenAI: chatcmpl-.... Claude: the upstream msg_... id is reused.
objectstringAlways "chat.completion".
createdintegerUnix timestamp.
modelstringModel the upstream actually used; may be a dated alias.
choices[].indexinteger0-based choice index.
choices[].message.rolestringAlways "assistant".
choices[].message.contentstring | nullConcatenated text. Null if only tool calls were returned.
choices[].message.tool_callsarrayFunction tool calls; on Claude paths, built from tool_use blocks.
choices[].message.reasoning_contentstringBUZZ extension. Populated for o-series, gpt-5 thinking models, and Claude thinking blocks.
choices[].finish_reasonstringstop / length / tool_calls / content_filter / function_call. Claude maps end_turn→stop, max_tokens→length, tool_use→tool_calls, refusal→content_filter.
choices[].logprobsobject | nullForwarded for OpenAI; always null for Claude.
system_fingerprintstringForwarded only when present upstream. Never set on Claude responses.
service_tierstring | nullForwarded for OpenAI; not returned for Claude.
usageobjectToken counters; see below.

usage object

OpenAI fields are returned verbatim. BUZZ adds extension fields for cache and reasoning visibility.

FieldDescription
prompt_tokensInput tokens (on Claude paths, input_tokens + cached_tokens + cached_creation_tokens combined).
completion_tokensOutput tokens.
total_tokensprompt_tokens + completion_tokens.
prompt_tokens_details{cached_tokens, cached_creation_tokens, text_tokens, audio_tokens, image_tokens}.
completion_tokens_details{text_tokens, audio_tokens, image_tokens, reasoning_tokens}.
prompt_cache_hit_tokensBUZZ extension. Convenience scalar for cache hits.
input_tokens / output_tokensBUZZ extension. Anthropic-style aliases for cross-protocol consumers.
claude_cache_creation_5m_tokens / _1h_tokensBUZZ extension. Splits Claude 1-hour cache.
usage_semantic / usage_sourceBUZZ extension. usage_source = "anthropic" indicates the call ran through Claude.

Streaming

Set "stream": true to receive a Server-Sent Events stream. Each event is one data: {...} line, terminated by data: [DONE].

Stream chunk shape

{
  "id": "chatcmpl-...",
  "object": "chat.completion.chunk",
  "created": 1748246400,
  "model": "claude-haiku-4-5-20251001",
  "choices": [
    {
      "index": 0,
      "delta": {"role": "assistant", "content": "Hello"},
      "logprobs": null,
      "finish_reason": null
    }
  ]
}

Set stream_options.include_usage = true to receive a final chunk where choices is empty and usage is populated.

Claude-stream translations

When the model is a Claude model, BUZZ converts Anthropic event: stream into OpenAI chunks:

Claude eventOpenAI chunk delta
message_startFirst chunk with delta.role="assistant", delta.content=""
content_block_start (text)Initial text chunk
content_block_start (tool_use)delta.tool_calls[].id and function.name
content_block_delta (text_delta)delta.content increment
content_block_delta (input_json_delta)delta.tool_calls[].function.arguments increment
content_block_delta (thinking_delta)delta.reasoning_content increment
content_block_delta (signature_delta)Newline placeholder appended to reasoning_content (signed segment is not exposed)
message_deltafinish_reason set via OpenAI mapping
message_stopSkipped; data: [DONE] is appended by the gateway

Identifying which upstream answered

Because both OpenAI and Claude are reachable through this endpoint, you can fingerprint which family handled a given response without parsing the model name:

SignalOpenAI / GPTClaude
id prefixchatcmpl-...msg_...
system_fingerprintPresent when upstream returns itNever present
choices[].logprobsObject when requestedAlways null
usage.usage_sourceEmpty"anthropic"
service_tierMay be presentNot returned
choices[].message.reasoning_contentOnly on o-series / gpt-5 thinkingAlways present in thinking mode

Errors

OpenAI-format error envelope:

{
  "error": {
    "code": "",
    "message": "Invalid token (request id: 202605260708...)",
    "type": "buzz_error"
  }
}
HTTPerror.typeTypical cause
400invalid_request_errorMalformed JSON, missing model, empty messages, max_tokens too large
401buzz_error / authentication_errorMissing or unrecognized API key
403permission_errorKey lacks permission for the requested model or group
413request_too_largeBody exceeds the gateway limit
429rate_limit_errorPer-key, per-model, or per-channel rate limit hit; respect retry-after
500api_errorInternal error; retry with backoff
503buzz_error / model_not_foundBUZZ-specific: no available channel for the model + group
529overloaded_errorUpstream overloaded; retry with longer backoff

See also