Use the OpenAI SDK with BUZZ
One SDK, one base URL, every supported model family. Point the official OpenAI client at https://buzzai.cc/v1 and call Claude, Gemini, Grok, or GPT by name. Streaming, tool use, retry policies, and observability wrappers keep working unchanged.
chat.completions endpoint at /v1. The gateway parses your request as chat.completions, dispatches to the right upstream based on the model field, and translates the response back. For OpenAI models the bytes flow through nearly untouched; for Claude / Gemini / Grok the gateway bridges the request and response shape on your behalf.
1. Install & configure
Install the official client. There is no BUZZ-specific package.
pip install openainpm install openaiConstruct the client with two strings: base_url and api_key.
from openai import OpenAI
client = OpenAI(
api_key="sk-YOUR_BUZZ_KEY",
base_url="https://buzzai.cc/v1",
)import OpenAI from "openai";
const client = new OpenAI({
apiKey: process.env.BUZZ_API_KEY,
baseURL: "https://buzzai.cc/v1",
});Both SDKs send the key as Authorization: Bearer <value>, which BUZZ accepts directly. The sk- prefix is optional; BUZZ strips it server-side.
2. Call Claude
Set model to a Claude identifier and call chat.completions.create. The gateway converts the request to the Anthropic Messages schema, calls upstream, and converts the response back. Response shape is the same OpenAI ChatCompletion your code already parses.
resp = client.chat.completions.create(
model="claude-haiku-4-5-20251001",
messages=[
{"role": "system", "content": "You are a precise technical writer."},
{"role": "user", "content": "Explain content-addressable storage in three sentences."},
],
temperature=0.3,
max_tokens=400,
)
print(resp.choices[0].message.content)
print(resp.usage)const resp = await client.chat.completions.create({
model: "claude-haiku-4-5-20251001",
messages: [
{ role: "system", content: "You are a precise technical writer." },
{ role: "user", content: "Explain content-addressable storage in three sentences." },
],
temperature: 0.3,
max_tokens: 400,
});
console.log(resp.choices[0].message.content);curl -sS https://buzzai.cc/v1/chat/completions \
-H "Authorization: Bearer $BUZZ_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "claude-haiku-4-5-20251001",
"messages": [
{"role": "system", "content": "You are a precise technical writer."},
{"role": "user", "content": "Explain content-addressable storage in three sentences."}
],
"max_tokens": 400
}'Common Claude model identifiers (live list at GET /v1/models):
claude-opus-4-7·claude-opus-4-6·claude-opus-4-5-20251101claude-sonnet-4-6·claude-sonnet-4-5-20250929claude-haiku-4-5-20251001
3. Call Gemini, Grok, and GPT
Routing across families is a model-name change. Same client, same code path, same retry wrapper.
# Same client, different model family.
gemini = client.chat.completions.create(
model="gemini-2.5-pro",
messages=[{"role": "user", "content": "Summarize this transcript..."}],
)
grok = client.chat.completions.create(
model="grok-4",
messages=[{"role": "user", "content": "What happened in markets this week?"}],
)
gpt = client.chat.completions.create(
model="gpt-5",
messages=[{"role": "user", "content": "Outline a migration plan."}],
)
The full live list of supported identifiers is at buzzai.cc/models. Live per-token rates are at buzzai.cc/api/pricing.
4. Streaming
Pass stream=true and iterate over the response. The gateway forwards SSE deltas in OpenAI chat.completion.chunk shape regardless of whether the upstream is Anthropic-typed or OpenAI-typed.
stream = client.chat.completions.create(
model="claude-sonnet-4-6",
messages=[{"role": "user", "content": "Write a haiku about cold storage."}],
stream=True,
)
for chunk in stream:
delta = chunk.choices[0].delta.content
if delta:
print(delta, end="", flush=True)const stream = await client.chat.completions.create({
model: "claude-sonnet-4-6",
messages: [{ role: "user", content: "Write a haiku about cold storage." }],
stream: true,
});
for await (const chunk of stream) {
const delta = chunk.choices[0]?.delta?.content;
if (delta) process.stdout.write(delta);
}To capture per-call token usage at the end of a stream, add stream_options:
stream = client.chat.completions.create(
model="claude-haiku-4-5-20251001",
messages=[{"role": "user", "content": "ping"}],
stream=True,
stream_options={"include_usage": True},
)
The final chunk carries a non-null usage object. When the upstream is Claude, BUZZ also tags usage.usage_source = "anthropic" so you can identify cross-family billing in observability pipelines.
5. Tool use / function calling
Define tools in the standard OpenAI shape. The gateway maps them to Anthropic tool_use blocks (or Gemini functionDeclarations) on the way out, and maps the upstream tool selection back to tool_calls on the way in. Existing tool-calling loops continue to work without code changes.
tools = [{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get the current weather for a city.",
"parameters": {
"type": "object",
"properties": {
"city": {"type": "string"},
"units": {"type": "string", "enum": ["c", "f"]},
},
"required": ["city"],
},
},
}]
resp = client.chat.completions.create(
model="claude-opus-4-7",
messages=[{"role": "user", "content": "What is the weather in Tokyo right now?"}],
tools=tools,
tool_choice="auto",
)
call = resp.choices[0].message.tool_calls[0]
print(call.function.name, call.function.arguments)
# After running the tool, append the assistant message and a follow-up
# {"role": "tool", "tool_call_id": call.id, "content": "..."} message and call again.
6. Fields silently dropped on the Claude path
When the gateway routes to a Claude upstream, it constructs a Messages-API request from your chat.completions body. Fields that have no Anthropic-side equivalent are not included. They do not produce a 400 — they simply disappear from the upstream call. Plan around the table below.
| Sent in chat.completions | What happens on the Claude path |
|---|---|
n (> 1) | Dropped. Anthropic returns a single choice; n > 1 is not supported. |
presence_penalty | Dropped. No Anthropic equivalent. |
frequency_penalty | Dropped. No Anthropic equivalent. |
logit_bias | Dropped. No Anthropic equivalent. |
logprobs / top_logprobs | Dropped. Logprobs are never returned on Claude responses. |
seed | Dropped. No deterministic-sampling knob on Anthropic. |
response_format | Dropped. JSON-mode and JSON-Schema constraints are not enforced on Claude; achieve the same via prompt or via tool-calling. |
function_call / functions | Dropped (the legacy form). Use tools + tool_choice; that path is wired up. |
prediction / modalities / audio | Dropped. Anthropic Messages does not expose these. |
verbosity | Dropped. GPT-only knob. |
user / safety_identifier / prompt_cache_key / metadata / store | Dropped on the Claude path. Some of these are gated even on the OpenAI path; see channel-gating note below. |
service_tier | Dropped by default. May be enabled per-channel by support. |
temperature / top_p / top_k with opus-4-7 or -thinking models | Coerced. When the routed model uses extended-thinking, sampling parameters are intentionally cleared, even if you sent values. |
Conversely, the gateway maps several OpenAI fields to Claude-side concepts on your behalf:
stop(string or array) becomes Anthropicstop_sequencestools+tool_choicebecome Anthropictools+tool_choice(with parallel-tool-calls mapping)web_search_optionsbecomes theweb_search_20250305tool;search_context_sizelow/medium/high translates tomaxUses1/5/10reasoning_effortlow/medium/high becomesthinking.budget_tokens1280/2048/4096- Image, audio, file, and video parts become Anthropic content blocks with base64 sources
service_tier, safety_identifier, stream_options.include_obfuscation, and the Claude-side inference_geo / speed. If you need any of these end-to-end, contact support to enable the gate.
7. When to reach for the Anthropic SDK instead
For features that have no canonical place in chat.completions, call the same gateway through the Anthropic SDK on the same key. Same gateway, same key, different protocol.
- Extended thinking blocks. Claude can return a structured
thinkingblock. The OpenAI adapter folds this intodelta.reasoning_content; if you need the typed block (and signature) raw, use the Anthropic SDK. - Prompt caching.
cache_controlon specific content blocks is an Anthropic-native concept. Use the Anthropic SDK so you can attachcache_controldirectly. See Prompt Caching. - Provider-typed streaming events. If you need
content_block_start/message_deltaframing (e.g. to render a thinking block separately), use the Anthropic SDK; the OpenAI adapter collapses these intodeltaevents.
from anthropic import Anthropic
ant = Anthropic(
api_key="sk-YOUR_BUZZ_KEY",
base_url="https://buzzai.cc",
)
msg = ant.messages.create(
model="claude-opus-4-7",
max_tokens=1024,
messages=[{"role": "user", "content": "Plan a refactor of this module..."}],
extra_body={"thinking": {"type": "enabled", "budget_tokens": 8000}},
)
8. Framework integration
Anything that wraps the OpenAI client and exposes a baseURL override works. Three concrete wirings:
LangChain (Python)
from langchain_openai import ChatOpenAI
llm = ChatOpenAI(
model="claude-haiku-4-5-20251001",
api_key="sk-YOUR_BUZZ_KEY",
base_url="https://buzzai.cc/v1",
temperature=0.2,
)
print(llm.invoke("Summarize the BUZZ gateway in one sentence.").content)
LangGraph
LangGraph nodes typically receive a chat model instance. Build it with ChatOpenAI as above and pass it into the graph. No graph-level changes are needed; the routing decision is on the model identifier.
from langgraph.prebuilt import create_react_agent
from langchain_openai import ChatOpenAI
model = ChatOpenAI(
model="claude-sonnet-4-6",
api_key="sk-YOUR_BUZZ_KEY",
base_url="https://buzzai.cc/v1",
)
agent = create_react_agent(model, tools=[...])
result = agent.invoke({"messages": [("user", "What did the deploy do?")]})
Vercel AI SDK
import { createOpenAI } from "@ai-sdk/openai";
import { generateText } from "ai";
const openai = createOpenAI({
apiKey: process.env.BUZZ_API_KEY,
baseURL: "https://buzzai.cc/v1",
});
const { text } = await generateText({
model: openai("claude-haiku-4-5-20251001"),
prompt: "Write a short release note for the cache improvement.",
});
Streaming with streamText, tool calls with the tools option, and structured output via Zod schemas all continue to work. The gateway preserves the wire shape that @ai-sdk/openai expects.
9. Observability and retry
Because the wire format on both directions is OpenAI-native, anything that wraps the OpenAI client keeps working without modification: OpenTelemetry GenAI semantic conventions, Langfuse / Helicone / Phoenix style trace processors, retry decorators on 429s, and request/response loggers. The only field worth surfacing on the BUZZ side is usage.usage_source, which is set to "anthropic" when the response came from a Claude upstream — useful for cross-family cost attribution.
10. Errors
| HTTP | error.type | Typical cause |
|---|---|---|
| 400 | invalid_request_error / buzz_error | Malformed JSON or missing required field |
| 401 | buzz_error | Missing or invalid API key |
| 403 | permission_error | Key lacks permission, IP not allow-listed |
| 429 | rate_limit_error | Rate limit hit; respect retry-after |
| 500 | api_error | Internal error; retry with backoff |
| 503 | buzz_error / model_not_found | No upstream channel under your group serves this model. Pick a different alias or check GET /v1/models. |
BUZZ-side errors return {"error": {"type": "buzz_error", "message": "... (request id: ...)"}}. Upstream-passthrough errors keep the upstream provider's envelope (Anthropic-style or OpenAI-style depending on the routed model).
See also
POST /v1/chat/completions— full parameter referencePOST /v1/messages— the Anthropic-native path on the same key- Claude Code guide — configure the CLI on the same key
- OpenAI SDK / Claude compatibility deep dive