Docs · Guides · OpenAI SDK

Use the OpenAI SDK with BUZZ

One SDK, one base URL, every supported model family. Point the official OpenAI client at https://buzzai.cc/v1 and call Claude, Gemini, Grok, or GPT by name. Streaming, tool use, retry policies, and observability wrappers keep working unchanged.

POST https://buzzai.cc/v1/chat/completions

Why this works. BUZZ exposes an OpenAI-compatible chat.completions endpoint at /v1. The gateway parses your request as chat.completions, dispatches to the right upstream based on the model field, and translates the response back. For OpenAI models the bytes flow through nearly untouched; for Claude / Gemini / Grok the gateway bridges the request and response shape on your behalf.

1. Install & configure

Install the official client. There is no BUZZ-specific package.

pip install openai

npm install openai

Construct the client with two strings: base_url and api_key.

from openai import OpenAI

client = OpenAI(
    api_key="sk-YOUR_BUZZ_KEY",
    base_url="https://buzzai.cc/v1",
)

import OpenAI from "openai";

const client = new OpenAI({
  apiKey: process.env.BUZZ_API_KEY,
  baseURL: "https://buzzai.cc/v1",
});

Both SDKs send the key as Authorization: Bearer <value>, which BUZZ accepts directly. The sk- prefix is optional; BUZZ strips it server-side.

2. Call Claude

Set model to a Claude identifier and call chat.completions.create. The gateway converts the request to the Anthropic Messages schema, calls upstream, and converts the response back. Response shape is the same OpenAI ChatCompletion your code already parses.

resp = client.chat.completions.create(
    model="claude-haiku-4-5-20251001",
    messages=[
        {"role": "system", "content": "You are a precise technical writer."},
        {"role": "user", "content": "Explain content-addressable storage in three sentences."},
    ],
    temperature=0.3,
    max_tokens=400,
)

print(resp.choices[0].message.content)
print(resp.usage)

const resp = await client.chat.completions.create({
  model: "claude-haiku-4-5-20251001",
  messages: [
    { role: "system", content: "You are a precise technical writer." },
    { role: "user", content: "Explain content-addressable storage in three sentences." },
  ],
  temperature: 0.3,
  max_tokens: 400,
});

console.log(resp.choices[0].message.content);

curl -sS https://buzzai.cc/v1/chat/completions \
  -H "Authorization: Bearer $BUZZ_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "claude-haiku-4-5-20251001",
    "messages": [
      {"role": "system", "content": "You are a precise technical writer."},
      {"role": "user", "content": "Explain content-addressable storage in three sentences."}
    ],
    "max_tokens": 400
  }'

Common Claude model identifiers (live list at GET /v1/models):

claude-opus-4-7 · claude-opus-4-6 · claude-opus-4-5-20251101
claude-sonnet-4-6 · claude-sonnet-4-5-20250929
claude-haiku-4-5-20251001

3. Call Gemini, Grok, and GPT

Routing across families is a model-name change. Same client, same code path, same retry wrapper.

# Same client, different model family.
gemini = client.chat.completions.create(
    model="gemini-2.5-pro",
    messages=[{"role": "user", "content": "Summarize this transcript..."}],
)

grok = client.chat.completions.create(
    model="grok-4",
    messages=[{"role": "user", "content": "What happened in markets this week?"}],
)

gpt = client.chat.completions.create(
    model="gpt-5",
    messages=[{"role": "user", "content": "Outline a migration plan."}],
)

The full live list of supported identifiers is at buzzai.cc/models. Live per-token rates are at buzzai.cc/api/pricing.

4. Streaming

Pass stream=true and iterate over the response. The gateway forwards SSE deltas in OpenAI chat.completion.chunk shape regardless of whether the upstream is Anthropic-typed or OpenAI-typed.

stream = client.chat.completions.create(
    model="claude-sonnet-4-6",
    messages=[{"role": "user", "content": "Write a haiku about cold storage."}],
    stream=True,
)

for chunk in stream:
    delta = chunk.choices[0].delta.content
    if delta:
        print(delta, end="", flush=True)

const stream = await client.chat.completions.create({
  model: "claude-sonnet-4-6",
  messages: [{ role: "user", content: "Write a haiku about cold storage." }],
  stream: true,
});

for await (const chunk of stream) {
  const delta = chunk.choices[0]?.delta?.content;
  if (delta) process.stdout.write(delta);
}

To capture per-call token usage at the end of a stream, add stream_options:

stream = client.chat.completions.create(
    model="claude-haiku-4-5-20251001",
    messages=[{"role": "user", "content": "ping"}],
    stream=True,
    stream_options={"include_usage": True},
)

The final chunk carries a non-null usage object. When the upstream is Claude, BUZZ also tags usage.usage_source = "anthropic" so you can identify cross-family billing in observability pipelines.

5. Tool use / function calling

Define tools in the standard OpenAI shape. The gateway maps them to Anthropic tool_use blocks (or Gemini functionDeclarations) on the way out, and maps the upstream tool selection back to tool_calls on the way in. Existing tool-calling loops continue to work without code changes.

tools = [{
    "type": "function",
    "function": {
        "name": "get_weather",
        "description": "Get the current weather for a city.",
        "parameters": {
            "type": "object",
            "properties": {
                "city": {"type": "string"},
                "units": {"type": "string", "enum": ["c", "f"]},
            },
            "required": ["city"],
        },
    },
}]

resp = client.chat.completions.create(
    model="claude-opus-4-7",
    messages=[{"role": "user", "content": "What is the weather in Tokyo right now?"}],
    tools=tools,
    tool_choice="auto",
)

call = resp.choices[0].message.tool_calls[0]
print(call.function.name, call.function.arguments)
# After running the tool, append the assistant message and a follow-up
# {"role": "tool", "tool_call_id": call.id, "content": "..."} message and call again.

6. Fields silently dropped on the Claude path

When the gateway routes to a Claude upstream, it constructs a Messages-API request from your chat.completions body. Fields that have no Anthropic-side equivalent are not included. They do not produce a 400 — they simply disappear from the upstream call. Plan around the table below.

Sent in chat.completions	What happens on the Claude path
`n` (> 1)	Dropped. Anthropic returns a single choice; `n > 1` is not supported.
`presence_penalty`	Dropped. No Anthropic equivalent.
`frequency_penalty`	Dropped. No Anthropic equivalent.
`logit_bias`	Dropped. No Anthropic equivalent.
`logprobs` / `top_logprobs`	Dropped. Logprobs are never returned on Claude responses.
`seed`	Dropped. No deterministic-sampling knob on Anthropic.
`response_format`	Dropped. JSON-mode and JSON-Schema constraints are not enforced on Claude; achieve the same via prompt or via tool-calling.
`function_call` / `functions`	Dropped (the legacy form). Use `tools` + `tool_choice`; that path is wired up.
`prediction` / `modalities` / `audio`	Dropped. Anthropic Messages does not expose these.
`verbosity`	Dropped. GPT-only knob.
`user` / `safety_identifier` / `prompt_cache_key` / `metadata` / `store`	Dropped on the Claude path. Some of these are gated even on the OpenAI path; see channel-gating note below.
`service_tier`	Dropped by default. May be enabled per-channel by support.
`temperature` / `top_p` / `top_k` with `opus-4-7` or `-thinking` models	Coerced. When the routed model uses extended-thinking, sampling parameters are intentionally cleared, even if you sent values.

Conversely, the gateway maps several OpenAI fields to Claude-side concepts on your behalf:

stop (string or array) becomes Anthropic stop_sequences
tools + tool_choice become Anthropic tools + tool_choice (with parallel-tool-calls mapping)
web_search_options becomes the web_search_20250305 tool; search_context_size low/medium/high translates to maxUses 1/5/10
reasoning_effort low/medium/high becomes thinking.budget_tokens 1280/2048/4096
Image, audio, file, and video parts become Anthropic content blocks with base64 sources

Channel-gated fields. A handful of inputs are silently filtered unless the upstream channel has the matching allow-flag enabled: service_tier, safety_identifier, stream_options.include_obfuscation, and the Claude-side inference_geo / speed. If you need any of these end-to-end, contact support to enable the gate.

7. When to reach for the Anthropic SDK instead

For features that have no canonical place in chat.completions, call the same gateway through the Anthropic SDK on the same key. Same gateway, same key, different protocol.

Extended thinking blocks. Claude can return a structured thinking block. The OpenAI adapter folds this into delta.reasoning_content; if you need the typed block (and signature) raw, use the Anthropic SDK.
Prompt caching. cache_control on specific content blocks is an Anthropic-native concept. Use the Anthropic SDK so you can attach cache_control directly. See Prompt Caching.
Provider-typed streaming events. If you need content_block_start / message_delta framing (e.g. to render a thinking block separately), use the Anthropic SDK; the OpenAI adapter collapses these into delta events.

from anthropic import Anthropic

ant = Anthropic(
    api_key="sk-YOUR_BUZZ_KEY",
    base_url="https://buzzai.cc",
)

msg = ant.messages.create(
    model="claude-opus-4-7",
    max_tokens=1024,
    messages=[{"role": "user", "content": "Plan a refactor of this module..."}],
    extra_body={"thinking": {"type": "enabled", "budget_tokens": 8000}},
)

8. Framework integration

Anything that wraps the OpenAI client and exposes a baseURL override works. Three concrete wirings:

LangChain (Python)

from langchain_openai import ChatOpenAI

llm = ChatOpenAI(
    model="claude-haiku-4-5-20251001",
    api_key="sk-YOUR_BUZZ_KEY",
    base_url="https://buzzai.cc/v1",
    temperature=0.2,
)

print(llm.invoke("Summarize the BUZZ gateway in one sentence.").content)

LangGraph

LangGraph nodes typically receive a chat model instance. Build it with ChatOpenAI as above and pass it into the graph. No graph-level changes are needed; the routing decision is on the model identifier.

from langgraph.prebuilt import create_react_agent
from langchain_openai import ChatOpenAI

model = ChatOpenAI(
    model="claude-sonnet-4-6",
    api_key="sk-YOUR_BUZZ_KEY",
    base_url="https://buzzai.cc/v1",
)

agent = create_react_agent(model, tools=[...])
result = agent.invoke({"messages": [("user", "What did the deploy do?")]})

Vercel AI SDK

import { createOpenAI } from "@ai-sdk/openai";
import { generateText } from "ai";

const openai = createOpenAI({
  apiKey: process.env.BUZZ_API_KEY,
  baseURL: "https://buzzai.cc/v1",
});

const { text } = await generateText({
  model: openai("claude-haiku-4-5-20251001"),
  prompt: "Write a short release note for the cache improvement.",
});

Streaming with streamText, tool calls with the tools option, and structured output via Zod schemas all continue to work. The gateway preserves the wire shape that @ai-sdk/openai expects.

9. Observability and retry

Because the wire format on both directions is OpenAI-native, anything that wraps the OpenAI client keeps working without modification: OpenTelemetry GenAI semantic conventions, Langfuse / Helicone / Phoenix style trace processors, retry decorators on 429s, and request/response loggers. The only field worth surfacing on the BUZZ side is usage.usage_source, which is set to "anthropic" when the response came from a Claude upstream — useful for cross-family cost attribution.

10. Errors

HTTP	error.type	Typical cause
400	`invalid_request_error` / `buzz_error`	Malformed JSON or missing required field
401	`buzz_error`	Missing or invalid API key
403	`permission_error`	Key lacks permission, IP not allow-listed
429	`rate_limit_error`	Rate limit hit; respect `retry-after`
500	`api_error`	Internal error; retry with backoff
503	`buzz_error / model_not_found`	No upstream channel under your group serves this model. Pick a different alias or check `GET /v1/models`.

BUZZ-side errors return {"error": {"type": "buzz_error", "message": "... (request id: ...)"}}. Upstream-passthrough errors keep the upstream provider's envelope (Anthropic-style or OpenAI-style depending on the routed model).