BUZZ AI Gateway
Docs · Guides · OpenAI SDK

Use the OpenAI SDK with BUZZ

One SDK, one base URL, every supported model family. Point the official OpenAI client at https://buzzai.cc/v1 and call Claude, Gemini, Grok, or GPT by name. Streaming, tool use, retry policies, and observability wrappers keep working unchanged.

POST https://buzzai.cc/v1/chat/completions
Why this works. BUZZ exposes an OpenAI-compatible chat.completions endpoint at /v1. The gateway parses your request as chat.completions, dispatches to the right upstream based on the model field, and translates the response back. For OpenAI models the bytes flow through nearly untouched; for Claude / Gemini / Grok the gateway bridges the request and response shape on your behalf.

1. Install & configure

Install the official client. There is no BUZZ-specific package.

pip install openai
npm install openai

Construct the client with two strings: base_url and api_key.

from openai import OpenAI

client = OpenAI(
    api_key="sk-YOUR_BUZZ_KEY",
    base_url="https://buzzai.cc/v1",
)
import OpenAI from "openai";

const client = new OpenAI({
  apiKey: process.env.BUZZ_API_KEY,
  baseURL: "https://buzzai.cc/v1",
});

Both SDKs send the key as Authorization: Bearer <value>, which BUZZ accepts directly. The sk- prefix is optional; BUZZ strips it server-side.

2. Call Claude

Set model to a Claude identifier and call chat.completions.create. The gateway converts the request to the Anthropic Messages schema, calls upstream, and converts the response back. Response shape is the same OpenAI ChatCompletion your code already parses.

resp = client.chat.completions.create(
    model="claude-haiku-4-5-20251001",
    messages=[
        {"role": "system", "content": "You are a precise technical writer."},
        {"role": "user", "content": "Explain content-addressable storage in three sentences."},
    ],
    temperature=0.3,
    max_tokens=400,
)

print(resp.choices[0].message.content)
print(resp.usage)
const resp = await client.chat.completions.create({
  model: "claude-haiku-4-5-20251001",
  messages: [
    { role: "system", content: "You are a precise technical writer." },
    { role: "user", content: "Explain content-addressable storage in three sentences." },
  ],
  temperature: 0.3,
  max_tokens: 400,
});

console.log(resp.choices[0].message.content);
curl -sS https://buzzai.cc/v1/chat/completions \
  -H "Authorization: Bearer $BUZZ_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "claude-haiku-4-5-20251001",
    "messages": [
      {"role": "system", "content": "You are a precise technical writer."},
      {"role": "user", "content": "Explain content-addressable storage in three sentences."}
    ],
    "max_tokens": 400
  }'

Common Claude model identifiers (live list at GET /v1/models):

3. Call Gemini, Grok, and GPT

Routing across families is a model-name change. Same client, same code path, same retry wrapper.

# Same client, different model family.
gemini = client.chat.completions.create(
    model="gemini-2.5-pro",
    messages=[{"role": "user", "content": "Summarize this transcript..."}],
)

grok = client.chat.completions.create(
    model="grok-4",
    messages=[{"role": "user", "content": "What happened in markets this week?"}],
)

gpt = client.chat.completions.create(
    model="gpt-5",
    messages=[{"role": "user", "content": "Outline a migration plan."}],
)

The full live list of supported identifiers is at buzzai.cc/models. Live per-token rates are at buzzai.cc/api/pricing.

4. Streaming

Pass stream=true and iterate over the response. The gateway forwards SSE deltas in OpenAI chat.completion.chunk shape regardless of whether the upstream is Anthropic-typed or OpenAI-typed.

stream = client.chat.completions.create(
    model="claude-sonnet-4-6",
    messages=[{"role": "user", "content": "Write a haiku about cold storage."}],
    stream=True,
)

for chunk in stream:
    delta = chunk.choices[0].delta.content
    if delta:
        print(delta, end="", flush=True)
const stream = await client.chat.completions.create({
  model: "claude-sonnet-4-6",
  messages: [{ role: "user", content: "Write a haiku about cold storage." }],
  stream: true,
});

for await (const chunk of stream) {
  const delta = chunk.choices[0]?.delta?.content;
  if (delta) process.stdout.write(delta);
}

To capture per-call token usage at the end of a stream, add stream_options:

stream = client.chat.completions.create(
    model="claude-haiku-4-5-20251001",
    messages=[{"role": "user", "content": "ping"}],
    stream=True,
    stream_options={"include_usage": True},
)

The final chunk carries a non-null usage object. When the upstream is Claude, BUZZ also tags usage.usage_source = "anthropic" so you can identify cross-family billing in observability pipelines.

5. Tool use / function calling

Define tools in the standard OpenAI shape. The gateway maps them to Anthropic tool_use blocks (or Gemini functionDeclarations) on the way out, and maps the upstream tool selection back to tool_calls on the way in. Existing tool-calling loops continue to work without code changes.

tools = [{
    "type": "function",
    "function": {
        "name": "get_weather",
        "description": "Get the current weather for a city.",
        "parameters": {
            "type": "object",
            "properties": {
                "city": {"type": "string"},
                "units": {"type": "string", "enum": ["c", "f"]},
            },
            "required": ["city"],
        },
    },
}]

resp = client.chat.completions.create(
    model="claude-opus-4-7",
    messages=[{"role": "user", "content": "What is the weather in Tokyo right now?"}],
    tools=tools,
    tool_choice="auto",
)

call = resp.choices[0].message.tool_calls[0]
print(call.function.name, call.function.arguments)
# After running the tool, append the assistant message and a follow-up
# {"role": "tool", "tool_call_id": call.id, "content": "..."} message and call again.

6. Fields silently dropped on the Claude path

When the gateway routes to a Claude upstream, it constructs a Messages-API request from your chat.completions body. Fields that have no Anthropic-side equivalent are not included. They do not produce a 400 — they simply disappear from the upstream call. Plan around the table below.

Sent in chat.completionsWhat happens on the Claude path
n (> 1)Dropped. Anthropic returns a single choice; n > 1 is not supported.
presence_penaltyDropped. No Anthropic equivalent.
frequency_penaltyDropped. No Anthropic equivalent.
logit_biasDropped. No Anthropic equivalent.
logprobs / top_logprobsDropped. Logprobs are never returned on Claude responses.
seedDropped. No deterministic-sampling knob on Anthropic.
response_formatDropped. JSON-mode and JSON-Schema constraints are not enforced on Claude; achieve the same via prompt or via tool-calling.
function_call / functionsDropped (the legacy form). Use tools + tool_choice; that path is wired up.
prediction / modalities / audioDropped. Anthropic Messages does not expose these.
verbosityDropped. GPT-only knob.
user / safety_identifier / prompt_cache_key / metadata / storeDropped on the Claude path. Some of these are gated even on the OpenAI path; see channel-gating note below.
service_tierDropped by default. May be enabled per-channel by support.
temperature / top_p / top_k with opus-4-7 or -thinking modelsCoerced. When the routed model uses extended-thinking, sampling parameters are intentionally cleared, even if you sent values.

Conversely, the gateway maps several OpenAI fields to Claude-side concepts on your behalf:

Channel-gated fields. A handful of inputs are silently filtered unless the upstream channel has the matching allow-flag enabled: service_tier, safety_identifier, stream_options.include_obfuscation, and the Claude-side inference_geo / speed. If you need any of these end-to-end, contact support to enable the gate.

7. When to reach for the Anthropic SDK instead

For features that have no canonical place in chat.completions, call the same gateway through the Anthropic SDK on the same key. Same gateway, same key, different protocol.

from anthropic import Anthropic

ant = Anthropic(
    api_key="sk-YOUR_BUZZ_KEY",
    base_url="https://buzzai.cc",
)

msg = ant.messages.create(
    model="claude-opus-4-7",
    max_tokens=1024,
    messages=[{"role": "user", "content": "Plan a refactor of this module..."}],
    extra_body={"thinking": {"type": "enabled", "budget_tokens": 8000}},
)

8. Framework integration

Anything that wraps the OpenAI client and exposes a baseURL override works. Three concrete wirings:

LangChain (Python)

from langchain_openai import ChatOpenAI

llm = ChatOpenAI(
    model="claude-haiku-4-5-20251001",
    api_key="sk-YOUR_BUZZ_KEY",
    base_url="https://buzzai.cc/v1",
    temperature=0.2,
)

print(llm.invoke("Summarize the BUZZ gateway in one sentence.").content)

LangGraph

LangGraph nodes typically receive a chat model instance. Build it with ChatOpenAI as above and pass it into the graph. No graph-level changes are needed; the routing decision is on the model identifier.

from langgraph.prebuilt import create_react_agent
from langchain_openai import ChatOpenAI

model = ChatOpenAI(
    model="claude-sonnet-4-6",
    api_key="sk-YOUR_BUZZ_KEY",
    base_url="https://buzzai.cc/v1",
)

agent = create_react_agent(model, tools=[...])
result = agent.invoke({"messages": [("user", "What did the deploy do?")]})

Vercel AI SDK

import { createOpenAI } from "@ai-sdk/openai";
import { generateText } from "ai";

const openai = createOpenAI({
  apiKey: process.env.BUZZ_API_KEY,
  baseURL: "https://buzzai.cc/v1",
});

const { text } = await generateText({
  model: openai("claude-haiku-4-5-20251001"),
  prompt: "Write a short release note for the cache improvement.",
});

Streaming with streamText, tool calls with the tools option, and structured output via Zod schemas all continue to work. The gateway preserves the wire shape that @ai-sdk/openai expects.

9. Observability and retry

Because the wire format on both directions is OpenAI-native, anything that wraps the OpenAI client keeps working without modification: OpenTelemetry GenAI semantic conventions, Langfuse / Helicone / Phoenix style trace processors, retry decorators on 429s, and request/response loggers. The only field worth surfacing on the BUZZ side is usage.usage_source, which is set to "anthropic" when the response came from a Claude upstream — useful for cross-family cost attribution.

10. Errors

HTTPerror.typeTypical cause
400invalid_request_error / buzz_errorMalformed JSON or missing required field
401buzz_errorMissing or invalid API key
403permission_errorKey lacks permission, IP not allow-listed
429rate_limit_errorRate limit hit; respect retry-after
500api_errorInternal error; retry with backoff
503buzz_error / model_not_foundNo upstream channel under your group serves this model. Pick a different alias or check GET /v1/models.

BUZZ-side errors return {"error": {"type": "buzz_error", "message": "... (request id: ...)"}}. Upstream-passthrough errors keep the upstream provider's envelope (Anthropic-style or OpenAI-style depending on the routed model).

See also