Using the OpenAI SDK to Talk to Claude (and Gemini, and Grok)
One SDK. One base_url. The OpenAI-compatible adapter that lets your existing Python code reach Anthropic, Google, and xAI models without a rewrite.
The OpenAI Python SDK is the closest thing the LLM world has to a common dialect. It is not the most expressive client and it does not always map perfectly to every provider, but it is the one your codebase, your dependencies, your tracing layer, and your retry policies almost certainly already speak. The interesting question is not whether to use it. The interesting question is whether you can keep using it once your model strategy expands beyond OpenAI.
You can. With BUZZ AI Gateway, the same openai client that calls gpt-5 can call claude-opus-4-8, a Gemini model, or a Grok model in the same line of code. Only two strings change: the base_url and the model name. This piece is a hands-on look at why that works, what it costs, and the corners where it stops being seamless.
1. Why developers reach for the OpenAI SDK to call Claude
If you have ever maintained a service that talks to two model families, the friction is not the API call itself. It is everything around it.
You already have OpenAI plumbing. Most LLM-touching codebases written since 2023 import from openai import OpenAI. There is a thin wrapper somewhere that sets timeouts, attaches an HTTP client with the right TLS settings, registers a tracer, and adds a retry policy that knows how to back off on 429s. Every additional SDK forces you to either duplicate that wrapper or extract a higher-level abstraction. Both are real engineering work that does not move product forward.
Frameworks already speak it. LangChain’s ChatOpenAI, LlamaIndex’s OpenAI integration, Vercel AI SDK’s openai provider, and most agent frameworks accept a base_url override. Pointing them at an OpenAI-compatible endpoint takes one line in a config file. Switching them to a foreign SDK takes a feature branch.
Your observability stack is calibrated for it. OpenTelemetry semantic conventions for GenAI are written around the OpenAI shape. Langfuse, Helicone, Phoenix, and most internal homegrown loggers parse chat.completions request and response bodies. When the wire format stays the same, your dashboards stay the same.
The mental model is uniform. Engineers reading the code six months from now do not need to remember which SDK uses messages versus contents, or which one wants system as a top-level field versus the first message in a list. client.chat.completions.create(model="...", messages=[...]) is enough.
None of this means the Anthropic SDK is bad. It means that for the majority of application code, the value of one SDK across all model families is higher than the value of provider-native ergonomics for one of them.
2. The OpenAI-compatible adapter at /v1
BUZZ exposes two protocols on the same key:
- Anthropic Messages at
https://buzzai.cc— drop-in for the official Anthropic SDK and for Claude Code. - OpenAI
chat.completionsathttps://buzzai.cc/v1— drop-in for the official OpenAI SDK and for any client that accepts an OpenAI-compatible base URL.
The /v1 endpoint accepts the OpenAI request shape, dispatches to the correct upstream based on the model field, and translates the response back into chat.completions format. The translation is mechanical: messages map to Anthropic content blocks or Gemini contents, tools and tool_choice map to Anthropic tool_use or Gemini functionDeclarations, streaming chunks map to OpenAI delta events, and finish reasons map to stop, length, or tool_calls. The user-visible content stays untouched.
3. Hands-on: Claude through the OpenAI SDK
Install the regular OpenAI client. There is nothing custom to install for BUZZ.
pip install openai
Configure the client with your BUZZ key and the gateway’s OpenAI-compatible base URL.
from openai import OpenAI
client = OpenAI(
api_key="YOUR_BUZZ_KEY",
base_url="https://buzzai.cc/v1",
)
resp = client.chat.completions.create(
model="claude-opus-4-8",
messages=[
{"role": "system", "content": "You are a precise technical writer."},
{"role": "user", "content": "Explain content-addressable storage in three sentences."},
],
temperature=0.3,
max_tokens=400,
)
print(resp.choices[0].message.content)
That is the entire integration. The model is Claude. The SDK is OpenAI’s. The response object is a ChatCompletion with choices, message, finish_reason, and usage exactly where you expect them.
Streaming
Streaming uses the same flag and the same iteration pattern as a real OpenAI call. Tokens arrive as delta.content on each chunk.
stream = client.chat.completions.create(
model="claude-sonnet-4-6",
messages=[{"role": "user", "content": "Write a haiku about cold storage."}],
stream=True,
)
for chunk in stream:
delta = chunk.choices[0].delta.content
if delta:
print(delta, end="", flush=True)
The gateway forwards the upstream server-sent event stream as it arrives. Tokens are not buffered or rewritten. If you have a wrapper that consumes iter_lines directly, that still works because the SSE framing is preserved.
Tool use through the OpenAI tools schema
Define tools in the OpenAI shape. The gateway maps them to Anthropic tool_use blocks on the way out and maps the model’s tool selection back to OpenAI tool_calls on the way in.
tools = [{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get the current weather for a city.",
"parameters": {
"type": "object",
"properties": {
"city": {"type": "string"},
"units": {"type": "string", "enum": ["c", "f"]},
},
"required": ["city"],
},
},
}]
resp = client.chat.completions.create(
model="claude-opus-4-8",
messages=[{"role": "user", "content": "What is the weather in Tokyo right now?"}],
tools=tools,
tool_choice="auto",
)
call = resp.choices[0].message.tool_calls[0]
print(call.function.name, call.function.arguments)
If your existing OpenAI tool-calling loop already handles tool_calls, parses arguments, runs the function, and appends a {"role": "tool", "tool_call_id": ...} message back to the conversation, that loop continues to work unchanged with Claude on the other end.
Switching to Gemini or Grok
Routing to a different family is a model-name change. The same client, the same code path, the same retry wrapper.
# Same client, different model.
gemini_resp = client.chat.completions.create(
model="gemini-2.5-pro",
messages=[{"role": "user", "content": "Summarize this 200-page PDF transcript..."}],
)
grok_resp = client.chat.completions.create(
model="grok-4",
messages=[{"role": "user", "content": "What happened in markets this week?"}],
)
For the live, authoritative list of supported model identifiers and their per-token rates, see https://buzzai.cc/models and https://buzzai.cc/api/pricing.
4. What changes versus calling OpenAI directly
Almost nothing in the call site. The differences are surgical.
| Concern | Direct OpenAI | BUZZ /v1 |
|---|---|---|
base_url | https://api.openai.com/v1 (default) | https://buzzai.cc/v1 |
api_key | OpenAI key | BUZZ key (one for all families) |
model namespace | OpenAI families only | Claude, GPT, Gemini, Grok |
| Request shape | chat.completions | chat.completions |
| Response shape | ChatCompletion | ChatCompletion |
| Streaming | SSE deltas | SSE deltas |
| Tool calling | tools + tool_calls | tools + tool_calls |
| Retries, tracing, logging | Your existing wrapper | Your existing wrapper, unchanged |
The migration is small enough that it usually fits in a single config diff. If your client construction is centralized, it is one file. If it is not, that is a separate refactor you wanted to do anyway.
Sampling parameters
Common parameters (temperature, top_p, max_tokens, stop, seed where supported, presence_penalty and frequency_penalty where supported) are forwarded to the closest upstream equivalent. Behavior follows the upstream model, not OpenAI. temperature=0.7 against Claude does not produce the same distribution as temperature=0.7 against GPT-5. That is true whether you call through a gateway or directly. Tune per model, not per number.
5. What does not translate cleanly
The chat.completions schema is a lowest-common-denominator format. Most application code only ever needs that surface. A few features sit outside it.
Anthropic extended thinking. Claude can return a thinking block alongside the answer. There is no canonical place for that block in the OpenAI response, so it is not exposed through the /v1 adapter. If your app actually consumes the thinking trace, call the same gateway with the Anthropic SDK at https://buzzai.cc and read the structured response.
Anthropic prompt caching. Cache control headers on specific content blocks are an Anthropic-native concept. The OpenAI request shape has no field for them. To use prompt caching with Claude, use the Anthropic SDK against the gateway.
Gemini-specific safety and grounding controls. Per-category safety thresholds, grounding-with-search, and similar Gemini-only knobs do not have chat.completions equivalents. They round-trip through the native Google protocol.
Provider-native streaming events. Anthropic emits typed events like content_block_start, content_block_delta, and message_delta. The OpenAI adapter collapses these into chat.completion.chunk deltas. Token text is preserved; the typed structure is not. If you specifically need the Anthropic event taxonomy (for example, to render a thinking block separately from the answer), use the Anthropic SDK.
The pragmatic pattern is to keep the OpenAI SDK as the default for application code and add the Anthropic SDK as a second client, against the same key and the same gateway, only in the few places where a Claude-native feature actually matters. pip install anthropic, base_url="https://buzzai.cc", done.
from anthropic import Anthropic
ant = Anthropic(
api_key="YOUR_BUZZ_KEY",
base_url="https://buzzai.cc",
)
msg = ant.messages.create(
model="claude-opus-4-8",
max_tokens=1024,
messages=[{"role": "user", "content": "Plan a refactor of this module..."}],
extra_body={"thinking": {"type": "enabled", "budget_tokens": 8000}},
)
Same key. Same gateway. Different protocol when, and only when, you need it.
6. A note on cost
The unusual property of this setup is that the OpenAI-compatible adapter is not a wrapper around OpenAI. It is a wrapper that can dispatch to any supported family, and it bills at BUZZ rates rather than first-party rates. The same script that pointed at https://api.openai.com/v1 yesterday and ran gpt-5 will, after pointing at https://buzzai.cc/v1, run the same prompt at a meaningfully lower cost. No code change beyond the URL.
BUZZ does not publish a fixed discount percentage because the savings vary by model and by token mix. The honest answer is to read the live numbers. Per-model rates for every supported family, including all Claude, GPT, Gemini, and Grok variants, are at https://buzzai.cc/api/pricing. Compare them against the upstream provider’s pricing page, multiply by your monthly token volume, and the spreadsheet will give you a real number.
Claude Code on the same key
If you also use Claude Code as your terminal coding agent, the same BUZZ key works there. One install command points the CLI at the gateway:
curl -fsSL https://buzzai.cc/sh/claudecode.sh | bash
Your application code talks to https://buzzai.cc/v1 through the OpenAI SDK. Claude Code talks to https://buzzai.cc through the Anthropic protocol. Both bills land on the same key and the same dashboard.
7. FAQ
Can I really call Claude with the official OpenAI Python SDK?
Yes. Set base_url="https://buzzai.cc/v1", set api_key to your BUZZ key, and pass model="claude-opus-4-8" (or any other supported identifier) to client.chat.completions.create. The SDK does not need to know that the upstream is Anthropic. The gateway translates between chat.completions and the Anthropic Messages schema for you.
Does streaming still work?
Yes. Pass stream=True. The gateway forwards SSE deltas in the OpenAI chunk format. Iterating over the response yields chunk.choices[0].delta.content exactly like a real OpenAI stream, with no buffering on the gateway side.
What about tool calling and function calling?
Tool calls work through the standard OpenAI tools and tool_choice parameters. The gateway maps them to Anthropic tool_use blocks or Gemini functionDeclarations on the way out, and maps the upstream tool selection back to OpenAI tool_calls on the way in. Existing tool-calling loops continue to work.
Do I need to change my retry, logging, or instrumentation code?
No. Anything that wraps the OpenAI client, including LangChain, LlamaIndex, OpenTelemetry instrumentations, retry decorators, and request loggers, keeps working unchanged. Only base_url and the model name are different.
Are temperature, max_tokens, and other sampling parameters honored?
Standard chat.completions parameters (temperature, top_p, max_tokens, stop, presence_penalty, frequency_penalty, seed where the upstream supports it) are forwarded to the closest equivalent. Behavior matches the upstream model, not OpenAI semantics, so identical numeric values can produce different distributions across families. Tune per model.
What does NOT translate cleanly through the OpenAI adapter?
Provider-native features without a canonical chat.completions place: Anthropic extended_thinking blocks, Anthropic prompt-cache control, Gemini-specific safety and grounding controls, and provider-typed streaming events. When you need those, call the same gateway through the Anthropic SDK at https://buzzai.cc instead. The same key works on both protocols.
How do model names work across families?
The model parameter is a routing key. Identifiers like claude-opus-4-8, claude-sonnet-4-6, claude-haiku-4-5, gpt-5, gpt-5.5, gpt-5.4, gpt-5.4-mini, plus current Gemini and Grok names all resolve to the correct upstream. The full live list is published at https://buzzai.cc/models.
How does pricing compare to calling OpenAI directly?
BUZZ rates are meaningfully below first-party pricing across the supported families. Because only the base_url changes, the same code path costs less without an application change. Live per-model rates are at https://buzzai.cc/api/pricing.
Can I use the same key with Claude Code?
Yes. curl -fsSL https://buzzai.cc/sh/claudecode.sh | bash installs Claude Code preconfigured for the gateway. The CLI uses the Anthropic protocol against https://buzzai.cc with the same key you use from the OpenAI SDK against https://buzzai.cc/v1.
Is this a real proxy, or does the gateway rewrite my prompt?
BUZZ forwards request and response bodies transparently. It does not modify system prompts, inject instructions, or silently swap models. The only transformation is the schema bridge between chat.completions and the native upstream protocol, and that bridge touches structure, not content.
8. Conclusion
The OpenAI Python SDK has won the lingua franca race for application-side LLM code, and that is fine. It is good enough for the vast majority of what production code needs to do: send messages, stream tokens, call tools, surface usage. The mistake is treating it as a contract that locks you to one model family. Once a gateway speaks the same protocol on the way in and dispatches to whichever upstream you name, the SDK becomes a transport detail and the model name becomes the choice.
Pointing the OpenAI SDK at https://buzzai.cc/v1 turns one line of configuration into access to every major model family at lower cost, with zero data retention, with streaming and tool use intact, and without disturbing the wrappers your codebase has accumulated. When a Claude-native feature like extended thinking actually matters, the Anthropic SDK is one extra dependency away on the same key. The boring parts stay boring, which is the only honest goal of infrastructure work.
Drop the URL into your client constructor. Change the model name. Run your existing tests. The migration is the kind that takes a coffee, not a sprint.