Using the OpenAI SDK to Talk to Claude (and Gemini, and Grok)

Q: Can I really call Claude with the official OpenAI Python SDK?

Yes. Set base_url to https://buzzai.cc/v1, set api_key to your BUZZ key, and pass model="claude-opus-4-8" (or any other supported model name) to client.chat.completions.create. The SDK does not need to know that the upstream is Anthropic. The gateway translates the chat.completions schema to the Anthropic Messages schema and back.

Q: Does streaming still work?

Yes. Pass stream=True. The gateway forwards server-sent events from Anthropic, Google, or xAI in OpenAI delta format. Iterating over the response yields chunk.choices[0].delta.content exactly like a real OpenAI stream.

Q: What about tool calling and function calling?

Tool calls work through the OpenAI tools and tool_choice parameters. The gateway maps them to Anthropic tool_use blocks or Gemini function declarations on the way out, and maps the upstream tool result back to OpenAI tool_calls on the way in. Existing OpenAI tool-calling code continues to work.

Q: Are temperature, max_tokens, and other sampling parameters honored?

Standard chat.completions parameters (temperature, top_p, max_tokens, stop, presence_penalty, frequency_penalty, seed where supported) are forwarded to the upstream provider with the closest equivalent. Behavior matches the upstream model, not OpenAI semantics, so an identical temperature value can produce different distributions across families.

Q: What does NOT translate cleanly through the OpenAI adapter?

A few provider-native features have no canonical place in the chat.completions schema: Anthropic extended_thinking blocks, Anthropic prompt caching headers, Gemini system instructions in their structured form, and provider-specific safety controls. When you need those, call the same gateway through the Anthropic SDK at https://buzzai.cc instead. The same key works on both protocols.

Q: How do model names work across families?

The model parameter is a routing key. Names like claude-opus-4-8, claude-sonnet-4-6, claude-haiku-4-5, gpt-5, gpt-5.5, gpt-5.4, gpt-5.4-mini, gemini-*, and grok-* all resolve to the correct upstream. The full live list is published at https://buzzai.cc/models.

Q: How does pricing compare to calling OpenAI directly?

BUZZ rates are significantly below first-party pricing across all supported families. Because only the base_url changes, the same code path costs less without any application change. Live per-model rates are published at https://buzzai.cc/api/pricing.

Q: Can I use the same key with Claude Code?

Yes. Run curl -fsSL https://buzzai.cc/sh/claudecode.sh | bash to install Claude Code preconfigured for the gateway. The CLI uses the Anthropic protocol against https://buzzai.cc with the same key you use from the OpenAI SDK.

One SDK. One base_url. The OpenAI-compatible adapter that lets your existing Python code reach Anthropic, Google, and xAI models without a rewrite.

OpenAI SDK Claude Gemini Grok chat.completions Python Streaming Tool Use

By BUZZ AI Gateway Engineering · 11 minute read

The OpenAI Python SDK is the closest thing the LLM world has to a common dialect. It is not the most expressive client and it does not always map perfectly to every provider, but it is the one your codebase, your dependencies, your tracing layer, and your retry policies almost certainly already speak. The interesting question is not whether to use it. The interesting question is whether you can keep using it once your model strategy expands beyond OpenAI.

You can. With BUZZ AI Gateway, the same openai client that calls gpt-5 can call claude-opus-4-8, a Gemini model, or a Grok model in the same line of code. Only two strings change: the base_url and the model name. This piece is a hands-on look at why that works, what it costs, and the corners where it stops being seamless.

1. Why developers reach for the OpenAI SDK to call Claude

If you have ever maintained a service that talks to two model families, the friction is not the API call itself. It is everything around it.

You already have OpenAI plumbing. Most LLM-touching codebases written since 2023 import from openai import OpenAI. There is a thin wrapper somewhere that sets timeouts, attaches an HTTP client with the right TLS settings, registers a tracer, and adds a retry policy that knows how to back off on 429s. Every additional SDK forces you to either duplicate that wrapper or extract a higher-level abstraction. Both are real engineering work that does not move product forward.

Frameworks already speak it. LangChain’s ChatOpenAI, LlamaIndex’s OpenAI integration, Vercel AI SDK’s openai provider, and most agent frameworks accept a base_url override. Pointing them at an OpenAI-compatible endpoint takes one line in a config file. Switching them to a foreign SDK takes a feature branch.

Your observability stack is calibrated for it. OpenTelemetry semantic conventions for GenAI are written around the OpenAI shape. Langfuse, Helicone, Phoenix, and most internal homegrown loggers parse chat.completions request and response bodies. When the wire format stays the same, your dashboards stay the same.

The mental model is uniform. Engineers reading the code six months from now do not need to remember which SDK uses messages versus contents, or which one wants system as a top-level field versus the first message in a list. client.chat.completions.create(model="...", messages=[...]) is enough.

None of this means the Anthropic SDK is bad. It means that for the majority of application code, the value of one SDK across all model families is higher than the value of provider-native ergonomics for one of them.

2. The OpenAI-compatible adapter at `/v1`

BUZZ exposes two protocols on the same key:

Anthropic Messages at https://buzzai.cc — drop-in for the official Anthropic SDK and for Claude Code.
OpenAI chat.completions at https://buzzai.cc/v1 — drop-in for the official OpenAI SDK and for any client that accepts an OpenAI-compatible base URL.

The /v1 endpoint accepts the OpenAI request shape, dispatches to the correct upstream based on the model field, and translates the response back into chat.completions format. The translation is mechanical: messages map to Anthropic content blocks or Gemini contents, tools and tool_choice map to Anthropic tool_use or Gemini functionDeclarations, streaming chunks map to OpenAI delta events, and finish reasons map to stop, length, or tool_calls. The user-visible content stays untouched.

Transparent forwarding. The adapter does not modify your system prompt, inject instructions, silently substitute models, or buffer streamed responses. It does the protocol translation and nothing else. Browse the live model catalog at https://buzzai.cc/models.

3. Hands-on: Claude through the OpenAI SDK

Install the regular OpenAI client. There is nothing custom to install for BUZZ.

pip install openai

Configure the client with your BUZZ key and the gateway’s OpenAI-compatible base URL.

from openai import OpenAI

client = OpenAI(
    api_key="YOUR_BUZZ_KEY",
    base_url="https://buzzai.cc/v1",
)

resp = client.chat.completions.create(
    model="claude-opus-4-8",
    messages=[
        {"role": "system", "content": "You are a precise technical writer."},
        {"role": "user", "content": "Explain content-addressable storage in three sentences."},
    ],
    temperature=0.3,
    max_tokens=400,
)

print(resp.choices[0].message.content)

That is the entire integration. The model is Claude. The SDK is OpenAI’s. The response object is a ChatCompletion with choices, message, finish_reason, and usage exactly where you expect them.

Streaming

Streaming uses the same flag and the same iteration pattern as a real OpenAI call. Tokens arrive as delta.content on each chunk.

stream = client.chat.completions.create(
    model="claude-sonnet-4-6",
    messages=[{"role": "user", "content": "Write a haiku about cold storage."}],
    stream=True,
)

for chunk in stream:
    delta = chunk.choices[0].delta.content
    if delta:
        print(delta, end="", flush=True)

The gateway forwards the upstream server-sent event stream as it arrives. Tokens are not buffered or rewritten. If you have a wrapper that consumes iter_lines directly, that still works because the SSE framing is preserved.

Tool use through the OpenAI tools schema

Define tools in the OpenAI shape. The gateway maps them to Anthropic tool_use blocks on the way out and maps the model’s tool selection back to OpenAI tool_calls on the way in.

tools = [{
    "type": "function",
    "function": {
        "name": "get_weather",
        "description": "Get the current weather for a city.",
        "parameters": {
            "type": "object",
            "properties": {
                "city": {"type": "string"},
                "units": {"type": "string", "enum": ["c", "f"]},
            },
            "required": ["city"],
        },
    },
}]

resp = client.chat.completions.create(
    model="claude-opus-4-8",
    messages=[{"role": "user", "content": "What is the weather in Tokyo right now?"}],
    tools=tools,
    tool_choice="auto",
)

call = resp.choices[0].message.tool_calls[0]
print(call.function.name, call.function.arguments)

If your existing OpenAI tool-calling loop already handles tool_calls, parses arguments, runs the function, and appends a {"role": "tool", "tool_call_id": ...} message back to the conversation, that loop continues to work unchanged with Claude on the other end.

Switching to Gemini or Grok

Routing to a different family is a model-name change. The same client, the same code path, the same retry wrapper.

# Same client, different model.
gemini_resp = client.chat.completions.create(
    model="gemini-2.5-pro",
    messages=[{"role": "user", "content": "Summarize this 200-page PDF transcript..."}],
)

grok_resp = client.chat.completions.create(
    model="grok-4",
    messages=[{"role": "user", "content": "What happened in markets this week?"}],
)

For the live, authoritative list of supported model identifiers and their per-token rates, see https://buzzai.cc/models and https://buzzai.cc/api/pricing.

4. What changes versus calling OpenAI directly

Almost nothing in the call site. The differences are surgical.

Concern	Direct OpenAI	BUZZ /v1
`base_url`	`https://api.openai.com/v1` (default)	`https://buzzai.cc/v1`
`api_key`	OpenAI key	BUZZ key (one for all families)
`model` namespace	OpenAI families only	Claude, GPT, Gemini, Grok
Request shape	`chat.completions`	`chat.completions`
Response shape	`ChatCompletion`	`ChatCompletion`
Streaming	SSE deltas	SSE deltas
Tool calling	`tools` + `tool_calls`	`tools` + `tool_calls`
Retries, tracing, logging	Your existing wrapper	Your existing wrapper, unchanged

The migration is small enough that it usually fits in a single config diff. If your client construction is centralized, it is one file. If it is not, that is a separate refactor you wanted to do anyway.

Sampling parameters

Common parameters (temperature, top_p, max_tokens, stop, seed where supported, presence_penalty and frequency_penalty where supported) are forwarded to the closest upstream equivalent. Behavior follows the upstream model, not OpenAI. temperature=0.7 against Claude does not produce the same distribution as temperature=0.7 against GPT-5. That is true whether you call through a gateway or directly. Tune per model, not per number.

5. What does not translate cleanly

The chat.completions schema is a lowest-common-denominator format. Most application code only ever needs that surface. A few features sit outside it.

Anthropic extended thinking. Claude can return a thinking block alongside the answer. There is no canonical place for that block in the OpenAI response, so it is not exposed through the /v1 adapter. If your app actually consumes the thinking trace, call the same gateway with the Anthropic SDK at https://buzzai.cc and read the structured response.

Anthropic prompt caching. Cache control headers on specific content blocks are an Anthropic-native concept. The OpenAI request shape has no field for them. To use prompt caching with Claude, use the Anthropic SDK against the gateway.

Gemini-specific safety and grounding controls. Per-category safety thresholds, grounding-with-search, and similar Gemini-only knobs do not have chat.completions equivalents. They round-trip through the native Google protocol.

Provider-native streaming events. Anthropic emits typed events like content_block_start, content_block_delta, and message_delta. The OpenAI adapter collapses these into chat.completion.chunk deltas. Token text is preserved; the typed structure is not. If you specifically need the Anthropic event taxonomy (for example, to render a thinking block separately from the answer), use the Anthropic SDK.

The pragmatic pattern is to keep the OpenAI SDK as the default for application code and add the Anthropic SDK as a second client, against the same key and the same gateway, only in the few places where a Claude-native feature actually matters. pip install anthropic, base_url="https://buzzai.cc", done.

from anthropic import Anthropic

ant = Anthropic(
    api_key="YOUR_BUZZ_KEY",
    base_url="https://buzzai.cc",
)

msg = ant.messages.create(
    model="claude-opus-4-8",
    max_tokens=1024,
    messages=[{"role": "user", "content": "Plan a refactor of this module..."}],
    extra_body={"thinking": {"type": "enabled", "budget_tokens": 8000}},
)

Same key. Same gateway. Different protocol when, and only when, you need it.

6. A note on cost

The unusual property of this setup is that the OpenAI-compatible adapter is not a wrapper around OpenAI. It is a wrapper that can dispatch to any supported family, and it bills at BUZZ rates rather than first-party rates. The same script that pointed at https://api.openai.com/v1 yesterday and ran gpt-5 will, after pointing at https://buzzai.cc/v1, run the same prompt at a meaningfully lower cost. No code change beyond the URL.

BUZZ does not publish a fixed discount percentage because the savings vary by model and by token mix. The honest answer is to read the live numbers. Per-model rates for every supported family, including all Claude, GPT, Gemini, and Grok variants, are at https://buzzai.cc/api/pricing. Compare them against the upstream provider’s pricing page, multiply by your monthly token volume, and the spreadsheet will give you a real number.

Zero data retention. Prompts and completions are not written to disk, database, or logs. Only token counts and model identifiers are kept for billing. The protocol bridge is in-memory and the payload is gone the moment the response finishes.

Claude Code on the same key

If you also use Claude Code as your terminal coding agent, the same BUZZ key works there. One install command points the CLI at the gateway:

curl -fsSL https://buzzai.cc/sh/claudecode.sh | bash

Your application code talks to https://buzzai.cc/v1 through the OpenAI SDK. Claude Code talks to https://buzzai.cc through the Anthropic protocol. Both bills land on the same key and the same dashboard.

7. FAQ

Can I really call Claude with the official OpenAI Python SDK?

Yes. Set base_url="https://buzzai.cc/v1", set api_key to your BUZZ key, and pass model="claude-opus-4-8" (or any other supported identifier) to client.chat.completions.create. The SDK does not need to know that the upstream is Anthropic. The gateway translates between chat.completions and the Anthropic Messages schema for you.

Does streaming still work?

Yes. Pass stream=True. The gateway forwards SSE deltas in the OpenAI chunk format. Iterating over the response yields chunk.choices[0].delta.content exactly like a real OpenAI stream, with no buffering on the gateway side.

What about tool calling and function calling?

Tool calls work through the standard OpenAI tools and tool_choice parameters. The gateway maps them to Anthropic tool_use blocks or Gemini functionDeclarations on the way out, and maps the upstream tool selection back to OpenAI tool_calls on the way in. Existing tool-calling loops continue to work.

Do I need to change my retry, logging, or instrumentation code?

No. Anything that wraps the OpenAI client, including LangChain, LlamaIndex, OpenTelemetry instrumentations, retry decorators, and request loggers, keeps working unchanged. Only base_url and the model name are different.

Are `temperature`, `max_tokens`, and other sampling parameters honored?

Standard chat.completions parameters (temperature, top_p, max_tokens, stop, presence_penalty, frequency_penalty, seed where the upstream supports it) are forwarded to the closest equivalent. Behavior matches the upstream model, not OpenAI semantics, so identical numeric values can produce different distributions across families. Tune per model.

What does NOT translate cleanly through the OpenAI adapter?

Provider-native features without a canonical chat.completions place: Anthropic extended_thinking blocks, Anthropic prompt-cache control, Gemini-specific safety and grounding controls, and provider-typed streaming events. When you need those, call the same gateway through the Anthropic SDK at https://buzzai.cc instead. The same key works on both protocols.

How do model names work across families?

The model parameter is a routing key. Identifiers like claude-opus-4-8, claude-sonnet-4-6, claude-haiku-4-5, gpt-5, gpt-5.5, gpt-5.4, gpt-5.4-mini, plus current Gemini and Grok names all resolve to the correct upstream. The full live list is published at https://buzzai.cc/models.

How does pricing compare to calling OpenAI directly?

BUZZ rates are meaningfully below first-party pricing across the supported families. Because only the base_url changes, the same code path costs less without an application change. Live per-model rates are at https://buzzai.cc/api/pricing.

Can I use the same key with Claude Code?

Yes. curl -fsSL https://buzzai.cc/sh/claudecode.sh | bash installs Claude Code preconfigured for the gateway. The CLI uses the Anthropic protocol against https://buzzai.cc with the same key you use from the OpenAI SDK against https://buzzai.cc/v1.

Is this a real proxy, or does the gateway rewrite my prompt?

BUZZ forwards request and response bodies transparently. It does not modify system prompts, inject instructions, or silently swap models. The only transformation is the schema bridge between chat.completions and the native upstream protocol, and that bridge touches structure, not content.

8. Conclusion

The OpenAI Python SDK has won the lingua franca race for application-side LLM code, and that is fine. It is good enough for the vast majority of what production code needs to do: send messages, stream tokens, call tools, surface usage. The mistake is treating it as a contract that locks you to one model family. Once a gateway speaks the same protocol on the way in and dispatches to whichever upstream you name, the SDK becomes a transport detail and the model name becomes the choice.

Pointing the OpenAI SDK at https://buzzai.cc/v1 turns one line of configuration into access to every major model family at lower cost, with zero data retention, with streaming and tool use intact, and without disturbing the wrappers your codebase has accumulated. When a Claude-native feature like extended thinking actually matters, the Anthropic SDK is one extra dependency away on the same key. The boring parts stay boring, which is the only honest goal of infrastructure work.

Drop the URL into your client constructor. Change the model name. Run your existing tests. The migration is the kind that takes a coffee, not a sprint.

Published: 2026-05-22 · Last reviewed: 2026-05-22

Using the OpenAI SDK to Talk to Claude (and Gemini, and Grok)

1. Why developers reach for the OpenAI SDK to call Claude

2. The OpenAI-compatible adapter at /v1

3. Hands-on: Claude through the OpenAI SDK

Streaming

Tool use through the OpenAI tools schema

Switching to Gemini or Grok

4. What changes versus calling OpenAI directly

Sampling parameters

5. What does not translate cleanly

6. A note on cost

Claude Code on the same key

7. FAQ

Can I really call Claude with the official OpenAI Python SDK?

Does streaming still work?

What about tool calling and function calling?

Do I need to change my retry, logging, or instrumentation code?

Are temperature, max_tokens, and other sampling parameters honored?

What does NOT translate cleanly through the OpenAI adapter?

How do model names work across families?

How does pricing compare to calling OpenAI directly?

Can I use the same key with Claude Code?

Is this a real proxy, or does the gateway rewrite my prompt?

8. Conclusion

2. The OpenAI-compatible adapter at `/v1`

Are `temperature`, `max_tokens`, and other sampling parameters honored?