What is the difference between a Claude relay station and an AI gateway?

A traditional Claude relay station typically modifies your request to cut its own costs: injecting hidden system prompts, trimming context, or silently swapping in a cheaper model. An AI gateway like BUZZ does transparent forwarding. The bytes you send are the bytes Anthropic receives, and the bytes Anthropic returns are the bytes you get back. Prompt caching, tool use, and streaming all stay byte-for-byte faithful.

Home › Blog › BUZZ vs Traditional Claude Relay Stations

BUZZ vs Traditional Claude Relay Stations: Why Claude Code Users Are Switching

Q: How do I point Claude Code at BUZZ?

Two environment variables: export ANTHROPIC_BASE_URL=https://buzzai.cc and export ANTHROPIC_AUTH_TOKEN= . Nothing else in Claude Code's config has to change. BUZZ is fully compatible with Anthropic /v1/messages, so prompt caching kicks in automatically across long sessions.

Q: Does BUZZ support prompt caching?

Yes, natively. The cache_control: {type: 'ephemeral'} marker in your request is forwarded to Anthropic untouched. Cache hits are computed by the upstream cluster, and the latency and price discount you see match Anthropic's published numbers. In long Claude Code sessions, cache hit rates above 90% are typical.

Q: Does BUZZ store my prompts or responses?

Zero retention. Request and response bodies are never written to disk, never enter any training pipeline, and are never reviewed by humans. Audit logs only contain token counts, status codes, latency, and a user identifier. They contain no prompt text and no completion text.

If you have been running Claude Code through a relay station and something feels off, context shrinking on its own, prompt cache never hitting, tool use sequences breaking mid-loop, long sessions getting strangely expensive, here is the uncomfortable framing: most relay stations are not gateways. They are request mutators. This article walks through the nine axes where BUZZ and traditional relay stations differ, and why heavy Claude Code users keep migrating off the latter.

2026-05-26 · About 8 minutes · For Claude API and Claude Code users

90B+Tokens served

187 msAvg latency

99.9%30-day uptime

12Edge nodes

What "traditional Claude relay station" actually means

Over the last couple of years, an entire layer of "Claude relay stations," "Claude mirrors," and "Anthropic API proxies" has sprung up. The problem they solve is real: not every developer wants to deal with Anthropic's account, billing, and capacity surface directly. Relay stations promise a lower-friction path.

The problem is how most of them make money. The cheapest path to margin is to quietly cut costs at your expense:

Inject hidden system prompts. A short pre-amble like "do not generate code longer than 100 lines" can shave 20% off output tokens that the relay would have to pay for.
Trim context. Long inputs get clipped at some internal threshold. You think you sent 50K tokens of context, the model actually sees 20K.
Swap models silently. You request claude-opus-4-8, the relay routes to claude-haiku-4-5, and the response object still claims "model": "claude-opus-4-8".
Strip prompt caching. The relay removes cache_control markers because maintaining cache state in their own infrastructure is hard. The cost lands on you, every long session pays full input price every turn.
Buffer streaming. Server-Sent Events get re-batched into a single response. Claude Code feels like it stalls for several seconds, then dumps everything at once.

None of this is visible from the outside. You only notice when long Claude Code sessions cost more than they should, when tool calls fail mid-loop, or when the assistant produces output that does not match the prompt you wrote.

Nine-axis comparison: BUZZ vs traditional relay stations

Axis	BUZZ AI Gateway	Traditional relay station
Request forwarding	Byte-level transparent, 1:1	Injects system prompts, trims context
Prompt caching	Native `cache_control` passthrough	Usually stripped, never hits
Function calling / tool use	Full `tool_use` / `tool_result` loop preserved	Long loops break, JSON occasionally rewritten
Streaming (SSE)	All event types pass through faithfully	Some relays buffer and dump in batches
SDK compatibility	Anthropic SDK and OpenAI SDK both work	Usually OpenAI-compatible path only
Claude Code support	Two env vars and you are done	Unsupported or needs a wrapper
Data retention	Zero retention, only token counts logged	Requests and responses often retained
Billing model	Pay-per-token, no monthly fee	Monthly plans or traffic packages
Pricing transparency	Public rates, cache discount auto-applied	Opaque, often hides a markup

1. Transparent forwarding (not a single byte changed)

The contract BUZZ is built around is transparent forwarding. The request bytes you send and the bytes the upstream model receives are identical. The bytes the model returns and the bytes you get back are identical.

The model field is never substituted.
The messages array is never trimmed.
No hidden system prompt is injected.
tool_use JSON is never rewritten.
Streaming events are not re-batched.

How to verify: send the same request through BUZZ and directly to Anthropic, then compare usage.input_tokens and usage.output_tokens on the responses. They must match. If a relay's input token count is lower, it trimmed your context. If higher, it injected something. The numbers do not lie.

2. Native prompt caching (the lever for cheap long sessions)

Anthropic's prompt caching is the single biggest reason Claude Code can stay affordable across long sessions. BUZZ leaves cache_control alone, so the cache is computed inside Anthropic's cluster and you get the same discount you would get from a direct integration.

// Your code does not change
const message = await anthropic.messages.create({
  model: "claude-opus-4-8",
  max_tokens: 1024,
  system: [
    {
      type: "text",
      text: "You are a code review assistant. Project context follows... (20K tokens)",
      cache_control: { type: "ephemeral" }   // forwarded by BUZZ untouched
    }
  ],
  messages: [...]
});

console.log(message.usage);
// {
//   input_tokens: 50,
//   cache_creation_input_tokens: 20000,  // first call writes the cache
//   cache_read_input_tokens: 0,
//   output_tokens: 200
// }
//
// Second call with the same cached prefix:
// cache_read_input_tokens: 20000  // hit, billed at 1/10 of input

Many relays strip cache_control outright because their architecture cannot maintain cache state across hops. Every call ends up billed at full input price. For long Claude Code sessions, that is a 10x to 50x cost difference over the life of a project.

3. Function calling and tool use, full fidelity

Claude Code's core mechanic is the tool_use to tool_result loop, often nested across many turns. Any relay that "cleans" the JSON schema breaks Claude Code immediately:

Error: tool_use_id "toolu_01ABC..." not found in messages

BUZZ does not parse tool_use bodies. It forwards them as bytes. Parallel tool calls, nested calls, and custom user-defined tools all behave exactly as Anthropic documents.

4. Full streaming surface

Every Server-Sent Events event type is preserved end to end:

message_start, message_delta, message_stop
content_block_start, content_block_delta, content_block_stop
ping heartbeats
error events

Tokens are pushed as they arrive, no buffering, no batching. The typewriter effect inside Claude Code, IDE integrations, and any UI that streams partial output all depend on this real-time delivery.

5. SDK compatibility on both surfaces

BUZZ implements two parallel interfaces.

Native Anthropic surface (recommended)

import { Anthropic } from "@anthropic-ai/sdk";

const client = new Anthropic({
  baseURL: "https://buzzai.cc",
  apiKey: process.env.BUZZ_API_KEY,
});

const message = await client.messages.create({
  model: "claude-opus-4-8",
  max_tokens: 1024,
  messages: [{ role: "user", content: "Hello" }],
});

OpenAI-compatible surface

from openai import OpenAI

client = OpenAI(
    base_url="https://buzzai.cc/v1",
    api_key="",
)

resp = client.chat.completions.create(
    model="claude-opus-4-8",
    messages=[{"role": "user", "content": "Hello"}],
)

Calling Claude through the OpenAI SDK is convenient for legacy codebases. Existing LangChain pipelines, in-house wrappers, and OpenAI-shaped frameworks usually need zero code changes to start routing to Claude.

6. Claude Code in two environment variables

This is the line that gets the most Claude Code users to actually try BUZZ:

export ANTHROPIC_BASE_URL=https://buzzai.cc
export ANTHROPIC_AUTH_TOKEN=<your BUZZ key>

claude   # ready to go

No wrapper, no proxy, no local config edits. Every native Anthropic feature, prompt caching, tool use, vision, streaming, behaves the same way it would against api.anthropic.com.

7. Zero data retention by default

Request and response bodies do not persist anywhere on the BUZZ side:

Not written to log files.
Not stored in any database.
Not reviewed by humans.
Not fed into any training pipeline.

Audit logs only contain timestamp, user ID, model, token counts, status code, and latency. They do not contain prompt text or completion text.

This matters for enterprise compliance and for individual developers shipping prompts that contain code, business logic, or internal documentation. Before sending sensitive material through any relay, ask the operator their exact retention policy in writing.

8. Pay-per-token pricing, no monthly fee

Model	Input / 1M tokens	Output / 1M tokens
Claude Opus 4.8	$0.20	$1.00
Claude Sonnet 4.6	$0.12	$0.60
Claude Haiku 4.5	$0.0361	$0.1805

Prompt cache hits are billed automatically using Anthropic's official multipliers. Cache reads are typically 1/10 of the input rate.

No monthly subscription, no minimum spend, no "traffic package" bundling. You pay for what you use, and balance does not expire. Live numbers are at https://buzzai.cc/api/pricing.

9. Multi-region, low latency (187 ms average)

BUZZ runs 12 edge nodes spread across regions, with a measured average latency of 187 ms (gateway-to-upstream plus return).

By comparison, single-region relays based out of one or two North American points of presence often add 400 to 800 ms for users far from the egress. Claude Code's interactive feel suffers visibly at that range.

Hands-on Claude Code setup

macOS / Linux

# ~/.zshrc or ~/.bashrc
export ANTHROPIC_BASE_URL=https://buzzai.cc
export ANTHROPIC_AUTH_TOKEN=sk-buzz-xxxxxxxx

# reload
source ~/.zshrc

# verify
claude --version
claude
# Claude Code is now routing through BUZZ with full prompt cache and tool use fidelity

Windows (PowerShell)

[Environment]::SetEnvironmentVariable("ANTHROPIC_BASE_URL", "https://buzzai.cc", "User")
[Environment]::SetEnvironmentVariable("ANTHROPIC_AUTH_TOKEN", "sk-buzz-xxxxxxxx", "User")

# restart PowerShell to pick up the new variables
claude

Confirming prompt cache is hitting

// Run the same project analysis twice in a row inside Claude Code.
// The second run should be noticeably faster and cheaper.
// In the BUZZ dashboard, under API Key > Usage, the cache_read_input_tokens
// share will tell you how much of the workload is hitting cache.

FAQ

Q1: What is the difference between a relay station and an AI gateway?

A traditional Claude relay station typically modifies your request to cut its own costs: injecting system prompts, trimming context, or swapping in a cheaper model. An AI gateway like BUZZ is built around transparent forwarding. The bytes you send are the bytes Anthropic receives, and back again. Prompt caching, tool use, and streaming all stay byte-for-byte faithful.

Q2: How do I point Claude Code at BUZZ?

Two environment variables: export ANTHROPIC_BASE_URL=https://buzzai.cc and export ANTHROPIC_AUTH_TOKEN=<your BUZZ key>. Nothing else in Claude Code's config has to change. BUZZ is fully compatible with Anthropic /v1/messages, so prompt caching kicks in automatically across long sessions.

Q3: Does BUZZ support prompt caching?

Yes, natively. The cache_control: {type: "ephemeral"} marker is forwarded to Anthropic untouched. Cache hits are computed by the upstream cluster, with the same latency and price discount you would see from a direct integration. In long Claude Code sessions, hit rates above 90% are typical.

Q4: Does BUZZ store my prompts or responses?

Zero retention. Request and response bodies are never written to disk, never enter any training pipeline, and are never reviewed by humans. Audit logs only contain token counts, status codes, latency, and a user identifier. They contain no prompt text and no completion text.

Q5: How is BUZZ priced? Is there a monthly subscription?

Pure pay-per-token, no monthly fee, no minimum spend. Opus 4.8 input is $0.20 per million tokens, Sonnet 4.6 input is $0.12 per million, Haiku 4.5 input is $0.04 per million. Prompt cache hits are billed automatically using Anthropic's official discount multipliers.

Q6: Which SDKs work with BUZZ?

The native Anthropic SDKs (Python, TypeScript, Go), the OpenAI SDK against the compatibility endpoint (so you can call Claude with the OpenAI client), and frameworks built on top such as LangChain, LangGraph, and the Vercel AI SDK. Tool use, function calling, and streaming all pass through with full fidelity.

Q7: What about latency and uptime?

BUZZ runs 12 edge nodes worldwide with a measured average round-trip latency of 187 ms. Trailing 30-day availability is 99.9%.

Q8: How can I verify a relay is not modifying my requests?

Compare token usage. Send the same prompt through the relay and directly to Anthropic and look at usage.input_tokens. If the relay's count is lower than the direct number, it trimmed your context. If it is higher, it injected something. BUZZ token counts match Anthropic's direct counts exactly, because the request bytes are identical.

Getting started

Sign up, top up the balance, copy two environment variables, and Claude Code is ready to go. First-time top-ups include a starter credit.

Create an account

Published: 2026-05-26
Last reviewed: 2026-05-26