BUZZ AI Gateway
Docs · Concepts · Multi-Vendor Routing

Multi-Vendor Routing

A single BUZZ key reaches many upstream vendors of the same Claude model. Behind the curtain, BUZZ resolves your request to a healthy channel, retries on transient failures, and falls back across vendors when one is overloaded. From your code's perspective, none of this is visible: same URL, same key, same response shape.

The problem this solves

If you call Anthropic directly, you have one upstream. When Anthropic returns 529 overloaded, you wait. When a regional outage cuts your latency in half, you wait. If you hold capacity across multiple Anthropic-compatible vendors (Anthropic First-Party, AWS Bedrock, Google Vertex, third-party resellers), you can route around outages, but now your application has to know about each vendor, hold each set of credentials, and implement its own failover logic.

Multi-vendor routing pushes that logic into the gateway. You hold one BUZZ key. BUZZ holds the relationship with each vendor.

The mental model: groups and channels

Two concepts are enough to describe the entire system.

A channel is a single upstream credential: one Anthropic key, one Bedrock IAM session, one Vertex service account. Each channel carries metadata about which models it serves, its priority, its weight, and which advanced flags (service_tier, inference_geo, speed) it is allowed to forward.

A group is a named collection of channels with a routing policy. Your API key is bound to exactly one group. When a request arrives, BUZZ looks at the requested model, filters to channels in your group that serve it, and applies the group's policy to pick one.

YOUR KEY (group: "production") | v +-----------------------------+ | group: production | | policy: priority + weight | | ---- | | channel A (anthropic) pri=1 weight=10 models=[opus,sonnet,haiku] | channel B (bedrock-us) pri=1 weight=5 models=[sonnet,haiku] | channel C (vertex-eu) pri=2 weight=1 models=[sonnet] +-----------------------------+ | v selected = highest-priority healthy channel that serves the requested model

This structure also supports tiered offerings: a "free" group might point only at slow, cheaper channels; a "pro" group at first-party Anthropic; an "enterprise" group at a private, dedicated channel pool.

Channel selection

For each request, BUZZ runs a small, deterministic selection process:

  1. Filter by model. Drop channels that do not list the requested model.
  2. Filter by health. Drop channels currently in cool-down due to recent failures.
  3. Filter by capability. If the request uses a channel-gated parameter (service_tier, etc.), drop channels that do not allow it.
  4. Pick by priority then weight. Within the highest-priority surviving group, pick weighted-randomly. Lower priorities are reserved for failover.
  5. Forward. Stream the request through the selected channel.

If no channel survives the filter, BUZZ returns HTTP 503 buzz_error / model_not_found. This is distinct from Anthropic's own model errors, so you can tell at a glance whether the problem is upstream availability or a typo in the model name.

Failover and retries

Real upstreams fail. BUZZ classifies upstream responses into three buckets:

ClassExamplesAction
Permanent client error400 invalid_request, 401 auth, 403 permissionReturn to caller. Do not retry.
Transient upstream error500, 503, 529 overloaded, connection resetMark channel cooled-down, retry on next priority. Total budget capped per request.
Rate limit429 with retry-afterMark channel cooled-down for the suggested duration, fail over.

The retry budget is small (typically two attempts) so latency stays bounded. Failover happens before any bytes have been sent to your client; once streaming has started, BUZZ does not silently switch upstreams mid-stream because that would corrupt the SSE event stream.

Priority routing

Priority is the operator's lever for shaping cost and quality. A common configuration:

Within a priority, weight controls traffic share. Two channels at priority 1 with weights 10 and 5 receive roughly two-thirds and one-third of the traffic respectively.

This is also how BUZZ exposes premium features without breaking compatibility: a channel with allow_service_tier=true can be placed in a higher-tier group; clients that send service_tier: priority are routed there, while everyone else falls through to standard channels.

What you see in responses

By design, the chosen vendor is not echoed in the standard Anthropic response shape, because doing so would break drop-in compatibility. The information is available in two places without ambiguity:

For most workloads the right answer is "do not look at it"; routing is an operator concern, not an application concern. For workloads that do care (e.g., billing analytics that bucket by upstream), opt in to the channel header.

See also