Engineering Blog

Notes from the gateway

Practical writing on multi-vendor LLM routing, prompt caching, billing internals, and running Claude, GPT, and Gemini under a single key.

Latest · Security

When Billing Overflows: How One Integer Can Turn Charges Into Credits

A field note on a class of LLM billing bugs: user-controlled parameters like image count, video duration, or max_tokens overflowing an int64 and flipping a charge into a credit. Why it happens, how to spot it, and the two layers that shut it down.

July 8, 2026 Read the analysis →

All posts

AnnouncementJun 30, 2026

Claude Sonnet 5 Is Live on BUZZ: Model ID, Pricing, and Migration Notes

claude-sonnet-5 is the new balanced workhorse of the Claude 5 family — newer and cheaper than Sonnet 4.6. Model ID, pricing, and a zero-change upgrade path.

Model SelectionJun 30, 2026

Claude 5 Family: How to Choose Between Fable 5, Sonnet 5, and Haiku

A decision tree for routing across the Claude 5 family under one key. Match request difficulty to model, default-plus-escalate, and the cost math that makes it worth it.

Trend NoteJun 9, 2026

Above Opus: What a "Mythos-Class" Tier Signals About Where Models Are Going

Claude 5 introduced a Mythos-class tier that sits above Opus. What a new top shelf means for capability, cost, and the widening gap between frontier and default models — and why routing beats defaulting to the newest thing.

AnnouncementJun 9, 2026

Claude Fable 5 Is Live on BUZZ: Model ID, Pricing, and Migration Notes

claude-fable-5, the first model in Anthropic's Claude 5 family and its most capable generally available model, is routable now through the same endpoint and key. Model ID, pay-per-token pricing, and a drop-in path from Opus 4.8.

AnnouncementMay 29, 2026

Claude Opus 4.8 Is Live on BUZZ: Model ID, Pricing, and Migration Notes

claude-opus-4-8 is routable now through the same endpoint and key. Exact model identifier, pay-per-token pricing, prompt cache warm-up behavior, and a zero-change upgrade from claude-opus-4-7.

ConfigurationMay 26, 2026

How to Point Claude Code at a Custom Endpoint: Complete ANTHROPIC_BASE_URL Reference

Practical reference covering env var vs settings.json, ANTHROPIC_AUTH_TOKEN vs API_KEY, and verification across macOS / Linux / Windows.

ErrorsMay 26, 2026

Anthropic API Error Code Reference: 401, 403, 429, 500, 529 — Root Cause and Fix

Complete map of Anthropic API errors. 5 root causes for 401, exponential backoff in Python/Node/Go for 429, multi-upstream fallback for 529.

OptimizationMay 26, 2026

Why Your cache_creation_input_tokens Is Zero: 7 Prompt Cache Antipatterns

7 cache_control patterns that quietly break your hit rate, with broken/fixed code pairs. Diagnose from usage fields. From 30% to 92% in real workloads.

MigrationMay 26, 2026

Drop-in OpenAI to Claude Migration: A Per-Field Compatibility Matrix

26-field compatibility matrix for the OpenAI-compatible path to Claude. Fully compatible / partially / silently dropped — with verification scripts.

ComparisonMay 26, 2026

BUZZ vs Traditional Claude Relay Stations: Why Claude Code Users Are Switching

A 9-dimension comparison between BUZZ AI Gateway and traditional Claude relay stations. Transparent forwarding, native Prompt Cache, full Tool Use fidelity, drop-in Claude Code compatibility.

ArchitectureMay 22, 2026

Choosing an AI Gateway: BUZZ vs OpenRouter vs Helicone vs LiteLLM

A practical, side-by-side decision guide for picking an AI gateway. Cost, retention posture, multi-model coverage, and operational ergonomics.

Cost OptimizationMay 22, 2026

Anthropic Prompt Caching in Production: A Practical Cost-Reduction Playbook

A production playbook for Anthropic prompt caching. cache_control mechanics, hit-rate diagnostics, and the operational gotchas no one writes about.

EngineeringMay 22, 2026

Claude API Through a Gateway: A Practical Guide

Putting a transparent, zero-retention gateway in front of Anthropic Claude. Streaming, tool use, prompt caching, and pricing notes — with Python code.

Cost OptimizationMay 22, 2026

Cutting Claude Code Costs Without Losing Capability

Claude Code is powerful but expensive. How routing the CLI through a gateway preserves quality while reducing spend.

EngineeringMay 22, 2026

Tool Use With Claude Through a Gateway: Streaming, Errors, and Cost Patterns

A walkthrough of Claude tool use through a transparent gateway. Streaming, error handling, and the cost shape of multi-turn function calls.

Multi-ModelMay 22, 2026

One API Key for Claude, GPT, Gemini, and Grok

Code-first walkthrough of running Anthropic Claude, OpenAI GPT, Google Gemini, and xAI Grok behind a single, OpenAI-compatible key.

CompatibilityMay 22, 2026

Using the OpenAI SDK to Talk to Claude (and Gemini, and Grok)

A hands-on guide to calling Claude, Gemini, and Grok with the official OpenAI SDK — drop-in compatibility through a transparent forwarder.

ComplianceMay 22, 2026

Zero-Retention LLM Gateways: Why Enterprises Need Forwarders That Forget

A practical guide to zero-retention LLM gateways for enterprise teams. Privacy posture, vendor settings, and what auditors actually look for.