Engineering Blog
Practical notes on multi-vendor LLM gateways, prompt caching, zero-retention forwarding, and operating Claude / GPT / Gemini under a single key.
-
Claude Opus 4.8 Is Live on BUZZ: Model ID, Pricing, and Migration Notes
claude-opus-4-8 is routable now through the same endpoint and key. Exact model identifier, pay-per-token pricing, prompt cache warm-up behavior, and a zero-change upgrade from claude-opus-4-7.
-
How to Point Claude Code at a Custom Endpoint: Complete ANTHROPIC_BASE_URL Reference
Practical reference covering env var vs settings.json, ANTHROPIC_AUTH_TOKEN vs API_KEY, and verification across macOS / Linux / Windows.
-
Anthropic API Error Code Reference: 401, 403, 429, 500, 529 — Root Cause and Fix
Complete map of Anthropic API errors. 5 root causes for 401, exponential backoff in Python/Node/Go for 429, multi-upstream fallback for 529.
-
Why Your cache_creation_input_tokens Is Zero: 7 Prompt Cache Antipatterns
7 cache_control patterns that quietly break your hit rate, with broken/fixed code pairs. Diagnose from usage fields. From 30% to 92% in real workloads.
-
Drop-in OpenAI to Claude Migration: A Per-Field Compatibility Matrix
26-field compatibility matrix for the OpenAI-compatible path to Claude. Fully compatible / partially / silently dropped — with verification scripts.
-
BUZZ vs Traditional Claude Relay Stations: Why Claude Code Users Are Switching
A 9-dimension comparison between BUZZ AI Gateway and traditional Claude relay stations. Transparent forwarding, native Prompt Cache, full Tool Use fidelity, drop-in Claude Code compatibility.
-
Choosing an AI Gateway: BUZZ vs OpenRouter vs Helicone vs LiteLLM
A practical, side-by-side decision guide for picking an AI gateway. Cost, retention posture, multi-model coverage, and operational ergonomics.
-
Anthropic Prompt Caching in Production: A Practical Cost-Reduction Playbook
A production playbook for Anthropic prompt caching. cache_control mechanics, hit-rate diagnostics, and the operational gotchas no one writes about.
-
Claude API Through a Gateway: A Practical Guide
Putting a transparent, zero-retention gateway in front of Anthropic Claude. Streaming, tool use, prompt caching, and pricing notes — with Python code.
-
Cutting Claude Code Costs Without Losing Capability
Claude Code is powerful but expensive. How routing the CLI through a gateway preserves quality while reducing spend.
-
Tool Use With Claude Through a Gateway: Streaming, Errors, and Cost Patterns
A walkthrough of Claude tool use through a transparent gateway. Streaming, error handling, and the cost shape of multi-turn function calls.
-
One API Key for Claude, GPT, Gemini, and Grok
Code-first walkthrough of running Anthropic Claude, OpenAI GPT, Google Gemini, and xAI Grok behind a single, OpenAI-compatible key.
-
Using the OpenAI SDK to Talk to Claude (and Gemini, and Grok)
A hands-on guide to calling Claude, Gemini, and Grok with the official OpenAI SDK — drop-in compatibility through a transparent forwarder.
-
Zero-Retention LLM Gateways: Why Enterprises Need Forwarders That Forget
A practical guide to zero-retention LLM gateways for enterprise teams. Privacy posture, vendor settings, and what auditors actually look for.