BUZZ AI Gateway
Docs · Guides · Streaming with SSE

Streaming with SSE

Stream Claude responses as they are generated using Server-Sent Events. This guide covers when to stream, all seven event types, parsing in Python and Node SDKs and raw fetch, and how to handle mid-stream errors. BUZZ forwards SSE frames transparently from the upstream model.

POST https://buzzai.cc/v1/messages + "stream": true

When to use streaming

Streaming sends incremental updates over a single HTTP response so your application can render output as it is generated. Reach for it when one of these is true:

Enable streaming

Add "stream": true to the request body. Everything else stays the same.

curl -N -X POST https://buzzai.cc/v1/messages \
  -H "Authorization: Bearer $BUZZ_API_KEY" \
  -H "anthropic-version: 2023-06-01" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "claude-haiku-4-5-20251001",
    "max_tokens": 200,
    "stream": true,
    "messages": [
      {"role": "user", "content": "Count from 1 to 5."}
    ]
  }'
from anthropic import Anthropic

client = Anthropic(
    base_url="https://buzzai.cc",
    api_key="",
)

with client.messages.stream(
    model="claude-haiku-4-5-20251001",
    max_tokens=200,
    messages=[{"role": "user", "content": "Count from 1 to 5."}],
) as stream:
    for text in stream.text_stream:
        print(text, end="", flush=True)
import Anthropic from "@anthropic-ai/sdk";

const client = new Anthropic({
  baseURL: "https://buzzai.cc",
  apiKey: process.env.BUZZ_API_KEY,
});

const stream = await client.messages.stream({
  model: "claude-haiku-4-5-20251001",
  max_tokens: 200,
  messages: [{ role: "user", content: "Count from 1 to 5." }],
});

for await (const chunk of stream) {
  if (chunk.type === "content_block_delta" && chunk.delta.type === "text_delta") {
    process.stdout.write(chunk.delta.text);
  }
}

The -N flag on curl disables output buffering so the terminal flushes each SSE chunk as it arrives.

The seven event types

Every stream is a sequence of named events. Most clients only care about content_block_delta, but each event has a defined role.

EventMeaningTypical action
message_startHeader of the response. Carries the id, model, and the initial usage object.Capture the message id for logging or retry correlation.
content_block_startA new content block begins. The block type is one of text, tool_use, or thinking.Allocate a buffer keyed on index.
content_block_deltaAn incremental piece of the current block. See the four delta shapes below.Append to the buffer for that index.
content_block_stopThe current block is complete.Finalize the buffer; for tool_use blocks parse the accumulated partial_json into the input object.
message_deltaFinal stop_reason and the cumulative usage totals.Update token accounting; check the stop reason.
message_stopEnd of stream. No further frames.Close the response handle.
pingKeep-alive heartbeat. May be sent at any point.Ignore (or reset an idle timer).
errorA mid-stream failure after the initial 200 response. See Mid-stream errors.Stop reading and surface the error to the caller.

Delta payloads inside content_block_delta

delta.typeFieldUsed for
text_deltadelta.textVisible text generation
input_json_deltadelta.partial_jsonStreaming tool input arguments
thinking_deltadelta.thinkingExtended thinking traces (Opus 4.7)
signature_deltadelta.signatureSigned thinking blocks

Captured wire sample

This is a real frame sequence captured from https://buzzai.cc/v1/messages for the prompt "Count from 1 to 3":

event: message_start
data: {"type":"message_start","message":{"model":"claude-haiku-4-5-20251001","id":"msg_01YkyqfgStqigCHAgJ6uUDfd","type":"message","role":"assistant","content":[],"stop_reason":null,"stop_sequence":null,"usage":{"input_tokens":7,"cache_creation_input_tokens":0,"cache_read_input_tokens":0,"output_tokens":1,"service_tier":"standard"}}}

event: content_block_start
data: {"type":"content_block_start","index":0,"content_block":{"type":"text","text":""}}

event: content_block_delta
data: {"type":"content_block_delta","index":0,"delta":{"type":"text_delta","text":"1\n2\n3"}}

event: content_block_stop
data: {"type":"content_block_stop","index":0}

event: message_delta
data: {"type":"message_delta","delta":{"stop_reason":"end_turn","stop_sequence":null},"usage":{"input_tokens":7,"output_tokens":5}}

event: message_stop
data: {"type":"message_stop"}

Parsing with the Python SDK

The official Anthropic Python SDK exposes streaming through a context-managed helper. Pointing it at BUZZ is one base-URL change.

from anthropic import Anthropic

client = Anthropic(
    base_url="https://buzzai.cc",
    api_key="",
)

with client.messages.stream(
    model="claude-haiku-4-5-20251001",
    max_tokens=400,
    messages=[{"role": "user", "content": "Write a haiku about caching."}],
) as stream:
    # Option A: text-only iteration
    for text in stream.text_stream:
        print(text, end="", flush=True)

    # Option B: full event iteration with type dispatch
    # for event in stream:
    #     if event.type == "content_block_delta":
    #         if event.delta.type == "text_delta":
    #             print(event.delta.text, end="", flush=True)
    #         elif event.delta.type == "input_json_delta":
    #             handle_tool_input_chunk(event.index, event.delta.partial_json)

    final_message = stream.get_final_message()
    print()
    print("stop_reason:", final_message.stop_reason)
    print("usage:", final_message.usage)

The context manager guarantees the underlying HTTP response is closed even if the consumer iterates partially. get_final_message() returns the assembled Message object once the stream has terminated.

Parsing with the Node.js SDK

import Anthropic from "@anthropic-ai/sdk";

const client = new Anthropic({
  baseURL: "https://buzzai.cc",
  apiKey: process.env.BUZZ_API_KEY,
});

const stream = await client.messages.stream({
  model: "claude-haiku-4-5-20251001",
  max_tokens: 400,
  messages: [{ role: "user", content: "Write a haiku about caching." }],
});

for await (const event of stream) {
  switch (event.type) {
    case "content_block_delta":
      if (event.delta.type === "text_delta") {
        process.stdout.write(event.delta.text);
      } else if (event.delta.type === "input_json_delta") {
        // accumulate event.delta.partial_json keyed on event.index
      }
      break;
    case "message_delta":
      // final stop_reason and cumulative usage
      break;
  }
}

const finalMessage = await stream.finalMessage();
console.log("\nstop_reason:", finalMessage.stop_reason);
console.log("usage:", finalMessage.usage);

The SDK handles SSE framing, JSON parsing, and reconnect-aware buffering. Most production clients should use it directly rather than parsing the wire format.

Parsing raw SSE with fetch

Useful when you cannot pull in a full SDK, when you are streaming through an edge runtime, or when you want full control over framing. The wire format is plain text: lines starting with event: name the event, lines starting with data: carry the JSON payload, and an empty line terminates each frame.

async function streamMessages() {
  const resp = await fetch("https://buzzai.cc/v1/messages", {
    method: "POST",
    headers: {
      "Authorization": `Bearer ${process.env.BUZZ_API_KEY}`,
      "anthropic-version": "2023-06-01",
      "Content-Type": "application/json",
    },
    body: JSON.stringify({
      model: "claude-haiku-4-5-20251001",
      max_tokens: 400,
      stream: true,
      messages: [{ role: "user", content: "Write a haiku about caching." }],
    }),
  });

  if (!resp.ok) {
    throw new Error(`HTTP ${resp.status} ${await resp.text()}`);
  }

  const reader = resp.body.getReader();
  const decoder = new TextDecoder();
  let buffer = "";

  while (true) {
    const { value, done } = await reader.read();
    if (done) break;
    buffer += decoder.decode(value, { stream: true });

    // SSE frames are separated by a blank line.
    const frames = buffer.split("\n\n");
    buffer = frames.pop() ?? "";

    for (const frame of frames) {
      const lines = frame.split("\n");
      let eventName = "message";
      let dataLines = [];
      for (const line of lines) {
        if (line.startsWith("event:")) eventName = line.slice(6).trim();
        else if (line.startsWith("data:")) dataLines.push(line.slice(5).trim());
      }
      if (dataLines.length === 0) continue;
      const payload = JSON.parse(dataLines.join("\n"));

      if (eventName === "content_block_delta" && payload.delta?.type === "text_delta") {
        process.stdout.write(payload.delta.text);
      } else if (eventName === "error") {
        throw new Error(`stream error: ${payload.error?.type} ${payload.error?.message}`);
      } else if (eventName === "message_stop") {
        return;
      }
    }
  }
}

Two details that surprise people writing their first SSE parser:

Mid-stream errors and disconnects

Once the gateway has sent HTTP 200 OK headers, any failure that happens later arrives as an SSE event, not as a non-200 status. There are three cases worth handling explicitly.

1. event: error frame

The upstream may emit a typed error mid-stream — for example a rate_limit_error or overloaded_error that fired only after generation began. The frame shape is:

event: error
data: {"type":"error","error":{"type":"overloaded_error","message":"..."}}

Treat this as a hard failure: stop reading, surface error.type to the caller, and decide whether to retry. See the error-handling guide for backoff strategies per error type.

2. TCP disconnect mid-stream

The connection drops without a message_stop. This shows up in your client as a stream that ends without a final event. The SDKs surface it as an exception (APIConnectionError in Python, APIConnectionError in Node).

Resumption requires re-issuing the request from scratch — Claude streams have no native resume token. For idempotency, use metadata.user_id plus a request-side correlation id you control.

3. Client-side cancellation

Closing the response (Python: exit the with block; Node: call stream.controller.abort(); raw fetch: call reader.cancel()) signals the upstream to stop generation. Tokens already produced still count toward your usage, so cancel as early as possible if the user closes the chat.

HTTP status is set before any frame arrives. A 4xx or 5xx response means generation never started — the body is the standard error envelope (see the error table). Once you get a 200, all subsequent failures live inside SSE frames.

Transparent forwarding on BUZZ

BUZZ does not buffer, repackage, or rewrite the SSE stream. The frame boundaries you see are the boundaries the upstream model produced, in the order produced. Concretely:

Production checklist

See also