Docs · Guides · Streaming with SSE

Streaming with SSE

Stream Claude responses as they are generated using Server-Sent Events. This guide covers when to stream, all seven event types, parsing in Python and Node SDKs and raw fetch, and how to handle mid-stream errors. BUZZ forwards SSE frames transparently from the upstream model.

POST https://buzzai.cc/v1/messages + "stream": true

When to use streaming

Streaming sends incremental updates over a single HTTP response so your application can render output as it is generated. Reach for it when one of these is true:

Long responses. Anything past a few hundred output tokens benefits from streaming. The user sees progress within tens of milliseconds instead of waiting for the whole completion.
Interactive UX. Chat UIs, code assistants, and any product where perceived latency matters. The first token arrives before the model finishes thinking.
Tool use loops. When the model emits a tool_use block your application can begin scheduling the tool call as the JSON arguments stream in via input_json_delta rather than waiting for the full message.
Extended thinking. With Opus 4.7 thinking enabled, the thinking_delta stream lets you display reasoning progress without blocking on the visible answer.
Long-running connections. The ping event keeps the TCP connection alive across slow generations and proxies that idle-timeout.

Enable streaming

Add "stream": true to the request body. Everything else stays the same.

curl -N -X POST https://buzzai.cc/v1/messages \
  -H "Authorization: Bearer $BUZZ_API_KEY" \
  -H "anthropic-version: 2023-06-01" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "claude-haiku-4-5-20251001",
    "max_tokens": 200,
    "stream": true,
    "messages": [
      {"role": "user", "content": "Count from 1 to 5."}
    ]
  }'

from anthropic import Anthropic

client = Anthropic(
    base_url="https://buzzai.cc",
    api_key="",
)

with client.messages.stream(
    model="claude-haiku-4-5-20251001",
    max_tokens=200,
    messages=[{"role": "user", "content": "Count from 1 to 5."}],
) as stream:
    for text in stream.text_stream:
        print(text, end="", flush=True)

import Anthropic from "@anthropic-ai/sdk";

const client = new Anthropic({
  baseURL: "https://buzzai.cc",
  apiKey: process.env.BUZZ_API_KEY,
});

const stream = await client.messages.stream({
  model: "claude-haiku-4-5-20251001",
  max_tokens: 200,
  messages: [{ role: "user", content: "Count from 1 to 5." }],
});

for await (const chunk of stream) {
  if (chunk.type === "content_block_delta" && chunk.delta.type === "text_delta") {
    process.stdout.write(chunk.delta.text);
  }
}

The -N flag on curl disables output buffering so the terminal flushes each SSE chunk as it arrives.

The seven event types

Every stream is a sequence of named events. Most clients only care about content_block_delta, but each event has a defined role.

Event	Meaning	Typical action
message_start	Header of the response. Carries the `id`, `model`, and the initial `usage` object.	Capture the message id for logging or retry correlation.
content_block_start	A new content block begins. The block `type` is one of `text`, `tool_use`, or `thinking`.	Allocate a buffer keyed on `index`.
content_block_delta	An incremental piece of the current block. See the four delta shapes below.	Append to the buffer for that `index`.
content_block_stop	The current block is complete.	Finalize the buffer; for `tool_use` blocks parse the accumulated `partial_json` into the input object.
message_delta	Final `stop_reason` and the cumulative `usage` totals.	Update token accounting; check the stop reason.
message_stop	End of stream. No further frames.	Close the response handle.
ping	Keep-alive heartbeat. May be sent at any point.	Ignore (or reset an idle timer).
error	A mid-stream failure after the initial 200 response. See Mid-stream errors.	Stop reading and surface the error to the caller.

Delta payloads inside `content_block_delta`

delta.type	Field	Used for
text_delta	`delta.text`	Visible text generation
input_json_delta	`delta.partial_json`	Streaming tool input arguments
thinking_delta	`delta.thinking`	Extended thinking traces (Opus 4.7)
signature_delta	`delta.signature`	Signed thinking blocks

Captured wire sample

This is a real frame sequence captured from https://buzzai.cc/v1/messages for the prompt "Count from 1 to 3":

event: message_start
data: {"type":"message_start","message":{"model":"claude-haiku-4-5-20251001","id":"msg_01YkyqfgStqigCHAgJ6uUDfd","type":"message","role":"assistant","content":[],"stop_reason":null,"stop_sequence":null,"usage":{"input_tokens":7,"cache_creation_input_tokens":0,"cache_read_input_tokens":0,"output_tokens":1,"service_tier":"standard"}}}

event: content_block_start
data: {"type":"content_block_start","index":0,"content_block":{"type":"text","text":""}}

event: content_block_delta
data: {"type":"content_block_delta","index":0,"delta":{"type":"text_delta","text":"1\n2\n3"}}

event: content_block_stop
data: {"type":"content_block_stop","index":0}

event: message_delta
data: {"type":"message_delta","delta":{"stop_reason":"end_turn","stop_sequence":null},"usage":{"input_tokens":7,"output_tokens":5}}

event: message_stop
data: {"type":"message_stop"}

Parsing with the Python SDK

The official Anthropic Python SDK exposes streaming through a context-managed helper. Pointing it at BUZZ is one base-URL change.

from anthropic import Anthropic

client = Anthropic(
    base_url="https://buzzai.cc",
    api_key="",
)

with client.messages.stream(
    model="claude-haiku-4-5-20251001",
    max_tokens=400,
    messages=[{"role": "user", "content": "Write a haiku about caching."}],
) as stream:
    # Option A: text-only iteration
    for text in stream.text_stream:
        print(text, end="", flush=True)

    # Option B: full event iteration with type dispatch
    # for event in stream:
    #     if event.type == "content_block_delta":
    #         if event.delta.type == "text_delta":
    #             print(event.delta.text, end="", flush=True)
    #         elif event.delta.type == "input_json_delta":
    #             handle_tool_input_chunk(event.index, event.delta.partial_json)

    final_message = stream.get_final_message()
    print()
    print("stop_reason:", final_message.stop_reason)
    print("usage:", final_message.usage)

The context manager guarantees the underlying HTTP response is closed even if the consumer iterates partially. get_final_message() returns the assembled Message object once the stream has terminated.

Parsing with the Node.js SDK

import Anthropic from "@anthropic-ai/sdk";

const client = new Anthropic({
  baseURL: "https://buzzai.cc",
  apiKey: process.env.BUZZ_API_KEY,
});

const stream = await client.messages.stream({
  model: "claude-haiku-4-5-20251001",
  max_tokens: 400,
  messages: [{ role: "user", content: "Write a haiku about caching." }],
});

for await (const event of stream) {
  switch (event.type) {
    case "content_block_delta":
      if (event.delta.type === "text_delta") {
        process.stdout.write(event.delta.text);
      } else if (event.delta.type === "input_json_delta") {
        // accumulate event.delta.partial_json keyed on event.index
      }
      break;
    case "message_delta":
      // final stop_reason and cumulative usage
      break;
  }
}

const finalMessage = await stream.finalMessage();
console.log("\nstop_reason:", finalMessage.stop_reason);
console.log("usage:", finalMessage.usage);

The SDK handles SSE framing, JSON parsing, and reconnect-aware buffering. Most production clients should use it directly rather than parsing the wire format.

Parsing raw SSE with `fetch`

Useful when you cannot pull in a full SDK, when you are streaming through an edge runtime, or when you want full control over framing. The wire format is plain text: lines starting with event: name the event, lines starting with data: carry the JSON payload, and an empty line terminates each frame.

async function streamMessages() {
  const resp = await fetch("https://buzzai.cc/v1/messages", {
    method: "POST",
    headers: {
      "Authorization": `Bearer ${process.env.BUZZ_API_KEY}`,
      "anthropic-version": "2023-06-01",
      "Content-Type": "application/json",
    },
    body: JSON.stringify({
      model: "claude-haiku-4-5-20251001",
      max_tokens: 400,
      stream: true,
      messages: [{ role: "user", content: "Write a haiku about caching." }],
    }),
  });

  if (!resp.ok) {
    throw new Error(`HTTP ${resp.status} ${await resp.text()}`);
  }

  const reader = resp.body.getReader();
  const decoder = new TextDecoder();
  let buffer = "";

  while (true) {
    const { value, done } = await reader.read();
    if (done) break;
    buffer += decoder.decode(value, { stream: true });

    // SSE frames are separated by a blank line.
    const frames = buffer.split("\n\n");
    buffer = frames.pop() ?? "";

    for (const frame of frames) {
      const lines = frame.split("\n");
      let eventName = "message";
      let dataLines = [];
      for (const line of lines) {
        if (line.startsWith("event:")) eventName = line.slice(6).trim();
        else if (line.startsWith("data:")) dataLines.push(line.slice(5).trim());
      }
      if (dataLines.length === 0) continue;
      const payload = JSON.parse(dataLines.join("\n"));

      if (eventName === "content_block_delta" && payload.delta?.type === "text_delta") {
        process.stdout.write(payload.delta.text);
      } else if (eventName === "error") {
        throw new Error(`stream error: ${payload.error?.type} ${payload.error?.message}`);
      } else if (eventName === "message_stop") {
        return;
      }
    }
  }
}

Two details that surprise people writing their first SSE parser:

Frames can split across read() calls. Always carry a buffer across iterations and only consume up to the last complete \n\n.
A single frame can have multiple data: lines. Concatenate them with \n before JSON-parsing.

Mid-stream errors and disconnects

Once the gateway has sent HTTP 200 OK headers, any failure that happens later arrives as an SSE event, not as a non-200 status. There are three cases worth handling explicitly.

1. `event: error` frame

The upstream may emit a typed error mid-stream — for example a rate_limit_error or overloaded_error that fired only after generation began. The frame shape is:

event: error
data: {"type":"error","error":{"type":"overloaded_error","message":"..."}}

Treat this as a hard failure: stop reading, surface error.type to the caller, and decide whether to retry. See the error-handling guide for backoff strategies per error type.

2. TCP disconnect mid-stream

The connection drops without a message_stop. This shows up in your client as a stream that ends without a final event. The SDKs surface it as an exception (APIConnectionError in Python, APIConnectionError in Node).

Resumption requires re-issuing the request from scratch — Claude streams have no native resume token. For idempotency, use metadata.user_id plus a request-side correlation id you control.

3. Client-side cancellation

Closing the response (Python: exit the with block; Node: call stream.controller.abort(); raw fetch: call reader.cancel()) signals the upstream to stop generation. Tokens already produced still count toward your usage, so cancel as early as possible if the user closes the chat.

HTTP status is set before any frame arrives. A 4xx or 5xx response means generation never started — the body is the standard error envelope (see the error table). Once you get a 200, all subsequent failures live inside SSE frames.

Transparent forwarding on BUZZ

BUZZ does not buffer, repackage, or rewrite the SSE stream. The frame boundaries you see are the boundaries the upstream model produced, in the order produced. Concretely:

No event collapsing. If the upstream emits eight content_block_delta frames, you see eight frames. The gateway does not coalesce small text deltas to reduce frame count.
No JSON normalization. The data: payloads are forwarded as bytes. Unknown fields the upstream introduces show up unchanged on your client.
No prompt rewriting. Your request bytes are the bytes the upstream model receives, including any cache_control directives. This is what makes the prompt cache work end-to-end (verified: cold-call cache_creation_input_tokens=1200, warm-call cache_read_input_tokens=1200 for the same payload).
BUZZ may add fields, never remove them. You may observe usage.iterations[] for per-iteration accounting transparency or context_management.applied_edits. Tolerate unknown fields in your parser.

Production checklist

Use the official SDK if you can. Hand-rolled SSE parsers eventually hit a frame-split bug.
Always handle the event: error frame separately from HTTP 4xx/5xx.
Keep a per-index buffer for content_block_delta; tool-use blocks need the full partial_json assembled before parsing.
Read message_delta.usage for the final token count, not message_start.usage.
Respect retry-after on 429 and back off on 529. See the error-handling guide.
Cancel streams on user-side abort to avoid paying for tokens you discard.

Streaming with SSE

When to use streaming

Enable streaming

The seven event types

Delta payloads inside content_block_delta

Captured wire sample

Parsing with the Python SDK

Parsing with the Node.js SDK

Parsing raw SSE with fetch

Mid-stream errors and disconnects

1. event: error frame

2. TCP disconnect mid-stream

3. Client-side cancellation

Transparent forwarding on BUZZ

Production checklist

See also

Delta payloads inside `content_block_delta`

Parsing raw SSE with `fetch`

1. `event: error` frame