Streaming with SSE
Stream Claude responses as they are generated using Server-Sent Events. This guide covers when to stream, all seven event types, parsing in Python and Node SDKs and raw fetch, and how to handle mid-stream errors. BUZZ forwards SSE frames transparently from the upstream model.
"stream": true
When to use streaming
Streaming sends incremental updates over a single HTTP response so your application can render output as it is generated. Reach for it when one of these is true:
- Long responses. Anything past a few hundred output tokens benefits from streaming. The user sees progress within tens of milliseconds instead of waiting for the whole completion.
- Interactive UX. Chat UIs, code assistants, and any product where perceived latency matters. The first token arrives before the model finishes thinking.
- Tool use loops. When the model emits a
tool_useblock your application can begin scheduling the tool call as the JSON arguments stream in viainput_json_deltarather than waiting for the full message. - Extended thinking. With Opus 4.7 thinking enabled, the
thinking_deltastream lets you display reasoning progress without blocking on the visible answer. - Long-running connections. The
pingevent keeps the TCP connection alive across slow generations and proxies that idle-timeout.
Enable streaming
Add "stream": true to the request body. Everything else stays the same.
curl -N -X POST https://buzzai.cc/v1/messages \
-H "Authorization: Bearer $BUZZ_API_KEY" \
-H "anthropic-version: 2023-06-01" \
-H "Content-Type: application/json" \
-d '{
"model": "claude-haiku-4-5-20251001",
"max_tokens": 200,
"stream": true,
"messages": [
{"role": "user", "content": "Count from 1 to 5."}
]
}'from anthropic import Anthropic
client = Anthropic(
base_url="https://buzzai.cc",
api_key="",
)
with client.messages.stream(
model="claude-haiku-4-5-20251001",
max_tokens=200,
messages=[{"role": "user", "content": "Count from 1 to 5."}],
) as stream:
for text in stream.text_stream:
print(text, end="", flush=True) import Anthropic from "@anthropic-ai/sdk";
const client = new Anthropic({
baseURL: "https://buzzai.cc",
apiKey: process.env.BUZZ_API_KEY,
});
const stream = await client.messages.stream({
model: "claude-haiku-4-5-20251001",
max_tokens: 200,
messages: [{ role: "user", content: "Count from 1 to 5." }],
});
for await (const chunk of stream) {
if (chunk.type === "content_block_delta" && chunk.delta.type === "text_delta") {
process.stdout.write(chunk.delta.text);
}
}The -N flag on curl disables output buffering so the terminal flushes each SSE chunk as it arrives.
The seven event types
Every stream is a sequence of named events. Most clients only care about content_block_delta, but each event has a defined role.
| Event | Meaning | Typical action |
|---|---|---|
| message_start | Header of the response. Carries the id, model, and the initial usage object. | Capture the message id for logging or retry correlation. |
| content_block_start | A new content block begins. The block type is one of text, tool_use, or thinking. | Allocate a buffer keyed on index. |
| content_block_delta | An incremental piece of the current block. See the four delta shapes below. | Append to the buffer for that index. |
| content_block_stop | The current block is complete. | Finalize the buffer; for tool_use blocks parse the accumulated partial_json into the input object. |
| message_delta | Final stop_reason and the cumulative usage totals. | Update token accounting; check the stop reason. |
| message_stop | End of stream. No further frames. | Close the response handle. |
| ping | Keep-alive heartbeat. May be sent at any point. | Ignore (or reset an idle timer). |
| error | A mid-stream failure after the initial 200 response. See Mid-stream errors. | Stop reading and surface the error to the caller. |
Delta payloads inside content_block_delta
| delta.type | Field | Used for |
|---|---|---|
| text_delta | delta.text | Visible text generation |
| input_json_delta | delta.partial_json | Streaming tool input arguments |
| thinking_delta | delta.thinking | Extended thinking traces (Opus 4.7) |
| signature_delta | delta.signature | Signed thinking blocks |
Captured wire sample
This is a real frame sequence captured from https://buzzai.cc/v1/messages for the prompt "Count from 1 to 3":
event: message_start
data: {"type":"message_start","message":{"model":"claude-haiku-4-5-20251001","id":"msg_01YkyqfgStqigCHAgJ6uUDfd","type":"message","role":"assistant","content":[],"stop_reason":null,"stop_sequence":null,"usage":{"input_tokens":7,"cache_creation_input_tokens":0,"cache_read_input_tokens":0,"output_tokens":1,"service_tier":"standard"}}}
event: content_block_start
data: {"type":"content_block_start","index":0,"content_block":{"type":"text","text":""}}
event: content_block_delta
data: {"type":"content_block_delta","index":0,"delta":{"type":"text_delta","text":"1\n2\n3"}}
event: content_block_stop
data: {"type":"content_block_stop","index":0}
event: message_delta
data: {"type":"message_delta","delta":{"stop_reason":"end_turn","stop_sequence":null},"usage":{"input_tokens":7,"output_tokens":5}}
event: message_stop
data: {"type":"message_stop"}
Parsing with the Python SDK
The official Anthropic Python SDK exposes streaming through a context-managed helper. Pointing it at BUZZ is one base-URL change.
from anthropic import Anthropic
client = Anthropic(
base_url="https://buzzai.cc",
api_key="",
)
with client.messages.stream(
model="claude-haiku-4-5-20251001",
max_tokens=400,
messages=[{"role": "user", "content": "Write a haiku about caching."}],
) as stream:
# Option A: text-only iteration
for text in stream.text_stream:
print(text, end="", flush=True)
# Option B: full event iteration with type dispatch
# for event in stream:
# if event.type == "content_block_delta":
# if event.delta.type == "text_delta":
# print(event.delta.text, end="", flush=True)
# elif event.delta.type == "input_json_delta":
# handle_tool_input_chunk(event.index, event.delta.partial_json)
final_message = stream.get_final_message()
print()
print("stop_reason:", final_message.stop_reason)
print("usage:", final_message.usage)
The context manager guarantees the underlying HTTP response is closed even if the consumer iterates partially. get_final_message() returns the assembled Message object once the stream has terminated.
Parsing with the Node.js SDK
import Anthropic from "@anthropic-ai/sdk";
const client = new Anthropic({
baseURL: "https://buzzai.cc",
apiKey: process.env.BUZZ_API_KEY,
});
const stream = await client.messages.stream({
model: "claude-haiku-4-5-20251001",
max_tokens: 400,
messages: [{ role: "user", content: "Write a haiku about caching." }],
});
for await (const event of stream) {
switch (event.type) {
case "content_block_delta":
if (event.delta.type === "text_delta") {
process.stdout.write(event.delta.text);
} else if (event.delta.type === "input_json_delta") {
// accumulate event.delta.partial_json keyed on event.index
}
break;
case "message_delta":
// final stop_reason and cumulative usage
break;
}
}
const finalMessage = await stream.finalMessage();
console.log("\nstop_reason:", finalMessage.stop_reason);
console.log("usage:", finalMessage.usage);
The SDK handles SSE framing, JSON parsing, and reconnect-aware buffering. Most production clients should use it directly rather than parsing the wire format.
Parsing raw SSE with fetch
Useful when you cannot pull in a full SDK, when you are streaming through an edge runtime, or when you want full control over framing. The wire format is plain text: lines starting with event: name the event, lines starting with data: carry the JSON payload, and an empty line terminates each frame.
async function streamMessages() {
const resp = await fetch("https://buzzai.cc/v1/messages", {
method: "POST",
headers: {
"Authorization": `Bearer ${process.env.BUZZ_API_KEY}`,
"anthropic-version": "2023-06-01",
"Content-Type": "application/json",
},
body: JSON.stringify({
model: "claude-haiku-4-5-20251001",
max_tokens: 400,
stream: true,
messages: [{ role: "user", content: "Write a haiku about caching." }],
}),
});
if (!resp.ok) {
throw new Error(`HTTP ${resp.status} ${await resp.text()}`);
}
const reader = resp.body.getReader();
const decoder = new TextDecoder();
let buffer = "";
while (true) {
const { value, done } = await reader.read();
if (done) break;
buffer += decoder.decode(value, { stream: true });
// SSE frames are separated by a blank line.
const frames = buffer.split("\n\n");
buffer = frames.pop() ?? "";
for (const frame of frames) {
const lines = frame.split("\n");
let eventName = "message";
let dataLines = [];
for (const line of lines) {
if (line.startsWith("event:")) eventName = line.slice(6).trim();
else if (line.startsWith("data:")) dataLines.push(line.slice(5).trim());
}
if (dataLines.length === 0) continue;
const payload = JSON.parse(dataLines.join("\n"));
if (eventName === "content_block_delta" && payload.delta?.type === "text_delta") {
process.stdout.write(payload.delta.text);
} else if (eventName === "error") {
throw new Error(`stream error: ${payload.error?.type} ${payload.error?.message}`);
} else if (eventName === "message_stop") {
return;
}
}
}
}
Two details that surprise people writing their first SSE parser:
- Frames can split across
read()calls. Always carry abufferacross iterations and only consume up to the last complete\n\n. - A single frame can have multiple
data:lines. Concatenate them with\nbefore JSON-parsing.
Mid-stream errors and disconnects
Once the gateway has sent HTTP 200 OK headers, any failure that happens later arrives as an SSE event, not as a non-200 status. There are three cases worth handling explicitly.
1. event: error frame
The upstream may emit a typed error mid-stream — for example a rate_limit_error or overloaded_error that fired only after generation began. The frame shape is:
event: error
data: {"type":"error","error":{"type":"overloaded_error","message":"..."}}
Treat this as a hard failure: stop reading, surface error.type to the caller, and decide whether to retry. See the error-handling guide for backoff strategies per error type.
2. TCP disconnect mid-stream
The connection drops without a message_stop. This shows up in your client as a stream that ends without a final event. The SDKs surface it as an exception (APIConnectionError in Python, APIConnectionError in Node).
Resumption requires re-issuing the request from scratch — Claude streams have no native resume token. For idempotency, use metadata.user_id plus a request-side correlation id you control.
3. Client-side cancellation
Closing the response (Python: exit the with block; Node: call stream.controller.abort(); raw fetch: call reader.cancel()) signals the upstream to stop generation. Tokens already produced still count toward your usage, so cancel as early as possible if the user closes the chat.
Transparent forwarding on BUZZ
BUZZ does not buffer, repackage, or rewrite the SSE stream. The frame boundaries you see are the boundaries the upstream model produced, in the order produced. Concretely:
- No event collapsing. If the upstream emits eight
content_block_deltaframes, you see eight frames. The gateway does not coalesce small text deltas to reduce frame count. - No JSON normalization. The
data:payloads are forwarded as bytes. Unknown fields the upstream introduces show up unchanged on your client. - No prompt rewriting. Your request bytes are the bytes the upstream model receives, including any
cache_controldirectives. This is what makes the prompt cache work end-to-end (verified: cold-callcache_creation_input_tokens=1200, warm-callcache_read_input_tokens=1200for the same payload). - BUZZ may add fields, never remove them. You may observe
usage.iterations[]for per-iteration accounting transparency orcontext_management.applied_edits. Tolerate unknown fields in your parser.
Production checklist
- Use the official SDK if you can. Hand-rolled SSE parsers eventually hit a frame-split bug.
- Always handle the
event: errorframe separately fromHTTP 4xx/5xx. - Keep a per-
indexbuffer forcontent_block_delta; tool-use blocks need the fullpartial_jsonassembled before parsing. - Read
message_delta.usagefor the final token count, notmessage_start.usage. - Respect
retry-afteron 429 and back off on 529. See the error-handling guide. - Cancel streams on user-side abort to avoid paying for tokens you discard.