Enable streaming responses - Weights & Biases Documentation

Setting the stream option to true returns the model’s response incrementally as a stream of chunks, so you can display results as they arrive instead of waiting for the entire response. This is helpful when models take time to generate output. All hosted models support streaming output. We recommend streaming for reasoning models, since non-streaming requests can time out if the model takes a long time to start producing output. The following examples enable streaming for a chat completion request:

Python
Bash

import openai

client = openai.OpenAI(
    base_url='https://api.inference.wandb.ai/v1',
    api_key="[YOUR-API-KEY]",  # Create an API key at https://wandb.ai/settings
)

stream = client.chat.completions.create(
    model="openai/gpt-oss-120b",
    messages=[
        {"role": "user", "content": "Tell me a rambling joke"}
    ],
    stream=True,
)

for chunk in stream:
    if chunk.choices:
        print(chunk.choices[0].delta.content or "", end="", flush=True)
    else:
        print(chunk) # Show CompletionUsage object

curl https://api.inference.wandb.ai/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer [YOUR-API-KEY]" \
  -d '{
    "model": "openai/gpt-oss-120b",
    "messages": [
      { "role": "user", "content": "Tell me a rambling joke" }
    ],
    "stream": true
  }'

Documentation Index