Skip to main content

Documentation Index

Fetch the complete documentation index at: https://wb-21fd5541-update-reference-docs-34.mintlify.app/llms.txt

Use this file to discover all available pages before exploring further.

Setting the stream option to true returns the model’s response incrementally as a stream of chunks, so you can display results as they arrive instead of waiting for the entire response. This is helpful when models take time to generate output. All hosted models support streaming output. We recommend streaming for reasoning models, since non-streaming requests can time out if the model takes a long time to start producing output. The following examples enable streaming for a chat completion request:
import openai

client = openai.OpenAI(
    base_url='https://api.inference.wandb.ai/v1',
    api_key="[YOUR-API-KEY]",  # Create an API key at https://wandb.ai/settings
)

stream = client.chat.completions.create(
    model="openai/gpt-oss-120b",
    messages=[
        {"role": "user", "content": "Tell me a rambling joke"}
    ],
    stream=True,
)

for chunk in stream:
    if chunk.choices:
        print(chunk.choices[0].delta.content or "", end="", flush=True)
    else:
        print(chunk) # Show CompletionUsage object