Stream Responses

For real-time interactions, NodeLLM supports streaming responses via standard JavaScript AsyncIterators. This allows you to display text to the user as it’s being generated, reducing perceived latency.

Basic Streaming

Use the stream() method on a chat instance to get an iterator.

const chat = NodeLLM.chat("gpt-4o");

process.stdout.write("Assistant: ");

for await (const chunk of chat.stream("Write a haiku about code.")) {
  // Most chunks contain content
  if (chunk.content) {
    process.stdout.write(chunk.content);
  }
}
// => Code flows like water
//    Logic builds a new world now
//    Bugs swim in the stream

Understanding Chunks

Each chunk passed to your loop contains partial information about the response.

content: The text fragment for this specific chunk. Can be empty contextually.
role: Usually “assistant”.
model: The model ID.
usage: (Optional) Token usage stats. Usually only present in the final chunk (provider dependent).

for await (const chunk of chat.stream("Hello")) {
  console.log(chunk);
  // { content: "He", role: "assistant", ... }
  // { content: "llo", role: "assistant", ... }
}

Streaming with Tools ✨

NEW: Tools now work seamlessly with streaming! When a model decides to call a tool during streaming, NodeLLM automatically:

Executes the tool with the provided arguments
Adds the result to the conversation history
Continues streaming the model’s final response

This all happens transparently - you just iterate over chunks as usual!

class WeatherTool extends Tool {
  name = "get_weather";
  description = "Get current weather";
  schema = z.object({
    location: z.string().describe("The city e.g. Paris")
  });

  async execute({ location }) {
    return { location, temp: 22, condition: 'sunny' };
  }
}

const chat = NodeLLM.chat("gpt-4o").withTool(WeatherTool);

// Tool is automatically executed during streaming!
for await (const chunk of chat.stream("What's the weather in Paris?")) {
  process.stdout.write(chunk.content || "");
}
// Output: "The weather in Paris is currently 22°C and sunny."

Tool Events in Streaming

You can also listen to tool execution events:

const chat = NodeLLM.chat("gpt-4o")
  .withTool(WeatherTool)
  .onToolCall((call) => {
    console.log(`\n[Tool Called: ${call.function.name}]`);
  })
  .onToolResult((result) => {
    console.log(`[Tool Result: ${JSON.stringify(result)}]\n`);
  });

for await (const chunk of chat.stream("Weather in Tokyo?")) {
  process.stdout.write(chunk.content || "");
}

Supported Providers: OpenAI, Anthropic, Gemini, DeepSeek

Error Handling

Stream interruptions (network failure, rate limits) will throw an error within the for await loop. Always wrap in a try/catch block.

try {
  for await (const chunk of chat.stream("Generate a long story...")) {
    process.stdout.write(chunk.content);
  }
} catch (error) {
  console.error("\n[Stream Error]", error.message);
}

Web Application Integration

Streaming is essential for modern web apps. Here is a simple example using Express:

import express from 'express';
import { NodeLLM } from '@node-llm/core';

const app = express();

app.get('/chat', async (req, res) => {
  // Set headers for streaming text
  res.setHeader('Content-Type', 'text/plain; charset=utf-8');
  res.setHeader('Transfer-Encoding', 'chunked');

  const chat = NodeLLM.chat("gpt-4o-mini");

  try {
    for await (const chunk of chat.stream(req.query.q as string)) {
      if (chunk.content) {
        res.write(chunk.content);
      }
    }
    res.end();
  } catch (error) {
    res.write(`\nError: ${error.message}`);
    res.end();
  }
});