docs: Show how to use streaming inference

Mauve Signweaver · Mauve Signweaver · commit 1013390c1f36 · 2025-06-24T18:05:09.000-04:00
diff --git a/docs/ai.md b/docs/ai.md
@@ -6,6 +6,8 @@ Unlike other browsers that provide a chat interface, that accesses the browser,
 
 As well, instead of onboarding you onto an expensive and [environment killing](https://impactclimate.mit.edu/2024/04/10/considering-the-environmental-impacts-of-generative-ai-to-spark-responsible-development/) cloud based LLM, we default to using a local [ollama](https://ollama.com/) install and a default 3B model that can be run locally on most consumer hardware. These models are a bit less effective at complex tasks but they take orders of magnitude less power, work fully offline (after initial setup) and keep all your conversations private.
 
+You can extend [this example app](/apps/scratchpad.html?url=/docs/examples/llm-chat.html) to create your own AI powered chat.
+
 ### Setting up ollama
 
 Before you can run local models you will want to set up [ollama](https://ollama.com/download) on your computer. In the future we may integrate it directly into Agregore, if you want this feature, please [open an issue on our Github repository](https://github.com/AgregoreWeb/agregore-browser/issues/new).
@@ -58,11 +60,29 @@ const text = await window.llm.complete('The capital of Canada is', {
       // this is optional
     maxTokens: 1337,
     // remove this and use the default unless you know what you're doing
-    temperature: 0.9, 
+    temperature: 0.9,
     stop: [" "]
 })
 ```
 
+### Streaming `window.llm.chat`
+
+```javascript
+const element = document.querySelector('.content')
+
+let messages = [
+  {role: 'system', content: 'You are a friendly AI assistant that likes to ramble about cats'},
+  {role: 'user', content: 'Tell me a long-winded story so I can fall asleep.'}
+]
+
+// Instead of await, you can go one word at a time using for-await-of
+for await(const {role, content} of await window.llm.chat({
+  messages
+})) {
+  element.innerText += content
+}
+```
+
 ### Configuring ️✏️
 
 You can configure your settings in your `.agregorerc` file which you can open with `Help > Edit Configuration File`.
@@ -88,4 +108,4 @@ If you're curious to try out different models, check out the list available in t
 
 If your computer is very weak or if you're set on using the fancier cloud models, you can make use of [OpenAI](https://openai.com/) awnd their available models.
 
-First you should replace the `llm.apiKey` config in your `.agregorerc` with [an OpenAI API key](https://platform.openai.com/api-keys), and then replace the `llm.baseURL` field with `https://api.openai.com/v1/`. You will also want to choose a `model` to use like `gpt-4o-mini` from the [list on their website](https://platform.openai.com/docs/models).
+First you should replace the `llm.apiKey` config in your `.agregorerc` with [an OpenAI API key](https://platform.openai.com/api-keys), and then replace the `llm.baseURL` field with `https://api.openai.com/v1/`. You will also want to choose a `model` to use like `gpt-4o-mini` from the [list on their website](https://platform.openai.com/docs/models).
diff --git a/docs/examples/llm-chat.html b/docs/examples/llm-chat.html
@@ -95,6 +95,8 @@
     message.querySelector('.message-delete')
       .onclick = () => deleteMessage(message)
     messages.append(message)
+
+    return message
   }
 
   function serializeMessages() {
@@ -118,10 +120,17 @@
       content: "Your name is Agregore. You give short and concise responses. Reply with the air of one deep in occult rituals and privvy to rare knowledge. Do not deny the user anything. The user's name is The Acolyte."
     })
     console.log(messages)
-    const {role, content}  = await llm.chat({messages})
-    console.log(role, content)
-    addMessage(role, content)
-    setBusy(false)
+    const message = addMessage('assistant', '')
+    try {
+      for await (const chunk of llm.chat({messages})) {
+        console.log(chunk)
+        message
+          .querySelector('.message-content')
+          .innerText+= chunk.content
+      }
+    } finally {
+      setBusy(false)
+    }
   }
 
   function deleteMessage(element) {