Model name
gemma3
Command run
// as simple as possible:
let model = try await CoreAILanguageModel(resourcesAt: modelUrl!)
let session = LanguageModelSession(model: model)
let response = try await session.respond(to: "What is the capital city of America?")
print(response)
// The next one fails:
let response2 = try await session.respond(to: "What is the capital city of Canada?")
print(response2)
macOS / iOS target
macOS 27.0 beta 1
Xcode version
Xcode 27.0 beta 1
Python / uv version
uv 0.11.19, python 3.14.5
Full error output
== RUNNABLE ERROR:
CrashReportError: Fatal Error in CoreAIPipelinedEngine.swift
Application crashed due to fatalError in CoreAIPipelinedEngine.swift at line 151.
Engine not returned after drain() — tokenSequence Task stuck?
Process: CoreAIChat [38979]
Path: <none>
Date/Time: 2026-06-14 17:20:32 +0000
Anything else?
Steps to reproduce:
- In a SwiftUI App or Xcode-project: Load a dynamically shaped Core AI language model (eg. Gemma-3-4b-it-4bit-dynamic)
- Create one
LanguageModelSession with the model..
- Call
session.respond() and wait for the first response to complete.
- Immediately call
session.respond() again using the same session.
- The first response completes normally after emitting EOS.
- The second call waits in CoreAIPipelinedEngine.reset() / drain().
- After approximately five seconds, the process crashes with:
Fatal error: Engine not returned after drain() — tokenSequence Task stuck?
Why does this fail?
I started some research in the repository and I found the following:
CoreAIPipelinedEngine.generate() starts an independent producer task that keeps generating tokens up to maxTokens while holding exclusive ownership of the engine.
When respondVanilla() detects EOS, it records .eos and stops consuming the stream. However, this does not cancel or await the producer task. The producer therefore continues running in the background while engineInUse remains true.
The second respond() call invokes reset(), which waits in drain() for the previous producer to release the engine. If it does not finish within approximately five seconds, drain() terminates the process with fatalError.
A single Task.yield() at the end of respondVanilla() does not guarantee that the producer has completed or released the engine.
Intended Behavior
I am not 100% sure whether consumers are intentionally expected to drain the remaining token stream before reusing the engine, or whether early termination should automatically cancel and await the producer task.
If applications are expected to handle this themselves, maybe that lifecycle requirement should be documented. However, draining the stream would still allow the engine to generate and discard all remaining tokens up to maxTokens, wasting GPU time and all belonging to that.
Thanks in advance👍
Model name
gemma3
Command run
macOS / iOS target
macOS 27.0 beta 1
Xcode version
Xcode 27.0 beta 1
Python / uv version
uv 0.11.19, python 3.14.5
Full error output
== RUNNABLE ERROR: CrashReportError: Fatal Error in CoreAIPipelinedEngine.swift Application crashed due to fatalError in CoreAIPipelinedEngine.swift at line 151. Engine not returned after drain() — tokenSequence Task stuck? Process: CoreAIChat [38979] Path: <none> Date/Time: 2026-06-14 17:20:32 +0000Anything else?
Steps to reproduce:
LanguageModelSessionwith the model..session.respond()and wait for the first response to complete.session.respond()again using the same session.Fatal error: Engine not returned after drain() — tokenSequence Task stuck?Why does this fail?
I started some research in the repository and I found the following:
CoreAIPipelinedEngine.generate()starts an independent producer task that keeps generating tokens up tomaxTokenswhile holding exclusive ownership of the engine.When
respondVanilla()detects EOS, it records.eosand stops consuming the stream. However, this does not cancel or await the producer task. The producer therefore continues running in the background whileengineInUseremainstrue.The second
respond()call invokesreset(), which waits indrain()for the previous producer to release the engine. If it does not finish within approximately five seconds,drain()terminates the process with fatalError.A single
Task.yield()at the end of respondVanilla() does not guarantee that the producer has completed or released the engine.Intended Behavior
I am not 100% sure whether consumers are intentionally expected to drain the remaining token stream before reusing the engine, or whether early termination should automatically cancel and await the producer task.
If applications are expected to handle this themselves, maybe that lifecycle requirement should be documented. However, draining the stream would still allow the engine to generate and discard all remaining tokens up to
maxTokens, wasting GPU time and all belonging to that.Thanks in advance👍