`CoreAIPipelinedEngine` producer remains active after EOS, causing the next response to crash in `drain()`

### Model name

gemma3

### Command run

```shell
// as simple as possible:
let model = try await CoreAILanguageModel(resourcesAt: modelUrl!)
let session = LanguageModelSession(model: model)
        
let response = try await session.respond(to: "What is the capital city of America?")
print(response)

// The next one fails:
let response2 = try await session.respond(to: "What is the capital city of Canada?")
print(response2)
```

### macOS / iOS target

macOS 27.0 beta 1

### Xcode version

Xcode 27.0 beta 1

### Python / uv version

uv 0.11.19, python 3.14.5

### Full error output

```shell
== RUNNABLE ERROR:

    CrashReportError: Fatal Error in CoreAIPipelinedEngine.swift
    
    Application crashed due to fatalError in CoreAIPipelinedEngine.swift at line 151.
    
    Engine not returned after drain() — tokenSequence Task stuck?
    
    Process:             CoreAIChat [38979]
    Path:                <none>
    
    Date/Time:           2026-06-14 17:20:32 +0000
```

### Anything else?

  ## Steps to reproduce:
  1. In a SwiftUI App or Xcode-project: Load a dynamically shaped Core AI language model (eg. Gemma-3-4b-it-4bit-dynamic)
  2. Create one `LanguageModelSession` with the model..
  3. Call `session.respond()` and wait for the first response to complete.
  4. Immediately call `session.respond()` again using the same session.
  5. The first response completes normally after emitting EOS.
  6. The second call waits in CoreAIPipelinedEngine.reset() / drain().
  7. After approximately five seconds, the process crashes with:`Fatal error: Engine not returned after drain() — tokenSequence Task stuck?`

<img width="1418" height="814" alt="Image" src="https://github.com/user-attachments/assets/277b835f-9373-4272-aac1-ed781af4ddd7" />

## Why does this fail?
I started some research in the repository and I found the following: 
`CoreAIPipelinedEngine.generate()` starts an independent producer task that keeps generating tokens up to `maxTokens` while holding exclusive ownership of the engine.
When `respondVanilla()` detects EOS, it records `.eos` and stops consuming the stream. However, this does not cancel or await the producer task. The producer therefore continues running in the background while `engineInUse` remains `true`.

The second `respond()` call invokes `reset()`, which waits in `drain()` for the previous producer to release the engine. If it does not finish within approximately five seconds, `drain()` terminates the process with fatalError.
A single `Task.yield()` at the end of respondVanilla() does not guarantee that the producer has completed or released the engine.

## Intended Behavior
I am not 100% sure whether consumers are intentionally expected to drain the remaining token stream before reusing the engine, or whether early termination should automatically cancel and await the producer task.

If applications are expected to handle this themselves, maybe that lifecycle requirement should be documented. However, draining the stream would still allow the engine to generate and discard all remaining tokens up to `maxTokens`, wasting GPU time and all belonging to that. 

Thanks in advance👍 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`CoreAIPipelinedEngine` producer remains active after EOS, causing the next response to crash in `drain()` #41

Model name

Command run

macOS / iOS target

Xcode version

Python / uv version

Full error output

Anything else?

Steps to reproduce:

Why does this fail?

Intended Behavior

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

CoreAIPipelinedEngine producer remains active after EOS, causing the next response to crash in drain() #41

Description

Model name

Command run

macOS / iOS target

Xcode version

Python / uv version

Full error output

Anything else?

Steps to reproduce:

Why does this fail?

Intended Behavior

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

`CoreAIPipelinedEngine` producer remains active after EOS, causing the next response to crash in `drain()` #41