[Feature]: Save streaming response and continue generation if worker node fails for RL

### Is your feature request related to a problem?

When using spot instances for RL rollout, worker nodes can fail without notifications. To avoid generating from the very beginning, we need to save completed streaming responses and dispatch to a healthy node to continue chat.

### Describe the Solution you'd like

1. Add a mode for RL + spot instances.
2. In this mode, all generations should use streaming and scheduler need to save the streaming response.
3. If worker fails, dispatch the requests to healthy nodes with generated outputs.

### Alternatives Considered (Optional)

_No response_

### Additional Context (Optional)

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature]: Save streaming response and continue generation if worker node fails for RL #411

Is your feature request related to a problem?

Describe the Solution you'd like

Alternatives Considered (Optional)

Additional Context (Optional)

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Feature]: Save streaming response and continue generation if worker node fails for RL #411

Description

Is your feature request related to a problem?

Describe the Solution you'd like

Alternatives Considered (Optional)

Additional Context (Optional)

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions