Skip to content

Conversation

@miguelmartin75
Copy link
Contributor

@miguelmartin75 miguelmartin75 commented Dec 17, 2025

What does this PR do?

This PR adds Cosmos Predict2.5 Base. It has been tested using the 2B model checkpoint official HF checkpoint is here. The converted checkpoints have yet to be uploaded to HF.

This change is largely based off the previous predict1/predict2 support done by @a-r-r-o-w

Testing:

  • Aside from the unit tests, the examples in this README from cosmos predict2.5 have been checked against, and the inference pipeline produces similar outputs.
  • These are included in the docstring for the pipeline

Additions

Pipelines:

  • A base pipeline which handles all modes for Predict2.5 Base checkpoint: Text2World, Image2World, Video2World
    • This pipeline loads the Reason1 checkpoint via Qwen2_5_VLForConditionalGeneration
    • In fact this pipeline supports image output if providing num_frames=1
    • Limitations: the pipeline assumes batch_size == 1
  • Three derivative pipelines are made based on this pipeline, to handle the official modes: Text2World, Image2World, Video2World
  • Unit tests are added, based on tests present for cosmos predict2

Scheduler:

  • FlowUniPCMultistepScheduler: is a new scheduler introduced, which uses the EDM noise schedule (Karras sigmas) using the UniPC algorithm as predict2.5 uses flow matching. This name can be changed.
  • The above is integrated into the existing UniPCMultistepScheduler scheduler via supporting use_karras_sigmas=True and use_flow_sigmas=True
  • This is done to match the predict2.5 codebase

Model changes:

  • Modified CosmosTransformer to accept an optional cross-attention projection layer (used for text embeddings from Reason1)

Scripts:

  • Extended scripts/convert_cosmos_to_diffusers.py to support Predict2.5

Who can review?

@miguelmartin75 miguelmartin75 changed the title Cosmos/predict2.5 base pr ready Cosmos Predict2.5 Base Model Dec 17, 2025
@miguelmartin75 miguelmartin75 changed the title Cosmos Predict2.5 Base Model Cosmos Predict2.5 Base: inference pipeline, checkpoint conversion & scheduler Dec 17, 2025
@miguelmartin75 miguelmartin75 changed the title Cosmos Predict2.5 Base: inference pipeline, checkpoint conversion & scheduler Cosmos Predict2.5 Base: inference pipeline, scheduler & chkpt conversion Dec 17, 2025
scheduler = FlowUniPCMultistepScheduler()

# NOTE: using Qwen2 VL instead for tests (reason1 is based on 2.5)
text_encoder = Qwen2VLForConditionalGeneration.from_pretrained(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there an internal Qwen2_5_VL model to test with?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would something like this work?

text_encoder = Qwen2_5_VLForConditionalGeneration(config)

Copy link
Member

@sayakpaul sayakpaul left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR!

I left some comments. My major comments are on separating the pipelines from one another instead of inheriting from one another.

Let's also add docs?

video = self.vae.decode(latents.to(self.vae.dtype), return_dict=False)[0]

assert self.safety_checker is not None
self.safety_checker.to(device)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't have to do it in this PR, but we could have a little utility like run_safety_checker() inside the pipelines and copy it over all the cosmos pipelines that require it (much akin to encode_prompt(), for example).

But this is not merge-blocking.

scheduler = FlowUniPCMultistepScheduler()

# NOTE: using Qwen2 VL instead for tests (reason1 is based on 2.5)
text_encoder = Qwen2VLForConditionalGeneration.from_pretrained(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would something like this work?

text_encoder = Qwen2_5_VLForConditionalGeneration(config)

@sayakpaul sayakpaul requested review from DN6 and yiyixuxu December 17, 2025 04:06
@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Copy link
Collaborator

@yiyixuxu yiyixuxu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR! I left some feedback. Mainly:

  1. Can we look into supporting the scheduler from the existing UniPCMultistepScheduler with the flow matching options (use_flow_sigmas, prediction_type="flow_prediction")? I'm ok adding this if it requires a lot of changes or just doesn't make sense to use the existing one, but wanted to check first.

  2. for the Pipeline, can we combine the 3 pipelines into one Cosmos2_5PredictPipeline that inherits directly from DiffusionPipeline? The current design isn't how we typically structure pipelines in diffusers. This isn't ideal since our pipelines are normally task-based (text2image, image2video, etc.), but i have to admit it's getting increasingly difficult to keep that pattern without huge portion of duplicated code. I think a single unified pipeline is reasonable here.

@miguelmartin75 miguelmartin75 force-pushed the cosmos/predict2.5-base-pr-ready branch 2 times, most recently from 4133c68 to 48f373b Compare December 18, 2025 03:24
@miguelmartin75 miguelmartin75 force-pushed the cosmos/predict2.5-base-pr-ready branch from 48f373b to c14a3da Compare December 18, 2025 03:25
@miguelmartin75 miguelmartin75 force-pushed the cosmos/predict2.5-base-pr-ready branch from c14a3da to b76f9f2 Compare December 18, 2025 03:26
@miguelmartin75
Copy link
Contributor Author

Updated PR to address comments, docs should be here: https://moon-ci-docs.huggingface.co/docs/diffusers/pr_12852/en/api/pipelines/cosmos when they are updated. I can update the main example to the latest model once we have uploaded the converted checkpoint to huggingface.

@sayakpaul
Copy link
Member

@bot /style

@github-actions
Copy link
Contributor

github-actions bot commented Dec 18, 2025

Style bot fixed some files and pushed the changes.

@sayakpaul
Copy link
Member

Could you run make fix-copies?

@miguelmartin75 miguelmartin75 force-pushed the cosmos/predict2.5-base-pr-ready branch from 5f41bc1 to 735fb0e Compare December 18, 2025 04:03
Copy link
Collaborator

@yiyixuxu yiyixuxu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks!

@sayakpaul sayakpaul merged commit b530968 into huggingface:main Dec 19, 2025
10 of 11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants