Skip to content

Conversation

@leisuzz
Copy link
Contributor

@leisuzz leisuzz commented Dec 18, 2025

What does this PR do?

  1. I got the error:
    raise ValueError(f"Expected image_latents to be a list, got {type(image_latents)}.")
    (1) cond_model_input_list will go to "_prepare_image_ids" in a list of [[1, cond_model_input[0], cond_model_input[1], cond_model_input[2]], ...]
    (2) As the "_prepare_image_ids" in pipeline will do the torch.cat(image_latent_ids, dim=0), this will cause mismatch of shape in the training step in code model_input_ids = torch.cat([model_input_ids, cond_model_input_ids], dim=1). cond_model_input_ids .shape[0] is 1, but model_input_ids.shape[0] is the batch size. The code cond_model_input_ids.view is to resize the shape to meet the requirement
    So this change will also work if batch size is more than 1.

  2. When I only changed the cond_model_input to list, I got the training abnormal training loss (start with ~1.7, which is too high). So I fix model prediction based on the pipeline part, and loss becomes reasonable (start with ~0.4).

With the code:

model_pred = model_pred[:, :noisy_seq_len, :]
model_input_ids = model_input_ids[:, :noisy_seq_len, :]
The training loss is:
Steps:   0%|          | 1/5000 [00:29<40:20:41, 29.05s/it]
Steps:   0%|          | 1/5000 [00:29<40:20:41, 29.05s/it, loss=0.328, lr=1e-5]
Steps:   0%|          | 2/5000 [01:00<42:12:33, 30.40s/it, loss=0.328, lr=1e-5]
Steps:   0%|          | 2/5000 [01:00<42:12:33, 30.40s/it, loss=0.835, lr=1e-5]
Steps:   0%|          | 3/5000 [01:29<41:20:34, 29.78s/it, loss=0.835, lr=1e-5]
Steps:   0%|          | 3/5000 [01:29<41:20:34, 29.78s/it, loss=0.254, lr=1e-5]
Steps:   0%|          | 4/5000 [01:58<40:54:41, 29.48s/it, loss=0.254, lr=1e-5]
Steps:   0%|          | 4/5000 [01:58<40:54:41, 29.48s/it, loss=0.405, lr=1e-5]
Steps:   0%|          | 5/5000 [02:27<40:43:31, 29.35s/it, loss=0.405, lr=1e-5]
Steps:   0%|          | 5/5000 [02:27<40:43:31, 29.35s/it, loss=1.03, lr=1e-5] 
Steps:   0%|          | 6/5000 [02:53<39:12:51, 28.27s/it, loss=1.03, lr=1e-5]
Steps:   0%|          | 6/5000 [02:53<39:12:51, 28.27s/it, loss=0.574, lr=1e-5]
Steps:   0%|          | 7/5000 [03:20<38:17:51, 27.61s/it, loss=0.574, lr=1e-5]
Steps:   0%|          | 7/5000 [03:20<38:17:51, 27.61s/it, loss=0.29, lr=1e-5] 
Steps:   0%|          | 8/5000 [03:49<38:54:26, 28.06s/it, loss=0.29, lr=1e-5]
Steps:   0%|          | 8/5000 [03:49<38:54:26, 28.06s/it, loss=0.393, lr=1e-5]

With the original code:

model_pred = model_pred[:, : packed_noisy_model_input.size(1) :]
model_pred = Flux2Pipeline._unpack_latents_with_ids(model_pred, model_input_ids)

The training loss is:

Steps:   0%|          | 1/5000 [00:46<64:57:32, 46.78s/it]
Steps:   0%|          | 1/5000 [00:46<64:57:32, 46.78s/it, loss=2.01, lr=1e-5]
Steps:   0%|          | 2/5000 [01:15<50:29:04, 36.36s/it, loss=2.01, lr=1e-5]
Steps:   0%|          | 2/5000 [01:15<50:29:04, 36.36s/it, loss=2.08, lr=1e-5]
Steps:   0%|          | 3/5000 [01:47<47:31:01, 34.23s/it, loss=2.08, lr=1e-5]
Steps:   0%|          | 3/5000 [01:47<47:31:01, 34.23s/it, loss=1.83, lr=1e-5]
Steps:   0%|          | 4/5000 [02:18<45:54:39, 33.08s/it, loss=1.83, lr=1e-5]
Steps:   0%|          | 4/5000 [02:18<45:54:39, 33.08s/it, loss=1.99, lr=1e-5]
Steps:   0%|          | 5/5000 [02:47<43:39:23, 31.46s/it, loss=1.99, lr=1e-5]
Steps:   0%|          | 5/5000 [02:47<43:39:23, 31.46s/it, loss=2.02, lr=1e-5]
Steps:   0%|          | 6/5000 [03:16<42:28:13, 30.62s/it, loss=2.02, lr=1e-5]
Steps:   0%|          | 6/5000 [03:16<42:28:13, 30.62s/it, loss=2.01, lr=1e-5]
Steps:   0%|          | 7/5000 [03:42<40:32:24, 29.23s/it, loss=2.01, lr=1e-5]
Steps:   0%|          | 7/5000 [03:42<40:32:24, 29.23s/it, loss=1.83, lr=1e-5]
Steps:   0%|          | 8/5000 [04:12<40:37:29, 29.30s/it, loss=1.83, lr=1e-5]
Steps:   0%|          | 8/5000 [04:12<40:37:29, 29.30s/it, loss=1.92, lr=1e-5]

Before submitting

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@leisuzz leisuzz changed the title Bugfix for dreambooth flux2 img2img2 Bugfix for flux2 img2img2 prediction Dec 18, 2025
@leisuzz
Copy link
Contributor Author

leisuzz commented Dec 18, 2025

@sayakpaul Please take a look at this PR. Thank you for your help!

@sayakpaul
Copy link
Member

Do you have a reproducer?

@leisuzz
Copy link
Contributor Author

leisuzz commented Dec 18, 2025

@sayakpaul I've updated the result in the description, thanks :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants