Skip to content

gainsborourse/NYCU_DL-Final-Project

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

36 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

AdaFreeU: Adaptive FreeU Parameter Prediction for Text-to-Image Diffusion Models

AdaFreeU overview

We propose AdaFreeU, an adaptive extension of FreeU for text-to-image diffusion models such as Stable Diffusion (SD). FreeU improves generation by re-weighting U-Net backbone and skip features during inference through two scaling factors, but its fixed parameters may not generalize well across different prompts, seeds, and visual styles. We address this limitation by predicting FreeU parameters adaptively using strategies such as Gaussian policy prediction, REINFORCE, and DPO. Experimental results show that AdaFreeU improves over both standard SD and default FreeU, with DPO achieving the highest mean ImageReward. Compared with default FreeU, DPO consistently improves the mean ImageReward gain over SD, with 2.1x larger gains in constant mode and 7.0x larger gains in spatial mode.

Yi-Hsiang Ho*, Ting-Wei Chou*, Yi-Cheng Lai* — National Yang Ming Chiao Tung University (* Equal contribution)

Project Page · Poster · Proposal

Methodology

Gaussian Policy

Gaussian policy pipeline

Frozen CLIP encoders embed the prompt and a baseline SD image; their concatenated features feed a policy network that outputs a mean and log-variance over the 8 FreeU parameters. The policy is trained on (prompt, FreeU parameters, reward) samples with a reward-weighted Gaussian negative log-likelihood, so high-reward parameter choices are pulled closer to the predicted distribution. At inference, FreeU parameters are sampled directly from the predicted Gaussian.

REINFORCE

REINFORCE pipeline

The same CLIP + policy network produces a mean, from which an action is sampled and denormalized into FreeU parameters with an associated log-probability. SD generates an image using these parameters, which is scored by ImageReward and compared against a baseline image's reward to form the advantage. The policy is updated with the REINFORCE objective, reinforcing parameter choices that improve ImageReward over the baseline.

DPO

DPO pipeline

The policy network outputs a base parameter vector, which is Gaussian-perturbed into two candidates. SD renders an image for each candidate, and ImageReward compares them to identify the preferred and non-preferred outputs. The policy is optimized with a DPO-style loss that pulls the base parameters toward the preferred candidate's parameters and away from the non-preferred one.

Experimental Setup & Dataset

Our experiments are conducted on Stable Diffusion 1.5, based on the latent diffusion framework, with a custom FreeU implementation that enables per-layer control over the four U-Net upsampling layers. We compare standard Stable Diffusion, default FreeU, and adaptive FreeU variants that predict FreeU parameters from the prompt and baseline SD image. Large-scale evaluation is performed on the MJHQ-30K dataset, using prompts and category metadata to assess performance across diverse image types. ImageReward-v1.0 is used as the primary evaluation metric, measuring prompt-image quality according to learned human preference.

Results

We evaluate each method on 5,000 MJHQ-30K prompts sampled uniformly across categories, using ImageReward as the main evaluation metric. Adaptive FreeU methods generally outperform standard SD and default FreeU, with DPO achieving the highest mean ImageReward of 0.076 under constant prediction.

Method Constant ImageReward Spatial ImageReward
SD -0.0216 ± 0.0137 -0.0216 ± 0.0137
Default FreeU 0.0245 ± 0.0136 -0.0155 ± 0.0138
Gaussian 0.0304 ± 0.0137 -0.0088 ± 0.0138
REINFORCE 0.0435 ± 0.0137 0.0132 ± 0.0138
DPO 0.0764 ± 0.0136 0.0271 ± 0.0138

A user study further supports that the Gaussian-based adaptive prediction method is preferred over both SD and default FreeU:

Mode SD Default FreeU Gaussian
Constant 30.0% (189/630) 23.7% (149/630) 46.3% (292/630)
Spatial 14.8% (93/630) 23.5% (148/630) 61.7% (389/630)

References

  1. Si, C., Huang, Z., Jiang, Y., & Liu, Z. (2024). FreeU: Free lunch in diffusion U-Net. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
  2. Xu, J., Liu, X., Wu, Y., Tong, Y., Li, Q., Ding, M., Tang, J., & Dong, Y. (2023). ImageReward: Learning and evaluating human preferences for text-to-image generation. In Advances in Neural Information Processing Systems (NeurIPS).
  3. Playground AI. (2024). MJHQ-30K benchmark [Data set]. Hugging Face.
  4. Rombach, R., Blattmann, A., Lorenz, D., Esser, P., & Ommer, B. (2022). High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
  5. Rafailov, R., Sharma, A., Mitchell, E., Manning, C. D., Ermon, S., & Finn, C. (2023). Direct preference optimization: Your language model is secretly a reward model. In Advances in Neural Information Processing Systems (NeurIPS).

Code

Project Layout

src/
  cli.py                  # `uv run freeu` entrypoint
  adaptive/              # dataset build/upload, b/s predictor training, inference
  configs/adaptive/       # Adaptive FreeU dataset configs
  configs/sd15/base.json  # SD1.5 defaults and task directory
  configs/sd15/tasks/     # one experiment task per JSON file
  core/                   # FreeU algorithm and diffusion pipeline wrapper
  experiments/            # experiment runners: compare, sweep, seed search, ablation
  utils/                  # config loading, figure rendering, metadata IO
outputs/                  # generated images and metadata, ignored by git
docs/freeu/               # official FreeU reference repo copy
docs/                     # proposal/paper PDFs

outputs/ is runtime output and can be deleted before reruns. src/configs/ stores the experiment configs we edit with the code.

Setup

uv sync
# Stable Diffusion model access may require: uv run hf auth login

FreeU Reproduction

Run configured tasks (use --dry-run to preview without loading Stable Diffusion, or --task <name> for a single task):

uv run freeu                                          # run all configured tasks
uv run freeu --dry-run
uv run freeu --task teddy_snowstorm_feature_maps --device cuda --dtype float16 --seed 12
uv run freeu --task teddy_snowstorm_feature_maps --freeu-mode spatial

Tasks live in src/configs/sd15/tasks/, with shared defaults in src/configs/sd15/base.json. FreeU parameters are per-layer:

{ "index": 0, "name": "up_block_0_lowest_resolution", "b": 1.5, "s": 0.9, "enabled": true }

This lets us compare single layers and combinations such as L1, L2, L3, L4, L1+L2, L2+L3, and L3+L4.

  • --freeu-mode constant applies fixed backbone scaling plus skip Fourier filtering.
  • --freeu-mode spatial uses the paper-style normalized feature map for backbone scaling.
  • Our implementation reimplements FreeU directly instead of using diffusers' enable_freeu().

Adaptive FreeU Pipeline

1. Build datasets — generates SD1.5 baselines, FreeU candidates, and ImageReward labels:

uv run freeu dataset build --config src/configs/adaptive/freeu_constant_sd15.json
uv run freeu dataset build --config src/configs/adaptive/freeu_spatial_sd15.json --output-dir outputs/datasets/freeu_spatial_sd15

# smoke test
uv run freeu dataset build --dry-run --max-prompts 2 --candidate-count 2

Optionally upload a dataset to Hugging Face (use upload-large-folder to preserve the dataset name in the repo):

uv run hf upload-large-folder gainsborouo/NYCU_DL-Final-Project outputs/datasets --repo-type dataset --include "freeu_constant_sd15/**"

2. Train a policy — three interchangeable objectives, all using a frozen-CLIP + policy network and the same cached-feature pipeline (the first run builds a CLIP feature cache under the dataset directory):

# supervised b/s predictor (per-layer b/s for L1-L4, no enable flags)
uv run freeu train predictor --dataset outputs/datasets/freeu_constant_sd15 \
  --output-dir outputs/checkpoints/predictor_freeu_constant_sd15 --epochs 20 --batch-size 4

# reward-weighted Gaussian policy: p(b/s | prompt, SD baseline) via softmax(reward_delta / tau) weighting
uv run freeu train policy --dataset outputs/datasets/freeu_constant_sd15 \
  --output-dir outputs/checkpoints/freeu_constant_sd15_policy --epochs 20 --batch-size 16 --reward-temperature 0.5

# reward-weighted mixture-density policy (same pipeline, mixture of Gaussians)
uv run freeu train mdn-policy --dataset outputs/datasets/freeu_constant_sd15 \
  --output-dir outputs/checkpoints/freeu_constant_sd15_mdn_policy --epochs 20 --batch-size 16 \
  --mixture-count 4 --reward-temperature 0.5

3. Benchmark on MJHQ-30K — generates SD, SD+default FreeU, and SD+adaptive FreeU per prompt and scores them with ImageReward (add --compute-fid for FID against MJHQ real images, a set-level metric so it needs a real sample size, not a smoke run):

# evaluate one or more checkpoints (predicted parameters used directly at inference)
uv run freeu benchmark mjhq --checkpoint outputs/checkpoints/predictor_freeu_constant_sd15 --checkpoint-name v1 \
  --checkpoint outputs/checkpoints/freeu_constant_sd15_policy --checkpoint-name v2 \
  --output-dir outputs/benchmarks/mjhq_constant_sd15_compare --max-samples 100 --split test --compute-fid

# compare fixed FreeU baselines without any checkpoint, with category-balanced prompt sampling
uv run freeu benchmark mjhq --output-dir outputs/benchmarks/mjhq_freeu_modes \
  --freeu-baseline-mode spatial --freeu-baseline-mode backbone_fourier --freeu-baseline-mode wavelet \
  --max-samples 100 --prompt-sample-strategy category_uniform --prompt-sample-seed 42

--freeu-mode selects the baseline preset (backbone_fourier and wavelet use mode-specific defaults). Results are written to mjhq_results.csv, summary.json, and benchmark_comparison.svg.

4. Predict & compare — predict b/s from a prompt + SD baseline image and render a default-vs-adaptive comparison:

uv run freeu predict --checkpoint outputs/checkpoints/predictor_freeu_constant_sd15 \
  --prompt "A red fox sitting in fresh snow, frosted pine forest background, close-up portrait, soft morning light, realistic wildlife photography, sharp focus" \
  --image outputs/datasets/freeu_constant_sd15/images/sample_00000_seed_42/baseline_sd.png --seed 42

About

114下 交大 深度學習 期末專題

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages