Sayan Deb Sarkar 1 . Sinisa Stekovic 2 . Vincent Lepetit 2 . Iro Armeni1
Neural Information Processing Systems (NeurIPS) 2025
1 Stanford University · 2 ENPC, IP Paris
Transferring appearance to 3D assets using different representations of the appearance object—such as images or text—has garnered interest due to its wide range of applications in industries like gaming, augmented reality, and digital content creation. However, state-of-the-art methods still fail when the geometry between the input and appearance objects is significantly different. A straightforward approach is to directly apply a 3D generative model, but we show that this ultimately fails to produce appealing results. Instead, we propose a principled approach inspired by universal guidance. Given a pretrained rectified flow model conditioned on image or text, our training-free method interacts with the sampling process by periodically adding guidance. This guidance can be modeled as a differentiable loss function, and we experiment with two different types of guidance including part-aware losses for appearance and self-similarity. Our experiments show that our approach successfully transfers texture and geometric details to the input 3D asset, outperforming baselines both qualitatively and quantitatively. We also show that traditional metrics are not suitable for evaluating the task due to their inability of focusing on local details and comparing dissimilar inputs, in absence of ground truth data. We thus evaluate appearance transfer quality with a GPT-based system objectively ranking outputs, ensuring robust and human-like assessment, as further confirmed by our user study. Beyond showcased scenarios, our method is general and could be extended to different types of diffusion models and guidance functions.
Check out our Project Page for more examples and interactive demos!
[2026-03] Reference code and interactive Viser demo released — see Installation & Usage below.
- [2025-09] 🎉🥳 GuideFlow3D accepted to NeurIPS 2025! See you in San Diego 🔥✨
Tested on Ubuntu 22.04.05 LTS, CUDA 12.8, PyTorch 2.7.1.
1. Clone the repo and install conda.
Install Miniconda or Anaconda so conda is on your PATH. Linux x86_64 one-liner (silent install to ~/miniconda3, then hook your shell):
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh -O miniconda.sh
bash miniconda.sh -b -p "$HOME/miniconda3"
rm miniconda.sh
"$HOME/miniconda3/bin/conda" init2. Create and activate the conda environment.
conda create -n guideflow3d python=3.11 -y
conda activate guideflow3d3. Run setup.sh.
From the repo root, run bash setup.sh. This installs PyTorch and all dependencies, and builds compiled packages (flash-attention, nvdiffrast, etc.). The install is slow; sudo may be needed for system packages.
4. Install Blender 3.0.1 (Linux x64).
Multiview rendering uses the TRELLIS Blender script. Download and extract the 3.0.1 tarball, then point BLENDER_HOME at the binary:
export BLENDER_INSTALLATION_PATH="$HOME/Downloads"
export BLENDER_HOME="$BLENDER_INSTALLATION_PATH/blender-3.0.1-linux-x64/blender"Add these to your .bashrc/.zshrc or export them in the same session before running the pipeline.
5. Download PartField weights.
Download the Objaverse pretrained model from PartField (Pretrained Model in their README) and place model_objaverse.ckpt in the weights/ directory at the repository root (gitignored). The path must match continue_ckpt in third_party/PartField/config.yaml.
Troubleshooting
- If
conda activatedoes not apply aftersetup.sh, open a new terminal and runconda activate guideflow3d. - If Blender is missing at runtime, verify
BLENDER_HOMEpoints to the extractedblender-3.0.1-linux-x64/blenderbinary. SetBLENDER_INSTALLATION_PATHif you did not use~/Downloads. - If PartField inference fails, confirm
weights/model_objaverse.ckptexists and matchesthird_party/PartField/config.yaml.
guideflow3d/
├── run.py # main CLI entry point
├── config/default.yaml # default hyperparameters
├── gui/app.py # Viser interactive demo
├── bash/run.sh # example batch runs
├── lib/
│ ├── opt/ # guidance optimization
│ │ ├── appearance.py # appearance (part-aware) guidance
│ │ └── self_similarity.py # self-similarity guidance
│ └── util/ # pipeline utilities
│ ├── generation.py # DINOv2 feature extraction, SLAT encoding/decoding
│ ├── render.py # Blender multiview rendering
│ ├── pointcloud.py # mesh voxelization
│ └── partfield.py # PartField feature sampling & co-segmentation
├── third_party/ # vendored TRELLIS and PartField
├── examples/ # sample meshes and images
└── weights/ # PartField checkpoint (gitignored)
Work from the repository root with the guideflow3d env active. Steps 4–5 of Installation (Blender + PartField weights) are required before running anything below. We provide sample meshes and images under examples/.
To start the web-based interactive demo:
python gui/app.pyOpen http://localhost:8080 in your browser (Viser). Outputs default to outputs/gui_run_<id>/; use Toggle Structure / Output to compare the output mesh against the input structure mesh.
run.py is the main command-line entry point. It runs the full pipeline end-to-end: Blender rendering → voxelization → PartField co-segmentation → (appearance only) DINOv2 feature extraction → SLAT encoding → guided flow sampling → SLAT decoding.
| Argument | Required | Description |
|---|---|---|
--guidance_mode |
Yes | appearance or similarity |
--structure_mesh |
Yes | Structure mesh (.glb) |
--output_dir |
Yes | Outputs, renders, checkpoints |
--convert_yup_to_zup |
No | Y-up → Z-up |
--appearance_mesh |
Appearance | Appearance mesh (.glb) |
--appearance_image |
No* | Reference image |
--appearance_text |
No* | Text (similarity mode) |
* Similarity: --appearance_text or --appearance_image, not both. Appearance: --appearance_mesh required; without --appearance_image, an image is rendered from the mesh.
python run.py --guidance_mode similarity \
--structure_mesh examples/example1.glb \
--output_dir outputs/my_run \
--appearance_text "a wooden chair"Outputs. Each run writes the following under --output_dir:
| File | Description |
|---|---|
out_app.glb / out_sim.glb |
Final textured mesh (appearance / similarity mode) |
out_gaussian_app.mp4 / out_gaussian_sim.mp4 |
Gaussian splatting turntable video |
struct_renders/, app_renders/ |
Blender multiview renders |
voxels/ |
Voxelized structure (and appearance) meshes |
features/, latents/ |
DINOv2 features and SLAT latents (appearance mode) |
partfield/ |
PartField feature planes |
Expected runtime: ~2–4 minutes on a single RTX 4090 (appearance mode, 300 steps). Similarity mode is faster since it skips DINOv2 extraction and SLAT encoding.
bash/run.sh shows a handful of example commands, running several run.py jobs on files in examples/.
bash bash/run.sh- 🧊 TRELLIS — structured 3D latents, encoders, rendering.
- 🔧 PartField — part-aware 3D feature fields.
- 🎛️ SpaceControl — Viser GUI ideas.
@inproceedings{sdsarkar_guideflow3d_2025,
author = {Deb Sarkar, Sayan and Stekovic, Sinisa and Lepetit, Vincent and Armeni, Iro},
title = {GuideFlow3D: Optimization-Guided Rectified Flow For 3D Appearance Transfer},
booktitle = {Advances in Neural Information Processing Systems (NeurIPS)},
year = {2025},
}