The official implementation of DaLGM, a depth-aware extension of the Large Multi-View Gaussian Model (LGM) for feed-forward 3D object reconstruction. Given 9 input views, the model predicts a 3D Gaussian Splatting (3DGS) representation and renders high-quality novel views while improving geometric fidelity and training efficiency through depth supervision and Gaussian pruning.
The pipeline takes multi-view RGB images of an object as input, predicts a set of 3D Gaussians via a UNet, and renders novel views using a differentiable Gaussian rasterizer. Key extensions over the original LGM include:
- Adaptive input views — input views are sampled randomly from fixed azimuth bands during training, improving robustness
- Pixel-aligned Gaussians — each Gaussian is placed along a camera ray at a learned depth, giving better geometric grounding and direct depth map extraction
- Depth supervision — pixel-aligned depth is supervised against ground-truth depth maps using L1/L2/Huber/BerHu losses with depth-aware RANKING loss
- Gaussian pruning — voxel-grid clustering removes duplicate/low-opacity Gaussians before rendering
Recommend for reproduciblity:
- CUDA version: 13.0 (we have tested with CUDA 13.0, others may work but not verified)
- GPU type: NVIDIA RTX5880Ada
- Num GPUs: 2
- Min available space: 190 GB
Clone the repository:
git clone https://github.com/Lhhiep-maxcode/DaLGM.git
cd DaLGMCreate and activate a Conda environment:
conda create -n dalgm python=3.12 -y
conda activate dalgmInstall all dependencies (replace 13.0 with your CUDA version, e.g., 12.8). Currently, the installation script has been verified to work with CUDA 13.0.
bash setup.sh 13.0This will install PyTorch, xFormers, diff-gaussian-rasterization, nvdiffrast, and all Python requirements, then download the pretrained checkpoint.
The training dataset is constructed from 10,000 objects sampled from Objaverse. We preprocess the original 3D assets into multi-view RGB images and corresponding depth maps following our rendering pipeline. After successful installation from step 1. Install dependencies, the dataset should follow the structure below:
dataset_root/
├── archive_001/
│ └── object_name/
│ ├── rgb/
│ │ ├── 000.png (elev: 0, azim: 0)
│ │ ├── 001.png (elev: 0, azim: 5.625)
| | ├── ...
│ │ └── 063.png (elev: 0, azim: 354.375)
│ │ └── 064.png (elev: 90, azim: 180)
│ └── depth/
│ ├── 000.npz
│ └── ...
000–063: Side views with 0° elevation and azimuth angles uniformly sampled from 0° to 354.375° (step size: 5.625°).064: Top-down view with 90° elevation and 180° azimuth.
Review the train.sh script and modify it if necessary. For reproducibility, you only need to update the following variables to match your local environment:
data_pathdepth1_pathwandb_project_namewandb_experiment_id(can be set toNone)wandb_experiment_namewandb_key
Optional: If you want to try with different value of threshold for proposed pruning algorithm + weight value for depth ranking loss. Try to adjust alpha_threshold, distance_threshold, scale_threshold, rot_threshold, rgb_threshold, and lambda_depth_rank.
Once the configuration is ready, start training with:
bash train.shOr manually:
accelerate launch --config_file accelerate_configs/gpu2.yaml main.py big \
--resume best_phase1/best_phase1_model.safetensors --fine_tune \
--workspace workspace \
--data_path 10k-dataset-9-views \
--depth1_path 10k-dataset-9-views \
--lambda_depth 0.5 --lambda_grad -1 --lambda_opacity -1 --lambda_depth_rank 0.3 --depth_loss_type l1 \
--num_workers 4 --batch_size 6 --mixed_precision fp16 --input_size 160 --splat_size 160 --pixel_align \
--output_size 512 --num_epochs 50 --train_size 0.8 --num_views_input 9 --num_views_output 9 \
--alpha_threshold 0.004 --distance_threshold -1 --scale_threshold -1 --rot_threshold -1 --rgb_threshold -1 \
--lr 1e-4 --gradient_accumulation_steps 4 --warmup_steps 2500 \
--wandb_project_name YOUR_PROJECT_NAME \
--wandb_experiment_id None \
--wandb_experiment_name YOUR_EXPERIMENT_NAME \
--wandb_key YOUR_WANDB_KEY \
> train.log 2>&1 &We evaluate on two benchmarks: GSO and ABO. The pipeline has two evaluation levels.
pip install kaggle
kaggle datasets download memaybeo12/best-depthloss-depth-ranking-2 -p checkpoints --unzip# RGB input views
kaggle datasets download laihoanghiep/100-gso-rgba-input -p data/gso/rgb --unzip
# Novel views
kaggle datasets download laihoanghiep/100-gso-16-views-for-eval -p data/gso/eval --unzip
# Ground-truth meshes
kaggle datasets download laihoanghiep/100-gso-mesh-gt -p data/gso/mesh_gt --unzip# RGB input views
kaggle datasets download laihoanghiep/100-abo-rgb-input -p data/abo/rgb --unzip
# Novel views
kaggle datasets download laihoanghiep/100-abo-16-views-for-eval -p data/abo/eval --unzip
# Ground-truth meshes
kaggle datasets download laihoanghiep/100-abo-mesh-gt -p data/abo/mesh_gt --unzip- The input dataset for evaluation (
100-abo-rgb-input,100-gso-rgb-input) follows convention as training dataset - The 16-view dataset for evaluation (
100-abo-16-views-for-eval,100-gso-16-views-for-eval) follows convention as following:000-007: Side views with 30° elevation and azimuth angles uniformly sampled from 0° to 315° (step size: 45°)008-015: Side views with 60° elevation and azimuth angles uniformly sampled from 0° to 315° (step size: 45°)
Make sure you are using the dalgm environment:
conda activate dalgmConvert the exported Gaussians to meshes, then compute geometric metrics. Replace <benchmark> with gso or abo depending on which benchmark you want to evaluate on.
# Convert Gaussians to meshes
python export_lgm_gaussians.py \
--config big \
--resume checkpoints/model.safetensors \
--fine-tune \
--data-path data/<benchmark>/rgb \
--eval-path data/<benchmark>/eval \
--outdir workspace/lgm_mesh_assets_<benchmark> \
--val-size 1 \
--input-size 160 \
--splat-size 160 \
--output-size 512 \
--num-views-input 9 \
--num-views-output 16 \
--pixel-align \
--batch-size 2 \
--num-workers 4 \
--mixed-precision fp16 \
--convert \
--nerf-iters 512 \
--mesh-iters 1024 \
--uv-iters 0
# Compute mesh metrics
python eval_lgm_mesh.py \
--data-path data/<benchmark>/rgb \
--eval-path data/<benchmark>/eval \
--mesh-path workspace/lgm_mesh_assets_<benchmark>/meshes \
--gt-mesh-path data/<benchmark>/mesh_gt \
--outdir workspace/lgm_mesh_eval_<benchmark> \
--val-size 1 \
--input-size 160 \
--splat-size 160 \
--output-size 512 \
--depth-render-size 512 \
--num-views-input 9 \
--num-views-output 16 \
--pixel-align \
--batch-size 1 \
--flip-uv-ypython 3Dreconstruct_infer.py big \
--resume checkpoints/best-depthloss-depth-ranking-2/model.safetensors --fine_tune \
--workspace output/ \
--pixel_align --input_size 160 --splat_size 160Edit the path variable at the bottom of 3Dreconstruct_infer.py to point to your image folder. Expected folder layout: rgb/000.png, rgb/001.png, etc.
If you find this work useful in your research, please cite:
@article{dalgm2026,
title={DaLGM: Depth-Aware Geometry Supervision and Efficient Gaussian Pruning for Feed-Forward 3D Reconstruction},
author={Hoang Hiep Lai and Duy Thanh Tran and Thanh Long Vu and Thi Chau Ma},
journal={The Visual Computer},
year={2026},
doi={https://doi.org/10.5281/zenodo.20615413},
note={Under review}
}