This repository contains a modified and optimized version of the LTX-2 Video Generation Model, designed specifically to run on consumer hardware with as little as 8GB of VRAM.
It includes a fully-featured Gradio Web Interface to make generating videos, managing presets, and applying LoRAs easy without needing to remember complex command-line arguments.
-
Added start/last frame (optional)
python music_maker_ui.py # distilled 2 step (fast)
python music_maker_ui_v2.py # 2 step (slow)- 8GB VRAM Optimization: Runs locally on cards like the RTX 3070/4060Ti using FP8 quantization and memory management tweaks.
- Windows 11 support!!! You can even run it on Windows (not supported in the original model).
- User-Friendly Web UI: Control everything from your browser.
- Smart "Safe Mode": The UI automatically limits the frame count based on selected resolution to prevent Out-Of-Memory (OOM) errors. (If you do not have 8GB of free VRAM, try decreasing the frame count.)
- Real-time Logging: View the generation progress and console output directly in the web interface.
- Advanced Features:
- Image Conditioning: Upload reference images.
- LoRA Support: Checkbox selection for Camera Control.
- Seed Control: Reproducible generations.
To run this, you need to download the specific FP8 distilled checkpoints and the Text Encoder.
1. Create a models directory in the root folder:
mkdir models
mkdir models/loras2. Download the models:
ltx-2-19b-distilled-fp8.safetensors- Downloadltx-2-spatial-upscaler-x2-1.0.safetensors- DownloadGemma 3
./models/
ltx-2-19b-distilled-fp8.safetensors
ltx-2-spatial-upscaler-x2-1.0.safetensors
./models/gemma3/
gemma-3 files
./models/loras/
LoRA files here
3. Install all required modules:
required modules
pip install -e packages/ltx-pipelines
pip install -e packages/ltx-core
Python 3.12.8
accelerate==1.10.1
torch==2.8.0+cu128
torchaudio==2.8.0+cu128
torchvision==0.23.0+cu128
xformers==0.0.32.post2
...
🖥️ Usage Run the web interface with a single command:
python web_ui_v2.py
or
python web_ui_v4.py📊 Performance & Presets (8GB of VRAM)
- The Web UI includes an "8GB VRAM Safe Mode" checkbox. When enabled, it enforces the following limits to ensure you don't crash your GPU. Est. inference time on RTX 3070 Ti laptop GPU ~300sec for all presets.
| Resolution | Max Frames i2v| t2v | Est. Time (3070ti laptop 8gb vram) |
| :---------- | :------------ |:---- |:---------------------------------- |
| 1280 x 704 | 177 | 257 | ~300..400 sec |
| 1536 x 1024 | 121 | 185 | ~300..400 sec |
| 1920 x 1088 | 81 | 121 | ~300..400 sec |
| 2560 x 1408 | 49 | 65 | ~300..400 sec |
| 3840 x 2176 | 17 | 25 | ~300..400 sec |
* +60 sec for prompt (if not empty/not cached)
* time to stage 1 preview 80..150 sec
-
UPD: optimized transformer code, increased max frames by 40% for text to video, generation speed 300..315 -> 385..415 sec, (1280x704 11sec 24fps, 1920x1088 5sec 24fps)
-
UPD2: added web ui v4, stage 1 video preview, task queue, prompt constructor, disable audio option (faster inference 10-30%)
-
UPD3: on branch "update" synced with original repo (21 Feb 2026) version but it require old transforners lib 4.52 and 20sec+ to prompt processing so not yet merged to main (will not be merged untill update) if you want you can try it just switch branch
-
UPD4: on branch "update_v2_3" you can run web_ui_v4.py it works with LTX-2.3, download safetensors file: https://huggingface.co/nalexand/LTX-2.3-distilled-fp8-cast
Credits
- Original Model: Lightricks (LTX-2)
- Optimization: nalexand
- Web UI: Created for the community to make this powerful model accessible.
Original Model:
- (you can find links to all model files and loras below)
LTX-2 is the first DiT-based audio-video foundation model that contains all core capabilities of modern video generation in one model: synchronized audio and video, high fidelity, multiple performance modes, production-ready outputs, API access, and open access.
ltx-2.mp4
# Clone the repository
git clone https://github.com/Lightricks/LTX-2.git
cd LTX-2
# Set up the environment
uv sync --frozen
source .venv/bin/activateDownload the following models from the LTX-2 HuggingFace repository:
LTX-2 Model Checkpoint (choose and download one of the following)
Spatial Upscaler - Required for current two-stage pipeline implementations in this repository
Temporal Upscaler - Supported by the model and will be required for future pipeline implementations
Distilled LoRA - Required for current two-stage pipeline implementations in this repository (except DistilledPipeline and ICLoraPipeline)
Gemma Text Encoder (download all assets from the repository)
LoRAs
LTX-2-19b-IC-LoRA-Canny-Control- DownloadLTX-2-19b-IC-LoRA-Depth-Control- DownloadLTX-2-19b-IC-LoRA-Detailer- DownloadLTX-2-19b-IC-LoRA-Pose-Control- DownloadLTX-2-19b-LoRA-Camera-Control-Dolly-In- DownloadLTX-2-19b-LoRA-Camera-Control-Dolly-Left- DownloadLTX-2-19b-LoRA-Camera-Control-Dolly-Out- DownloadLTX-2-19b-LoRA-Camera-Control-Dolly-Right- DownloadLTX-2-19b-LoRA-Camera-Control-Jib-Down- DownloadLTX-2-19b-LoRA-Camera-Control-Jib-Up- DownloadLTX-2-19b-LoRA-Camera-Control-Static- Download
- TI2VidTwoStagesPipeline - Production-quality text/image-to-video with 2x upsampling (recommended)
- TI2VidOneStagePipeline - Single-stage generation for quick prototyping
- DistilledPipeline - Fastest inference with 8 predefined sigmas
- ICLoraPipeline - Video-to-video and image-to-video transformations
- KeyframeInterpolationPipeline - Interpolate between keyframe images
- Use DistilledPipeline - Fastest inference with only 8 predefined sigmas (8 steps stage 1, 4 steps stage 2)
- Enable FP8 transformer - Enables lower memory footprint:
--enable-fp8(CLI) orfp8transformer=True(Python) - Install attention optimizations - Use xFormers (
uv sync --extra xformers) or Flash Attention 3 for Hopper GPUs - Use gradient estimation - Reduce inference steps from 40 to 20-30 while maintaining quality (see pipeline documentation)
- Skip memory cleanup - If you have sufficient VRAM, disable automatic memory cleanup between stages for faster processing
- Choose single-stage pipeline - Use
TI2VidOneStagePipelinefor faster generation when high resolution isn't required
When writing prompts, focus on detailed, chronological descriptions of actions and scenes. Include specific movements, appearances, camera angles, and environmental details - all in a single flowing paragraph. Start directly with the action, and keep descriptions literal and precise. Think like a cinematographer describing a shot list. Keep within 200 words. For best results, build your prompts using this structure:
- Start with main action in a single sentence
- Add specific details about movements and gestures
- Describe character/object appearances precisely
- Include background and environment details
- Specify camera angles and movements
- Describe lighting and colors
- Note any changes or sudden events
For additional guidance on writing a prompt please refer to https://ltx.video/blog/how-to-prompt-for-ltx-2
LTX-2 pipelines support automatic prompt enhancement via an enhance_prompt parameter.
To use our model with ComfyUI, please follow the instructions at https://github.com/Lightricks/ComfyUI-LTXVideo/.
This repository is organized as a monorepo with three main packages:
- ltx-core - Core model implementation, inference stack, and utilities
- ltx-pipelines - High-level pipeline implementations for text-to-video, image-to-video, and other generation modes
- ltx-trainer - Training and fine-tuning tools for LoRA, full fine-tuning, and IC-LoRA
Each package has its own README and documentation. See the Documentation section below.
Each package includes comprehensive documentation:
- LTX-Core README - Core model implementation, inference stack, and utilities
- LTX-Pipelines README - High-level pipeline implementations and usage guides
- LTX-Trainer README - Training and fine-tuning documentation with detailed guides