Skip to content

nalexand/LTX-2-OPTIMIZED

 
 

Repository files navigation

LTX-2 Optimized (8GB of VRAM Edition) + Web UI

This repository contains a modified and optimized version of the LTX-2 Video Generation Model, designed specifically to run on consumer hardware with as little as 8GB of VRAM.

It includes a fully-featured Gradio Web Interface to make generating videos, managing presets, and applying LoRAs easy without needing to remember complex command-line arguments.

Web UI v2

image

Web UI v4

scr221

CinemaMaker UI

cm

Music to Video UI (S2V, Lip Sync)

cm
python music_maker_ui.py # distilled 2 step (fast)
python music_maker_ui_v2.py # 2 step (slow)

🚀 Features

  • 8GB VRAM Optimization: Runs locally on cards like the RTX 3070/4060Ti using FP8 quantization and memory management tweaks.
  • Windows 11 support!!! You can even run it on Windows (not supported in the original model).
  • User-Friendly Web UI: Control everything from your browser.
  • Smart "Safe Mode": The UI automatically limits the frame count based on selected resolution to prevent Out-Of-Memory (OOM) errors. (If you do not have 8GB of free VRAM, try decreasing the frame count.)
  • Real-time Logging: View the generation progress and console output directly in the web interface.
  • Advanced Features:
    • Image Conditioning: Upload reference images.
    • LoRA Support: Checkbox selection for Camera Control.
    • Seed Control: Reproducible generations.

📥 Model Download & Setup

To run this, you need to download the specific FP8 distilled checkpoints and the Text Encoder.

1. Create a models directory in the root folder:

mkdir models
mkdir models/loras

2. Download the models:

./models/
    ltx-2-19b-distilled-fp8.safetensors	
    ltx-2-spatial-upscaler-x2-1.0.safetensors

./models/gemma3/
    gemma-3 files

./models/loras/
    LoRA files here

3. Install all required modules:

required modules
pip install -e packages/ltx-pipelines
pip install -e packages/ltx-core

Python 3.12.8
accelerate==1.10.1
torch==2.8.0+cu128
torchaudio==2.8.0+cu128
torchvision==0.23.0+cu128
xformers==0.0.32.post2
...

🖥️ Usage Run the web interface with a single command:

python web_ui_v2.py

or

python web_ui_v4.py

📊 Performance & Presets (8GB of VRAM)

  • The Web UI includes an "8GB VRAM Safe Mode" checkbox. When enabled, it enforces the following limits to ensure you don't crash your GPU. Est. inference time on RTX 3070 Ti laptop GPU ~300sec for all presets.
| Resolution  | Max Frames i2v| t2v  | Est. Time (3070ti laptop 8gb vram) |
| :---------- | :------------ |:---- |:---------------------------------- |
| 1280 x 704  | 177           | 257  | ~300..400 sec                      |
| 1536 x 1024 | 121           | 185  | ~300..400 sec                      |
| 1920 x 1088 | 81            | 121  | ~300..400 sec                      |
| 2560 x 1408 | 49            | 65   | ~300..400 sec                      |
| 3840 x 2176 | 17            | 25   | ~300..400 sec                      |
* +60 sec for prompt (if not empty/not cached)
* time to stage 1 preview 80..150 sec
  • UPD: optimized transformer code, increased max frames by 40% for text to video, generation speed 300..315 -> 385..415 sec, (1280x704 11sec 24fps, 1920x1088 5sec 24fps)

  • UPD2: added web ui v4, stage 1 video preview, task queue, prompt constructor, disable audio option (faster inference 10-30%)

  • UPD3: on branch "update" synced with original repo (21 Feb 2026) version but it require old transforners lib 4.52 and 20sec+ to prompt processing so not yet merged to main (will not be merged untill update) if you want you can try it just switch branch

  • UPD4: on branch "update_v2_3" you can run web_ui_v4.py it works with LTX-2.3, download safetensors file: https://huggingface.co/nalexand/LTX-2.3-distilled-fp8-cast

Credits

  • Original Model: Lightricks (LTX-2)
  • Optimization: nalexand
  • Web UI: Created for the community to make this powerful model accessible.

Original Model:

  • (you can find links to all model files and loras below)

LTX-2

Website Model Demo Paper Discord

LTX-2 is the first DiT-based audio-video foundation model that contains all core capabilities of modern video generation in one model: synchronized audio and video, high fidelity, multiple performance modes, production-ready outputs, API access, and open access.

ltx-2.mp4

🚀 Quick Start

# Clone the repository
git clone https://github.com/Lightricks/LTX-2.git
cd LTX-2

# Set up the environment
uv sync --frozen
source .venv/bin/activate

Required Models

Download the following models from the LTX-2 HuggingFace repository:

LTX-2 Model Checkpoint (choose and download one of the following)

Spatial Upscaler - Required for current two-stage pipeline implementations in this repository

Temporal Upscaler - Supported by the model and will be required for future pipeline implementations

Distilled LoRA - Required for current two-stage pipeline implementations in this repository (except DistilledPipeline and ICLoraPipeline)

Gemma Text Encoder (download all assets from the repository)

LoRAs

Available Pipelines

⚡ Optimization Tips

  • Use DistilledPipeline - Fastest inference with only 8 predefined sigmas (8 steps stage 1, 4 steps stage 2)
  • Enable FP8 transformer - Enables lower memory footprint: --enable-fp8 (CLI) or fp8transformer=True (Python)
  • Install attention optimizations - Use xFormers (uv sync --extra xformers) or Flash Attention 3 for Hopper GPUs
  • Use gradient estimation - Reduce inference steps from 40 to 20-30 while maintaining quality (see pipeline documentation)
  • Skip memory cleanup - If you have sufficient VRAM, disable automatic memory cleanup between stages for faster processing
  • Choose single-stage pipeline - Use TI2VidOneStagePipeline for faster generation when high resolution isn't required

✍️ Prompting for LTX-2

When writing prompts, focus on detailed, chronological descriptions of actions and scenes. Include specific movements, appearances, camera angles, and environmental details - all in a single flowing paragraph. Start directly with the action, and keep descriptions literal and precise. Think like a cinematographer describing a shot list. Keep within 200 words. For best results, build your prompts using this structure:

  • Start with main action in a single sentence
  • Add specific details about movements and gestures
  • Describe character/object appearances precisely
  • Include background and environment details
  • Specify camera angles and movements
  • Describe lighting and colors
  • Note any changes or sudden events

For additional guidance on writing a prompt please refer to https://ltx.video/blog/how-to-prompt-for-ltx-2

Automatic Prompt Enhancement

LTX-2 pipelines support automatic prompt enhancement via an enhance_prompt parameter.

🔌 ComfyUI Integration

To use our model with ComfyUI, please follow the instructions at https://github.com/Lightricks/ComfyUI-LTXVideo/.

📦 Packages

This repository is organized as a monorepo with three main packages:

  • ltx-core - Core model implementation, inference stack, and utilities
  • ltx-pipelines - High-level pipeline implementations for text-to-video, image-to-video, and other generation modes
  • ltx-trainer - Training and fine-tuning tools for LoRA, full fine-tuning, and IC-LoRA

Each package has its own README and documentation. See the Documentation section below.

📚 Documentation

Each package includes comprehensive documentation:

About

Optimized for 8Gb inference LTX-2 audio–video generative model. + Web UI. Model created by:

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

No contributors

Languages

  • Python 100.0%