Skip to content

Alen-121/Video-Frame-Enhancer

Repository files navigation

Video Frame Enhancer

A GAN-based Diffusion architecture for enhancing low-bitrate/low-resolution video frames into high-fidelity, temporally stable video sequences.

🎯 Overview

alen-vfe combines the power of Latent Diffusion Models, Adversarial Training, and Temporal Frame Interpolation to deliver state-of-the-art video enhancement.

Key Features

  • 🚀 Fast Inference: diffusion using DDIM sampling
  • 💾 Memory Efficient: LoRA fine-tuning
  • 🎨 High Quality: Combined loss (MSE + LPIPS + Adversarial) ensures sharp, realistic results
  • 🎬 Temporal Stability: RIFE integration eliminates flickering

🏗️ Architecture

┌─────────────────────────────────────────────────────────────┐
│                    Input: Low-Res Video                     │
└────────────────────────┬────────────────────────────────────┘
                         │
                         ▼
┌─────────────────────────────────────────────────────────────┐
│              Generator (Stable Diffusion v1.5)              │
│                    + LoRA Fine-tuning                        │
│              (1 step DDIM inference)                       │
└────────────────────────┬────────────────────────────────────┘
                         │
                         ▼
┌─────────────────────────────────────────────────────────────┐
│              Discriminator (PatchGAN)                        │
│           Evaluates realism of enhancements                  │
└────────────────────────┬────────────────────────────────────┘
                         │
                         ▼
┌─────────────────────────────────────────────────────────────┐
│         Smoothing Layer (RIFE)                               │
│     Temporal Frame Interpolation                             │
└────────────────────────┬────────────────────────────────────┘
                         │
                         ▼
┌─────────────────────────────────────────────────────────────┐
│                Output: High-Res Video                        │
└─────────────────────────────────────────────────────────────┘

Components

  1. Generator: Lightweight Latent Diffusion Model (Stable Diffusion v1.5)

    • Fine-tuned with LoRA (Low-Rank Adaptation)
    • Optimized inference using DDIM
  2. Discriminator: Pre-trained PatchGAN

    • Evaluates high-frequency detail realism
    • Provides adversarial feedback during training
  3. Smoothing Layer: RIFE (Real-Time Intermediate Flow Estimation)

    • Optical flow-based frame interpolation
    • Ensures temporal consistency
    • Eliminates flickering between frames

📦 Installation

Prerequisites

  • Python 3.8+
  • CUDA 11.7+ (for NVIDIA GPUs) or Mac M4 with MPS support
  • FFmpeg (for video processing)

Setup

# Clone the repository
git clone https://github.com/yourusername/alen-vfe.git
cd alen-vfe

# Create virtual environment
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt

# Download RIFE pretrained model
python scripts/download_rife.py

🚀 Quick Start

Inference (Enhance a Video)

from inference.enhancer import VideoEnhancer
from omegaconf import OmegaConf

# Load configuration
config = OmegaConf.load("config/inference_config.yaml")

# Initialize enhancer
enhancer = VideoEnhancer(config)

# Enhance video
enhancer.enhance_video(
    input_path="input_video.mp4",
    output_path="enhanced_video.mp4",
    scale_factor=4
)

Command Line

python inference/enhance.py \
    --input input_video.mp4 \
    --output enhanced_video.mp4 \
    --checkpoint checkpoints/best_model.pth \
    --scale 4 \
    --enable-rife

🎓 Training

Dataset Preparation

We use Vimeo-90K dataset for training:

# Download dataset
python data/download.py --dataset vimeo90k --output ./data

# Prepare training data
python data/prepare_dataset.py \
    --dataset vimeo90k \
    --downscale-factor 4 \
    --output ./data/processed

Training on Kaggle

  1. Upload the project to Kaggle
  2. Open notebooks/train_kaggle.ipynb
  3. Ensure GPU accelerator is enabled (T4 recommended)
  4. Run all cells

Training Locally

python training/train.py \
    --config config/training_config.yaml \
    --output-dir ./checkpoints

📊 Dataset

DIV2K

  • Size: ~7GB (perfect for quick start!)
  • Images: 800 training + 100 validation
  • Resolution: Up to 2K high-quality images
  • Download: Official Link
  • Why: Much smaller than Vimeo-90K, faster downloads, great for testing

Vimeo-90K (Optional - for Production)

  • Size: ~82GB
  • Sequences: 89,800 triplets (3 frames each)
  • Resolution: 448×256
  • Download: Official Link
  • Why: Video-specific data, more data for production models

🔧 Configuration

Training Configuration

Edit config/training_config.yaml:

model:
  generator:
    lora_rank: 8          # Higher = more capacity, more VRAM
    inference_steps: 4     # 1-4 steps for fast inference

training:
  batch_size: 8
  num_epochs: 100
  learning_rate:
    generator: 1.0e-5
    discriminator: 4.0e-4

loss:
  weights:
    mse: 1.0
    lpips: 0.5
    adversarial: 0.1

Inference Configuration

Edit config/inference_config.yaml:

enhancement:
  scale_factor: 4
  enable_rife: true
  target_fps_multiplier: 2

video:
  batch_size: 10
  output_codec: "libx264"
  output_crf: 18

📁 Project Structure

alen-vfe/
├── config/                 # Configuration files
│   ├── training_config.yaml
│   └── inference_config.yaml
├── data/                   # Dataset utilities
│   ├── __init__.py
│   ├── dataset.py
│   ├── download.py
│   └── prepare_dataset.py
├── models/                 # Model architectures
│   ├── __init__.py
│   ├── generator.py        # Stable Diffusion + LoRA
│   ├── discriminator.py    # PatchGAN
│   └── rife.py            # RIFE integration
├── training/               # Training infrastructure
│   ├── __init__.py
│   ├── losses.py
│   ├── trainer.py
│   └── utils.py
├── inference/              # Inference pipeline
│   ├── __init__.py
│   ├── enhancer.py
│   ├── enhance.py         # CLI script
│   └── video_utils.py
├── notebooks/              # Jupyter notebooks for experiments and training
├── runs/                   # TensorBoard event logs for training visualization
├── outputs/                # Enhanced video outputs and sample results
├── checkpoints/            # Model checkpoints (ignored by git)
├── dataset/                # Training datasets (ignored by git)
├── requirements.txt
└── README.md

📂 Folders

  • notebooks/: Contains Jupyter notebooks for exploratory data analysis, experimental training runs, and Kaggle-specific setup.
  • runs/: Stores TensorBoard event files. You can visualize training progress by running tensorboard --logdir runs/.
  • outputs/: This is where all enhanced videos, preview images, and test results are stored.
  • checkpoints/: Directory for saving model weights during training.
  • dataset/: Local storage for training data like DIV2K or Vimeo-90K.

🧪 Testing

# Run unit tests
pytest tests/

# Test inference pipeline
python tests/test_pipeline.py --checkpoint checkpoints/best_model.pth

# Benchmark performance
python tests/benchmark.py --device cuda

📝 Loss Functions

The model uses a combined loss function:

L_total = λ₁·L_MSE + λ₂·L_LPIPS + λ₃·L_ADV
  • L_MSE: Pixel-wise Mean Squared Error (structural accuracy)
  • L_LPIPS: Learned Perceptual Image Patch Similarity (perceptual quality)
  • L_ADV: Adversarial Loss (realism)

Default weights: λ₁=1.0, λ₂=0.5, λ₃=0.1

🧪 Experimental Status & Results

Important

This project is currently in an experimental phase.

  • Fine-tuning: We are experimenting with using a text-to-image model for video fine-tuning, which is a non-optimal approach and may lead to unexpected results.
  • Resources: Due to a lack of significant training resources (GPU time/memory), the current model outputs may not yet reach production-grade quality.
  • Outputs: I have shared my latest experimental outputs in the outputs/ folder for review.

Sample Result

Preview Output Latest enhancement preview. See the outputs/ folder for full video results.

Acknowledgments

📄 License

MIT License - see LICENSE file for details

Contributing

Contributions are welcome! Please:

  1. Fork the repository
  2. Create a feature branch
  3. Make your changes
  4. Submit a pull request

📧 Contact

For questions or issues, please open a GitHub issue.


About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages