Skip to content

kuttivicky/DeiT-Implementation

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 

Repository files navigation

Implementation of DeiT - Data-Efficient Image Transformer

Overview

This project implements and evaluates Data-Efficient Image Transformer (DeiT), an improved version of Vision Transformer (ViT) that enhances performance using knowledge distillation. The goal is to compare DeiT with ViT and analyze its effectiveness in image classification on smaller datasets.

Features

  • DeiT vs ViT: Comparative analysis of performance.
  • Distillation Token: Uses a ResNet-50 teacher model to guide training.
  • Data Augmentation: Implements techniques like CutMix, MixUp, Horizontal Flip, and Random Erasing.
  • Performance Metrics: Evaluates models using Accuracy, AUC, F1 Score, Precision, and Recall.

Dataset

  • CIFAR-10: 50,000 training images and 10,000 test images (32x32 resolution).
  • Chosen for its well-labeled structure and availability of pre-trained models.

Installation

Prerequisites

Ensure you have Python 3.8+ installed. Install dependencies using:

pip install torch torchvision timm transformers numpy matplotlib

Training the Model

Default Training

Train DeiT on CIFAR-10 using:

python train.py --dataset cifar10 --epochs 20 --batch_size 64 --lr 0.001

Using Knowledge Distillation

To train with a ResNet-50 teacher model:

python train.py --dataset cifar10 --distillation --teacher_model resnet50

Evaluation

Evaluate the trained model:

python evaluate.py --model deit --dataset cifar10

Results and Analysis

  • ViT vs DeiT: DeiT outperforms vanilla ViT on CIFAR-10.
  • Distillation Boost: Adding a teacher model improves F1 Score and AUC.
  • Data Augmentation: Enhances accuracy and reduces overfitting.

Challenges and Solutions

  • High validation loss: Addressed using data augmentation.
  • Slow CutMix and MixUp: Reduced probability of application to optimize computation time.

Future Enhancements

  • Extend to object detection tasks.
  • Implement DeiT using TensorFlow/Keras.
  • Improve interpretability with attention visualizations.

Authors

  • Vignesh Ram Ramesh Kutti
  • Aravind Balaji Srinivasan

References

  1. Touvron, H., et al. (2020). "Training data-efficient image transformers & distillation through attention." arXiv
  2. Dosovitskiy, A., et al. (2020). "An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale." arXiv
  3. Vaswani, A., et al. (2017). "Attention is all you need." arXiv

About

DeiT (Data-Efficient Image Transformer) is a vision transformer model optimized for small datasets. It uses knowledge distillation and efficient training techniques to outperform traditional CNNs in image classification tasks. By leveraging the self-attention mechanism, DeiT extracts features effectively while maintaining computational efficiency.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors