Skip to content

djthegr8/CS771-Project-2

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

12 Commits
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Continual Learning & Unsupervised Domain Adaptation

CS771 - Introduction to Machine Learning (Autumn 2024) - Mini-Project 2

Course Problem Dataset


🎯 Project Overview

This project addresses the continual learning and unsupervised domain adaptation challenges posed in CS771 Mini-Project 2. We tackle the problem of learning from sequential datasets while preventing catastrophic forgetting, using Learning with Prototypes (LwP) as the base classifier with novel prototype update mechanisms.

Problem Setting

  • 20 sequential datasets derived from CIFAR-10
  • Task 1: Datasets D₁-D₁₀ from the same distribution p(x)
  • Task 2: Datasets D₁₁-Dβ‚‚β‚€ from different but related distributions
  • Constraint: Only D₁ is labeled; all others are unlabeled
  • Goal: Maintain performance on previous datasets while adapting to new ones

Team Members

  • Akshat Sharma (230101)
  • Dweep Joshipura (230395)
  • Kanak Khandelwal (230520)
  • Praneel B Satare (230774)

πŸš€ Key Innovations & Technical Contributions

1. Weighted Prototype Update Rule (Task 1)

For sequential datasets from the same distribution, we developed a mathematically principled update mechanism:

$$\mu^{(n+1)}_c := \frac{N\alpha\mu^{(n)}_c + \sum\limits_{y^{(n+1)}_i = c}{x^{(n+1)}_i}}{N\alpha+n^{(n+1)}_c}$$

Key Characteristics:

  • Ξ± = 0.2 (optimally tuned): Balances old knowledge retention vs. new data adaptation
  • Prevents catastrophic forgetting: Maintains 98%+ accuracy across all previous datasets
  • Interpretable: Ξ± β†’ ∞ preserves old prototypes, Ξ± β†’ 0 uses only new data

2. Clustering-Based Domain Adaptation (Task 2)

For datasets with distribution shifts, we introduced an unsupervised adaptation method:

$$\mu^{(n+1)}_c := \frac{\beta\mu^{(n)}_c + M^{(n+1)}_c}{\beta+1}$$

Novel Approach:

  • Class-aware K-means: Initialize cluster centers with previous prototypes
  • Automatic adaptation: Clusters adjust to new data distributions
  • Balanced update: Ξ² = 1 equally weighs old prototypes and new centroids

πŸ”§ Architecture & Methodology

Feature Extraction Pipeline

Given the constraint of not using CIFAR-trained models, we explored ImageNet pre-trained extractors:

Model Feature Dim Accuracy on D₁ Selected
ResNet 2048 84.12% ❌
MobileNetv3 960 83.72% ❌
CaiT-M36 768 94.20% ❌
ViT-Base 768 96.52% ❌
Eva02-Base 768 96.88% ❌
BEiT-Large 1024 98.72% βœ…

Baseline Comparisons

Initial experiments without feature extraction showed the necessity of our approach:

Method Training Accuracy
LwP (Euclidean, Raw) 29.04%
LwP (Mahalanobis, Raw) 9.52%
LwP (Euclidean, PCA-50) 28.56%
LwP (Mahalanobis, PCA-50) 41.20%

πŸ“Š Experimental Results

Task 1: Sequential Learning (Same Distribution)

Performance Matrix: Models f₁ to f₁₀ on held-out datasets D̂₁ to D̂₁₀

Model D̂₁ DΜ‚β‚‚ D̂₃ DΜ‚β‚„ DΜ‚β‚… D̂₆ D̂₇ DΜ‚β‚ˆ D̂₉ D̂₁₀
f₁ 98.32% β€” β€” β€” β€” β€” β€” β€” β€” β€”
fβ‚‚ 98.36% 97.84% β€” β€” β€” β€” β€” β€” β€” β€”
f₃ 98.16% 97.76% 98.16% β€” β€” β€” β€” β€” β€” β€”
fβ‚„ 98.16% 97.76% 98.04% 97.92% β€” β€” β€” β€” β€” β€”
fβ‚… 98.20% 97.68% 97.96% 98.00% 97.92% β€” β€” β€” β€” β€”
f₆ 98.12% 97.84% 98.00% 97.92% 97.96% 98.40% β€” β€” β€” β€”
f₇ 98.16% 97.72% 97.92% 97.92% 98.00% 98.36% 97.40% β€” β€” β€”
fβ‚ˆ 98.00% 97.76% 97.80% 97.84% 97.88% 98.36% 97.36% 97.56% β€” β€”
f₉ 98.08% 97.72% 97.84% 97.96% 97.84% 98.28% 97.36% 97.60% 97.68% β€”
f₁₀ 98.04% 97.76% 97.88% 97.96% 97.84% 98.28% 97.36% 97.60% 97.64% 97.88%

Key Achievement: βœ… No catastrophic forgetting - consistent ~98% accuracy across all datasets

Task 2: Domain Adaptation (Distribution Shifts)

Performance Matrix: Models f₁₁ to fβ‚‚β‚€ on held-out datasets D̂₁₁ to DΜ‚β‚‚β‚€

Model D̂₁₁ D̂₁₂ D̂₁₃ D̂₁₄ D̂₁₅ D̂₁₆ D̂₁₇ DΜ‚β‚β‚ˆ D̂₁₉ DΜ‚β‚‚β‚€
f₁₁ 90.36% β€” β€” β€” β€” β€” β€” β€” β€” β€”
f₁₂ 90.36% 75.92% β€” β€” β€” β€” β€” β€” β€” β€”
f₁₃ 90.36% 75.92% 93.56% β€” β€” β€” β€” β€” β€” β€”
f₁₄ 90.36% 75.92% 93.56% 97.28% β€” β€” β€” β€” β€” β€”
f₁₅ 90.36% 75.92% 93.56% 97.28% 97.92% β€” β€” β€” β€” β€”
f₁₆ 90.36% 75.92% 93.56% 97.28% 97.92% 94.56% β€” β€” β€” β€”
f₁₇ 90.36% 75.92% 93.56% 97.28% 97.92% 94.56% 94.56% β€” β€” β€”
fβ‚β‚ˆ 90.36% 75.92% 93.56% 97.28% 97.92% 94.56% 94.56% 91.32% β€” β€”
f₁₉ 90.36% 75.92% 93.56% 97.28% 97.92% 94.56% 94.56% 91.32% 76.48% β€”
fβ‚‚β‚€ 90.36% 75.92% 93.56% 97.28% 97.92% 94.56% 94.56% 91.32% 76.48% 97.24%

Key Achievement: βœ… Successful domain adaptation - ~5% average improvement through clustering-based updates


πŸ› οΈ Implementation Details

Computational Requirements

  • Hardware: Kaggle P100 GPU
  • Feature Extraction Time: 2h 38m for all datasets
  • Memory: Efficient prototype storage (~10KB per model)

Hyperparameter Optimization

  • Ξ± tuning: Grid search over [0.1, 0.2, 0.3, ..., 2.0]
  • Optimal Ξ±: 0.2 (maximizes f₁₀ accuracy on D̂₁)
  • Ξ² selection: Set to 1.0 based on equal weighting heuristic

Key Constraints Addressed

βœ… No CIFAR-trained models: Used ImageNet pre-trained BEiT-Large
βœ… Same model size: Consistent prototype dimensions across updates
βœ… No labeled data: Only D₁ labels used; rest are pseudo-labels
βœ… LwP requirement: Base classifier remains Learning with Prototypes


πŸ“ Repository Structure

CS771-Project-2/
β”œβ”€β”€ notebooks/
β”‚   β”œβ”€β”€ task1_sequential_learning.ipynb    # Task 1 implementation
β”‚   β”œβ”€β”€ task2_domain_adaptation.ipynb      # Task 2 implementation
β”‚   └── feature_extraction.ipynb           # BEiT feature extraction                
β”œβ”€β”€ docs/
β”‚   β”œβ”€β”€ report.pdf                         # LaTeX project report
└── README.md

🎬 Paper Review

As required by the project, we presented a detailed review of the below paper in a YouTube video.

  • Deja Vu: Continual Model Generalization for Unseen Domains (ICLR 2023)

πŸ”¬ Technical Analysis

Why Our Approach Works

  1. Weighted Updates: The Ξ± parameter creates a principled balance between stability and plasticity
  2. Feature Quality: BEiT-Large provides rich, transferable representations
  3. Class-Aware Clustering: Initializing with prototypes maintains class structure
  4. Unsupervised Adaptation: No need for labeled data in new domains

Limitations & Future Work

  • Dataset D₁₇ Challenge: Poor cluster-class correlation affects performance
  • Ξ² Selection: Currently heuristic; could benefit from adaptive methods
  • Scalability: Limited to prototype-based methods per project constraints

πŸ“Š Key Metrics Summary

Metric Task 1 Task 2
Average Accuracy 97.8% 89.4%
Catastrophic Forgetting ❌ Prevented βœ… Minimal
Domain Adaptation N/A +5% improvement
Computational Efficiency βœ… Prototype-based βœ… K-means clustering

πŸ† Project Achievements

βœ… Successfully prevented catastrophic forgetting in sequential learning
βœ… Developed novel weighted prototype updates with theoretical foundation
βœ… Achieved effective domain adaptation without labeled target data
βœ… Maintained consistent model size across all updates
βœ… Comprehensive evaluation with detailed accuracy matrices
βœ… Efficient implementation suitable for resource-constrained environments


πŸ“š References & Citations

  1. Course: CS771 - Introduction to Machine Learning, IIT Kanpur, Autumn 2024
  2. Problem Statement: Mini-Project 2 - Continual Learning with LwP
  3. BEiT: Bao, H., Dong, L., Piao, S., & Wei, F. (2022). BEiT: BERT Pre-training of Image Transformers
  4. Domain Adaptation: Fernando, B., et al. (2014). Subspace alignment for domain adaptation
  5. Clustering Methods: Dridi, J., et al. (2024). Unsupervised clustering-based domain adaptation

πŸ“ž Contact

For questions about this implementation or the CS771 course project:


This project demonstrates practical solutions to continual learning challenges while adhering to the constraints and requirements of CS771 Mini-Project 2. The proposed methods show promise for real-world applications where models must adapt to new data distributions without forgetting previous knowledge.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors