Skip to content

ShermeenZiauddin/HybridCourseRecommenderSystem

Repository files navigation

Hybrid Course Recommender System

Content-Based + Knowledge-Based Filtering with Penalty-Based Re-Ranking

A metadata-only course recommendation engine built on the Coursera Course Dataset 2023 — no user interaction history required.


Overview

With thousands of online courses on Coursera, learners struggle to find courses that match their skill goals, background, and preferences. This project solves that by building a Hybrid Course Recommender System that operates entirely on course-level metadata — no historical user interaction data needed.

The system combines three techniques in a pipeline:

  1. Content-Based Filtering (CBF) — TF-IDF vectorization of course text with cosine similarity scoring against the user's skill-interest query.
  2. Knowledge-Based Filtering (KBF) — Hard constraint satisfaction on user-specified requirements (difficulty, certificate type, organization).
  3. Penalty-Based Re-Ranking (novel contribution) — A multiplicative penalty function that demotes courses with poor ratings or very low enrollment.

Every recommendation is fully explainable: CBF score, hybrid score, penalty factor, and penalty reasons are all shown to the user.


Dataset

Property Value
Name Coursera Course Dataset 2023
Source Kaggle — tianyimasf/coursera-course-dataset
Records 993 courses
Key Fields course_title, course_description, course_skills, course_difficulty, course_rating, course_students_enrolled, course_certificate_type, course_organization
User Data None (metadata-only)

System Architecture

1. Content-Based Filtering (TF-IDF + Cosine Similarity)

Each course is represented as a TF-IDF vector built from its title, description, and skill tags. The vectorizer uses:

  • Unigrams and bigrams (ngram_range=(1,2))
  • Sublinear TF scaling (log(1+tf))
  • Top 10,000 features with English stop-word removal

A user query (e.g., "machine learning, Python, data analysis") is vectorized in the same space, and cosine similarity produces a relevance score in [0, 1] for each course.

Structured Token Enrichment: The system automatically appends structured tokens (e.g., skill_python, level_beginner) to the query to prioritize categorical matches over fuzzy text.

2. Knowledge-Based Filtering (Constraint Satisfaction)

Users specify hard constraints at query time:

  • Difficulty level (Beginner / Intermediate / Advanced / Mixed)
  • Certificate type
  • Preferred organization (optional)

Courses failing any constraint are excluded entirely before scoring — fully rule-based and explainable.

3. Hybrid Score Fusion

KBF acts as a hard pre-filter. Dynamic weighting logic adjusts based on user profile maturity:

  • Cold Start: Prioritizes CBF and Popularity signals.
  • Active User: Gradually shifts weight toward Collaborative Filtering (up to 50%) as the user provides more star ratings.

4. Penalty-Based Re-Ranking (Novel Contribution)

A multiplicative penalty is applied after hybrid scoring to prevent low-quality but highly similar content from ranking first. Each penalty is shown to the user with a plain-language explanation (e.g., "Rating (3.1) is below quality threshold (3.5)").

5. Collaborative Filtering Layer (Extended Feature)

When logged in, users can star/rate courses (1–5 stars). A tunable CF weight slider (0–0.5) controls how strongly community ratings influence the final ranking.


Getting Started

Prerequisites

pip install -r requirements.txt

Running the App

python app.py

Then open your browser at http://localhost:5000.

Basic Usage

  1. Enter a skill-interest query (e.g., "machine learning python")
  2. Optionally set hard constraints: difficulty level, certificate type, organization
  3. Browse results — each card shows CBF score, hybrid score, penalty factor, and penalty reason

Project Structure

├── app.py                  # Main application entry point
├── recommender/
│   ├── cbf.py              # Content-Based Filtering (TF-IDF + cosine similarity)
│   ├── kbf.py              # Knowledge-Based Filtering (constraint satisfaction)
│   ├── hybrid.py           # Hybrid score fusion
│   ├── penalty.py          # Penalty-based re-ranking
│   └── collaborative.py    # Collaborative filtering layer
├── data/
│   └── coursera_courses.csv
├── evaluation/
│   └── metrics.py          # Precision, Recall, NDCG, ILD, Coverage
├── requirements.txt
└── README.md

About

A hybrid course recommender system using TF-IDF content-based filtering, knowledge-based constraint satisfaction, and penalty-based re-ranking on Coursera metadata, no user interaction data required.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages