Hybrid Course Recommender System

Content-Based + Knowledge-Based Filtering with Penalty-Based Re-Ranking

A metadata-only course recommendation engine built on the Coursera Course Dataset 2023 — no user interaction history required.

Overview

With thousands of online courses on Coursera, learners struggle to find courses that match their skill goals, background, and preferences. This project solves that by building a Hybrid Course Recommender System that operates entirely on course-level metadata — no historical user interaction data needed.

The system combines three techniques in a pipeline:

Content-Based Filtering (CBF) — TF-IDF vectorization of course text with cosine similarity scoring against the user's skill-interest query.
Knowledge-Based Filtering (KBF) — Hard constraint satisfaction on user-specified requirements (difficulty, certificate type, organization).
Penalty-Based Re-Ranking (novel contribution) — A multiplicative penalty function that demotes courses with poor ratings or very low enrollment.

Every recommendation is fully explainable: CBF score, hybrid score, penalty factor, and penalty reasons are all shown to the user.

Dataset

Property	Value
Name	Coursera Course Dataset 2023
Source	Kaggle — `tianyimasf/coursera-course-dataset`
Records	993 courses
Key Fields	`course_title`, `course_description`, `course_skills`, `course_difficulty`, `course_rating`, `course_students_enrolled`, `course_certificate_type`, `course_organization`
User Data	None (metadata-only)

System Architecture

1. Content-Based Filtering (TF-IDF + Cosine Similarity)

Each course is represented as a TF-IDF vector built from its title, description, and skill tags. The vectorizer uses:

Unigrams and bigrams (ngram_range=(1,2))
Sublinear TF scaling (log(1+tf))
Top 10,000 features with English stop-word removal

A user query (e.g., "machine learning, Python, data analysis") is vectorized in the same space, and cosine similarity produces a relevance score in [0, 1] for each course.

Structured Token Enrichment: The system automatically appends structured tokens (e.g., skill_python, level_beginner) to the query to prioritize categorical matches over fuzzy text.

2. Knowledge-Based Filtering (Constraint Satisfaction)

Users specify hard constraints at query time:

Difficulty level (Beginner / Intermediate / Advanced / Mixed)
Certificate type
Preferred organization (optional)

Courses failing any constraint are excluded entirely before scoring — fully rule-based and explainable.

3. Hybrid Score Fusion

KBF acts as a hard pre-filter. Dynamic weighting logic adjusts based on user profile maturity:

Cold Start: Prioritizes CBF and Popularity signals.
Active User: Gradually shifts weight toward Collaborative Filtering (up to 50%) as the user provides more star ratings.

4. Penalty-Based Re-Ranking (Novel Contribution)

A multiplicative penalty is applied after hybrid scoring to prevent low-quality but highly similar content from ranking first. Each penalty is shown to the user with a plain-language explanation (e.g., "Rating (3.1) is below quality threshold (3.5)").

5. Collaborative Filtering Layer (Extended Feature)

When logged in, users can star/rate courses (1–5 stars). A tunable CF weight slider (0–0.5) controls how strongly community ratings influence the final ranking.

Getting Started

Prerequisites

pip install -r requirements.txt

Running the App

python app.py

Then open your browser at http://localhost:5000.

Basic Usage

Enter a skill-interest query (e.g., "machine learning python")
Optionally set hard constraints: difficulty level, certificate type, organization
Browse results — each card shows CBF score, hybrid score, penalty factor, and penalty reason

Project Structure

├── app.py                  # Main application entry point
├── recommender/
│   ├── cbf.py              # Content-Based Filtering (TF-IDF + cosine similarity)
│   ├── kbf.py              # Knowledge-Based Filtering (constraint satisfaction)
│   ├── hybrid.py           # Hybrid score fusion
│   ├── penalty.py          # Penalty-based re-ranking
│   └── collaborative.py    # Collaborative filtering layer
├── data/
│   └── coursera_courses.csv
├── evaluation/
│   └── metrics.py          # Precision, Recall, NDCG, ILD, Coverage
├── requirements.txt
└── README.md

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
__pycache__		__pycache__
README.md		README.md
app.py		app.py
code_explanation.docx		code_explanation.docx
coursera_data.csv		coursera_data.csv
evaluation.py		evaluation.py
project_report.docx		project_report.docx
recommender.py		recommender.py
requirements.txt		requirements.txt
users.json		users.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Hybrid Course Recommender System

Content-Based + Knowledge-Based Filtering with Penalty-Based Re-Ranking

Overview

Dataset

System Architecture

1. Content-Based Filtering (TF-IDF + Cosine Similarity)

2. Knowledge-Based Filtering (Constraint Satisfaction)

3. Hybrid Score Fusion

4. Penalty-Based Re-Ranking (Novel Contribution)

5. Collaborative Filtering Layer (Extended Feature)

Getting Started

Prerequisites

Running the App

Basic Usage

Project Structure

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Hybrid Course Recommender System

Content-Based + Knowledge-Based Filtering with Penalty-Based Re-Ranking

Overview

Dataset

System Architecture

1. Content-Based Filtering (TF-IDF + Cosine Similarity)

2. Knowledge-Based Filtering (Constraint Satisfaction)

3. Hybrid Score Fusion

4. Penalty-Based Re-Ranking (Novel Contribution)

5. Collaborative Filtering Layer (Extended Feature)

Getting Started

Prerequisites

Running the App

Basic Usage

Project Structure

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages