Distributed Recommendation Engine

A distributed content-based recommendation system built using Hadoop HDFS, Apache Spark, and Apache HBase.

The project demonstrates scalable data storage, processing, and recommendation generation using big data technologies on the Amazon Product Sales dataset.

Features

Distributed data storage using Hadoop HDFS
Data cleaning and transformation with Apache Spark
Data persistence and retrieval via Apache HBase
Analytical insights using Spark (pricing, ratings, category analysis)
Content-based recommendation system using cosine similarity
Feature engineering with:
- StringIndexer
- OneHotEncoder
- VectorAssembler
Spark ML pipeline for scalable preprocessing
Recommendation storage in HBase for fast lookup

Requirements

Hadoop (HDFS)
Apache Spark
Apache HBase
Python 3.9+
HappyBase
pandas
matplotlib
seaborn

Name		Name	Last commit message	Last commit date
Latest commit History 32 Commits
data		data
visualisations		visualisations
README.md		README.md
distributed_recommendation_engine.ipynb		distributed_recommendation_engine.ipynb
distributed_recommendation_engine_report.docx		distributed_recommendation_engine_report.docx

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Distributed Recommendation Engine

Features

Requirements

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Distributed Recommendation Engine

Features

Requirements

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages