Course Page for DS251 (Data Analytics and Visualization) and Lab course DS252 being taught at IIT Bhilai, India in the Winter Semester of 2026.
Course Instructor: Dr. Gagan Raj Gupta
This course's purpose is to introduce students to various algorithms and techniques for data analysis and visualization. We will emphasize working with real world datasets around us and develop relevant skills in the fields of data visualization and analysis of complex data sets, such as large network data.
TextBook DATA ANALYTICS AND VISUALIZATION, Gagan Raj Gupta and Naresh Nagwani, Prentice Hall of India, 2025 https://www.amazon.in/ANALYTICS-VISUALIZATION-Textbook-NARESH-NAGWANI/dp/9354439314
Grading Scheme
A. Lecture course (ds251) will have maximum weightage for exams (30, 50) and surprise quiz based (10). We will delve into the mathematics and algorithms of data analysis. I would like to give practice problems and bonus questions regularly to build critical thinking and problem solving skills (10).
B. Lab course (ds252) will have regular programming and data analysis tests without internet (50). Each student will get an individual project (40), where they have to collect, clean and analyse the data and present insights as a story. There will be regular practice problem sets and lab exercises (10).
- Useful insights can be obtained from data that can help people: agriculture, healthcare, industry, governments, science
- Getting data is becoming easier day by day, but is very complex and difficult for people to understand
- Data has errors of various types (missing, incorrect etc.), is incomplete and is hard to clean (e.g. user reviews/ratings, distorted images)
- Data usually has complex correlations and i.i.d. assumptions don't always work very well (e.g. graph data, time-series data)
- Data Visualization is critical to help us engage more diverse audience in the process of analytical thinking
In this course, we want to learn how that is being done and solve real-life problems that interest us.
- Attend classes regularly and take notes.
- Regularly review notes and think of possible applications and questions (at least on weekly basis)
- Ask questions in the class, present your view-points (seek clarifications from instructor/TAs)
- Practice problem solving in the real-world
- Motivate and demonstrate the benefits and uses of data science
- Impart the skills needed by a data scientist: acquire, clean, model, visualize data
- Teach fundamental algorithms for handling basic and complex datasets including recommendation, basket analysis, streaming algorithms
- Study techniques for creating effective visualizations based on principles from graphic design, perceptual psychology, and cognitive science
- Teach basic techniques of machine learning (unsupervised) which is important way to model relationships in data
- Provide hands-on experience to students in analyzing datasets in diverse fields (NLP, Image/Video, Graphs, Networks, Bio-informatics, Finance)
- Basic knowledge of Python (most assignments will be based on Python)
- Knowledge of basic computer science principles and skills
- Math
- Linear Algebra ( Matrix-factorization, Eigenvalues, Column and row spaces, Norms)
- Probability theory (Conditional, Bayes Rule, Concentration Inequalities, Distributions, Gaussian, Multi-variate)
- Basic Data Structures, Algorithms and Asymptotic Analysis (graphs, heaps, lists, dynamic programming)
- Calculus (Multi-variate)
- Web Technologies
- HTML
- JS
- CSS
If you don't meet one or more pre-requisites, be prepared to spend more time before or during the course in learning them.
- DAV: Data Analytics and Visualization, (to be published soon)
- MML: Mathematics of Machine Learning
- LFD: Learning From Data, Gilbert strang
- IVB: Interactive visualization for the Web, Scott Murray
- ISL: Introduction to Statistical Learning
- PML : Probabilistic Machine Learning, Kevin P. Murphy
- DSC: Data Science from Scratch, Joel Grus
- GA: Graph Algorithms: Practical Examples in Apache Spark and Neo4J, Mark Needham, Amy Hodler
Introduction to Data science workflow; Data Collection and Exploratory Analysis: Automated methods for data collection, Data and Visualization Models, Data wrangling and cleaning, and Exploratory data analysis; Building Models for: Classification, Clustering, Regression; Model evaluation: statistical tests for significance of predictors; Time-series Analysis: Characteristics, Regression, Exploratory data analysis, ARIMA Models; Visualization Design: Introduction, Abstractions, Validation, Marks and Channels; Visualization of Different Data Types: Tabular Data, Multidimensional Data, Spatial Data, Graphs, Text Data; Assorted Topics: Graphical Perception, Interaction dynamics for Visual Analysis, Using Space Effectively, Stacked Graphs, Geometry & Aesthetics.
Google drive (for IIT Bhilai students): GDrive
| # | Week | Topics planned in this week | Text Book Reference |
|---|---|---|---|
| 1 | Dec 31 | Data Collection, Data Exploration | Ch 1 and Ch2 |
| 2 | Jan 6 | Linear Algebra and Regression and Classification | Ch3 |
| 3 | Jan 13 | Probability, Statistics and Optimization | Ch 4,5 |
| 4 | Jan 20 | Data Visualization Theory | Ch 6 |
| 5 | Jan 27 | Choosing the right visualization for your data | Ch 7 |
| 6 | Feb 3 | Clustering, Dimensionality reduction | Ch 8 |
| 7 | Feb 10 | High Dimensional Data Analysis, Dimensionlity Reduction | Ch 9 |
Both data-scientists and software developers are good at designing and building complex systems with many interconnected parts using different tools and frameworks. In general, software developers design systems consisting of many well-defined components, whereas data scientists work with systems wherein at least one of the components isn’t well defined prior to being built. That component is usually closely involved with data processing or analysis. Data scientists specialize in creating systems that rely on probabilistic statements about data and results. Well known examples of these are Google search engine (“These are probably the most relevant pages”), product recommendations on Amazon.com (“We think you’ll probably like these things”).
We will cover the skills necessary for cleaning, modeling and visualizing data by a data-scientist. We will also learn the skills needed for developers designing interactive dashboards and applications.
D3 (website)
D3.js is a JavaScript library for manipulating documents based on data. D3 helps you bring data to life using HTML, SVG, and CSS. D3’s emphasis on web standards gives you the full capabilities of modern browsers without tying yourself to a proprietary framework, combining powerful visualization components and a data-driven approach to DOM manipulation.
To make life easy, we will be using the tutorials and notebook environment provided by Observable
Check these examples: Sankey, Cholorpleth, Hexbin,Fisheye
The project and lab component of this course will equip students with modern software toolkit to develop their own data analysis and interactive visualization applications (web/android) to better appreciate the data science process.