Skip to content

gagan-iitb/DataAnalyticsAndVisualization

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

185 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Data Analytics And Visualization

Course Page for DS251 (Data Analytics and Visualization) and Lab course DS252 being taught at IIT Bhilai, India in the Winter Semester of 2026.


Course Instructor: Dr. Gagan Raj Gupta

This course's purpose is to introduce students to various algorithms and techniques for data analysis and visualization. We will emphasize working with real world datasets around us and develop relevant skills in the fields of data visualization and analysis of complex data sets, such as large network data.

TextBook DATA ANALYTICS AND VISUALIZATION, Gagan Raj Gupta and Naresh Nagwani, Prentice Hall of India, 2025 https://www.amazon.in/ANALYTICS-VISUALIZATION-Textbook-NARESH-NAGWANI/dp/9354439314

Grading Scheme

A. Lecture course (ds251) will have maximum weightage for exams (30, 50) and surprise quiz based (10). We will delve into the mathematics and algorithms of data analysis. I would like to give practice problems and bonus questions regularly to build critical thinking and problem solving skills (10).

B. Lab course (ds252) will have regular programming and data analysis tests without internet (50). Each student will get an individual project (40), where they have to collect, clean and analyse the data and present insights as a story. There will be regular practice problem sets and lab exercises (10).

Motivation

  • Useful insights can be obtained from data that can help people: agriculture, healthcare, industry, governments, science
  • Getting data is becoming easier day by day, but is very complex and difficult for people to understand
  • Data has errors of various types (missing, incorrect etc.), is incomplete and is hard to clean (e.g. user reviews/ratings, distorted images)
  • Data usually has complex correlations and i.i.d. assumptions don't always work very well (e.g. graph data, time-series data)
  • Data Visualization is critical to help us engage more diverse audience in the process of analytical thinking

In this course, we want to learn how that is being done and solve real-life problems that interest us.

UNDERSTAND DATA ---> HYPOTHESIS ---> MODELS ----> INSIGHTS

** Tips for students

  • Attend classes regularly and take notes.
  • Regularly review notes and think of possible applications and questions (at least on weekly basis)
  • Ask questions in the class, present your view-points (seek clarifications from instructor/TAs)
  • Practice problem solving in the real-world

Course Objectives

  • Motivate and demonstrate the benefits and uses of data science
  • Impart the skills needed by a data scientist: acquire, clean, model, visualize data
  • Teach fundamental algorithms for handling basic and complex datasets including recommendation, basket analysis, streaming algorithms
  • Study techniques for creating effective visualizations based on principles from graphic design, perceptual psychology, and cognitive science
  • Teach basic techniques of machine learning (unsupervised) which is important way to model relationships in data
  • Provide hands-on experience to students in analyzing datasets in diverse fields (NLP, Image/Video, Graphs, Networks, Bio-informatics, Finance)

Pre-requisites

  • Basic knowledge of Python (most assignments will be based on Python)
  • Knowledge of basic computer science principles and skills
  • Math
    • Linear Algebra ( Matrix-factorization, Eigenvalues, Column and row spaces, Norms)
    • Probability theory (Conditional, Bayes Rule, Concentration Inequalities, Distributions, Gaussian, Multi-variate)
    • Basic Data Structures, Algorithms and Asymptotic Analysis (graphs, heaps, lists, dynamic programming)
    • Calculus (Multi-variate)
  • Web Technologies
    • HTML
    • JS
    • CSS

If you don't meet one or more pre-requisites, be prepared to spend more time before or during the course in learning them.

Reference Books

  • DAV: Data Analytics and Visualization, (to be published soon)
  • MML: Mathematics of Machine Learning
  • LFD: Learning From Data, Gilbert strang
  • IVB: Interactive visualization for the Web, Scott Murray
  • ISL: Introduction to Statistical Learning
  • PML : Probabilistic Machine Learning, Kevin P. Murphy
  • DSC: Data Science from Scratch, Joel Grus
  • GA: Graph Algorithms: Practical Examples in Apache Spark and Neo4J, Mark Needham, Amy Hodler

Syllabus for the course

Introduction to Data science workflow; Data Collection and Exploratory Analysis: Automated methods for data collection, Data and Visualization Models, Data wrangling and cleaning, and Exploratory data analysis; Building Models for: Classification, Clustering, Regression; Model evaluation: statistical tests for significance of predictors; Time-series Analysis: Characteristics, Regression, Exploratory data analysis, ARIMA Models; Visualization Design: Introduction, Abstractions, Validation, Marks and Channels; Visualization of Different Data Types: Tabular Data, Multidimensional Data, Spatial Data, Graphs, Text Data; Assorted Topics: Graphical Perception, Interaction dynamics for Visual Analysis, Using Space Effectively, Stacked Graphs, Geometry & Aesthetics.

Class Materials

Google drive (for IIT Bhilai students): GDrive

# Week Topics planned in this week Text Book Reference
1 Dec 31 Data Collection, Data Exploration Ch 1 and Ch2
2 Jan 6 Linear Algebra and Regression and Classification Ch3
3 Jan 13 Probability, Statistics and Optimization Ch 4,5
4 Jan 20 Data Visualization Theory Ch 6
5 Jan 27 Choosing the right visualization for your data Ch 7
6 Feb 3 Clustering, Dimensionality reduction Ch 8
7 Feb 10 High Dimensional Data Analysis, Dimensionlity Reduction Ch 9

Data Scientist vs. Software Developer (Engineer)

Both data-scientists and software developers are good at designing and building complex systems with many interconnected parts using different tools and frameworks. In general, software developers design systems consisting of many well-defined components, whereas data scientists work with systems wherein at least one of the components isn’t well defined prior to being built. That component is usually closely involved with data processing or analysis. Data scientists specialize in creating systems that rely on probabilistic statements about data and results. Well known examples of these are Google search engine (“These are probably the most relevant pages”), product recommendations on Amazon.com (“We think you’ll probably like these things”).

Skills for a data scientist

We will cover the skills necessary for cleaning, modeling and visualizing data by a data-scientist. We will also learn the skills needed for developers designing interactive dashboards and applications.

D3 (website)

D3.js is a JavaScript library for manipulating documents based on data. D3 helps you bring data to life using HTML, SVG, and CSS. D3’s emphasis on web standards gives you the full capabilities of modern browsers without tying yourself to a proprietary framework, combining powerful visualization components and a data-driven approach to DOM manipulation.

To make life easy, we will be using the tutorials and notebook environment provided by Observable

Check these examples: Sankey, Cholorpleth, Hexbin,Fisheye

Labs and Projects

The project and lab component of this course will equip students with modern software toolkit to develop their own data analysis and interactive visualization applications (web/android) to better appreciate the data science process.

About

Course Page for DS251 and DS252 (Data Analytics and Visualization) at IIT Bhilai

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages