Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
9 views

Final Data Science Course (Practicals)

The MTS-555 MJP: Data Science course offers a comprehensive introduction to data science, covering fundamental concepts, statistical analysis, and predictive modeling. Students will gain hands-on experience using Python for data pre-processing, analysis, and model evaluation through lab sessions. By the end of the course, participants will be equipped to apply data science principles to real-world scenarios.

Uploaded by

Om Bachhav
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views

Final Data Science Course (Practicals)

The MTS-555 MJP: Data Science course offers a comprehensive introduction to data science, covering fundamental concepts, statistical analysis, and predictive modeling. Students will gain hands-on experience using Python for data pre-processing, analysis, and model evaluation through lab sessions. By the end of the course, participants will be equipped to apply data science principles to real-world scenarios.

Uploaded by

Om Bachhav
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 5

MTS-555 MJP: Data Science [2P credits]

Course Objectives:

1. Introduction to Data Science is a comprehensive course that introduces students to the


fundamental concepts, techniques, and tools used in the field of data science.
2. The course explores the role of data science in the era of big data and provides a strong
foundation in statistical analysis and predictive modelling.
3. Students will gain hands-on experience through lab sessions using Python Programming
and learn how to effectively pre-process and analyse data, build predictive models, and
evaluate their performance.
4. By the end of the course, students will have a solid understanding of the key principles of
data science and be able to apply them to real-world scenarios.

Course Outcomes:
1. Understand the need, benefits, and applications of data science in the context of big data.
2. Recognize the importance of mathematics and statistics as foundational disciplines for data
science.
3. Develop skills in data pre-processing, including handling missing values, data wrangling,
and data visualization.
4. Learn various supervised and unsupervised machine learning techniques for predictive
modelling.
5. Gain proficiency in evaluating and selecting appropriate evaluation metrics for machine
learning models.
6. Apply the concepts and techniques learned in the course to practical scenarios through lab
sessions.

Course Contents:

Pre-requisite: Programming in Python

Unit 1: Data Science in a big data world [12 Hours]


1.1 Need, benefits and uses of data science and big data
1.2 Overview of the data science process
1.3 The big data ecosystem and data science
1.4 Challenges in big data world
1.5 Importance of Mathematics and Statistics in data science

Unit 2: Statistical Foundation for Data Science [18 Hours]


2.1 Analysis of Variance
2.2 Data and data representation Techniques
2.3 Measure of Central Tendency and Variability
2.4 Exploratory Data Analysis
2.4 Introduction to probability and probability distributions
2.5 Methods of Estimation
2.6 Testing of Hypothesis.

Unit 3: Data Pre-processing [15 Hours]


3.1 Data and data quality
3.2 Missing Value Analysis and Data wrangling
3.3 Label encoding and feature selection
3.4 Data Visualization techniques
3.5 Data integration and reshaping
3.6 Graph mining methods
3.7 Tex mining techniques

Unit 4: Predictive Modelling [15 Hours]


4.1 Supervised Learning.
4.1.1 Regression Analysis: Linear, Non-linear and correlation
4.1.2. Time Series Analysis: ARIMA, SARIMA, VERMAX
4.1.3. Classification Techniques: Logistic regression, Decision trees, Random forest,
Support Vector Machine
4.2 Unsupervised Learning.
4.2.1 Clustering: K-means, Hierarchical clustering, density-based clustering
4.2.2 Dimensionality reduction using PCA and t-SNE
4.2.3 Association rules mining
4.3 Evaluation metrics for Machine Learning models.

Lab Sessions:

Practical 1. Probability Theory


Problems based on
a. Probability
b. Conditional Probability
Practical 2. Probability Distributions of Discrete Random Variables
Problems on the following discrete random variables
a. Binomial Random Variable
b. Poisson Random Variable
c. Hypergeometric Random Variable
Practical 3. Probability Distributions of Continuous Random Variables
Problems on the following continuous random variables
a. Uniform Random Variable
b. Exponential Random Variable
c. Normal Random Variable

Practical 4. Methods of Estimation


Problems based on methods of estimation of different population parameters.

Practical 5. Testing of Hypothesis


Problems to find confidence interval with different levels of significance and testing of hypothesis
for small as well as large sample set.

(For above practicals refer S. C. Gupta, Fundamentals of Statistics, Himalaya Publishing House)
For further practicals, choose any dataset from the following free datasets on the Kaggle.

Datasets:

1. Jobs and Salaries in Data Science


Jobs and Salaries in Data Science (kaggle.com)

2. NHANES datasets (from 2017-2018)


National Health & Nutrition Exam Survey 2017-2018 (kaggle.com)

3. Bitcoin Historical Data


Bitcoin Historical Data (kaggle.com)

4. COVID-19 Open Research Dataset Challenge (CORD-19)


COVID-19 Open Research Dataset Challenge (CORD-19) (kaggle.com)

5. 52,000 Animation Movie Details (2024)


52,000 Animation Movie Details (2024) (kaggle.com)

6. New York Housing Market


New York Housing Market | Kaggle

7. Apple Quality
Apple Quality (kaggle.com)

8. Palmer Archipelago (Antarctica) penguin data


Palmer Archipelago (Antarctica) penguin data (kaggle.com)

9. Netflix Movies and TV Shows


Netflix Movies and TV Shows (kaggle.com)

10. Most streamed Sportify Songs 2023


Most Streamed Spotify Songs 2023 (kaggle.com)

11. Supermarket sales


Supermarket sales (kaggle.com)

12. Students Performance in Examination


Students Performance in Exams (kaggle.com)

13. Amazon Sales Dataset


Amazon Sales Dataset (kaggle.com)

14. Sleep Health and Lifestyle Dataset


Sleep Health and Lifestyle Dataset (kaggle.com)
15. Diabetes Dataset
Diabetes Dataset (kaggle.com)

16. IBM HR Analytics Employee Attrition and Performance


IBM HR Analytics Employee Attrition & Performance (kaggle.com)

17. Fashion MNIST


Fashion MNIST (kaggle.com)

18. Google Play Store Apps


Google Play Store Apps (kaggle.com)

19. Customer Personality Analysis


Customer Personality Analysis (kaggle.com)

20. Brain-Spectrograms
Brain-Spectrograms (kaggle.com)

Practical 6. Exploratory Data Analysis


Perform the exploratory data analysis on chosen dataset to understand it.
Practical 7. Data Visualisation
Apply different data visualisation techniques to understand the chosen dataset.
Practical 8. Data Pre-processing
Do missing value analysis, data wrangling and labels encoding. Apply different
mining methods whichever necessary on chosen dataset for data cleaning.
Practical 9. Regression Analysis
Apply different regression techniques to build prediction model to understand
correlation between different features present in the chosen dataset.
Practical 10. Dimension Reducibility Techniques
Apply different dimension reducibility techniques to identify important features
from your chosen dataset.
Practical 11. Classification Techniques
Apply different classification techniques to build prediction model to understand
the relation between different features present in the chosen dataset. Decide which
classification technique is best for your chosen dataset.
Practical 12. Clustering Algorithms
Apply different clustering algorithms to your dataset to understand the relation
between among the different features under study in your dataset.
Recommended Books:

1. Foster Provost and Tom Fawcett, Data Science for Business. O’REILLY publications, 2013.

(Unit 1: Chapter 1, Unit 3: Chapters 6, 7, 8, 10 and Unit 4: Chapters 3, 4, 5.)

2. Peter Bruce, Andrew Bruce & Peter Gedeck, Practical Statistics for Data Scientists,
2nd edition.

(Unit 2: Chapters 1, 2, 3, 4 and Unit 4: Chapters 5, 6, 7.)

Reference Books:

1. Peter Bruce, Andrew Bruce & Peter Gedeck, Practical Statistics for Data Scientists,
2nd edition
2. Jiawei Han, Micheline Kamber & Jian Pei, Data Mining, Concepts and Techniques,
3rd edition
3. Ethem Alpaydin, Introduction to Machine Learning, Edition 2, The MIT Press.
4. S. C. Gupta, Fundamentals of Statistics, Himalaya Publishing House

You might also like