Data Science 3
Data Science 3
MACHINE LEARNING
B. TECH
II YEAR – II SEM (Sec-A & B)
Academic Year 2022-23
Pre-requisite:
Database Management Systems, Data Structures
Course Objectives:
This course will enable students to:
• Know about the fundamental concepts and technologies of Data Science.
• Explore the various Data collection and storage methods.
• Understand the Data Analysis, statistics, and various machine learning algorithms.
• Investigate about the visualization of data and apply coding techniques to data for
securing the data.
• Study the Applications of Data Science, Technologies for visualization Handling of
variables using Python.
Textbooks:
1. Cathy O’Neil, Rachel Schutt, Doing Data Science, Straight Talk from the Frontline. O’Reilly,
2013.
2. Jure Leskovek, Anand Rajaraman, Jeffrey Ullman, Mining of Massive Datasets. v 2.1,
Cambridge University Press, 2014.
Reference Books:
1. Joel Grus, “Data Science from scratch”, O'Reilly, 2015.
2. Gupta, S.C. and Kapoor, V.K.: “Fundamentals of Mathematical Statistics”, Sultan &
Chand & Sons, New Delhi, 11th Ed, 2002.
3. Hastie, Trevor, et al. “The elements of Statistical Learning”, Springer, 2009.
4. Wes Mc Kinney, “Python for Data Analysis”, O'Reilly Media, 2012
Course Outcomes:
The student will be able to
• Identify the basic concepts of data science and identify the types of data.
• Analyse about how to collect the data, manage the data, explore the data, store the data.
• Implement the basic measures of central tendency and classify the data using SVM and
navie Bayesian.
• Interpret the visualization of data and apply coding techniques to data for securing the
data.
• Analyse the various concepts of data science and can be able to handle simple
applications of data science using python.
UNIT– III
Ø Intro to Data Analysis
Ø Basics of Terminology
and concepts
2
Topics Covered
Topics Covered
• Descriptive analysis
o Ratio, proportion, percentage, and rate
o Median, mean, and trend
• Selection of the appropriate chart
Data Analysis: Key Concepts
Data Analysis
Statistical terms
• Ratio
• Proportion
• Percentage
• Rate
• Mean
• Median
• Trend
Central Tendency
Example:
(22+18+30+19+37+33) = 159 ÷ 6 = 26.5
Client 1 – 2
Client 2 – 134
Client 3 – 67
Client 4 – 10
Client 5 – 221
Median of clients 1–5 = 67
Median of clients 1–4 = 100.5
(67+134=201/2 = 100.5)
Mean vs. Median: When to Use One or the Other?
29.7 Facility 2 22 29
Facility 3 26
Facility 4 29
Facility 5 34
Facility 6 38
Facility 7 39
Mean vs. Median: When to Use One or the Other?
50.8 Facility 2 38 40
Facility 3 39
Facility 4 40
Facility 5 45
Facility 6 46
Facility 7 140
Use the Mean or the Median?
CD4 count
Client 1 9
Client 2 11
Client 3 92
Client 4 92
Client 5 95
Client 6 100
Client 7 100
Client 8 101
Client 9 104
Client 10 206
Trend
160
140
120
100 # adults on ART
# children on ART
80
60
40
20
0
2008 2009 2010 2011
19
Calculating Trends
160
140
120
100
# adults on ART
80 # children on ART
60
40
20
0
r pr l t
Ja n
Feb
M
a
A M ay un
J Ju ug Se p Oc Nov ec
A D
20
Key Messages
21
SELECT THE RIGHT CHART
Types of Charts
5 QUESTIONS TO ASK YOURSELF
WHEN CHOOSING A CHART
5 Questions to Ask Yourself When
Choosing a Chart
Number of clinicians
Line working in each clinic in
A line chart reveals
Years 1–4
trends or progress
over time.
• Can be used to
show many
different categories
of data
Use a line chart to
show a continuous
data set.
Examples of Charts to Choose
When Analyzing Data
Dual axis
• Used with 2–3 data sets,
at least one of which is
based on a continuous
set of data, and another
of which is better suited
to being grouped by
category
• Should be used to
visualize a correlation, or
the lack
thereof, between these
three data sets
.
Example of Charts to Choose
When Analyzing Data
Pie
• Represents
percentages,
with the segments
totaling 100
Example of Charts to Choose
When Analyzing Data
Customer happiness, by
Scatter plot response time
UNIT– III
Ø Descriptive Statistics
Ø Central tendency,
Variance, Mean,
Median etc., Concepts
UNIT– III
Ø Distribution properties
and arithmetic
UNIT– III
Ø Intro to ML
Ø Basic ML Algorithms
UNIT– III
Ø Linear Regression