0% found this document useful (0 votes)

69 views

M2. Understanding A Data Set II

This document discusses variance, standard deviation, and other statistical measures of variability and distribution of data. It defines variance and standard deviation, and how to calculate them. It explains how variance and standard deviation can quantify how dispersed a data set is compared to the mean. The document also discusses the normal distribution and z-scores, how to transform raw scores to z-scores, and what percentages of scores fall within certain standard deviations of the mean in a normal distribution. Finally, it defines skewness and kurtosis as other measures of the shape of a distribution, and how to calculate and interpret them.

Uploaded by

MYo Oo

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

69 views

M2. Understanding A Data Set II

Uploaded by

MYo Oo

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 33

M2.

Understanding a data set (II)

Presented by
Aung Kay Tu, MBBS, DTM&H, MCTM, PhD
Variance and Standard Deviation
Variance: a measure of how data points differ
from the mean
• Data Set 1: 3, 5, 7, 10, 10
Data Set 2: 7, 7, 7, 7, 7

What is the mean and median of the above data set?

Data Set 1: mean = 7, median = 7

Data Set 2: mean = 7, median = 7

But we know that the two data sets are not identical! The variance shows how they are
different.

We want to find a way to represent these two data set numerically.

How to Calculate?
• If we conceptualize the spread of a distribution as the extent to which
the values in the distribution differ from the mean and from each other,
then a reasonable measure of spread might be the average deviation, or
difference, of the values from the mean.

( x  X )
N
• We could just drop the negative signs, which is the same mathematically as taking the
absolute value, which is known as the mean deviations.
• The average of the squared deviations about the mean is called the variance.

x  X 
2
2 For population variance
 
N

x  X 
2
2 For sample variance
s 
n 1
Score XX ( X  X )2
X

1
3
2
5
3
7
4
10
5
10
Totals
35

The mean is 35/5=7.

Score XX ( X  X )2
X

1
3 3-7=-4
2
5 5-7=-2
3
7 7-7=0
4
10 10-7=3
5
10 10-7=3
Totals
35
Score XX ( X  X )2
X

1
3 3-7=-4 16
2
5 5-7=-2 4
3
7 7-7=0 0
4
10 10-7=3 9
5
10 10-7=3 9
Totals
35 38
Score XX ( X  X )2
X

1
3 3-7=-4 16
2
5 5-7=-2 4
3
7 7-7=0 0
4
10 10-7=3 9
5
10 10-7=3 9
Totals
35 38

x  X 
2
2 38
s    7.6
n 5
Example 2

No. Data Set A Data set B

1 28 27
2 22 27
3 21 28
4 26 6
5 18 27
Find the mean, median, mode, range?

mean 23 23
median 22 27
range 10 22

What can be said about this data?

Due to the outlier, the median is more typical of overall performance.

Which data set is more consistent?

standard deviation - a measure of variation of scores about the
mean

• higher standard deviation indicates higher spread, less consistency,

and less clustering.

x  X 
2

s
• sample standard deviation: n 1

• population standard deviation: x  


N
Find area
Length = 10 cm
Breadth= 3.5 cm

Length = 10 cm
Breadth= 10 cm
Find area under the curve
Bell shaped curve
•  68% of all scores fall with 1 standard deviation of
the mean
•  95% of all scores fall with 2 standard deviation of
the mean
•  99.7% of all scores fall with 3 standard deviation of
the mean
z Scores, and the Normal Curve
• Z-distribution – normal distribution of standardized scores
z Scores, and the Normal Curve
• So what are z-scores?
• Number of standard deviations away from the mean of a particular score
• Can be positive or negative
• Positive = above mean
• Negative = below mean

( X  )
z

The z Distribution
Transforming Raw Scores to z Scores
• Step 1: Subtract the mean of the population from the raw score
• Step 2: Divide by the standard deviation of the population
1.96

the normal distribution is symmetric

Skewness and Kurtosis
Skewness
• Skewness describes how the sample differs in shape from a
symmetrical distribution.
• If a normal distribution has a skewness of 0, right skewed is greater
then 0 and left skewed is less than 0.
• In a normal distribution where skewness is 0, the mean, median and
mode are equal.
• In a negatively skewed distribution, the mode > median > mean.
• Positively skewed distributions occur when most of the scores are
toward the low end of the distribution.
• In a positively skewed distribution, mode< median< mean.
Skewness
When the distribution is symmetric, the value of skewness should be zero.
Karl Pearson defined coefficient of Skewness as:

Mean  Mode
Sk 
SD
Since in some cases, Mode doesn’t exist, so using empirical relation,

We can write,

3  Median  Mean 
Sk 
SD
(it ranges b/w -3 to +3)
Application
• If we have a skewed data then it may harm our results.
• In order to use a skewed data we have to apply a log transformation
over the whole set of values to discover patterns in the data and make
it usable for the statistical model.
Karl Pearson (1857-1938)
Kurtosis
• Karl Pearson introduced the term Kurtosis (literally the amount of hump)
for the degree of peakedness or flatness of a unimodal curve.

When the peak of a curve becomes relatively high

then that curve is called Leptokurtic.

When the curve is flat-topped, then it is called

Platykurtic.

Since normal curve is neither very peaked nor very

flat topped, so it is taken as a basis for comparison.

The normal curve is called Mesokurtic.

Kurtosis
• For a normal distribution, kurtosis is equal to 3.

• When is greater than 3, the curve is more sharply peaked and has narrower tails
than the normal curve and is said to be leptokurtic.

• When it is less than 3, the curve has a flatter top and relatively wider tails than the
normal curve and is said to be platykurtic.
Kurtosis

Another measure of Kurtosis, known as Percentile coefficient of kurtosis is:

Q.D
Kurt=
P90  P10
Where,
Q.D is semi IQR=(Q3-Q1)/2
P90=90th percentile
P10=10th percentile
Application
• We can evaluate the normality of variable by using skewness and
kurtosis.
• A kurtosis with an absolute value greater than 10 is problematic
• Useful measure of whether there is a problem with outliers in a data
set.
• Larger kurtosis indicates a more serious outlier problem, and may
need to choose alternative statistical methods.

Gentle Lentil Case Solution
100% (2)
Gentle Lentil Case Solution
4 pages
Multiple+Choice+Questions Forecasting
86% (7)
Multiple+Choice+Questions Forecasting
3 pages
Lecture 2 - Normative Distribution and Descriptive Statistics
No ratings yet
Lecture 2 - Normative Distribution and Descriptive Statistics
51 pages
Descriptive Stat
No ratings yet
Descriptive Stat
13 pages
Statistics in Education: Distribution
No ratings yet
Statistics in Education: Distribution
79 pages
8614.educational Statitics Unit 4
No ratings yet
8614.educational Statitics Unit 4
34 pages
Properties of The Normal Distribution
No ratings yet
Properties of The Normal Distribution
16 pages
Chap 4
No ratings yet
Chap 4
7 pages
Biostat Lec Part 4 (SV)
No ratings yet
Biostat Lec Part 4 (SV)
3 pages
Lecture of BIOSTATISTICS 12.2022 RMDC
No ratings yet
Lecture of BIOSTATISTICS 12.2022 RMDC
85 pages
4th Chap Variability
No ratings yet
4th Chap Variability
24 pages
DS Notes Unit - III
No ratings yet
DS Notes Unit - III
29 pages
Statical Data 1
No ratings yet
Statical Data 1
32 pages
BA20 Session2 M
No ratings yet
BA20 Session2 M
40 pages
Unit-3 DS Students
No ratings yet
Unit-3 DS Students
35 pages
Previously On Statistics 1
No ratings yet
Previously On Statistics 1
48 pages
Chapter 4 Fin534
No ratings yet
Chapter 4 Fin534
38 pages
Psychological Statistics II LAB Psychological Statistics
No ratings yet
Psychological Statistics II LAB Psychological Statistics
2 pages
Chapter 5 - RM
No ratings yet
Chapter 5 - RM
22 pages
Analysis of Statistcal Data
No ratings yet
Analysis of Statistcal Data
46 pages
3-Measures of Dispersion
No ratings yet
3-Measures of Dispersion
33 pages
History Reporting
No ratings yet
History Reporting
61 pages
Descriptive Statistics 1
No ratings yet
Descriptive Statistics 1
63 pages
Unit 6 Interpreting Evaluation Results
No ratings yet
Unit 6 Interpreting Evaluation Results
54 pages
Statistics-in-Education6
No ratings yet
Statistics-in-Education6
460 pages
TDA1
No ratings yet
TDA1
57 pages
5. CH.5. stat.com
No ratings yet
5. CH.5. stat.com
34 pages
AP ECON 2500 Session 2
No ratings yet
AP ECON 2500 Session 2
22 pages
1 MPH - Sem 1 - Biostatistics - Normal Distribution
No ratings yet
1 MPH - Sem 1 - Biostatistics - Normal Distribution
19 pages
FDSA unit 2
No ratings yet
FDSA unit 2
44 pages
Lecture III-Measures of Dispersion
No ratings yet
Lecture III-Measures of Dispersion
33 pages
Descreptive Statistics 1
No ratings yet
Descreptive Statistics 1
74 pages
2 - Central Tendency and Dispersion - SFB
No ratings yet
2 - Central Tendency and Dispersion - SFB
69 pages
Descriptive Statistics MBA
100% (2)
Descriptive Statistics MBA
7 pages
Lecture 3
No ratings yet
Lecture 3
14 pages
Lecture 4
No ratings yet
Lecture 4
38 pages
Ed216 Chapter 7
No ratings yet
Ed216 Chapter 7
31 pages
Weighted Average Formula Simplify. Multiply Each Side by 27. Subtract 1245 From Each Side. Simplify. Divide Each Side by 12
No ratings yet
Weighted Average Formula Simplify. Multiply Each Side by 27. Subtract 1245 From Each Side. Simplify. Divide Each Side by 12
12 pages
Chapter 4 Measures of Dispersion (Variation)
No ratings yet
Chapter 4 Measures of Dispersion (Variation)
34 pages
Midterms_Day_5
No ratings yet
Midterms_Day_5
33 pages
Theory and Formula
No ratings yet
Theory and Formula
42 pages
Statistics Final Review
No ratings yet
Statistics Final Review
37 pages
Bioepi Lesson 6. Descriptive Statistics
No ratings yet
Bioepi Lesson 6. Descriptive Statistics
38 pages
Process Data Analysis
No ratings yet
Process Data Analysis
24 pages
4 - The Shape of The Distribution
No ratings yet
4 - The Shape of The Distribution
33 pages
1-PDP On Decoding Statistics For Data Analysis - Day 1 - Test of Normality
No ratings yet
1-PDP On Decoding Statistics For Data Analysis - Day 1 - Test of Normality
31 pages
ASSIGNMEN4
100% (1)
ASSIGNMEN4
15 pages
Mean Median Mode
No ratings yet
Mean Median Mode
56 pages
Day 3 Educational Statistics
No ratings yet
Day 3 Educational Statistics
37 pages
BUSD2027 QualityMgmt Module2
No ratings yet
BUSD2027 QualityMgmt Module2
168 pages
Lab 1 (1)
No ratings yet
Lab 1 (1)
5 pages
DS-2, Week 2 - Lectures
No ratings yet
DS-2, Week 2 - Lectures
13 pages
Chapter 4
No ratings yet
Chapter 4
21 pages
Lecture 2
No ratings yet
Lecture 2
93 pages
ARM301-METHODOLOGY Lesson6-DataAnalysis StatisticalTreatmentofData Condensedvers
No ratings yet
ARM301-METHODOLOGY Lesson6-DataAnalysis StatisticalTreatmentofData Condensedvers
49 pages
FXGFHGJHKJLK
No ratings yet
FXGFHGJHKJLK
19 pages
Topic 3
No ratings yet
Topic 3
49 pages
Practical 1 Data Analysis Descriptive Statistics
No ratings yet
Practical 1 Data Analysis Descriptive Statistics
12 pages
Descriptive Statistic
No ratings yet
Descriptive Statistic
37 pages
Midterm Reviewer Matm
No ratings yet
Midterm Reviewer Matm
3 pages
Statistical Foundations for Psychology
From Everand
Statistical Foundations for Psychology
James C. Ware
No ratings yet
Origami Dots: Folding paper to explore geometry
From Everand
Origami Dots: Folding paper to explore geometry
Andy Parkinson
5/5 (1)
FinalTerm MTH302 10 Papers Solved
No ratings yet
FinalTerm MTH302 10 Papers Solved
137 pages
Case Study Focsa Cristina
No ratings yet
Case Study Focsa Cristina
20 pages
Theory Efficient Frontier
No ratings yet
Theory Efficient Frontier
6 pages
3rd Quarter Statistics and Probability Quarter 1
No ratings yet
3rd Quarter Statistics and Probability Quarter 1
60 pages
(1992) A Method For Measuring Foot Pressures Using A High Resolution
No ratings yet
(1992) A Method For Measuring Foot Pressures Using A High Resolution
8 pages
Assignment in Research and Statistics
No ratings yet
Assignment in Research and Statistics
17 pages
900 MHZ in Forest
No ratings yet
900 MHZ in Forest
57 pages
MG311 Exam
No ratings yet
MG311 Exam
4 pages
Topic: Teaching Competence of Higher Secondary School Teachers With Various Aspects in Relation To Communication Skill
No ratings yet
Topic: Teaching Competence of Higher Secondary School Teachers With Various Aspects in Relation To Communication Skill
11 pages
7356 2 QP Mathematics AS 14oct20 PM MQP18A4
No ratings yet
7356 2 QP Mathematics AS 14oct20 PM MQP18A4
52 pages
6.2 Part1 Hwork
No ratings yet
6.2 Part1 Hwork
4 pages
6 Methods of Data Collection PDF
100% (4)
6 Methods of Data Collection PDF
30 pages
1.1. What Is Population Ecology?
No ratings yet
1.1. What Is Population Ecology?
24 pages
Stat and Prob M3 Adm
No ratings yet
Stat and Prob M3 Adm
9 pages
Business Statistics May Module
No ratings yet
Business Statistics May Module
72 pages
Session 5 6 7 - Inventory I
No ratings yet
Session 5 6 7 - Inventory I
81 pages
Question Bank On Z Transforms
No ratings yet
Question Bank On Z Transforms
15 pages
Marathon 4 - Measures of Central Tendency and Dispersion
No ratings yet
Marathon 4 - Measures of Central Tendency and Dispersion
86 pages
Statistics and Computer: Tools For Analyzing of Assessment Data
No ratings yet
Statistics and Computer: Tools For Analyzing of Assessment Data
10 pages
CH 7
No ratings yet
CH 7
77 pages
Instructional Module: Republic of The Philippines Nueva Vizcaya State University Bambang, Nueva Vizcaya
No ratings yet
Instructional Module: Republic of The Philippines Nueva Vizcaya State University Bambang, Nueva Vizcaya
20 pages
Astm D4964-96R20
No ratings yet
Astm D4964-96R20
6 pages
A SPECIALTY TOYS CASE Managerial Report
No ratings yet
A SPECIALTY TOYS CASE Managerial Report
7 pages
Visvesvaraya Technological University, Belagavi: VTU-ETR Seat No.: A
No ratings yet
Visvesvaraya Technological University, Belagavi: VTU-ETR Seat No.: A
48 pages
Health Care Waste Generation Rates and Patterns - The Case of Lebanon
No ratings yet
Health Care Waste Generation Rates and Patterns - The Case of Lebanon
5 pages
ADM SHS StatProb Q3 M13 Computing Probabilities and Percentiles Using The Standard
100% (1)
ADM SHS StatProb Q3 M13 Computing Probabilities and Percentiles Using The Standard
26 pages
Foundation Design Principles and Practices 3rd Edition Coduto Solutions Manual PDF Download Full Book With All Chapters
100% (9)
Foundation Design Principles and Practices 3rd Edition Coduto Solutions Manual PDF Download Full Book With All Chapters
35 pages