Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
32 views

Probability Theory

This document discusses probability theory and measures of dispersion. It defines probability as the occurrence of discrete or continuous events. Dispersion is the spread of data around the central location or mean. Measures of dispersion include variance, standard deviation, and coefficient of variation. These quantify how well the mean represents the data and the degree of dispersion relative to the mean. Skewness and kurtosis describe other properties of frequency distributions such as symmetry and pointiness. Box and whisker plots provide a visual representation of these dispersion properties.

Uploaded by

shiva
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
32 views

Probability Theory

This document discusses probability theory and measures of dispersion. It defines probability as the occurrence of discrete or continuous events. Dispersion is the spread of data around the central location or mean. Measures of dispersion include variance, standard deviation, and coefficient of variation. These quantify how well the mean represents the data and the degree of dispersion relative to the mean. Skewness and kurtosis describe other properties of frequency distributions such as symmetry and pointiness. Box and whisker plots provide a visual representation of these dispersion properties.

Uploaded by

shiva
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 17

Probability Theory

What is probability?
 Probability of occurrence of
◦ an event – discrete
◦ a number in continuous data
 Example –
◦ Air Quality
 What is the probability that today air quality index is
unhealthy?
 What is the probability that today air quality index is 161.2?
◦ Grades
 What is the probability that I will not qualify in the exam?
 What is the probability that I will be amongst first 10 in class
of 40 with marks = 75?
 Probability = n/N
Taking the grades example?
 If the mean grades of the class is 50, then
what is the likeliness that I will be
amongst top 10 in class of 40 with grades
of 80?
 Therefore you need dispersion and use of
z-scores.
MEASURES OF
DISPERSION
What is dispersion?
Data arranged in
ascending order
1
1
2
3
3 Average = 3.9
3 Is it the true measure?

4
4
5
6
7
9

What is
0
0 2 4 6 8 10 12

7
dispersion?
6

0
0 2 4 6 8 10 12
Dispersion
 Defines spread of data around central
location
 Mean is a hypothetical value
 Mean is a model created to summarize
our data
Deviance
 It is the difference between the observed
value and the modelled value
 It can be considered as the error for each
observation
 This gives information on each
observation
 What if, if we add all the observations?
 Total sum = total sum of all the deviances
???
Dispersion
 Total error = sum of deviations from the
mean
 Deviance =
 So, the total error is zero
 But there are errors
 Calculate sum of square errors
 Sum of squared errors = square sum of
deviation
 SS =
Dispersion
 Variance (s2) = average sum of squared
errors (SS)

 Standard deviation (s) = sqrt (s2)


◦ Measures how well the mean represents the
data in the same unit of dimension as the
observation xi
Coefficient of variation
 Standard deviation relative to mean, also
known as relative variability
 For example, a standard deviation of 2,
has different interpretation for mean of 6
as compared to 60.
 Coefficient of variation is the degree of
dispersion relative to the mean of the
data
Example
Data 1 Data 2
3 1
3 4
6 10
7 20
8 2
9 4
2 6
3 8
1 14
4 16

Compute mean, median, standard deviation for both the datasets


Properties of frequency distribution
 Lack of symmetry (skewness)
◦ skewed distribution are not symmetrical
◦ Frequent scores (tall bars in histogram) are
clustered at one end
 Pointyness (kurtosis)
◦ Platykurtic: more distribution of data in the
tails
◦ Leptokurtic: more data is around the centre
and less data distributed in the tails
Skewness
 What will be mean Positively skewed

with respect to
median?
 Symmertrical:
◦ Mean = median
 Positive skewed: Negatively skewed

◦ Mean > median


 Negative skewed
 Mean < median
Kurtosis
 What is the
difference in standard
deviation?
Box and Whisker plots
Measures of shape of distribution
 If standard deviation
◦ is small, then, data is clustered around mean
◦ is wide, then, mean is the bad representation
of data

You might also like