Probability Theory
Probability Theory
What is probability?
Probability of occurrence of
◦ an event – discrete
◦ a number in continuous data
Example –
◦ Air Quality
What is the probability that today air quality index is
unhealthy?
What is the probability that today air quality index is 161.2?
◦ Grades
What is the probability that I will not qualify in the exam?
What is the probability that I will be amongst first 10 in class
of 40 with marks = 75?
Probability = n/N
Taking the grades example?
If the mean grades of the class is 50, then
what is the likeliness that I will be
amongst top 10 in class of 40 with grades
of 80?
Therefore you need dispersion and use of
z-scores.
MEASURES OF
DISPERSION
What is dispersion?
Data arranged in
ascending order
1
1
2
3
3 Average = 3.9
3 Is it the true measure?
4
4
5
6
7
9
What is
0
0 2 4 6 8 10 12
7
dispersion?
6
0
0 2 4 6 8 10 12
Dispersion
Defines spread of data around central
location
Mean is a hypothetical value
Mean is a model created to summarize
our data
Deviance
It is the difference between the observed
value and the modelled value
It can be considered as the error for each
observation
This gives information on each
observation
What if, if we add all the observations?
Total sum = total sum of all the deviances
???
Dispersion
Total error = sum of deviations from the
mean
Deviance =
So, the total error is zero
But there are errors
Calculate sum of square errors
Sum of squared errors = square sum of
deviation
SS =
Dispersion
Variance (s2) = average sum of squared
errors (SS)
with respect to
median?
Symmertrical:
◦ Mean = median
Positive skewed: Negatively skewed