Statistics and Probability Notes Part 1
Statistics and Probability Notes Part 1
Statistics and Probability Notes Part 1
Outline
1. Arithmetic Mean
Let x1 , x2 , . . . , xn be an array of n measurements of a variable X. The arithmetic
mean is denoted and given by
n
1X
x̄ = xi (1)
n
i=1
Suppose that these n measurements can be arranged into k categories, and let
fi , i ∈ {1, 2, . . . , k} represents the frequency in each of the k categories, Then, the
arithmetic mean of the measurements is given by
n
1X
x̄ = fi xi , (2)
n
i=1
P
Where n = fi the sample size and the xi correspond to the observed values.
2. Geometric Mean
For a set of positive numbers x1 , x2 , . . . , xn the geometric mean is the principal nth
root of the product of the n numbers.
v
u n
uY
n
x̄G = t xi , (3)
i=1
And
v
u k
uY f
n
x̄G = t xi i . (4)
i=1
3. Harmonic Mean
The harmonic mean of a set of data x1 , x2 , . . . , xn is the reciprocal of the arithmetic
mean of the reciprocals of the data.
n
x̄H = Pn 1 . (5)
i=1 xi
And
n
x̄H = Pn fi
. (6)
i=1 xi
In summary,
4. Median
5. Mode
The mode is the value which occurs with the greatest frequency. Consequently,
it only really makes sense to calculate or use it with discrete data, or for continu-
ous data with small grouping intervals and large sample sizes. From this definition
therefore a distribution may have more than one mode.
Measures of variation
They determine the spread of the data values (variability of data). These mea-
sures include the range, variance and standard deviation.
1. Range
2
Range is the difference between the largest and smallest observation. Since it de-
pends only on two observations, the lowest and the highest, we will get a misleading
idea of dispersion if these values are outliers.
R=highest value-lowest value
2. Variance
The variance is the average of the squares of the distance each value is from the
mean. Let x1 , x2 , . . . , xn be an array of n measurements of a variable X. The
sample variance is given by
n
1 X
S2 = (xi − x̄)2 (8)
n−1
i=1
( n )
1 X
S2 = ( 2
xi ) − nx̄ 2
(9)
n−1
i=1
3. Standard Deviation
Sample
√ Standard Deviation is a square root of sample variance and it is denoted by
S = S2.
The population Standard Deviation√ is a square root of population variance and it
is denoted and calculated by σ = σ 2 .
• The variances and standard deviations can be used to determine the spread
of the data. If the variance or standard deviation is large, the data are more
dispersed. This information is useful in comparing two (or more) data sets to
determine which is more (most) variable. If the data all lies close to the mean
then the standard deviation will be small. While if the data is spread out
over a large range of values, standard deviation will be large. That is having
outliers will increase the standard deviation.
• The measures of variance and standard deviation are used to determine the
consistency of a variable.
3
• The variance and standard deviation are used to determine the number of
data values that fall within a specified interval in a distribution.
• Finally, the variance and standard deviation are used quite often in inferential
statistics.
4. Coefficient of variation (CV)
It is equal the standard deviation divided by the mean times 100%
S
CV = × 100. (12)
x̄
The result is expressed as a percentage. This coefficient is used when you need to
compare standard deviations where the units are different.
1. Find the mean, median, mode, range, Standard deviation and variance for the
following five weight measurements in Kg: 40, 45, 50, 55, 60
2. Find the mean, median, mode, range, Standard deviation and variance for the
following ten weight measurements in Kg: 60, 55, 40, 50, 45, 45, 50, 55, 60, 50
3. Select ten students randomly from your class and ask their age in completed
years, or their weight in Kg or their height in cm, then
a. Describe as how you select the ten students from your class
b. Present the information you collected using an appropriate graph and
frequency distribution table.
c. Find the mean, median, mode, range, Standard deviation and variance
for your data
4
d. Tell the most likely distribution for your data
e. Find the skewness and kurtosis value for your data and interpret the
result.
Assume that an experiment can be repeated many times, with each repetition called
a trial, and assume that one or more outcomes can result from each trial, then the
probability of a given outcome is the number of times that outcome occurs divided
by the total number of trials. If the outcome is sure to occur, it has a probability of
1; if an outcome can not occur, its probability is 0. In other words, the probability
is equal to the number of ways of achieving success divide by the total number of
possible outcomes.
Example:The probability of flipping a fair coin and getting tails is 0.50, or 50%. If
a coin is flipped 10 times, there is no guarantee, that exactly 5 tails will be observed,
the proportion of tails can range from 0 to 1.
5
In the above examples, the sample spaces were found by observation and
reasoning; however, another way to find all possible outcomes of a probability
experiment is to use a tree diagram.
Use a tree diagram to find the sample space for the gender of three chil-
dren in a family, as in Example 2.
Solution Since there are two possibilities (boy or girl) for the first child,
draw two branches from a starting point and label one B and the other G.
Then if the first child is a boy, there are two possibilities for the second child
(boy or girl), so draw two branches from B and label one B and the other G.
Do the same if the first child is a girl. Follow the same procedure for the third
child. The completed tree diagram is shown in Figure 1. To find the outcomes
for the sample space, trace through all the possible branches, beginning at the
starting point for each one.
5. An event is defined to be any subset of the sample space, and events are
usually denoted by capital letters, A, B and so forth.
An event can be one outcome or more than one outcome. For example, if a
die is rolled and a 6 shows, this result is called an outcome, since it is a result
of a single trial. An event with one outcome is called a simple event. The
event of getting an odd number when a die is rolled is called a compound
event, since it consists of three outcomes or three simple events. In general,
a compound event consists of two or more outcomes or simple events.
6. A conditional probability is the probability of one event given that another
event has occurred.
7. In Probability OR means the union that is either can occur and in probability
AND means intersection that is both must occur. Two events are mutually
exclusive if they cannot occur simultaneously.
6
Formula used to calculate the probability of an event. The probability of any event
A is
N umber of outcomes in A
(13)
T otal number of outcomes in the sample space
This probability is denoted by
n(A)
P (A) = (14)
n(S)
6. For disjoint events, A ∩ B = ∅, and the addition rule takes the simple form
P (A ∪ B) = P (A) + P (B).
P (A)
9. If A ⊆ B), then A ∩ B = A, so P (A|B) = P (B) .
10. The complement of an event A is the set of outcomes in the sample space
that are not included in the outcomes of event A. The complement of Ais
denoted Ac ; P (Ac ) = 1 − P (A), P (A) = 1 − P (Ac ), therefore,
P (A) + P (Ac ) = 1.
The multiplication rule for probabilities when events are not independent can
be used to derive one form of an important formula called Baye’s theorem.
Since P (A ∩ B) is the same as P (B ∩ A), then
7
Example 3. If a family has three children, find the probability that two of the
three children are girls.
Solution:
The sample space for the gender of the children for a family that has three children
has eight outcomes, that is,
Since there are three ways to have two girls, namely, GGB, GBG, BGG, then
3
P ( two girls) = .
8
Example 4.: In a college campus, suppose that 2600 are men out of 4000 under-
graduate students, while 800 are men among 2000 undergraduates who are under
the age 25. From this population of undergraduate students if one student is se-
lected at random, what is the probability that the student will be either a man or
be under the age 25?
Solution:
Let A denote the event even and B denote the event a six. Here, sample space is
given by
S = {1, 2, 3, 4, 5, 6}.
3 1 1
P (A) = = and P (B) =
6 2 6
Clearly B ⊆ A, so
1
P (B) 6 1
P (A|B) = = 1 = = 0.3.
P (A) 2
3
Example 5.: For events A and B, you are given that P (A) = 23 , P (B) = 2
5 and
P (A ∪ B) = 43 . Find P (Ac ), P (B c ), P (A ∩ B).
Solution
8
2 1 2 3
P (Ac ) = 1 − P (A) = 1 − = , P (B c ) = 1 − P (B) = 1 − =
3 3 5 5
P (A ∪ B) = P (A) + P (B) − P (A ∩ B)
⇔ P (A ∩ B) = P (A) + P (B) − P (A ∪ B)
2 2 3 19
⇔ P (A ∩ B) = + − =
3 5 4 60
Example 6. A coin is flipped and a die is rolled. Find the probability of getting a
head on the coin and a 4 on the die.
Solution
1 1 1
P ( head and 4) = P ( head).P (4) = . =
2 6 12
Note that the sample space for the coin is H, T ; and for the die it is 1, 2, 3, 4, 5, 6
Example 7. A recent survey asked 100 people if they thought women in the armed
forces should be permitted to participate in combat. The results of the survey are
shown
Gender Yes No Total
Male 32 18 50
Female 8 42 50
Total 40 60 100
a. The respondent answered yes, given that the respondent was a female.
b. The respondent was a male, given that the respondent answered no.
Solution:
Let
M: respondent was a male; Y: respondent answered yes
F: respondent was a female; N: respondent answered no‘
a. The problem is to find P (Y |F ). The rule states
P (F and Y )
P (Y |F ) =
P (F )
The probability P (F and Y ) is the number of females who responded yes, divided
by the total number of respondents:
8
P ((F and Y )) =
100
The probability P (F ) is the probability of selecting a female:
50
P ((F ) =
100
9
Then
P (F and Y ) 8 50 4
P (Y |F ) = = / =
P (F ) 100 100 25
P (M and N ) 18 60 3
P (M |N ) = = / =
P (N ) 100 100 10
3. Discrete random variable can only take finite or countable values. Eg.
Sex, parity, race, etc.
4. Continuous random variable can take any value within a specified interval.
Eg. Blood pressure, weight, etc.
Solution
Since the sample space S is S = {1, 2, 3, 4, 5, 6} and each outcome has a proba-
bility of 61 , the distribution is as shown.
Outcome X 1 2 3 4 5 6
1 1 1 1 1 1
Probability P(X) 6 6 6 6 6 6
Example 2:
Construct a probability distribution for tossing a coin three times by assuming that
X is the random variable for the number of heads.
Solution
Knowing that the sample space for tossing a coin three times is given by S =
{T T T, T T H, T HT, HT T, HHT, HT H, T HH, HHH}.
Hence, the probability of getting no heads is 18 , one head is 38 , two heads is 38 , and
three heads is 81 . From these values, a probability distribution can be constructed
by listing the outcomes and assigning the probability of each outcome, as shown
here
Number of heads X 0 1 2 3
1 3 3 1
Probability P(X) 8 8 8 8
10
Example 3: Consider the data below which show the frequency distribution
of the number of babies that women have had. Here let X be the discrete random
variable representing the number of children a woman have had, if the study child
was the first child then X = 1, if he/she the second then X = 2, and so on. The
table 1 shows the probability distribution of X, i.e. P (X), which is the proportion
of a woman with 1,2, ... children (frequency distribution). We observe all possible
outcomes of X, so the probabilities add up to 1 (exhaustive trial).
Suppose we want to know the probability that a particular mother’s child is her
3rd child, from the table P (X = 3) = 0.14.
Question
What is the probability that a chosen mother’s child is her 4th or 5th ?
• The sum ofPthe probabilities of all the events in the sample space must equal
1; that is, ni=1 P (X = xi ) = 1
• The probability of each event in the sample space must be between or equal
to 0 and 1. That is,
0 ≤ P (X) ≤ 1. (20)
Example
Determine whether each distribution is a probability distribution.
X 0 5 10 15 25
1. 1 1 1 1 1
P(X) 5 5 5 5 5.
X 0 2 4 6
2.
P(X) −1 1.5 0.3 0.2
11
Note
If the frequency groups or categories become many it will be difficult to use fre-
quency distribution as in the above example. Instead one appeals to known theo-
retical probability distributions. Most measurements in real life take the form of
known theoretical distributions.
Example: Weight and Age have roughly normal distributions
Example:
Suppose that the error in the reaction temperature, in degree Celsius, for a con-
trolled laboratory experiment is a continuous random variable X having the prob-
ability density function
( 2
x
, −1 < x < 2
f (x) = 3
0, elsewhere
a. Obviously, f (x) ≥ 0.
∞ 2
x2
Z Z
f (x)dx = dx = 1.
−∞ −1 3
Therefore, f (x) is a density function.
b.
1
x2
Z
1
P (0 ≤ X ≤ 1) = dx =
0 3 9
12
10.3.3 Expectation and Variance
Let X be a random variable with probability distribution f (x).
1. If X is discrete random variable, then
Second, Variance of X
13
Solution:
4. V ar(a) = 0
5. V ar(aX) = a2 V ar(X)
1. Bernoulli distribution
P (X = 1) = p (21)
P (X = 0) = 1 − p (22)
Many types of probability problems have only two outcomes or can be reduced
to two outcomes. For example, when a coin is tossed, it can land heads or tails.
When a baby is born, it will be either male or female. In a basketball game, a team
14
either wins or loses. A true/false item can be answered in only two ways, true or
false. Other situations can be reduced to two outcomes. For example, a medical
treatment can be classified as effective or ineffective, depending on the results. A
person can be classified as having normal or abnormal blood pressure, depending on
the measure of the blood pressure gauge. A multiple-choice question, even though
there are four or five answer choices, can be classified as correct or incorrect. Situ-
ations like these are called binomial experiments.
A binomial experiment and its results give rise to a special probability distribution
called the binomial distribution. The binomial distribution is used when
there are only two outcomes for an experiment, there are a fixed number of trials,n,
the probability is the same for each trial, and the outcomes are independent of one
another.
µ = np (24)
σ 2 = npq (25)
Examples
1. The probability that a patient recovers from a rare blood disease is 0.4. If 15
people are known to have contracted this disease, what is the probability that
a. at least 10 survive,
b. from 3 to 8 survive, and
c. exactly 5 survive?
Solution
a. n = 15 and p = 0.4. “At least 10 survive” means 10, 11, 12, 13, 14, 15, That
is,
15!
P (X = 10) = 0.410 × 0.615−10 = 0.024.
10!(15 − 10)!
15!
P (X = 11) = 0.411 × 0.64 = 0.007.
11!(15 − 11)!
15!
P (X = 12) = 0.412 × 0.63 = 0.002.
12!(3)!
15!
P (X = 13) = 0.413 × 0.62 ≈ 0.
13!(2)!
15!
P (X = 14) = 0.414 × 0.61 ≈ 0
14!(1)!
15!
P (X = 15) = 0.415 × 0.60 ≈ 0.
15!(0)!
15
Therefore, the probability of at least 10 is survived is equal to
P (X = 3) + P (X = 4) + P (X = 5) + P (X = 6) + P (X = 7) + P (X = 8) = 0.8779.
2. A coin is tossed 3 times. Find the probability of getting exactly two heads.
Solution
Here n = 3,x = 2, the probability of a success (heads) is 12 in each case.
3!
P (X = 2) = 0.52 × 0.51 = 0.375,
2!(1)!
3. A die is rolled 360 times. Find the mean, variance, and standard deviation of
the number of 4s that will be rolled.
Solution
This is a binomial experiment since getting a 4 is a success and not getting a
4 is considered a failure. Hence n = 360, p = 16 , q = 1 − 16 = 56 ,
1
µ = np = 360 × = 60
6
1 5
σ 2 = npq = 360 × × = 50
6 6
√
σ = σ 2 = 7.07
16
4. The Statistical Bulletin published by Metropolitan Life Insurance Co. re-
ported that 2% of all American births result in twins. If a random sample of
8000 births is taken, find the mean, variance, and standard deviation of the
number of births that would result in twins (ANS: µ = 160, σ 2 = 156.8, σ =
12.5)
3.Poisson distribution
A discrete probability distribution that is useful when number of trials,n, is
large and the probability of success, p, is small and when the independent
variables occur over a period of time is called the Poisson distribution. It gives
the probability that an outcome occurs in a specified number of times. The
probability of X occurrences in an interval of time, volume, area, etc., for a
variable where (Greek letter lambda) is the mean number of occurrences per unit
(time, volume, area, etc.) is
e−λ λx
P (X = x, λ) = , where x = 0, 1, 2, . . . . (26)
x!
The letter e is a constant approximately equal to 2.7183. The mean and the
variance of the Poisson distribution are the same and it is given by
µ = σ 2 = np (27)
Examples
e−4 46
P (X = 6, 4) = = 0.1042
6!
17
Figure 3: Using Poisson table
3. Ten is the average number of oil tankers arriving each day at a certain port.
The facilities at the port can handle at most 15 tankers per day. What is the
probability that on a given day tankers have to be turned away? ( answer:
0.0487)
4. Suppose we are interested in the number of people who visit the clinic in
city ”X” in a given year among the total population say 5000,000, and let
the probability that some one in the city visits the clinic is 0.00001. The
mean number of people from the example above would be np = 5000000 × 0 :
00001 = 50 which is also the variance. For this example calculate:
1. The probability that no one in this population visits the clinic in the a
given year
2. The probability that less than 5 people visits the clinic in the a given
year
(a) It can take on any value (not just integers, as do the binomial and Poisson
distribution)
(b) A normal distribution curve is bell-shaped.
(c) The mean, median, and mode are equal and are located at the center of
the distribution.
(d) A normal distribution curve is unimodal (i.e., it has only one mode).
(e) The curve is symmetric about the mean, which is equivalent to saying
that its shape is the same on both sides of a vertical line passing through
the center.
18
Figure 4: Normal curve
(f) The curve is continuous; that is, there are no gaps or holes. For each
value of X, there is a corresponding value of Y.
(g) The curve never touches the x axis. Theoretically, no matter how far in
either direction the curve extends, it never meets the x axis—but it gets
increasingly closer.
(h) The total area under a normal distribution curve is equal to 1.00, or
100%. This fact may seem unusual, since the curve never touches the x
axis, but one can prove it mathematically by using calculus. (The proof
is beyond the scope of this textbook.)
(i) The area under the part of a Normal distribution Curve that lies within
1 standard deviation of the mean is approximately 0.68, or 68%; within
2 standard deviations, about 0.95, or 95%; and within 3 standard devia-
tions, about 0.997, or 99.7%. See Figure 5, which also shows the area in
each region.
19
Standard normal distribution
The Standard Normal distribution follows a normal distribution and has mean
0 and standard deviation 1.
Solution
(a) The area in Figure 7(a) to the right of z = 1.84 is equal to 1 minus the
area in Table of standard normal distribution to the left of z = 1.84,
namely, 1 − 0.9671 = 0.0329. Alternatively, This sub-question can be
written as
(b) The area in Figure 7(b) between z = −1.97 and z = 0.86 is equal to
the area to the left of z = 0.86 minus the area to the left of z = −1.97.
20
Figure 7: Area under the curves for example 1.
Solution
(a) In Figure 8(a), we see that the k value leaving an area of 0.3015 to
the right must then leave an area of 0.6985 to the left. From Table of
standard normal distribution it follows that k = 0.52. Alternatively,
P (Z > k) = 1 − P (Z ≤ k)
⇒ P (Z ≤ k) = 1 − P (Z > k) = 1 − 0.3015 = 0.6985
21
6. From Table of standard normal distribution, we note that the total area to the
left of −0.18 is equal to 0.4286. In Figure 8(b), we see that the area between
k and −0.18 is 0.4197, so the area to the left of k must be 0.4286 − 0.4197 =
0.0089. Hence, from Table of standard normal distribution, we have k = 2.37.
Alternatively,
P (Z ≤ −0.18) − P (Z ≤ k) = 0.4197
⇒ P (Z ≤ k) = P (Z ≤ −0.18) − 0.4197 = 0.4286 − 0.4197 = 0.0089
P (0.5 < Z < 1.2) is shown by the area of the shaded region in Figure 9. This area
may be found by subtracting the area to the left of the ordinate z = −0.5 from the
entire area to the left of z = 1.2. Using Standard normal distribution table, we
have
P (45 < X < 62) = P (0.5 < Z < 1.2) = P (Z < 1.2) − P (Z < 0.5)
⇒ P (45 < X < 62) = 0.8849 − 0.3085 = 0.5764.
22