Stat Int Note1
Stat Int Note1
Stat Int Note1
Objectives:
Remark: statistics are aggregate of facts. Single and isolated figures are not statistics as they cannot be
compared and are unrelated.
Secondly statistics is singular noun which deals with the methods of collecting, organizing, presenting,
analyzing and interpreting statistical data.
Classification of Statistics
Statistics may be divided into two main branches:
1) Descriptive Statistics
2) Inferential Statistics
Descriptive statistics:
Includes statistical methods involving the collection, presentation, and characterization of a set of data in
order to describe the various features of the data.
Methods of descriptive statistics include graphic methods (bar chart, pie chart, e t c) and numeric
measures (mean, median, variance e t c).
Descriptive statistics do not allow us to make conclusions beyond the data we have analyzed.
Meaningful and pertinent information cannot be realized from raw data unless summarized by the tools
of descriptive statistics.
Descriptive statistics, therefore, allow us to present the data in a more meaningful way which allows
interpretation of the data easily.
Inferential statistics:
Includes statistical methods which facilitate estimation the characteristics of a population or making
decisions concerning a population on the basis of sample results.
In this regard, methods like estimation and hypothesis testing are examples of inferential statistics.
For example, a biologist collected blood samples of 10 students from biology department to study blood types.
Accordingly, the following data is obtained:
O A O AB A O O B A O
Summary measures, for example, the proportion of students with blood type O in the sample is 50% is an
example of descriptive statistics. We can also describe the data using bar or pie charts.
However, if he/she wants to get information on the proportion of students with blood type O in the entire class,
he/she may use the sample proportion (50%) as an estimate of the corresponding value of the entire class. This
is an example of inferential statistics.
1
1.2 Stages in statistical investigation
A statistical study might involve the following stages: collection of data, organizing and presenting the
collected data, analyzing and interpreting the result.
Stage 1: DATA COLLECTION: this stage involves acquiring data related with the problem at hand.
Stage 2: ORGANIZING AND PRESENTING DATA: this stage involves the classification or sorting the
collected data based on some characteristics or attributes such as age, sex, marital status e t c. Further we may
use tables, graphs, charts so on to present the data.
Stage 3: DATA ANALYSIS: a systematic analysis of the data is necessary in order to reach conclusions or
provide answers to a problem. The analysis might require simple or sophisticated statistical tools depending on
the type of answers that may have to be provided.
Stage 4: INTERPRETATION OF THE RESULT: logically a statistical analysis has to be followed by
conclusions in order to be able to make a decision. The technical terminology used to describe this last process
of a statistical study is referred to as interpretation.
Ordinal scale:
This measurement scale is similar to the nominal scale but the levels or categories can be ranked or
order.
That is, we can compare levels or categories of the scale.
Therefore, this scale of measurement gives better information on the quantities being measured as
compared to nominal scale. For example, living standard of a family can be poor, medium or higher.
These categories can be ordered as poor is less than medium and medium is less than higher class.
However, the distance or magnitude between the levels, say between poor and medium, is not clearly
known.
Interval scale:
This measurement scale shares the ordering or ranking and labeling properties of ordinal scale of
measurement. Besides, the distance or magnitude between two values is clearly known (meaningful).
However, it lacks a true zero point (i.e., zero point is not meaningful). For example, temperature in
degree centigrade or Fahrenheit of an object. If the temperature of an object is zero degree centigrade, it
doesn’t mean that the object lacks heat. Hence zero is arbitrary point in the scale. It doesn’t make sense
to say that 80° F is twice as hot as 40° F; in centigrade the ratio would be 6; neither ratio is meaningful.
We can do subtraction and addition on interval level data but division and multiplication are impossible.
Ratio scale:
It is the highest level of measurement scale.
It shares the ordering, labeling and meaningful distance properties of interval scale.
In addition, it has a true or meaningful zero point. The existence of a true zero makes the ratio of two
measures meaningful. For instance, if your salary is 1000 birr and your wife’s is 2000 we can say that
your wife earns twice of yours. If you don’t have any source of income, your income is zero in this scale
context and it is meaningful assignment. Other example includes, weight, height e t c.
We can do subtraction, addition, multiplication and division on ration level data.
The more precise variable is ratio variable and the least precise is the nominal variable. Ratio and interval level
data are classified under quantitative variable and, nominal and ordinal level data are classified under qualitative
variable.
3
UNIT TWO: METHODS OF DATA COLLECTION AND PRESENTATION
Objectives:
After completing this unit you should be able to
organize data using frequency distribution.
present data using suitable graphs or diagrams.
2.1 Methods of data collection
Depending on the source, data can be primary or secondary.
Primary data refers to the statistical data which the investigator originates for the purpose of inquiry.
Secondary data refers to data which is not originated by the investigator himself, but which he obtains from
someone else records. Secondary data can be obtained from published or unpublished documents: reports,
journals, magazines, articles e t c.
Primary methods of data collection: It includes data collection using observation, personal interview, self
administered questionnaire, mailed questionnaire etc.
3. Higher ordered tables: results when we have more than two characteristics of classification. For instance,
we can classify the students who took introduction to statistics in 2014 by age, gender and faculty.
For this example, the first class is ‘31-40’. Lower limit of this class = 31; upper limit = 40. The lower class
boundary = 30.5; upper class boundary = 40.5. The width of the class = upper class boundary - lower class
boundary = 40.5-30.5 = 10. The class mark (class mid-point) of this class is (31+40)/2 = 35.5. The values 36,
39, 38 are included in this class. Therefore, the frequency of this class is 3.
Guidelines for constructing a frequency distribution
Find the range of the data
Range =R=Maximum value – Minimum value
Determine the number of classes.
5
Let K be the number of classes and n be the number of observations to be classified. There are two
alternatives to determine K.
1. Choose K to be between 5-15;
2. Use the following formula known as Sturgess’ formula given by:
and round K to the nearest integer.
Determine the class width , let it be W , by using
W=Range/K
Tally the observations, count and assign frequencies to the classes.
Example 2.4: The following data are on the number of minutes to travel from home to work for a group of
automobile workers: 28 25 48 37 41 19 32 26 16 23 23 29 36 31 26 21 32 25 31 43 35 42 38 33
28. Construct a frequency distribution for this data.
Solution:
Range = 48 – 16 =32
K=1+3.322 =5.64≈6
W=32/6=5.33 rounding up to the nearest integer i.e W=6.
Let the lower limit of the first class be 16 then the frequency distribution is as follows:
Class limit Class boundaries Tally Frequency
16-21 15.5-21.5 \\\ 3
22-27 21.5-27.5 \\\\\ \ 6
28-33 27.5-33.5 \\\\\ \\\ 8
34-39 33.5-39.5 \\\\ 4
40-45 39.5-45.5 \\\ 3
46-51 45.5-51.5 \ 1
Total 25
The final frequency distribution is shown in table below.
Definition 2.1: A relative frequency distribution is a distribution which specifies the frequency of a class
relative to the total frequency.
6
Example 2.5: Convert the above absolute frequency distribution in example 2.4 to a relative frequency
distribution.
Solution: First we find the relative frequency of each class. The relative frequency of a class is the frequency of
the class divided by the total number of observations. For instance the relative frequency of the first class is
3/25=0.12, the relative frequency of the second class is 6/25=0.24, and so on. Thus, the relative frequency
distribution is shown in the table below.
Definition 2.2: Cumulative frequency refers to the number of observations that are below a specified value or
that are above a specified value.
Note: Class boundaries are mostly used to obtain cumulative frequencies. Based on whether the observations
are bounded from above or from below, we can have a cumulative less than or a cumulative more than
frequency distributions, respectively.
Example 2.6: Convert the absolute frequency distribution in example 2.4 into:
i) a cumulative less than frequency distribution.
ii) a cumulative more than frequency distribution.
Solution:
i) We use the class boundaries to form cumulative frequencies. For instance, there is no observation
which is less than 15.5, 3 observations are less than 21.5, 9 observations are less than 27.5 and so on.
Thus, the following less than cumulative frequency distribution is obtained.
7
Table: More than cumulative frequency distribution of times
Time (in minute) Cumulative frequency
More than 15.5 25
More than 21.5 22
More than 27.5 16
More than 33.5 8
More than 39.5 4
More than 45.5 1
More than 51.5 0
Note: If class limits are used instead of class boundaries the phrases ‘or more’ and ‘or less’ are used in place of
‘more than’ and ‘less than’, respectively to obtain cumulative frequencies.
Ungrouped frequency distributions (Single-value grouping)
In the previous examples each class that we used for grouping data represented a range of possible values. In
some cases, however, using classes that each represents a single value is more appropriate.
Example 2.7: A demographer is interested in the number of children a family may have. He took a random
sample of 30 families. The following data is the number of children in a sample of 30 families.
4 2 4 3 2 8 3 4 4 2 2 8 5 3 4
4 5 4 3 5 2 7 3 3 6 7 3 8 4 5
To group these data, we will use classes based on the single numerical value.
8
The classes for grouping are ‘Democratic’, ‘Republican’ and ‘Other’.
Frequency polygon: is a graphic form of a frequency distribution. It can be constructed by plotting the class
frequencies against class marks and joining them by a set of line segments.
Note: we should add two classes with zero frequencies at the two ends of the frequency distribution to complete
the polygon.
Example 2.10: Construct a frequency polygon for the frequency distribution of the time spent by the
automobile workers that we have seen in example 2.4.
9
Figure: Distribution of number of minutes spent by the automobile workers
Example 2.11: Draw a bar chart for the following coffee production data.
Table: Coffee productions from 1990 to 1995.
120
100
80
60
40
20
0
1990 1991 1992 1993 1994 1995
Production year
Pie-chart: it is a circle divided by radial lines into sections or sectors so that the area of each sector is
proportional to the size of the figure represented.
Pie-chart construction:
Calculate the percentage frequency of each component. It is .
Example 2.14: The following data are the blood types of 50 volunteers at a blood plasma donation clinic:
O A O AB A A O O B A O A AB B O O O A B A A O A A B O B A O AB A O O
A B A A A O B O O A O A B O AB A O
a) Organize this data using a categorical frequency distribution
b) Present the data using both a pie and a bar chart.
Solution
a) The classes of the frequency distribution are A, B, O, AB. Count the number of donors for each of the
blood types.
A 19 38.0
B 8 16.0
O 19 38.0
AB 4 8.0
Total 50 100.0
b) Pie chart
Find the percentage of donors for each blood type. In order to find the angles of the sector for each blood
type, multiply the corresponding percentage by 3600 and divide by 100.
11
Blood type
A
B
O
AB
15
10
0
A B O AB
Blood type
12
UNIT THREE: MEASURES OF CENTERAL TENDENCY
Objectives:
Having studied this unit, you should be able to:
understand the role of descriptive statistics in summarization, description and interpretation of data.
use several numerical methods belonging to measures of central tendency to describe the characteristics
of a data set.
Note that the individual values of the distribution must have a tendency to cluster around an average. In view of
this requirement an average is also referred to as a measure of central tendency.
An average (a measure of central tendency) is considered satisfactory if it possesses all or most of the following
properties. An average should be
based on all the observed values.
simple to understand and easy to interpret.
calculable by reasonable ease and rapidity.
easily manipulated algebraically.
little affected by fluctuations of sampling.
should not unduly be influenced by extreme values.
it should be defined rigidly which means that it should have a definite value.
Objectives of measuring central tendency:
To get one single value that describes the characteristics of the entire group.
To facilitate comparison between different data sets.
where 1= lower limit and n = upper limit. When no confusion can result we
shall often denote the above sum simply by . Using this notation we may have also
13
Rules of summation
and
Definition 3.2:
i) Let be the values of the variable X. The simple arithmetic mean denoted by is the
sum of these observations of X divided by the no values.
ii) If the numbers occur with frequencies , respectively. Then mean can be
Note that if the data refers to a population data the mean is denoted by the Greek letter µ (read as mu).
Arithmetic mean for raw data (ungrouped data)
Example 3.1: The following data is the weight (in Kg) of eight youths: 32,37,41,39,36,43,48 and 36. Calculate
the arithmetic mean of their weight.
Solution:
Example 3.2: The ages of a random sample of patients in a given hospital in Ethiopia is given below:
Age 10 12 14 16 18 20 22
Number of patients 3 6 10 14 11 5 4
Calculate the average age of these patients.
Solution:
Age (xi) Number of patients (fi)
10 3 30
12 6 72
14 10 140
14
16 14 224
18 11 198
20 5 100
22 4 88
Total 53 852
Example 3.3: The GPA or CGPA of a student is a good example of a weighted arithmetic mean. Suppose that
Solomon obtained the following grades in the first semester of the freshman program at AASTU in 2006.
Course Credit hour (wi) Grade
Math101 4 A=4
Stat2091 3 C=2
Chem101 3 B=3
Phys101 4 B=3
Flen101 3 C=2
Find the GPA of Solomon.
Example 3.4: In a vacancy for a position of botanist in an organization, the criteria of selection were work experience,
entrance exam, and, interview result. The relative importance of these criteria was regarded to be different. The
weights of these criteria and the scores obtained by 3 candidates (out of 100 in each criterion) are given in the following
table. In addition, the selection of a candidate is based on average result on these criteria.
Criterion Weight Candidates
Tesfaye Gutema Kedir
Work experience 4 70 89 85
Entrance exam 3 78 83 89
Interview result 2 90 92 90
15
Who is the appropriate candidate for the position based on the criteria?
Solution: We use the weighted mean since the relative importances of these criteria are different.
Criterion Weight Candidates
Tesfaye Gutema Kedir
xi xiwi xi xiwi xi xiwi
Work experience 4 70 280 89 356 85 340
Entrance exam 3 78 234 83 249 89 267
Interview result 2 90 180 92 184 90 180
Total 9 238 694 264 789 264 787
The weighted mean and the simple arithmetic mean for the applicants are as follows:
Applicant Tesfaye Gutema Kedir
Weighted mean 694/9=77.11 789/9=87.67 787/9=87.44
Simple arithmetic mean 238/3=79.33 264/3=88 264/3=88
If we use the simple arithmetic mean of the scores, both Gutema and Kedir have got equal chances to be
recruited. However, the relative importance of the criteria is different. So we have to use the weighted mean for
discriminating among the candidates. The weighted mean of the scores obtained by Gutema is larger than the
others. So Gutema should be recruited for the job.
Properties of arithmetic mean
i. It can be computed for any set of numerical data, it always exists, and unique.
ii. It depends on all observations.
iii. The sum of deviations of the observations about the mean is zero i.e.
b)
Day Number of cases
1st 12
2nd 24=2 12
3rd 48=2
4th 96=2
5th 192=2
Example 3.7: A company’s year-to-year changes in fuel consumption expenditures were 5, 10, 20, 40 and 60
percent. Determine the average yearly percent change in expenditure.
Solution: The 1st, 2nd ,3rd , 4th , and 5th growth rates are 105 %,110%,120%,140%, and 160%, respectively.
Average growth rate=
The average percentage change=125.43% -100%=25.43%
Example 3.8: The rate of increase in population of a country during the last three decades is 5 percent, 8
percent, and 12 percent. Find the average rate of growth during the last three decades.
Solution: Since the data is given in terms of percentage, therefore geometric mean is a more appropriate
measure. The calculations of geometric mean are shown in the following table.
Decade Rate of increase Population at the end of decade (x) taking proceeding
in population (in %) decade as 100
1 5 105
2 8 108
3 12 112
Average growth rate=
Hence the average rate of increase in population over the last three decades is 108.2-100=8.2 percent.
i. If the number of observations, say n, is odd then the median is equal to the observation of
the array.
ii. If the number of observations n is even then the median is equal to the sum of observation
17
Notation: If X is the variable under consideration, then is used to denote the median.
Example 3.9: Find the median for the following sets of data:
i. 10 5 7 9 6 5 4
Solution: First arrange the data in the form of an array.
4 5 5 6 7 9 10
Here we have n=7 which is odd
ii. 10 5 7 9 6 5 4 8
Solution: Arrange the data in ascending order.
4 5 5 6 7 8 9 10
Here n=8 which is even.
Therefore,
iii. A shop keeper (sales person) recorded the number of video cassette recorders (VCRs) sold per
month over a two year period. Find the median number VCRs sold.
Number of sets sold Frequency ( months) Cumulative frequency
1 3 3
2 8 11
3 5 16
4 4 20
5 2 22
6 1 23
7 1 24
Properties of median
It is an average of position.
It is affected by the number of observations than by extreme values.
The sum of the deviations about the median, signs ignored, is less than the sum of deviations taken from
any other value or specific average.
3.3.4 The mode
Definition 3.6: The mode (modal value) of an observed set of data is the value that occurs the largest number of
times.
18
ii. 1 2 3 4 8 2 5 4 6. In this data,the modal values are 2 and 4 since both 2 and 4 appear most
frequently and they occur equal number of times. These kind distributions are called bimodal
distribution.
iii. 1 2 4 3 5 6 8 7 In this data set, all values appear equal number of times so there is no modal
value.
Note:
If a distribution has more than two modal values then we call the distribution multimodal.
If in a set of observed values, all values occur once or equal number of times, there is no mode.
The mode is also useful in finding the most typical case when the data are nominal or categorical.
Example 3.11: A survey showed the following distribution for the number of students enrolled in each field. Find the
mode.
Field Number of students
Civil 850
Electrical 825
Computer sciences 645
Mechanical 478
Chemical 100
Solution: Since the category with the highest frequency is Civil, the most typical case is a Civil.
19
Properties of modal value
It is easy to calculate and understand.
It is not affected by extreme values.
It is not based on all observations.
Is not used in further analysis of data.
The mean, median, and mode of grouped data
The mean for grouped data can be found by considering the values in the interval are centered at the mid-point
of the interval.
Example 3.12: Consider the frequency distribution of the time spent by the automobile workers. Find the mean
time spent by these workers from this frequency distribution.
Time (in minute) Class mark (xi) Number of workers f×xi
15.5- 21.5 18.5 3 55.5
21.5-27.5 24.5 6 147
27.5-33.5 30.5 8 244
33.5-39.5 36.5 4 146
39.5-45.5 42.5 3 127.5
45.5-51.5 48.5 1 48.5
Total 25 768.5
Solution:
Note: In case of grouped data if any class interval is open, arithmetic mean cannot be calculated.
The median for grouped data can be approximated by the following formula.
The mode for grouped data can be estimated by the following formula.
The modal value is denoted by . For grouped data we can compute the mode as follows:
Deciles are nine points which divide an array into 10 parts in such a way that each part contains equal number
of elements. The 1st, 2nd,…, and the 9th points are known as the 1st, 2nd,…, and the 9th deciles and are usually
denoted by D1,D2,…,D9, respectively.
Percentiles are 99 points which divide an array into 100 parts in such a way that each part consists of equal
number of elements. The 1st, 2nd… and the 99th points are known as the 1st, 2nd… and the 99th percentiles and are
usually denoted by P1, P2… P99, respectively.
Note: The array should be in ascending order in order to get the quantiles.
i. Quantile points for raw data
First form an array in an ascending order and then apply the following procedure.
21
Example 3.15: The following data relate to sizes of shoes sold at a stock during a week. Find the quartiles, the
seventh decile and the 90th percentile.
Size of shoes 5 5.5 6 6.5 7 7.5 8 8.5 9 9.5
Number of pairs 2 5 15 30 60 40 23 11 4 1
Solution: The total number of observations is 191.
22
Q1=P25
Q2=P50=D5=
Q3=P75
D1=P10; D2=P20 …D9=P90.
23
UNIT FOUR: MEASURES OF VARIATION
Objectives:
Having studied this unit, you should be able to
understand the importance of measuring the variability (dispersion) in a data set.
measure the scatter or dispersion in a data set.
understand ‘moments’ as a convenient and unifying method for summarizing several descriptive
statistical measures.
measure the extent to which the distribution of values in a data set deviate from symmetry.
4.1 Introduction and objectives of measuring variation
We have seen that averages are representatives of a frequency distribution. But they fail to give a complete
picture of the distribution. They do not tell anything about the spread or dispersion of observations within the
distribution. Suppose that we have the distribution of yield (kg per plot) of two rice varieties from 5 plots each.
Variety 1: 45 42 42 41 40
Variety 2: 54 48 42 33 30
The mean yield of both varieties is 42 kg. The mean yield of variety 1 is close to the values in this variety. On
the other hand, the mean yield of variety 2 is not close to the values in variety 2. The mean doesn’t tell us how
the observations are close to each other. This example suggests that a measure of central tendency alone is not
sufficient to describe a frequency distribution. Therefore, we should have a measure of spreads of observations.
There are different measures of dispersion. In this chapter we shall discus the most commonly used measure of
dispersion or variation like Range, Quartile Deviation, Standard Deviation, coefficient of variation. And
measure of shape such as skewness and kurtosis.
Objectives of measuring variation
To describe dispersion (variability) in a data.
To compare the spread in two or more distributions.
To determine the reliability of an average.
Note: The desirable properties of good measures of variation are almost identical with that of a good measure of
central tendency.
4.2 Absolute and relative measures
Measures of variation may be either absolute or relative. Absolute measures of variation are expressed in the
same unit of measurement in which the original data are given. These values may be used to compare the
variation in two distributions provided that the variables are in the same units and of the same average size.
In case the two sets of data are expressed in different units, however, such as quintals of sugar versus tones of
sugarcane or if the average sizes are very different such as manager’s salary versus worker’s salary, the absolute
measures of dispersion are not comparable. In such cases measures of relative dispersion should be used. A
measure of relative dispersion is the ratio of a measure of absolute dispersion to an appropriate measure of
central tendency. It is a unitless measure.
4.3 Types of measures of variation
The range and relative range
Definition 4.1: Range is defined as the difference between the maximum and minimum observations in a set of
data.
Range is the crudest absolute measures of variation. It is widely used in the construction of quality control
charts and description of daily temperature.
24
Definition 4.2: Relative range (RR) is defined as
Definition 4.3: The variance is the average of the squares of the distance each value is from the mean. The
symbol for the population variance is σ2 (σ is the Greek lower case letter sigma). Let x1,x2,…,xN be the
measurements on N population units then, the population variance is given by the formula:
Definition 4.4: The standard deviation is the square root of the variance. The symbol for the population standard
deviation is The corresponding formula for the standard deviation is
Example 4.1: The height of members of a certain committee was measured in inches and the data is presented
below.
Height(x): 69 66 67 69 64 63 65 68 72
2 -1 0 2 -3 -4 -2 1 5
4 1 0 4 9 16 4 1 25
And
Definition 4.5: The sample variance is denoted by S2, and its formula is
Definition 4.6: The sample standard deviation, denoted by S, is the square root of the sample variance
Example 4.2: For a newly created position, a manager interviewed the following numbers of applicants each
day over a five-day period: 16, 19, 15, 15, and 14. Find the variance and standard deviation.
Solution:
25
Note that the procedure for finding the variance and standard deviation for grouped data is similar to that for
finding the mean for grouped data, and it uses the mid-points of each class.
Properties of variance
The unit of measurement of the variance is the square of the unit of measurement of the observed
values. It is one of its limitations.
The variance gives more weight to extreme values as compared to those which are near to mean value,
because the difference is squared in variance.
It is based on all observations in the data set.
Properties of standard deviation
Standard deviation is considered to be the best measure of dispersion and is used widely.
There is, however, one difficulty with it. If the unit of measurement of variables of two series is not the
same, then their variability cannot be compared by comparing the values of standard deviation.
Uses of the variance and standard deviation
The variance and standard deviations can be used to determine the spread of data, consistency of a
variable and the proportion of data values that fall within a specified interval in a distribution.
If the variance or standard deviation is large, the data is more dispersed. This information is useful in
comparing two or more data sets to determine which is more (most) variable.
Finally, the variance and standard deviation are used quite often in inferential statistics.
Coefficient of variation (CV)
The standard deviation is an absolute measure of dispersion. The corresponding relative measure is known as
the coefficient of variation (CV).
Coefficient of variation is used in such problems where we want to compare the variability of two or more
different series. Coefficient of variation is the ratio of the standard deviation to the arithmetic mean, usually
expressed in percent:
Since the CV of Computer science Department students is greater than that of Earth science Department
students, we can say that there is more dispersion in the distribution of Computer science students’ scores
compared with that of Earth science.
Example 4.4: The mean weight of 20 children was found to be 30 kg with variance of 16kg2 and their mean
height was 150 cm with variance of 25cm2. Compare the variability of weight and height of these children.
26
The weight of the children is more variable than their height.
2.2.1 Standard score
A standard score is a measure that describes the relative position of a single score in the entire distribution of
scores in terms of the mean and standard deviation. It also gives us the number of standard deviations a
particular observation lie above or below the mean.
Population standard score: where is the value of the observation, and are the mean and
standard deviation of the population respectively.
Sample standard score: where is the value of the observation, and are the mean and standard
deviation of the sample respectively.
Interpretation:
Example 4.5: Two sections were given an exam in a course. The average score was 72 with standard deviation
of 6 for section 1 and 85 with standard deviation of 5 for section 2. Student A from section 1 scored 84 and
student B from section 2 scored 90. Who performed better relative to his/her group?
Solution: Section 1: = 72, = 6 and score of student A from Section 1; = 84
Section 2: = 85, = 5 and score of student B from Section 2; = 90
Z-score of student A:
Z-score of student B:
From these two standard scores, we can conclude that student A has performed better relative to his/her section
students because his/her score is two standard deviations above the mean score of selection 1 while the score of
student B is only one standard deviation above the mean score of section 2 students.
Example 4.6: A student scored 65 on a calculus test that had a mean of 50 and a standard deviation of 10; she
scored 30 on a history test with a mean of 25 and a standard deviation of 5. Compare her relative positions on
each test.
Solution: First, find the z-scores.
For calculus the z-score is
Since the z-score for calculus is larger, her relative position in the calculus class is higher than her relative
position in the history class.
4.4 Moments, skewness and kurtosis
Moments
Definition 4.7: The average of deviations from an arbitrary origin raised to an integral power of the observations
of a distribution is defined as a moment. Let x1,x2,…,xn be observations, we define the r-th moment about A as:
27
The most known moments are moments about the mean also known as the central moments and the moments
about zero (also known as moments about the origin.)
The rth moment about the mean, µr, is given by:
Special cases: , ,
Measures of skewness
i) Pearsonian coefficient of skewness (Pcsk) defined as:
Interpretation:
Note: in a negatively skewed distribution larger values are more frequent than smaller values. In a positively
skewed distribution smaller values are more frequent than larger values.
Example 4.7: If the mean, mode and s.d of a frequency distribution are 70.2, 73.6, and 6.4, respectively. What
can one state about its skeweness?
28
Kurtosis: it refers to the degree of peakedness of a distribution.
When the values of a distribution are closely bunched around the mode in such a way that the peak of the
distribution becomes relatively high, the distribution is said to be leptokurtic. If it is flat topped we call it
platykurtic. A distribution which is neither highly peaked nor flat topped is known as a meso-kurtic distribution
(normal).
Measures of kurtosis
where
Interpretation:
29
UNIT FIVE: ELEMENTARY PROBABILITY
Objectives:
Having studied this unit, you should be able to
understand the elements of probability
calculate some probabilities of events associated with random experiments
apply the concept of probability in some biological phenomena
5.1 Introduction
Why it is that science is not always certain? Nature is complex and full of unexplained variability. In addition,
almost all methods of observation and experiment are imperfect. Observers are subject to human bias and error.
Science is a continuing story; subjects vary; measurements fluctuate. Biomedical science, in particular, contains
controversy and disagreement; with the best of intentions, biomedical data, medical histories, physical
examinations, interpretations of clinical tests, descriptions of symptoms and diseases are somewhat inexact. But
most important of all, we always have to deal with incomplete information: It is either impossible, or too costly,
or too time consuming, to study the entire population; we often have to rely on information gained from a
sample, that is, a subgroup of the population under investigation. So some uncertainty almost always prevails.
Science and scientists cope with uncertainty by using the concept of probability. By calculating probabilities,
they are able to describe what has happened and predict what should happen in the future under similar
conditions. In short, more often the quantities we are interested in will not be predictable in advance but, rather,
will exhibit an inherent variation. Probability and statistics are concerned in the quantification of such quantities
( or random phenomena).
Definition 5.2:
Sample point (outcome): The individual result of a random experiment.
Sample space: The set containing all possible sample points (out comes) of the random experiment. The
sample space is often called the universe and denoted by S.
Event: The collection of outcomes or simply a subset of the sample space. We denote events with capital
letters, A, B, C, etc.
30
Review of set theory
Concepts of set theory are important in understanding probability. Given A, B and C are events associated with
a sample space S and ω represents an elementary event (outcome) in S, then the following are some useful
definitions and results in set theory.
Definitions 5.3:
1. Union: The union of A and B, A u B, is the event containing all sample points in either
A or B or both. Sometimes we use A or B for union.
2. Intersection: The intersection of A and B, A n B, is the event containing all sample points that are both in
A and B. Sometimes we use AB or A and B for intersection.
3. Subset: If for any ω A, then ω B. Then .
4. Empty set: If a set A contains no points, it will be called the null set, or empty set, and denoted by .
5. Complement: The complement of a set A denoted by Ac is the set where ω S, ω Ac but, ω .
6. Mutually Exclusive Events: Two events are said to be mutually exclusive (or disjoint) if their
intersection is empty. (i.e. A n B = ). Subsets A1, A2,… are defined to be mutually exclusive if A i n Aj =
for every i ≠ j.
In short, to assign probabilities for an event, we might need to enumerate the possible outcomes of a random
experiment and need to know the number of possible outcomes favoring the event. The following principles
will help us in determining the number of possible outcomes favoring a given event.
Example 5.3: Suppose one wants to purchase a certain commodity and that this commodity is on sale in 5
government owned shops, 6 public shops and 10 private shops. How many alternatives are there for the person
to purchase this commodity?
Solution: Total number of ways =5+6+10=21 ways
31
Example 5.4: If we can go from Addis Ababa to Rome in 2 ways and from Rome to Washington D.C. in 3
ways then the number of ways in which we can go from Addis Ababa to Rome to Washington D.C. is 2x3
ways or 6 ways. We may illustrate the situation by using a tree diagram below:
W
R W
W
A
W
R
W
Example 5.5: If a test consists of 10 multiple choice questions, with each permitting 4 possible answers, how
many ways are there in which a student gives his/her answers?
Solution: There are 10 steps required to complete the test.
First step: To give answer to question number one. He/she has 4 alternatives.
Second step: To give answer to question number two, he/she has 4 alternatives……
Last step: To give answer to last question, he/she has 4 alternatives.
Therefore, he/she has 4x4x4x…x4=410 ways or1, 048, 576 ways of completing the exam. Note that there is only
one way in which he /she can give correct answers to all questions and that there are 3 10 ways in which all the
answers will be incorrect.
Example 5.6: A manufactured item must pass through three control stations. At each station the item is
inspected for a particular characteristic and marked accordingly. At the first station, three ratings are possible
while at the last two stations four ratings are possible. Hence there are 48 ways in which the item may be
marked.
Example 5.7: Suppose that car plate has three letters followed by three digits. How many possible car plates are
there, if each plate begins with a H or an F?
2x 26x 26x 10x 10x 10 or 1, 352, 000 different plates.
Definition 5.4: If n is a positive integer, we define n!= n(n-1)(n-2)…1 and call it n-factorial and 0!=1.
Permutations
Suppose that we have n different objects. In how many ways, say nPn, may these objects be arranged
(permuted)? For example, if we have objects a, b and c we can consider the following arrangements: abc, acb,
bac, bca, cab, and cba. Thus the answer is 6. The following theorem gives general result on the number of such
arrangements.
32
Example 5.8: Suppose that we have five letters a, b, c, d.
i) What is the number of possible arrangements of these letters taken all at a time?
ii) What is the number of possible arrangements of these letters if we use only three of the letters at a time?
Solution:
i) Using (i) of theorem 5.4, we have 4! ways of arranging the 4 letters, i.e. we have 24 possible
arrangements.
ii) Using (ii) of theorem 5.4, we have 4P3 ways of arranging 3 letters taken from the four letters, i.e. we have
24 possible arrangements.
Example 5.9: In a class with 8 boys and 8 girls
i) In how many ways can the children line up if they alternate girl-boy-girl-boy-... ?
ii) In how many ways can the children line up so that no two of the same sex are next to each other?
Solution:
i) The 8 girls can line-up in 8! ways, and likewise the 8 boys can line-up in 8! ways. For any single
arrangement of the girls, all possible arrangements of the boys are possible, thus by multiplication
principle we have 8!x 8! ways to arrange the children in girl-boy lines.
ii) Now we must include the case of boy-girl. So we have 2x8!x 8! ways of arranging.
Example 5.10: If I have 5 different books on my shelf, in how many ways can I arrange these books? Solution:
We can arrange the books in 5! different ways or 5x4x3x2x1 ways or 120 ways.
Remarks
i) The number of permutations of n distinct objects arranged in a circle is (n-1)!.
This is because we consider two permutations the same if one is a rotation of the other. For n objects arranged
around a circle, there a n rotations that give the same permutation. Dividing n! by n gives (n - 1)!. The two
circular permutations below are considered the same; their order is a, b, c, d, e.
33
.
Combinations
Consider n different objects. This time we are concerned with counting the number of ways we may choose r
out of these n objects without regard to order. For example, we have the objects a, b, c and d, and r=2; we wish
to count ab, ac, ad, bc, bd, and cd. In other words, we do not count ab and ba since the same objects are
involved and only the order differs.
There are many problems in which we are interested in determining the number of ways in which r objects can
be selected from n distinct objects without regard to the order in which they are selected. Such selections are
called combinations or r-sets. It may help to think of combinations as committees. The key here is without
regard for order.
To obtain the general result we recall the formula derived above: the number of ways of choosing r objects out
of n and permuting the chosen r equals n!/(n-r)!. Let C be the number of ways of choosing r out of n,
disregarding order. C is the number required. Note that once the r items have been chosen, there are r! ways of
permuting them. Hence applying the multiplication principle again, together with the above result, we obtain
C.r! = n!/(n-r)!. Therefore, . This number arises in many contexts in mathematics and hence a
special symbol is used for it. We shall write
.
Example 5.12: How many different committees of 3 can be formed from Hawa, Segenet, Nigisty and Lensa?
Solution: The question can restated in terms of subsets from a set of 4 objects, how many subsets of 3 elements
are there? In terms of combinations the question becomes, what is the number of combinations of 4 distinct
objects taken 3 at a time? The list of committees:{H,S,N}, {H,S,L}, {H,N,L}, {S,N,L}.Therefore, we have 4C3
or 4 possible number of committees.
Example 5.13:
(i) A committee of 3 is to be formed from a group of 20 people. How many different committees are possible?
(ii) From a group of 5 men and 7 women, how many different committees consisting of 2 men and 3 women can
be formed?
Solution: (i) There are possible committees.
34
5.4 Probability of an event
Definition 5.5: The Axioms of Probability
Probabilities are real numbers assigned to events (or subsets) of a sample space. We can think of the
assignment of probabilities to events, or probability measure, as a function between the collection of
subsets of the sample space and the real numbers. Mathematically, a probability measure P for a random
experiment is a real-valued function defined on the collection of events that satisfies the following axioms:
Axiom 1: The probability of an event is a nonnegative real number; that is, P(A) ≥ 0 for any subset A of S.
Axoim 2: P(S) = 1
Axiom 3: If A1, A2, A3 ... is a finite or infinite sequence of mutually exclusive
events of S, then P(A1 u A2 u A3 u ...) = P( A1) + P( A2) + P( A3) + ...=
It is rather surprising that with only these three axioms, we can construct the "entire" theory of probability! The
next theorems and definitions help in assigning probabilities of events.
Theorem 5.6 :If A is an event in a discrete sample space S, then P(S) equals the sum of the probabilities of
the individual outcomes comprising A.
Theorem 5.7: Suppose that we have a random experiment with sample space S and probability function P
and A and B are events. Then we have the following results:
i) P( ) = 0
ii) P(Ac) = 1 − P(A)
iii) P(B n Ac) = P(B) − P(A n B)
iv) If A subset of B then P(A) ≤ P(B).
Definition 5.6: The classical definition of probability
If an experiment can result in any one of N equally likely and mutually exclusive outcomes, and if n of
these outcomes constitute the event A, then the probability of event A is .
Example 5.14: Consider the experiment of tossing a fair die. A fair die means that all six numbers are equally
likely to appear. Calculate the probabilities of the following events:
a) A=One will occur ={1}
b) B=Even number will occur ={2, 4, 6}
c) C=Odd number will occur ={1, 3, 5}
d) D=A number less than 3 will occur ={1,2}
Solution:
a) Since the die is fair
Example 5.15: Suppose that we toss two coins, and assume that each of the four outcomes in the sample space
S = {(H,H),(H, T ), (T ,H), (T , T )} are equally likely and hence has probability ¼. Let A = {(H, H),(H, T )} and
B = {(H,H), (T ,H)} that is, A is the event that the first coin falls heads, and B is the event that the second coin
falls heads. Then, calculate the probabilities of A, B, A c, Bc, and Sc. The event that none of the outcomes will
occur is the same as Sc.
35
Solution:
Example 5.16: From a group of 5 men and 7 women, it is required to form a committee of 5 persons. If the
selection is made randomly, then
i) what is the probability that 2 men and 3 women will be in the committee?
ii) what is the probability that all members of the committee will be men?
iii) what is the probability that at least three members will be women?
Solution: The total number of possible committees is , i.e. the number of possible out comes
in the sample space is 792.
i) Let A be the event that the committee will consist of two 2 men and 3 women. We need to know the
number of possible outcomes favoring this event. The number of ways we can select 2 men from 5
men is and the number of ways of selecting 3 women out of 7 women is
ii) Let B be the event that all members of the committee will be men. Hence
iii) Let C be the event that at least three of the committee members will be women.
Basically, three different compositions of committee members can be formed in terms of sex: 3
women and 2 men, 4 women and 1 man, and all are women. Hence the number of possible outcomes
favoring event C using the principle of combination together with the addition principle is
.
Therefore,
36
of A is P(A) ≈ nA/n.
The above definition of probability is based on empirical data accumulated through time or based on
observations made from repeated experiments for a large number of times.
5.5 Some probability rules
Theorem 5.8: If A and B , then P(A u B) = P(A) + P(B) − P(A n B).
Example 5.18: Sixty percent of the families in a certain community own their own car, thirty percent own their
own home, and twenty percent own both their own car and their own home. If a family is randomly chosen,
a) what is the probability that this family do not have a car?
b) what is the probability that this family owns a car or a house?
c) what is the probability that this family owns a car or a house but not both?
d) what is the probability that this family owns only a house?
e) what is the probability that this family neither owns a car nor a house?
Solution: Let A represents that the family owns a car and B represents that the family owns a house. Given
information: P(A)=0.6,P(B)=0.3, and P(AnB)=0.2.
a) Required: P(Ac) = ?
P(Ac)=1-P(A) = 1-0.6 = 0.4
b) Required: P(AUB) = ?
P(AUB) = P(A)+P(B)-P(AnB) = 0.6+0.3-0.2 = 0.7
c) Required: P((AnBc)U(AcnB)) = ?
P((AnBc)U(AcnB)) = P(AnBc)+P(AcnB) = [P(A)-P(AnB)]+[P(B)-P(AnB)]
= [0.6-0.2]+[0.3-0.2]=0.5
d) Required: P(A nB) =?
c
37
5.6 Conditional probability and independence
Conditional Probability
Conditional probability provides us with a way to reason about the outcome of an experiment, based on partial
information. Here are some examples of situations we may have in our mind:
(a) What is the probability that a person will be HIV-Positive given he has tuberculosis?
(d) A spot shows up on a radar screen. How likely is it that it corresponds to an aircraft?
In more precise terms, given an experiment, a corresponding sample space, and a probability law, supposes that
we know that the outcome is within some given event B. We wish to quantify the likelihood that the outcome
also belongs to some other given event A. We thus seek to construct a new probability law, which takes into
account this knowledge and which, for any event A, gives us the conditional probability of A given B, denoted
by P(A|B).
Definition 5.8: If P(B) > 0, the conditional probability of A given B, denoted by P(A|B), is
Example 5.19: Suppose cards numbered one through ten are placed in a hat, mixed up, and then one of the
cards is drawn at random. If we are told that the number on the drawn card is at least five, then what is the
conditional probability that it is ten?
Solution: Let A denote the event that the number on the drawn card is ten, and B be the event that it is at least
five. The desired probability is P(A|B).
Example 5.20: A family has two children. What is the conditional probability that both are boys given that at
least one of them is a boy? Assume that the sample space S is given by S = {(b, b), (b, g), (g, b), (g, g)}, and all
outcomes are equally likely. (b, g) means, for instance, that the older child is a boy and the younger child is a
girl.
Solution: Letting A denote the event that both children are boys, and B the event that at least one of them is a
boy, then the desired probability is given by
Law of Multiplication
The defining equation for conditional probability may also be written as:
P(AnB) = P(B) P(A|B)
This formula is useful when the information given to us in a problem is P(B) and P(A|B) and we are asked to
find P(AnB). An example illustrates the use of this formula. Suppose that 5 good fuses and two defective ones
have been mixed up. To find the defective fuses, we test them one-by-one, at random and without replacement.
What is the probability that we are lucky and find both of the defective fuses in the first two tests?
38
Example 5.21: Suppose an urn contains seven black balls and five white balls. We draw two balls from the urn
without replacement. Assuming that each ball in the urn is equally likely to be drawn, what is the probability
that both drawn balls are black?
Solution: Let A and B denote, respectively, the events that the first and second balls drawn are black. Now,
given that the first ball selected is black, there are six remaining black balls and five white balls, and so P(B|A)
= 6/11. As P(A) is clearly 7/12 , our desired probability is
Independence
We have introduced the conditional probability P(A|B) to capture the partial information that event B provides
about event A. An interesting and important special case arises when the occurrence of B provides no
information and does not alter the probability that A has occurred, i.e., P(A|B) = P(A). When the above equality
holds, we say that A is independent of B. Note that by the definition P(A|B) = P(A ∩ B)/P(B), this is equivalent
to P(A ∩ B) = P(A)P(B).
Example 5.22: A basket contains 2 black and 2 white balls. Consider selecting two balls at random in two
ways:
a) selecting a second ball without replacing first selected ball in the basket
b) selecting a second ball after replacing first selected ball in the basket
Let A and B represents black ball will be selected in the first and second selection, respectively. In which of the
two ways are A and B independent?
Solution:
First way (a):
S = {B1B2, B2B1, B1W1, B1W2 , B2W1, B2W2, W1W2 , W1B1, W2B1 , W1B2, W2B2,W2W1}
A = {B1B2, B2B1, B1W1, B1W2 , B2W1, B2W2 }
B = {B1B2, B2B1, W1B1, W2B1 , W1B2, W2B2 }
A n B = {B1B2, B2B1}
Hence
39
UNIT SIX: PROBABILITY DISTRIBUTIONS
Objectives:
Having studied this unit, you should be able to
compute probabilities of events using the concept of probability distributions.
compute expected values and variances of random variables.
apply the concepts of probability distributions to real-life problems.
Introduction
In many applications, the outcomes of probabilistic experiments are numbers or have some numbers associated
with them, which we can use to obtain important information, beyond what we have seen so far. We can, for
instance, describe in various ways how large or small these numbers are likely to be and compute likely
averages and measures of spread. For example, in 3 tosses of a coin, the number of heads obtained can range
from 0 to 3, and there is one of these numbers associated with each possible outcome. Informally, the quantity
“number of heads” is called a random variable, and the numbers 0 to 3 its possible values. The value of a
random variable is determined by the outcome of the experiment. Thus, we may assign probabilities to the
possible values of the random variable.
Example 6.1: Consider an experiment of tossing two fair coins. Letting X denote the number of heads
appearing on the top face, then X is a random variable taking on one of the values 0, 1, 2 . The random variable
X assigns a 0 value for the outcome (T,T), 1 for outcomes (T ,H) and (H, T ), and 2 for the outcome (H,H).
Thus, we can calculate the probability that X can take specific value/s as follows:
40
P(X = 0) = P({(T , T )}) = ¼
P(X = 1) = P({(T ,H),(H, T )}) = 2/4,
P(X = 2) = P({(H,H)}) = ¼
The table below shows the probability mass function X.
X 0 1 2
PX(x) ¼ 2/4 ¼
We can justify that PX(x) is probability mass function.
PX(x)≥0 for x=0,1,2 and
P(X = 0) + P(X = 1)+ P(X = 2) = ¼ + 2/4 + ¼=1
Suppose we are interested to calculate the probability that X≥1. The values of X which are greater than or equal
to 1 are 1 and 2. Thus, the probability that X is greater than or equal to 1, denoted P(X≥1), is found as P(X≥1)
= P(X = 1) + P(X = 2)=3/4.
We can use the probability density function to calculate probabilities of events expressed in terms of the random
variable X. For instance, if we are interested in the probability that X lies between two points, say a and b, we
can find it using integration of fX(x) on the interval [a,b],i.e.
where c is a constant.
iii) The probability that a continuous random variable X will assume a value in a closed intervals is the same
as the probability that it will assume in open interval or half open intervals, i.e. , P(a≤X≤b) = P(a<X<b) =
P(a≤X<b) = P(a<X≤b), P(X≤c) = P(X<c) , P(X≥c) = P(X>c) where a, b, and c are constants.
41
It is useful to view the mean of X as a “representative” value of X, which lies somewhere in the middle of its
range. We can make this statement more precise, by viewing the mean as the center of gravity of the
distribution.
Variance
Definition 6.4: The variance of a random variable X denoted V(X) or σ 2 is defined as V(X)=E[(X- μ)2] =
E(X2) – μ2.
i) if X is discrete,
ii) if X is continuous,
The variance provides a measure of dispersion of X around its mean. Another measure of dispersion is the
standard deviation of X, which is defined as the square root of the variance and is denoted by σ.
Example 6.2: Calculate the mean and variance of the random variable X in example 6.1.
The mean and variance of the binomial distribution are np and np(1-p), respectively. Note that the binomial
distributions are used to model situations where there are just two possible outcomes, success and failure. The
following conditions also have to be satisfied.
i) There must be a fixed number of trials called n
ii) The probability of success (called p) must be the same for each trial.
iii) The trials must be independent
Example 6.3: A fair coin is flipped 4 times. Let X be the number of heads appearing out of the four trials.
Calculate the following probabilities:
i) 2 heads will appear
ii) No head will appear
iii) At least two heads will appear
iv) Less than two heads will appear
v) At most heads 2 will appear
Solution: We can consider that the outcomes of each trial are independent to each other. In addition the
probability that a head will appear in each trial is the same. Thus, X has a binomial distribution with number of
42
trials 4 and probability of success (the occurrence of head in a trial) is ½. The probability mass function of X is
given by
, Note that n = 4 and p = 1/2
i)
ii)
iii)
iv)
v)
Example 6.4: Suppose that a particular trait of a person (such as eye color or left handedness) is classified on
the basis of one pair of genes and suppose that d represents a dominant gene and r a recessive gene. Thus a
person with dd genes is pure dominance, one with rr is pure recessive, and one with rd is hybrid. The pure
dominance and the hybrid are alike in appearance. Children receive one gene from each parent. If, with respect
to a particular trait, two hybrid parents have a total of four children, what is the probability that exactly three of
the four children have the outward appearance of the dominant gene?
Solution: If we assume that each child is equally likely to inherit either of two genes from each parent, the
probabilities that the child of two hybrid parents will have dd, rr, or rd pairs of genes are, respectively, ¼, ¼,
½. Hence, because an offspring will have the outward appearance of the dominant gene if its gene pair is either
dd or rd, it follows that the number of such children ,say X, is binomially distributed with parameters n equals
4 and p equals ¾. Thus the desired probability is
Example 6.5: Suppose it is known that the probability of recovery for a certain disease is 0.4. If random sample
of 10 people who are stricken with the disease are selected, what is the probability that:
(a) exactly 5 of them will recover?
(b) at most 9 of them will recover?
Solution: Let X be the number of persons will recover from the disease. We can assume that the selection
process will not affect the probability of success (0.4) for each trial by assuming a large diseased population
size. Hence, X will have a binomial distribution with number of trials equal to 10 and probability of success
equal 0.4.
(a)
(b)
The Poisson Random Variable
A random variable X, taking on one of the values 0, 1, 2, . . . , is said to have a Poisson distribution if its
probability mass function is given by
.
λ is the parameter of this distribution. The mean and variance of the poisson distribution are equal and their
values are equal to λ. Note that poisson distributions is used to model situations where the random variable X is
the number of occurrences of a particular event over a given period of time (or space). Together with this , the
following conditions must also be fulfilled: events are independent of each other, events occur singly, and
events occur at a constant rate (in other words for a given time interval the mean number of occurrences is
proportional to the length of the interval).
43
The poisson distribution is used as a distribution of rare events such as telephone calls made to a switch board in
a given minute, number of misprints per page in a book, road accidents on a particular motor way in one day,
etc. The process that give rise to such events are called poisson processes.
Example 6.6: Suppose that the number of typographical errors on a single page of this lecture note has a
Poisson distribution with parameter λ = 1. if we randomly select a page in this lecture note, calculate the
probability that
a) no error will occur.
b) exactly three errors will occur.
c) less than 2 errors will occur.
d) there is at least one error.
Solution: Let X= Number of errors per page
a) Required P(X≥1)=?
b)
c)
D)
Example 6.7: If the number of accidents occurring on a highway each day is a Poisson random variable with
parameter λ = 3, what is the probability that no accidents will occur on a randomly selected day in the future?
Solution: Let X= number of accidents per day
Required P(X= 0) = ?
Note: The Poisson random variable has a wide range of applications in a diverse number of areas. An important
property of the Poisson random variable is that it may be used to approximate a binomial random variable when
the binomial parameter n is large and p is small. The probability that X will be k can be approximated by
substituting λ by np in the Poisson distribution, i.e. .
6.4 Common continuous probability distributions
Normal distribution
The normal distribution plays an important role in statistical inference because many real-life distributions are
approximately normal; many other distributions can be almost normalized by appropriate data transformations
(e.g., taking the log) and as a sample size increases, the means of samples drawn from a population of any
distribution will approach the normal distribution.
A continuous random variable X is said to follow normal distribution , if and only if , its probability density
infinitely many normal distributions since different values of μ and σ define different normal distributions. For
instance, when μ= 0 and σ =1 , the above density will have the following form . This
particular distribution is called the standard normal distribution and sometimes known as Z-distribution.. The
random variable corresponding to this distribution is usually denoted by Z. If X has a normal distribution with
mean μ and variance σ2, we denote it as .
Properties of normal distribution
44
i) The normal distribution curve is a bell shaped, symmetrical about μ and mesokurtic. The p.d.f. attains its
maximum value at x= μ.
ii) Since for x= μ divides the area under the normal curve into two equal parts, μ is the mean, the median and
the mode of the distribution.
iii) The mean and variance of the normal distribution are μ, and σ2, respectively.
iv) The total area under the curve and bounded from below by the horizontal axis is 1, i.e.
standard normal table which gives area values bounded by two points. Areas under the standard normal
distribution curve are tabulated in various ways. The most common tables give areas bounded between Z=0 and
a positive value of Z. In addition to the standard normal table, the properties of normal distribution and the
following theorem are useful to make probability calculations very easy for any normal distribution.
ii)
Example 6.8: Let Z be the standard normal random variable. Calculate the following probabilities using the
standard normal distribution table: a) P(0<Z<1.2) b) P(0<Z<1.43) c) P(Z≤0) d) P(-1.2<Z<0) e) P(Z≤-1.43)
f) P(-1.43≤Z<1.2) g) P(Z≥1.52) h)P(Z≥-1.52)
45
Solution:
a) The probability that Z lies between 0 and 1.2 can be directly found from the standard normal table as
follows: look for the value 1.2 from z column ( first column) and then move horizontally until you find
the value of 0.00 in the first row. The point of intersection made by the horizontal and vertical movements
will give the desired area (probability). Hence P(0<Z<1.2)= 0.3849. Refer the table below as a guide to
find this probability.
Figure: The area to the left and the right of 0 for z-distribution
d) P(-1.2<Z<0)=P(0<Z<1.2)= 0.3849 due to symmetry
e) P(Z<-1.43)= 1- P(Z ≥ -1.43) Using the probability of the complement event.
= 1-[P(-1.43<Z<0)+P(Z≥0)] Since a region can be broken down
=1-[P(0<Z<1.43)+P(Z ≥0)] into non overlapping regions.
=1-[0.4236 + 0.5]
=1-0.9236=0.0764
46
Figure: P(Z<-1.43) is the shaded region
f) P(-1.43≤Z<1.2) = P(-1.43≤Z<0) + P(0≤Z<1.2)=P(0<Z≤1.43) + 0.3849= 0.4236 + 0.3849 =0.8085
47
P(Z > z*) = 0.1446 implies that P(0<Z≤z*) = 0.5 -0.1446=0.3554.
Hence we can look for the value of z* satisfying the above condition form the standard normal table. Thus z*
=1.06
b) If the probability that Z>z* is 0.8554 implies that z* is to the left of zero because
P(Z>0) = 0.5 is less than P(Z>z*). It implies that z* is a negative number.
a)
b)
c)
48
Example 6.10: Assume that the test scores for a large class are normally distributed with a mean of 74 and a
standard deviation of 10.
(a) Suppose that you receive a score of 88. What percent of the class received scores higher than yours?
(b) Suppose that the teacher wants to limit the number of A grades in the class to no more than 20%. What
would be the lowest score for an A?
Solution: Let X be the score of a randomly picked student, then
a)
Hence 8.08 percent of the students score more than you did?
b) Let XA be the lowest mark to get letter grade A. We are given that
The chi-square distribution has one parameter called the degrees of freedom, n. Depending on the values of n,
we can have many different chi-square distributions. The mean and the variance of chi-square distribution are
n, and 2n, respectively.
Because of its importance, the chi-square distribution is tabulated for various values of the parameter n (refer
table). Thus we may find in the table that value, denoted by , satisfying The
example below helps on how to read chi-square distribution values.
Example 6.11: To read the chi-square value with 2 degrees of freedom where the area to the right of this value
is 0.005.Look the degrees of freedom, 2, in the first column (df column) and then move horizontally until you
find the value of α , 0.005 in the first row. The point of intersection made by the horizontal and vertical
movement will give the desired chi-square value, 10.597. This value satisfies the following:
. In a similar way,The chi-square value with 100 degrees of freedom where the area to
the right of this value is 0.975 is 74.222.
49
The t distribution
The t distribution is an important distribution useful in inference concerning population mean/means. This
distribution has one parameter called the degrees of freedom. Depending on the values of the degrees of
freedom, we may have different t distributions. The degrees of freedom is usually denoted by n. In inference on
the population mean, the degrees of freedom is related to sample size. As the sample size or degrees of
freedom increases, the t distribution approaches the standard normal distribution.
The t- distribution shares some characteristics of the normal distribution and differs from it in others. The t
distribution is similar to the standard normal distribution in the following ways.
i) it is bell-shaped
ii) it is symmetrical about the mean
iii) the mean, median, and mode are equal to 0 and are located at the center of the distribution.
iv) The curve never touches the x-axis
The t distribution differs from the standard normal distribution in the following ways.
i) the variance is greater than 1.
ii) The t distribution is actually a family of curves based on the concept of degrees of freedom.
50
UNIT SEVEN: SAMPLING AND SAMPLING DISTRIBUTION OF SAMPLE MEAN
Objectives:
After a successful completion of this unit, students will be able to:
Differentiate the two major sampling techniques: probabilistic and non-probabilistic
Apply simple random sampling technique to select sample
Define sampling distribution of the sample mean
51
For example, you may use the lottery method to draw a random sample by using a set of 'N' tickets, with
numbers ' 1 to N' if there are 'N' units in the population. After shuffling the tickets thoroughly, the sample of a
required size, say n, is selected by picking the required n number of tickets.
The best method of drawing a simple random sample is to use a table of random numbers. After assigning
consecutive numbers to the units of population, the researcher starts at any point on the table of random
numbers and reads the consecutive numbers in any direction horizontally, vertically or diagonally. If the read
out numbers corresponds with the one written on a unit card, then that unit is chosen for the sample.
Suppose that a sample of 6 study centers is to be selected at random from a serially numbered population of 60
study centers. The following table is portion of a random numbers table used to select a sample.
Row 1 2 3 4 5 …… N
Column
1 2315 7548 5901 8372 5993 ….. 6744
2 0554 5550 4310 5374 3508 ….. 1343
3 1487 1603 5032 4043 6223 ….. 0834
4 3897 6749 5094 0517 5853 ….. 1695
5 9731 2617 1899 7553 0870 ….. 0510
6 1174 2693 8144 3393 0862 ….. 6850
7 4336 1288 5911 0164 5623 ….. 4036
8 9380 6204 7833 2680 4491 ….. 2571
9 4954 0131 8108 4298 4187 ….. 9527
10 3676 8726 3337 9482 1569 ….. 3880
11 ….. ….. ….. ….. ….. ….. …..
12 ….. ….. ….. ….. ….. ….. …..
13 ….. ….. ….. ….. ….. ….. …..
14 ….. ….. ….. ….. ….. ….. …..
15 ….. ….. ….. ….. ….. ….. …..
N 3914 5218 3587 4855 4888 ….. 8042
If you start in the first row and first column, centers numbered 23, 05, 14,…, will be selected. However, centers
numbered above the population size (60) will not be included in the sample. In addition, if any number is
repeated in the table, it may be substituted by the next number from the same column. Besides, you can start at
any point in the table. If you chose column 4 and row 1, the number to start with is 83. In this way you can
select first 6 numbers from this column starting with 83.
The sample, then, is as follows:
83 75
53 33
40 01
05 26
Hence, the study centers numbered 53, 40, 05, 33, 01 and 26 will be in the sample.
52
Simple random sampling ensures the best results. However, from a practical point of view, a list of all the units
of a population is not possible to obtain. Even if it is possible, it may involve a very high cost which a
researcher or an organization may not be able to afford. In addition, it may result an unrepresentative sample by
chance.
Stratified sampling
Stratified random sampling takes into account the stratification of the main population into a number of sub-
populations, each of which is homogeneous with respect to one or more characteristic(s). Having ensured this
stratification, it provides for selecting randomly the required number of units from each sub-population. The
selection of a sample from each subpopulation may be done using simple random sampling. It is useful in
providing more accurate results than simple random sampling.
Systematic sampling
In this method, samples are selected at equal intervals from the listings of the elements. This method provides a
sample as good as a simple random sample and is comparatively easier to draw a sample. For instance, to study
the average monthly expenditure of households in a city, you may randomly select every fourth households
from the household listings
Cluster sampling
Cluster sampling is used when sampling frame is difficult to construct or using other sampling techniques
(simple random sampling) is not feasible or costly. For instance, when the geographic distribution of units is
scattered it is difficult to apply simple random sampling. It involves division of the population of elementary
units into groups or clusters that serve as primary sampling units. A selection of the clusters is then made to
form the sample. The precision of estimates made based on samples taken using this method is relatively low.
Non-probabilily sampling techniques
In non-probability sampling, the sample is not based on chance. It is rather determined by personal judgment.
This method is cost effective; however, we cannot make objective statistical inferences. Depending on the
technique used, non-probability samples are classified into quota, judgment or purposive and convenience
samples.
The problem with using sample mean to make inferences about the population mean is that the sample
mean will probably differ from the population mean. This error is measured by the variance of the
sampling distribution of the sample mean and is known as the standard error. The standard error is the
average amount of sampling error found because of taking a sample rather than the whole population.
As sample size increases, the standard error decreases.
7.3 Central Limit Theorem
If X1, X2, …, Xn is a random sample from a population with mean μ and variance σ 2, then as n goes to infinity
the distribution of the sample mean, , approximates normal distribution with mean μ and variance σ 2/n. That
is, as n gets large, N (μ, σ2/n) and its standardized form is
Note: The central limit theorem is useful for approximating the distribution of the sample mean based on a large
sample size and when the population distribution is non normal; however, if the population is normal, then the
sampling distribution of the sample mean will be normal regardless of the sample size.
Example 7.2: If the uric acid values in normal adult males are normally distributed with mean 5.7 mgs and
standard deviation of 1mg. Find the probability that
a) a sample of size 4 will yield a mean less than 5
b) a sample of size 9 will yield a mean greater than 6
Solution: Let X be the amount of uric acids in normal adult males with mean 5.7 and variance 1.
a) If a sample of size 4 is taken, then ~ N (5.7, 0.25) since the population is normally distributed.
b) If a sample of size 9 is taken, then ~ N (5.7, 1/9) since the population is normally distributed.
54
formulate hypothesis about a population mean
determine an appropriate sample size for estimation
Introduction
We now assume that we have collected, organized and summarized a random sample of data and are trying to
use that sample to estimate a population parameter. Statistical inference is a procedure whereby inferences
about a population are made on the basis of the results obtained from a sample. Statistical inference can be
divided in to two main areas: estimation and hypothesis testing. Estimation is concerned with estimating the
values of specific population parameters; hypothesis testing is concerned with testing whether the value of a
population parameter is equal to some specific value.
8.1 Point and interval estimation of the mean
Point estimate: In point estimation, a single sample statistic (such as ) is calculated from the sample
to provide an estimate of the true value of the corresponding population parameters (such as ). Such
a single statistic is termed as point estimator, and the specific value of the statistic is termed as point estimate.
For example, the sample mean is an estimator for population mean and = 10 is an estimate, which is one
of the possible values of .
Interval estimate: In most practical problems, a point estimate does not provide information about ‘how close
is the estimate’ to the population parameter unless accompanied by a statement of possible sampling errors
involved based on the sampling distribution of the statistic. Hence, an interval estimate of a population
parameter is a confidence interval with a statement of confidence that the interval contains the parameter value.
An interval estimate of the population parameter consists of two bounds within which the parameter will be
contained:
n - 1 degrees of freedom. Moreover, as the sample size increases t is approximately the same as standard
normal.
Consider the case is known, we can derive a confidence interval for the population mean .
Let be a point on the standard normal curve that cuts an area of to the right. i.e. = . By
the symmetric property of the normal distribution, = (see the diagram below).
From the standard normal distribution, we know that
55
To obtain the limit of the interval estimate, we use the standardized form of in the above probability
Becomes
We can assert with probability that the interval contains the population
mean we are estimating.
The end points of the interval, and , are called confidence limits and the probability
is called the degree of confidence.
In a similar way a confidence interval for the population mean with unknown variance is
given by
where is the critical value of t-test statistic providing an area in the right tail of the t-distribution with
56
We use the central limit theorem to approximate the distribution of the sample mean based on large sample (
). Large sample size is a necessary condition to use the normal distribution. And hence, ~
N(0,1). If is unknown we can replace it by its sample estimate S. The resulting confidence
interval of becomes
Example 8.1: A drug company is testing a new drug which is supposed to reduce blood pressure. From the six
people who are used as subjects, it is found that the average drop in blood pressure is 2.28 millimeter of
mercury (mmHg) with a standard deviation of 0.95 mmHg. What is the 95% confidence interval for the mean
change in blood pressure? (Assume that the population is normal).
Solution: Given: , ,
(2.28-0.997, 2.28+0.997)
(1.28, 3.27)
We are 95% confident that the mean drop in blood pressure lies in between 1.28 mmHg and 3.27 mmHg for the
sampled population.
Example 8.2: Punctuality of patients in keeping appointment is of interest to a research team. In a study of
patients flow through the office of general practitioners, it was found that a sample of 35 patients were 17.2
minutes late for appointments, on the average. Previous research had shown the standard deviation to be about 8
minutes. The population distribution was felt to be not normal. What is the 90 percent confidence interval for
the true mean amount of time late for appointment?
Solution: Given: , ,
Since the sample size is fairly large (n > 30), and since the population standard deviation is known, according to
the central limit theorem, the sampling distribution of sample mean is approximately normal. Thus, a
confidence interval of the population mean is given by:
59
iii. For (left-tailed test) reject if .
Reject if
Reject if
Step 5: Interpret the result.
Errors in Hypotesis Testing
Ideally the hypotesis testing procedure should lead to the rejection of the null hypothesis when it is false
and nonrejection of when it is true. However, the correct decision is not always possible. Since the decision
to reject or do not reject a hypothesis is based on sample data, there is a possibility of committing an incorrect
decision or error. Hence, a decision-maker may commit one of the two types of errors while testing a null
hypothesis. These errors are summarized as follows:
Null Hypothesis ( )
Decision True False
Reject Type I error ( ) Correct decision
Accept Type II error (
Correct decision )
Type I error is committed if we reject the null hypothesis when it is true. The probability of committing a type I
error, denoted by is called the level of significance. The probability level of this error is decided by the
decision-maker before the hypothesis test is performed. Type II error is committed if we do not reject the null
hypothesis when it is false. The probability of committing a type II error is denoted by (Greek letter beta).
As type one error increases type two error will decrease (they are inversely proportional). Hence we cannot
reduce both errors simultaneously. As the sample size increases both errors will decrease.
Example 8.3: The life expectancy of people in the year 1999 in a country is expected to be 50 years. A survey
was conducted in eleven regions of the country and the data obtained, in years, are given below:
Life expectancy (years): 54.2, 50.4, 44.2, 49.7, 55.4, 47.0, 58.2, 56.6, 61.9, 57.5, and 53.4.
Do the data confirm the expected view? (Assuming normal population) Use 5% level of significance.
Solution: Let be the life expectancy of people in the year 1999 in a country.
1. (The life expectancy of people in the year 1999 in a country is 50 years)
(The life expectancy of people in the year 1999 in a country is different from 50 years)
2. Level of significance, α = 0.05.
3. Since is unknown and the population is normal, the t-test statistic is appropriate.
60
Given: n = 11; and we need to compute and .
4. For α = 0.05 and two-tailed test, the critical (table) value is:
Since reject the null hypothesis . That is, the calculated t value lies
in the rejection region (the shaded region).
5. Conclusion: The data do not confirm the expected view. That is, the life expectancy is different from 50
years at 5% level of significance.
Example 8.4: Suppose that we want to test the hypothesis with a significance level of .05 that the climate has
changed since industrialization. Suppose that the mean temperature throughout history is 50 degrees. During
the last 40 years, the mean temperature has been 51 degrees and the population standard deviation is 2 degrees.
What can we conclude?
Solution:
Let be the mean temperature.
1. (There is no change in temperature since industrialization)
(There is change in temperature since industrialization)
2. Level of significance, α = 0.05.
3. Since n = 40 is large, the Z-test statistic is appropriate.
Given: n = 40; = 2; = 51;
= 3.16
4. For α = 0.05 and two-tailed test, the critical (table) value is:
61
Since reject the null hypothesis . That is, the calculated Z value
lies in the rejection region (the shaded region).
5. Conclusion: There has been a change in temperature since industrialization, at 5% level of significance.
Example 8.5: A study was conducted to describe the menopausal status, menopausal symptoms, energy
expenditure and aerobic fitness of healthy midwife women and to determine relationship among these factors.
Among the variables measured was maximum oxygen uptake (Vo2max). The mean Vo2max score for a sample of
242 women was 33.3 with a standard deviation of 12.14. On the basis of these data, can we conclude that the
mean score for a population of such women is greater than 30? Use 5% level of significance.
Solution:
Let be the mean Vo2max score for a population of healthy midwife women.
1. (The mean score for a population of healthy midwife women is 30)
(The mean score for a population of healthy midwife women is greater than 30).
2. Level of significance, α = 0.05.
3. Since n = 242 is large, the Z-test statistic is appropriate.
Given: n = 242; = 12.14; = 33.3;
= 4.23
4. For α = 0.05 and right-tailed test, the critical (table) value is:
Since reject the null hypothesis . That is, the calculated Z value lies in the
rejection region (the shaded region).
5. Conclusion: The mean Vo2max score for the sampled population of healthy midwife women is greater
than 30 at 5% level of significance.
8.3 Test of Association (Independence)
Usually we encounter with nominal scale data. The test of association is useful for determining whether
there is any relationship or association exists between two nominal variables. For instance, we might be
interested in the relationship between HIV status with sex, lung cancer and smoking habit, political affiliation
and sex, e t c.
When observations are classified according to two variables or attributes and arranged in a table, the display is
called a contingency table as shown below:
62
The test of association or independence uses the contingency table format. Here the variables A and B have
been classified into mutually exclusive categories. The values Oij in row i and column j of the table shows the
observed frequency falling in each joint category i and j. The row and column totals are the sums of their
corresponding frequencies. The sum of row or column totals will give grand total n, which represents the
sample size. The procedures to test the association between two independent variables is summarized as
follows:
Step 1: State the null and alternative hypotheis
: There is no association or relationship exists between two variables, that is, the two variables are
independent.
: There is association or relationship between two variables, that is, the two variables are dependent.
Step 2: State the level of significance, .
Step 3: Calculate the expected frequencies, Eij, corresponding to the observed frequency in row i and column j.
The expected frequencies in each cell are calculated as:
where Oij is the observed frequency of row i and coulumn j and E ij is the expected frequency of row i and
coulumn j.
Step 5: Find the critical (table) value of (from Appendix..). The value of correponds to an area
in the right tail of the distribution.
where = (Number of rows – 1)(Number of columns – 1) = (r – 1)(c – 1)
Step 6: Compare the calculated and table values of . Decide wheather the variables are independent or not,
using the following decision rule:
Reject if is greater than . Otherwise do not reject .
Example 8.6: The following data on the colour of eye and hair for 6800 individuals were obtained from a
source:
Eye colour
Hair colour Fair Brown Black red Total
Blue 1768 808 190 47 2813
Green 946 1387 746 43 3122
Brown 115 444 288 18 865
Total 2829 2639 1224 108 6800
Test the hypothesis that hair colour and eye colour are independently distributed (there is no association
between colour of eye and colour of hair) at the level of = 0.01.
Solution:
63
1. : There is no association between hair colour and eye colour.
: There is association between hair colour and eye colour.
2. = 0.01.
3. Calculate the expected frequencies, Eij
………………..
…………………..
5. Critical value
= (r – 1) (c – 1) = (3 – 1) (4 – 1) = (2) (3) = 6
Objectives:
Having studied this unit, you should be able to:
formulate a simple linear regression model.
express quantitatively the magnitude and direction of the association between two variables
Introduction
The statistical methods discussed so far are used to analyze the data involving only one variable. Often an
analysis of data concerning two or more variables is needed to look for any statistical relationship or association
64
between them. Thus, regression and correlation analysis are helpful in ascertaining the probable form of the
relationship between variables and the strength of the relationship.
where = y-intercept that represents the mean value of the dependent variable when the independent
variable is zero; = slope of the regression line that represents the change in the mean of for a unit
change in the value of ; = error term
The population parameters and can be estimated from sample data using the least square technique. The
estimators of and are usually denoted by a and b, respectively. The resulting regression line is
and the equation is known as the fitted regression line. The estimated values of are denoted by . The
observed values of are denoted by y. The difference between the observed and the estimated values, - ,
is known as error or residual, and is denoted by . The residual can be positive, negative or zero.
A best fitting line is the one for which the sum of squares of the residuals, has the minimum value. This is
called the method of least squares. According to this method, one would select a and b such that =
is minimum. The solution of this minimization problem using partial differentiation is as follows:
= and
Example 9.1: A researcher wants to find out if there is any relationship between height of the son and his
father. He took random sample of 6 fathers and their sons. The height in inch is given in the table below:
Height of father (X) 63 65 64 65 67 68
Height of the son (Y) 66 68 65 67 69 70
i) Draw the scatter diagram and comment on the type of relationship.
ii) Fit the regression line of Y on X.
iii) Predict the height of the son if his father’s height is 66 inch.
Solution:
i)
65
From the scatter plot one can see that the points are roughly on straight line.
ii)
, , , ,
= 0.923 = 7.2
66
o as r approaches 0 indicates weak linear relationship between the two variables
Examples of correlation coefficients:
Example 9.2: In some locations, there is strong association between concentrations of two different pollutants.
An article reports the accompanying data on ozone concentration x (ppm) and secondary carbon concentration y
(
X 0.066 0.088 0.120 0.050 0.162 0.186 0.057 0.100
Y 4.6 11.6 9.5 6.3 13.8 15.4 2.5 11.8
a. Calculate the correlation coefficient and comment on the strength and direction of the relationship
between the two variables.
Solution: The summary quantities are
The value of 0.716 indicates that there is somehow strong and positive relationship between ozone
concentration and secondary carbon concentration.
67
68
69
70