Statistics Notes 2022
Statistics Notes 2022
Statistics Notes 2022
STATISTICS NOTES
ii) Continuous variables: Are variables, which can assume any value within a specific
range. Are always measured e.g. height, temperature, weight, radius etc.
iii) Finance
The finance mangers in discharging their finance functions efficiently depend heavily on
statistical analysis of facts and figures.
Financial forecasting, break even analysis and investment decisions under uncertainty are
part of their activities.
The area of security analysis is also highly quantitative.
iv) Banking
Banks need to gather and analyze information on the general economic consideration.
Banks’ reserves are highly influenced by money markets which are not only local but also
international.
The credit department performs statistical analysis to determine how much credit to
extend to various customers.
v) Purchase
The purchasing department makes use of statistical data to frame suitable purchase
policies such as where to buy, how to buy, at what time to buy and at what price to buy.
vi) Accounting
The auditing function makes frequent applications of statistical sampling and estimation
procedures.
The account collects data on historical costs in the course of auditing a company’s
financial records and may use regression analysis to analyze the cost.
vii) Personnel
The personnel department frames policies based on facts.
It makes statistical studies of wage rates, incentive plans, cost of living, labor turnover
rates, employment trends, accident rates employment grievances, performance appraisal,
training programs etc.
Such studies help the personnel department in the process of manpower planning.
viii) Investment
Statistics greatly assists investors in making clear judgments in his investment decisions
in selecting securities which are safe and which have the best prospects of yielding a
good income.
Numbers are often assigned to the various categories for the purpose of identification. E.G:
for the variable marital status we can assign 1 = married, 2 = single, 3 = divorced, 4 =
windowed, 5 = separated.
The numbers assigned to the various categories do no represent quantity or order and
therefore performing mathematical operations on these numbers would yield meaningless
values.
The counting of members in each group is the only possible arithmetic operation when a
nominal scale is employed. Accordingly we are restricted to use the mode as the measure of
central tendency. There is mo measure of dispersion used for nominal scales.
b) Ordinal scale
Items are not only grouped into categories but they are also ranked into some order.
Therefore in an ordinal scale, numerals are used to represent relative position or order among
the values of the variables.
The use of ordinal scale implies a statement of ‘greater than’ or ‘less than’ (equality is also
acceptable) without being able to state how much greater or less. The real difference between
ranks 1 and 2 may be more or less than the difference between ranks 5 and 6.
Since the numbers of this scale have only a rank meaning, the appropriate measure of central
tendency is the median. A percentile or quartile measure is used for measuring dispersion.
Correlations are restricted to various rank order methods. Measures of statistical significance
are restricted to non-parametric methods.
c) Interval scale
Numerals assigned to each measure are ranked in order and the intervals between them are
equal. Hence numerals used represent quantity and some mathematical operations would
yield meaningful values.
However, the zero point is not meaningful, i.e. interval scales have an arbitrary zero and it is
not possible to determine for them what may be called an absolute zero or the unique origin.
The primary limitation of the interval scale is the lack of a true zero; it does not have the
capacity to measure the complete absence of a trait or characteristic.
10
The Fahrenheit scale is an example of an interval scale. One can say that an increase in
temperature from 30o to 40o involves the same increase in temperature as an increase from
60o to 70o, but one cannot say that the temperature of 60 o is twice as warm as the temperature
of 30o because both numbers are dependent on the fact that then zero on the scale is set
arbitrarily at the temperature of the freezing point of water. The ratio of the two temperatures,
30o and 60o, means nothing because zero is an arbitrary point.
Intervals scales provide more powerful measurement than ordinal scales since the interval
scale incorporates the concept of equality of interval.
As such more powerful statistical measures can be used with interval scales. Mean is the
appropriate measure of central tendency, while standard deviation is the most widely used
measure of dispersion.
Product moment correlation techniques are appropriate and the generally used tests for
statistical significance are the‘t’ test and ‘F’ test.
d) Ratio scale
Ratio scales have an absolute or true zero of measurement. E.G: the zero point on a
centimeter scale indicates the complete absence of length or height. But an absolute zero of
temperature is theoretically unattainable and it remains a concept existing only in the
scientist’s mind.
Ratio scale represents the actual amounts of variables. Measures of physical dimensions such
as weight, height, distance, et. Are examples.
All statistical techniques are usable with ratio scale and all mathematical operations
(including multiplication and division) can be used
Geometric and harmonic means can be used as measures of central tendency and coefficients
of variation may also be calculated.
11
Its data that been gathered earlier for some other purpose. In contrast, the data that are
collected first hand by someone specifically for the purpose of facilitating the study are
known as primary data.
E.G: the demographic statistics collected every ten years are the primary data with the
registrar of persons but the same statistics used by anyone else would be secondary data
with that individual.
Advantages of secondary data
i) It is far more economical as the cost of collecting original data is saved.
ii) Use of secondary data is time saving.
Disadvantages of secondary data
i) One does not always know how accurate the secondary data are.
ii) The secondary data might be out dated.
12
13
In mail surveys, questionnaires are mailed to respondents who are supposed to fill them and
return. They are appropriate where the field of investigation is very vast and the informants
are spread over a wide geographical area.
Telephone interviews are similar to personal interviews except that communication between
interviewer and respondents is on telephone instead of direct personal contact.
The largest value is 73 and the smallest is 65. Hence, the range is 73 – 65 = 8 inches.
Frequency Distribution
Ungrouped data
In forming an array a value is repeated as many times as it appears. The number of times a
value appears in the listing is referred to as its frequency. In giving the frequency of a value,
we answer the question, “ How frequently does the value occur in the listing?”
14
When the data is arranged in tabular form by giving its frequencies, the table is called a
frequency table. The arrangement itself is called a frequency distribution.
Quite often it is useful to give relative frequencies instead of actual frequencies. The relative
frequency of any observation is obtained by dividing the actual frequency of the observation
by the total frequency (sum of all frequencies).
If the relative frequencies are multiplied by 100 and expressed as a percentage, we get the
percentage frequency distribution.
An advantage of expressing frequencies as percentages is that one can then compare
frequency distributions of two sets of data.
Example:
The following data were obtained when a die was tossed 30 times. Construct a frequency
table.
1 2 4 2 2 6 3 5 6 3
3 1 3 1 3 4 5 3 5 3
5 1 6 3 1 2 4 2 4 4
Grouped Data
When dealing with a huge mass of data and when the observed values consist of too many
distinct values, it is preferable to divide the entire range of values and group the data into
classes.
E.G: If we are interested in the distribution of ages of people, we could form the classes
0 – 19, 20 – 39, 40 – 59, 60 – 79 and 80 – 99. A class such as 40 – 59 represents all the
people with ages between 40 and 59 years inclusive.
When data are arranged in this way, they are called grouped data. The number of individuals
in a class is called the class frequency.
The following set of steps are suggested to form a frequency distribution from the raw data
i) Range
Scan through the raw data and find the smallest and the largest value. The largest
value minus the smallest value gives the range.
ii) Number of classes
Decide on a suitable number of classes. This could be anywhere from six to twenty.
15
16
A point that represents the halfway or dividing point between successful classes is called a
class boundary. If d is the difference between the lower class limit of a given class and the
upper class limit of the succeeding class, then
1
d
Upper Class Boundary (UCB) = Lower Class Limit (UCL) + 2
1
d
Lower Class Boundary (LCB) = Upper Class Limit (LCL) - 2
The class mark is defined as the mid point of a class interval. It is computed by adding the
lower and upper class limits of a class and then dividing by 2.
1
UCB LCB
Mid point = 2
1
UCB LCB
= 2
Example
Class L.C.L U.C.L L.B U.B Class mark
(Midpoint)
10 – 19 10 19 9.5 19.5 14.5
20 – 29 20 29 19.5 29.5 24.5
30 – 39 30 39 29.5 39.5 34.5
40 – 49 30 49 39.5 49.5 44.5
50 – 59 50 59 49.5 59.5 54.5
NB: The upper boundary of one class is the lower boundary of the next.
17
When the data is grouped, the cumulative frequency distribution gives the total frequency of
all the values less than the upper boundary of a given class.
Example
Find the cumulative frequency distribution for the grouped data given below:
Class Frequency Cumulative frequency (cf)
5 – 19 4 4
20 – 34 12 16
35 – 49 15 31
50 – 64 16 47
65 – 79 22 69
80 – 94 11 80
18
2.5 Exercise
A random sample of 50 auto drivers insured with a company and having similar auto
insurance policies was selected. The following data shows monthly auto insurance premium
(in Kshs.000) paid by them.
54 40 45 20 60 30 35 40 55 70 20 15
45 60 45 25 15 30 25 18 35 25 45 56
59 25 27 39 50 56 20 25 30 30 41 25
56 48 45 25 35 60 55 48 38 34 60 60
60 64
i) Group the above data starting with the class 10 -20 exclusive
ii) Represent the data using a Histogram and an Ogive.
3.1 Introduction
A measure of central tendency, also called measures of location or averages, is a single
value within the range of data that is used to represent all the values in the series.
b) Median
c) Mode
d) Geometric mean
e) Harmonic mean
Indirect method
where = provisional mean, = Deviations from P.M, = the sum of deviations from P.M
Grouped series
Direct method
Where = frequencies, = number of items
Indirect method
NB: For a grouped frequency distribution the value of X is taken as the mid point of each class.
Examples
1. The monthly sales of ABC stores for the period of 6 months were as follows:
37,000, 48,000, 84,000, 73,000, 35,000, 53,000.
20
21
N1 X 1 N 2 X 2
X 12
N1 N 2
Examples
1. There are two branches of a company employing 100 and 80 employees respectively. If
arithmetic means of the monthly salaries paid by the two branches are $4570 and $6750
respectively. Find the arithmetic mean of the salaries of the employees of the company put
together.
2 , where N f
N
2. Find
N
3. Check the cumulative frequency just greater than 2
4. The corresponding value of the variable is the median
22
Interpolation Formula
Steps
1. Construct the less than cumulative frequency distribution
2 , where N f
N
2. Find
N
3. Check the cumulative frequency just greater than 2
4. The corresponding class contains the median and is called the median class.
The median has to be interpolated in the class interval containing the median using the
formula:-
hN
median L C
f 2
where = Lower class boundary of the median class
h= Length of the classes
= Frequency of the median class
N = Total frequency
C = cumulative frequency of the class preceding the median class.
Examples
1. Find the median of the data below:
a) 5, 5, 4, 7, 0, 7, 8
b) 20, 15, 30, 45, 60, 10
2. Determine the median of the data below.
Grade A B C D E
No of students (f) 10 15 67 50 21
24
LQi
where = Lower class boundary of the ith quartile class
h = Length of the classes
f Qi
= Frequency of the ith quartile class
N = Total frequency
C = Cumulative frequency of the class preceding the ith quartile class.
Computation of the Deciles
h iN
Di LDi C
f Di 10
LDi
where = Lower class boundary of the ith decile class
h = Length of the classes
f Di
= Frequency of the ith decile class
N = Total frequency
C = Cumulative frequency of the class preceding the ith decile class.
Computation of the Percentiles
h iN
Pi LPi C
f Pi 100
LPi
where = Lower class boundary of the ith percentile class
h = Length of the classes
f Pi
= Frequency of the ith percentile class
N = Total frequency
C = Cumulative frequency of the class preceding the ith percentile class.
NB: Analogous to the graphical method of estimating the median, the quartiles, deciles and
percentiles of a grouped frequency distribution can be estimated using the cumulative
frequency curve (ogive curve).
Examples
1. Find the 1st , 2nd and 3rd quartiles for the following data
13, 9, 18, 15, 14, 21, 7, 10, 11, 20, 5, 18, 25, 16, 17
25
2. Given below is the number of families in a locality according to their monthly expenditure
Monthly No. of
expenditure families
140 - 150 17
150 - 160 29
160 - 170 42
170 - 180 72
180 - 190 84
190 – 200 107
200 – 210 49
210 – 220 34
220 – 230 31
230 – 240 16
240 – 250 12
Calculate:
i) All the quartiles
ii)7th decile
iii) 90th percentile
26
Interpolation Formula
h f m f1
Mode L
2 f m f1 f 2
D1
L i
D1 D2
=
Where Lower class boundary of the modal class
h= Length of the classes
f m Frequency of the modal class
Examples
1. Find the mode for the data below
a) 1, 2, 3, 4, 5, 6; Solution: The mode does not exist
b) 7, 8, 3, 8, 6, 10, 8 Solution: Mode = 8; This is a uni-modal distribution
c) 29,30,60,13,30,7,2,7 Solution: Modes are 30 and 7; This is a bi-modal distribution
d)
X 4 5 6 7 8 9 10
F 2 5 21 18 9 2 1
27
It represents the most typical value of the distribution and it should coincide with existing
items
It is not affected by the presence of extremely large or small items
Advantages of the Mode
It is easy to understand
Extreme items do not affect its value
It possesses the merit of simplicity
In either case the median will be about one third as far away from the mean as the mode is. This
means that
Mode = mean –3 (mean – mode)
28
= 3(median) – 2(mean)
1
Log G.M log x1 log x2 ... log xn
n
1
log G.M
n
log xi
G.M x x ... x
1
fn
2
f2
n
fn
N
1
Log G.M f1 log x1 f 2 log x2 ... f n log xn
N
1
log G.M
N
f log x
i i
N N f
where
Examples
1. The weekly incomes (‘000) of 10 families are given below. Find the geometric mean?
50, 80, 45, 70, 15, 75, 85, 40, 36, 25
29
X 15 20 25 30 35 40 45 50
F 2 22 29 24 7 8 6 2
Grouped data
f f1 f 2 ... f n
f x
f1 f 2 f
... n
H.M = = x1 x2 xn
Examples
1. Calculate the Harmonic mean of the following data
11, 13, 15, 16, 19, 22, 13, 20
30
3.5 Exercise
1. What are the requirements of a good average? Compare the mean, the median and the mode
in the light of these requirements.
2. Find the mean, median and mode for the following set of data
i) 3, 5, 2, 6, 5, 9, 5, 2, 8 and 6
ii) 51.6, 48.7, 50.3, 49.5 and 48.9
3. The following data pertain to marks obtained by 120 students in their final examination in
mathematics:
Marks Number of Students
30 -39 1
40 – 49 3
50 – 59 11
60 – 69 21
70 – 79 43
80 -89 32
90 - 99 9
Total 120
31
32
33
Mean deviation
Standard deviation / Variance
Limitations
It is not based on each and every value of the distribution
It is subject to fluctuations of considerable magnitude from sample to sample
It cannot be computed in case of open-ended distributions
It does not explain or indicate anything about the character of the distribution within the
two extreme observations.
34
Finding the difference between two values e.g. wages earned by different employees.
x1 , x2, ..., xn
mode, ignoring the signs of deviation. If are n observations then the mean deviation
about the mean is calculated as;
M .D
x x
For ungrouped data: n
M .D
f x x
For grouped data: f
Examples
1. Calculate the mean deviation of the following values
3000, 4000, 4200, 4400, 4600, 4800, 5800
2. Calculate the average deviation from the mean for the following
Sales (thousands) 10 – 20 20 – 30 30 – 40 40 – 50 50 – 60
No. of days (f) 3 6 11 3 2
35
Demerits
1. Ignores algebraic signs while taking the deviations
2. Cannot be computed for distributions with open-ended class
3. Rarely used in sociological studies
Standard deviation is the square root of the variance. It is denoted by s for sample data and
for population data.
x x
2
2
x x
2
n , where = sum of squares of the deviations from arithmetic mean
Variance for grouped data
f x x
2
2
f
Computing the standard deviation
Standard deviation for ungrouped data
36
x x
2
n
f x x
2
f
NB: The computations of can be simplified by using the following version of the formula
2
2 x
x2 2
2
fx 2
x
2
N1 X 1 N 2 X 2
X 12
observations is given by N1 N 2
37
N112 N 2 2 2 N1d12 N 2 d 2 2
12
N1 N 2
d1 X 1 X 12 d 2 X 2 X 12
;
NB: The above formula can be extended to find out the standard deviation of three or more
groups. For example, combined standard deviation of three groups would be
N112 N 2 2 2 N 3 3 N1d12 N 2 d 2 2 N 3d 3
123
N1 N 2 N3
Example
1. The number of workers employed, the mean wage per week and the standard deviation in each
branch of a company are given below. Calculate the mean wages and standard deviation of all
workers taken together for the factory.
38
Coefficient of Variation
The measures of dispersion which are expressed in terms of the original units of the
observations are termed as absolute measures. Such measures are not suitable for comparing
the variability of two distributions which are not expressed in the same units of measurements.
Therefore it is better to use relative measure of dispersion obtained as ratios or percentages and
are thus pure numbers independent of the unit of measurement.
Standard deviation is an absolute measure of dispersion and a relative measure based on the
standard deviation is called the coefficient of variation. It is a pure number and suitable for
comparing the variability, homogeneity or uniformity of two or more distributions. It is given as
a percentage and calculated as
100
Coefficient of variation (CV) = Mean
The lower the C.V the more consistent or stable the distribution is since the less the variability.
Example
Over a period of 3 months the daily number of components produced by two comparable
machines was measured, giving the following statistics
Machine A: mean = 242.8; Standard deviation = 20.5
Machine B: mean = 281.3; Standard deviation = 23.0
Which machine has less variability in its performance?
39
In a symmetrical distribution the values of mean, median and mode are alike. If the value of
mean is greater than the mode, skewness is said to be positive. If the value of mode is greater
than mean, skewness is said to be negative.
The Karl Pearson’s coefficient of skewness is frequently used for measuring skewness and its
calculated as
Mean Mode
SK p
Mean Mode 3 Mean Median
But . Thus the formula for calculating the coefficient of
skewness can be written as
3 Mean Median
SK p
Kurtosis refers to the degree of flatness or peakedness of a frequency curve. The degree of
peakedness of a distribution is measured relative to the peakedness of the normal
distribution.
If a distribution is more peaked than the normal curve, it is called Leptokurtic; if it is more
flat-topped than the normal curve, it is called platykurtic or flat-topped. The normal curve
is itself known as Mesokurtic.
Freq
Leptokur琀椀c curve
Platykur琀椀c curve
40
4.6 Activities
1. The following table indicates the marks obtained by students in a statistics test.
Marks Number of students
0 – 20 5
20 – 40 7
40 – 60 -
60 – 80 8
80 – 100 7
The arithmetic mean for the class was 52.5 marks. You are required to determine the value
of:
i) The missing frequency
ii) The median mark
iii) The modal mark
iv) The standard deviation
v) The coefficient of skewness
2. From the prices of the shares X and Y given below, state which share is more stable in value
and which one would you invest on and why?
X: 55 54 52 53 56 58 52 50 51 49
Y: 108 107 105 105 106 107 104 103 104 101
3. An analysis of the monthly wages paid to workers of two firms A and B belonging to the same
industry gives the following results:
Firm A Firm B
No. of wage earners 586 648
Average monthly wage 52.5 47.5
Standard deviation 10 11
Compute the combined standard deviation.
41
EXAMPLE
Let the set of all outcomes (sample space) in the experiment of tossing two coins be
HH , HT , TH , TT . Then
A= HT , TH is the event of getting just one head/tail
B= HH , HT , TH is the event of getting atleast one head
= is the the impossible event
S = HH , HT , TH , TT is the sure event
An elementary event or simple event is the event containing only one point of the sample
space E.G: In the Toss of two coins, the following are elementary events:
HH , HT , TH , TT .
A random variable is a function which assigns a numerical value to each simple event in a
sample space.
Example
Suppose that three students are selected at random from a class and each is asked whether he
smokes (S) or he does not (N). Then the sample space of this experiment is given by
S SSS , SSN , SNS , SNN , NSS , NSN , NNS , NNN
Let X denote the number of smokers among the three students chosen. Then:
42
The probability distribution of a random variable can be described by using all the values that
a random variable can together with the corresponding probabilities. Such a listing is called a
probability distribution or probability mass function of the random variable.
Example
Suppose X represents the number of heads in a random experiment of tossing three coins.
The sample space is:
S HHH , HHT , HTH , HTT , THH , THT , TTH , TTT
The probability distribution of the random variable X defined as the “number of heads” is
43
x P(X =x)
1
0 8
3
1 8
3
2 8
1
3 8
In general, suppose X is a random variable that assumes the values x1 , x2, …, xk. if we
represent the probability that X assumes the value xi by P(X=xi), then the probability function
can be given in the form of a table as
X P(x)
x1 p(x1)
x2 P(x2)
. .
. .
. .
xk P(xk)
Sum = 1
p( x ) p( x ) P( x ) ... P( x )
i 1 2 k
The sum of the probabilities, i.e. i 1 is one.
p( x ) 1
i 1
i
ii) The sum of all probabilities is equal to one, i.e.
Example
44
The number of telephone calls received in an office between 9 – 10 am has the probability
distribution as shown below:
probability and summing the terms. That is, if x1 , x2 ,...xn are the values assumed by a random
variable with respective probabilities p( x1 ), p( x2 ),... p( xn ) , then its mean (also called the
expected value) is given by
x1 p x1 x2 p x2 ...xn p xn
n
= xi p xi
i 1
i 1
45
The positive square root of the variance is called the standard deviation of the random
variable. The variance is commonly denoted as , hence the standard deviation equals .
2
Example
Suppose we are given the following data relating to the breakdown of a machine in a certain
company during a given week, where x represents the number of breakdowns of the machine and
P(x) represents the probability value of x.
x 0 1 2 3 4
P(x) 0.12 0.20 0.25 0.30 0.13
Find the mean and the variance of the number of breakdowns per week for this machine
NB: The computations of can be simplified by using the following version of the formulae:
2
2 x 2 .P x 2
46
n
p n, x p x q n x
x
The mean of a Binomial distribution =
Example
There are five flights daily from Moi International airport to Jomo Kenyatta International airport.
Suppose the probability that any flight arrives late is 0.2. What is the probability that: -
i) None of the flights are late today?
ii) Exactly one of the flights is late today?
47
E.G: To find the area between z = 0 and z = 1.73, we go to 1.7 in the column and 0.03 in the
row and read the corresponding entry as 0.4582. Hence the area between 0 and 1.73 is
P 0 z 1.73 0.4582
0.4582 and
NB:
i) The curve is symmetrical w.r.t the vertical axis through zero
ii) It is strongly recommended that we sketch the curves and identify the areas under the
curve and the values along the horizontal axis.
EXAMPLES
P 0 z c 0.3944
1. If . Find c.
P 2.42 z 0.8
2. Find
P 1.8 z 2.8 P 2.8 z 1.8
3. Find a) b)
5. Suppose z is a standard normal variable. In each of the following cases find c for which
P z c 0.1151
a)
P z c 0.8238
b)
P 1 z c 0.1525
c)
P c z c 0.8164
d)
Having considered areas under the standard normal curve, we now consider the general case
of a normal distribution with any mean and any standard deviation , where 0 .
If X is a normal random variable with mean and standard deviation , then X can be
X
z
converted into a standard normal variable z by setting
48
EXAMPLE 6
Suppose X has a normal distribution with = 30 and 4. Find
5.6 Activities
1. A salesman who sells cars for General Motors claims that he sells the largest number of cars
on Saturday. He has the following probability distribution for the number of cars he expects
to sell on a particular Saturday.
No. of cars (x) Probability P(x)
0 .1
1 .2
2 .3
3 .3
4 .1
Total 1.0
i) On a typical Saturday, how many cars does the salesman expect to sell?
ii) What is the variance of the distribution?
2. In a recent survey, 90% of the homes in a city were found to have colored TV’s. In a sample
of nine homes, what is the probability that:
i. All nine have colored TV’s?
ii. Less than five have colored TV’s?
iii. More than five have colored TV’s?
iv. At least seven homes have colored TV’s?
3. The life times of electric components manufactured by Raman Industries Ltd are normally
distributed with mean of 2500 hours and standard deviation of 600 hours. If the daily
production is 500 components, how many are expected to have a life time of:
i) Less than 2600 hours
ii) Between 2350 hours and 2580 hours
iii) More than 2380 hours
49
50
6.1 Introduction
The field of inferential or inductive statistics is concerned with studying facts about populations.
Specifically, the interest is in learning about the population parameters. This is accomplished by
picking a sample and computing the values of the appropriate statistics.
A parameter is a numerical descriptive measure of a population. Because it is based on the
observation in the population, its value is almost always unknown.
A Sample statistic is a numerical descriptive measure of a sample. It is calculated from the
observations in the sample.
NB: The term statistic refers to sample quantity and the term parameter refers to a population
quantity.
Sampling is the process of selecting a sample from a population.
51
b) Non-probability sampling
It is used when a researcher is not interested in selecting a sample that is representative of the
population.
i) Purposive Sampling
It allows the researcher to use cases that have the required information with respect to the
objectives of his or her study e.g. educational level, age group, religious sect etc.
ii) Quota Sampling
The researcher purposively selects subjects to fit the quotas identified e.g. Gender: Male or
Female; Class Level: Graduate or Undergraduate; Religion: Muslim, Protestant, catholic,
Jewish; Social economic class: Upper, middle or lower.
iii) Snow ball sampling
It is used when the population that possesses the characteristics under study is not well
known and can be best located through referral networks. Initial subjects are identified who
in turn identify others. Commonly used in drug cultures, teenage gang activities, Mungiki
sect, insider trading, Mau Mau etc.
iv) Convenience or Accidental Sampling
Involves selecting cases or units of observation as they become available to the researcher
e.g. asking a question to the radio listeners, roommates or neighbours.
52
i) Economy: Directly observing only a portion of the population requires fewer resources than a
census.
ii) The Time factor: A sample may provide an investigator with needed information quickly
iii) The very large populations: Many populations about which inferences must be made are quite
large and sample evidence may be the only way to obtain information.
iv) Partly inaccessible populations: Some populations contain elementary units so difficult to
observe that they are in a sense inaccessible e.g. in determining consumer attitudes not all of the
users of a product can be queried.
v) The Destructive nature of the observation: Sometimes the very act of observing the desired
characteristics of the elementary unit destroys it for the use intended. Classical examples of this
occur in quality control
vi) Accuracy and sampling: A sample may be more accurate than a census. A sloppily conducted
census can provide less reliable information than a carefully obtained sample.
Sampling error: It comprises the difference between the sample and the population that are due
solely to the particular elementary units that happen to have been selected.
There are two basic causes for sampling error.
One is Chance: Bad luck may result in untypical choices. Unusual elementary units do
exists, and there is always a possibility that an abnormally large number of them will be
chosen. The main protection against this type of error is to use a large enough sample.
Another cause of sampling error is sampling bias. This is the tendency to favor the selection
of elementary units that have particular characteristics. Sampling bias is usually the result of
a poor sampling plan.
Non sampling error
The other main cause of unrepresentative samples is non sampling error. This type can occur
whether a census or a sample is being used.
53
A non-sampling error is an error that results solely from the manner in which the observations
are made. The simplest example of non sampling error is inaccurate physical measurement due to
faulty instruments or poor procedures. Consider the observation of human weights – no 2
answers will be of equal reliability.
2
2
2 ,the mean and variance of the sampling distribution of x are given by x and x n .
When random samples of size n are drawn without replacement from a finite population of size
N that has a mean and a variance , the mean and the variance of the sampling distribution
2
of x are given by
2 N n
x x2
and n N1
2
2
x
If the population size is large compared to the sample size, n , approximately
The standard deviation of the sampling distribution of x is commonly known as the standard
error of the mean. It is n when sampling with replacement. For a sample drawn without
N n
replacement from a finite population of size N, the standard error of the mean is n N1
54
In the latter case it is approximately n if the population is very large compared to the sample
2
size. In our discussion, we shall assume that the population is large enough that n can be taken
The standard error of the mean then depends on two quantities, and n. It will be large if
2 2
is large, i.e. if the scatter in the parent population is large. On the other hand, the standard error
will be small if the sample size n is large. Since with a larger sample we can get more
information about the population mean and consequently less scatter of the sample mean
about .
The variance of the parent population is usually not under the experimenter’s control. Therefore
one sure way of reducing the standard error of the mean is by picking a large sample – the larger
the better.
So for we have concerned ourselves with two parameters of the sampling distribution of
x x and x2
. We now turn our attention to the distribution itself
The probability distribution of x will very much depend on the distribution of the sampled
population.
Note that if n the sample size, is large, the distribution of x is close to a normal distribution
2
of course with mean and variance n . The statement of this result is contained in the central
limit theorem.
55
The distribution of the sample mean x of a random sample drawn from practically any
2
distribution with mean and variance n , provided the sample size is large.
The central limit theorem tells us that the shape of the distribution is approximately normal. We
2
and x2
already know that if the population has mean and variance , then x
2
n .
Converting to the z scale, we can give an alternate version of the central limit theorem.
x
When the sample size is large, the distribution of n is close to that of a standard
normal variable z.
(Recall that to convert to the z scale the rule is: subtract the mean and divide by the standard
deviation of the r.v in question)
Since the central limit theorem applies if the sample size is large, a natural question is, how large
is large enough?
This will depend on the nature of the sampled population
If the parent population is normally distributed, then the distribution of x is normal for
any sample size,
If the parent population has a symmetric distribution, the approximation to the normal
distribution will be reached for a moderately small sample size, as low as 10.
In most instances, the tendency towards normality is so strong that the approximation is fairly
satisfactory with a sample size of about 30.
Example 1
The records of the Dept of health, education and welfare show that the mean expenditure
incurred by a student during 2010 was $5000 and the standard deviation of the expenditure was
$800. Find the approximate probability that the mean expenditure of 64 students picked at
random was
a) More than $4820
56
Example 2
The length of life (in hours) of a certain type of electric bulb is a random variable with a mean
life of 500 hours and a standard deviation of 35 hours.
What is the approximate probability that a random sample of 49 bulbs will have a mean life
between 488 and 505 hours?
x
(not very close to 0 or 1) and if n is large, then the distribution of the sample proportion n
pq
is approximately normal with mean p and variance n where p q 1 .
x
p
n
pq
Converting to the z scale, it follows that n has a distribution that is very close to the
x np
standard normal distribution provided n is large. This leads to the conclusion that npq is
distributed approximately as a standard normal variable.
Example 1
Suppose 10% of the tubes produced by a Machine are defective. If a sample of 100 tubes is
inspected at random
a) Find the expected proportion of defectives in the sample
b) Find the variance of the proportion of defective in the sample
c) Find the approximate distribution of the sample proportion
d) Find the probability that the proportion of defective will exceed 0.16
Example 2
57
If 60% of the population feels that the president is doing a satisfactory job, find the approximate
probability that in a sample of 900 people interviewed at random, the proportion who share this
view will
a) Exceed 0.65
b) Be less than 0.56
58
x x
2
i
s 2 i 1
Similarly, S is an estimator of and n 1
2 2
is its estimate computed from a
set of data x1 , x2 ,......, xn . Also If X represents the number of successes in a sample of n, then
X x
n is an estimator of P and if in a particular sample there are x successes, then n is an
estimate of P.
The major limitation of a point estimate is that it fails to indicate how close it is to the
quantity it is supposed to estimate. In other words, a point estimate does not give any idea
about the reliability or precision of the method of estimation used.
59
Interval Estimation
Another method of estimating parameters is called the method of Interval Estimation or
Confidence Interval.
It involves computing two points and constructing an interval within which the parameter lies
with a specified degree of confidence. In constructing the end points of the interval, all of the
factors, namely, the point estimate, the population variance, and the sample size, are brought
into play.
When we find a point estimate, we certainly do not expect that it will exactly equal to the
parameter value on the dot. Also if we take two samples from the same population, we do not
expect the two estimates computed from these samples to be exactly equal. This is due to the
sampling error involved. Thus, the method of point estimation has some drawbacks.
7.4 Confidence Intervals for Population Mean when the Population Variance is
Known.
If the population has a normal distribution and
1 is known, then a 100 percent
confidence interval for is given by
60
x z x z
2 n 2 n.
Example 1:
A gas station sold a total of 8019 gallons of gas on 9 randomly picked days. Suppose the amount
sold on a day is normally distributed with a standard deviation of 90 gallons. Construct
confidence intervals for the true mean amount sold on a day with the following confidence
levels:
a) 98%
b) 80%
Example 2:
A random sample of 16 fully grown turkeys had a mean weight of 20.8kgs. If we can assume
from past experience that 2.8 kgs, construct confidence interval for , the true mean weight,
with the following confidence coefficients.
a) 90%
b) 95%
c) 98%
2
n 2
e
differ from by more than a pre assigned quality e is .
Example
A population has a normal distribution with variance 225. Find how large a sample must be
drawn in order to be 95% confident that the sample mean will not differ from the population
mean by more than 2 units.
61
7.6 Confidence Interval for Population Mean When the Population Variance is
Unknown
A1
100 percent confidence interval for when the population is normally distributed and
is not known is given by
S S
x tn 1, x tn 1,
2 n 2 n
tn 1, 2
Note that 2 , will be very close to if n is 30 or more. In that case, the above confidence.
Interval for becomes, approximately
S S
x 2 x 2
n n
Example 1
When 16 cigarettes of a particular brand were tested in a laboratory for the amount of nicotine
content, it was found that their mean content was 18.3 mg with S =1.8mg.
Set a 90 percent confidence interval for the mean nicotine content in the population of
cigarettes of this brand. (Assume that the amount of nicotine in the cigarette is normally
distributed).
Example 2
In order to estimate the amount of time in minutes that teller spends on a customer, a bank
manager decided to observe 64 customers picked at random. The amount of time the teller spent
on each customer was recorded. It was found that the sample mean was 3.2 minutes with
S 2 1.44 find a 98% confidence interval for the mean amount of time .
Example 3
The following data represent the amount of sugar consumed (in pounds) in a household during
five randomly picked weeks: 3.8, 4.5, 5.2, 4.0 and 5.5. Construct a 90% confidence interval for
the true mean consumption . (Assume a normal distribution for the amount of sugar consumed)
62
63
hypothesis denoted as H 0 . It should be stated in such a way that it contains the equality sign.
The hypothesis against which the null hypothesis is tested is called the Alternative
does not support the claim under H 0 , we will reject H 0 and conclude that H 0 is false.
This error of accepting H 0 when it is false is called a type II error or an acceptance error.
a) H A : 0
b) H A : 0
64
H A : 0 with 0 450
where n is the sample size, is the population standard deviation, which is assumed known and
Z is the value on the z scale such that the area in right tail is .
It is the one-sided nature of the alternative hypothesis (greater than, >) that prompts the rejection
of H 0 if the value of the statistic falls in the right tail of its distribution. The test is therefore
called a one-tailed test, specifically, a right-tailed test.
b) A left-tailed test
Suppose the null and alternative hypotheses are given as
H 0 : 0
H A : 0
Once again, the alternative hypothesis is one sided (less than, <). We reject H 0 for smaller
values of x , leading to the rejection of H 0 if the value falls in the left tail of the distribution of
x as shown below. This gives a one-tailed test that is specifically a left-tailed test.
65
0 x
Ac琀椀ons
H 0 Z H0
C 0 Z .
The critical value C is given by n
c) A Two-Tailed test
A test leads to a two-tailed test if the alternative hypothesis is two sided.
Consider he following example:
E.g. Suppose a machine is adjusted to manufacture bolts to the specification of 1 – inch diameter,
and we state the null and alternative hypotheses as
H 0 : 1
H A : 1
If the sample mean of the diameters was too far off on either side of 1, we would favor rejecting
0.025
The rejection region with 0.05 has been distributed as 2 at each tail
66
2 2
0 x
Z Z z - scale
2 2
Ac琀椀ons:
H0 H0 H0
Z Z
than 2 or greater than 2
67
4. Stipulate the value of , the probability of rejecting H 0 wrongly. It is the value of that will
determine the critical point(s). Together with step 2, formulate the decision rule, i.e. determine
the values of the test statistics that will lead to the rejection of H 0 (the critical region)
5. Take a random sample and compute the value of the test statistic.
6. The final step consist of making the decision in light of the decision rule formulated in step 4.
It is important to interpret the conclusions in a non statistical language for the benefit of the un-
initiated
x 0
30 . The relevant statistic in this case is n
A summary of the test criteria to test H 0 : 0 against the three forms of alternative
hypotheses is given below
Alternative hypothesis
The decision rule is to reject H 0 if the
computed value is
0 Greater than Z
0 Less than Z
0 Z Z
Less than 2 or greater than 2
Example 1
After taking a refresher course, a salesman found that his sales (in dollars) on 9 random days
were 1280, 1250, 990, 1100, 880, 1300, 1100, 950 and 1050. Does the sample indicate that the
refresher course had the desired effect, in that his mean sale is now more than 1000 dollars?
Assume 100 , and the probability of erroneously saying that the refresher course is beneficial
should not exceed 0.01. Also assume that the sales are normally distributed.
68
Example 2
An IQ test was administered to 9 students and their mean IQ was found to be 95. Assuming the
population variance is 144, is it true that the mean IQ in the population is less than 100?
Use 0.15 , and assume that IQ is normally distributed.
Example 3
A machine can be adjusted so that when under control, the mean amount of sugar filled in a bag
is 5kgs. From past experience, the standard deviation of the amount filled is known to be
0.15kgs.
To check if the machine is in control, a random sample of 16 bags was weighed and the mean
weight was found to be 5.1kgs. At the 5% level of significance, is there evidence to believe that
the adjustment is out of control [Assume a normal distribution of the amount of sugar filled in a
bag]
8.6.2 Test of Hypothesis for the Population Mean when the Population Variance is Unknown
and the Sample is Small
x 0
In the case where was known, we used the test statistic n
Since is not known, we will use its estimation S. Hence the appropriate test statistic is
x 0
T
S
n
At this point we need the added assumption that the population is normally
distributed, especially if n is small. Since, under this assumption, the statistic T has student’s t
distribution with n – 1 d.f, we get the decision rules given in the following table, depending upon
the particular alternative hypothesis
Alternative Hypothesis
The decision rule is to reject H 0 if the computed value of T
is
1. 0
t
Greater than n 1,
2. 0
t
Less than n 1,
3. 0 Less than
tn 1,
2 or greater than
tn 1,
2
69
Example 4
A car salesman claims that a particular make of car would give a mean milleage of greater than
20 miles per litre To test the claim, a field experiment was conducted where 10 cars were each
run on one litre of petrol. The results (in miles) were 23, 18, 22, 19, 19, 22, 18, 18, 24, 22.
Do the data corroborate the salesman’s claim? Use 0.05 and assume a normal distribution
for mileage per gallon.
Example 5
A home economist claims that is a person is put on a certain diet, it will lead to a reduction of his
or her weight. The following data records the weights (in pounds) of five people, before and
after the diet. Does the data support the claim at the 5% level of significance?
Person number 1 2 3 4 5
Before the diet 175 168 140 130 150
After the diet 170 169 133 132 143
Example 6
An auto dealer believes that his new model will give mean trouble-free service of at least 12,000
miles. In a simulated test with 4 cars, the following numbers of trouble-free miles were
obtained: 11,000, 12,000, 11,800 and 11,200
Do these data refute the dealer’s claim? Use 0.05 [assume a normal distribution]
Example 7
A machine can be adjusted so that when under control, the mean amount of sugar filled in a bag
is 5 kg. To check if the machine is in control, six bags were picked at random and their weights
were found to be 5.3, 5.2, 4.8, 5.2, 4.8 and 5.3.
At the 5% level of significance, is there evidence to believe that the machine is not in control?
[Assume a normal distribution for the weight of a bag]
70
We shall specifically consider the problem of testing the null hypothesis H 0 : p p0 , where
e.g. we might be interested in the proportion of defective items produced by a machine and
wish to test: p 0.2 against p 0.2 ; or p 0.2 against p 0.2 ; or p 0.2 against p 0.2
To carry out a test of hypothesis regarding the population proportion. We pick a sample of
independent observations and use the sample proportion as the statistic on which the test is
x
based. If p is the proportion in the population then the sample proportion n has a sampling
p 1 p n
distribution with mean p and standard deviation
x
Furthermore, if the sample is large, the shape of the distribution of n is approximately
normal. Consequently, under the null hypothesis, which postulates that the population
x
proportion is 0 , n has a distribution that is approximately normal with mean p0 and
p
p0 1 p0 n
standard deviation provided n is large,
We now have a situation analogous to the one where we tested hypotheses regarding the
x p0 1 p0
x n
The role of is played by , that of 0 by p0 and that of n by n
The table below gives the 3 cases based on the nature of the alternative hypothesis
71
Alternative
The decision rule is to reject H 0 if the computed value of
hypothesis
x
p0
n
p0 1 p0 n
is
p p0 Greater than Z
p p0 Less than Z
p p0 Z Z
Less than 2 or greater than 2
Example 1
A machine is known to produce 30% defective tubes. After repairing the machine, it was found
that it produced 22 defective tubes in the first run of 100. Is it true that after the repaired the
Example 2
The proportion of Kenyans who traveled abroad last year was 20%. To find the attitude of
people on foreign travel this year, 100 people were interviewed. Of these 15 said they would
travel and the remaining 85 said they would not. Is there any basis to believe that the attitude has
72
XY
12 22
m n
The decision rules for various forms of alternative hypothesis are given in the table below.
Alternative hypothesis
The decision rule is to reject H 0 if the computed value is
1 2 Greater than Z
1 2 Less than Z
1 2 Z Z
Less than 2 or greater than 2
Example 1
For a sample of 15 adult Kenyans picked at random, the mean weight was x 154 pounds,
whereas for a sample of 18 people in the U.S, the mean weight was y 162 pounds. From past
surveys it is known that the variance of weight in Kenya is 1 100 and in the U.S it is
2
22 169 .
Is it true that there is significant difference between mean weights in the two places? Use
0.05 . [Assume that the weights are normally distributed]
Example 2
In order to compare two brands of cigarettes, brand A and brand B, for their nicotine content, a
sample of 60 was inspected from brand A and a sample of 40 from brand B. The results of the
tests were summarized as follows.
Brand A x 15.4 S12 3
Brand B y 16.8 S22 4
At the 5% level of significance, do the two brands differ in their mean nicotine content?
73
8.7.2 Difference in Population Means when the Variances are unknown but are assumed
equal
The following test procedure is particularly suited for the case when small independent
samples are drawn from normally distributed populations both having the same variance.
But we are given that the variances are equal. So suppose 1 2 and let represent the
2 2 2
XY
1 1
common value. The above test statistic then reduces to m n
S p2
m 1 S12 n 1 S22
Since is not known, we shall use its polled estimator S P where, mn 2
XY
1 1
Sp
Therefore, the test statistic appropriate for carrying out the test of H 0 is m n
The test procedure for the various form of the alternative hypothesis are given in the table below
Alternative Hypothesis
The decision rule is to reject H 0 if the computed value of is
4. 2
t
Greater than m n 2,
5. 2
t
Less than - m n 2,
6. 2 Less than
tm n 2,
2 or greater than
tmn 2,
2
Example 3
A nitrogen fertilizer was used on 10 plots and the mean yield per plot was found to be x 82.5
with an estimate S1 of the population standard deviation of yield per plot equal to 10kg. On the
other hand, 15 plots treated with phosphate fertilizer gave a mean yield y 90.5 kg per plot with
74
an estimate S 2 of the standard deviation of yield per plot equal to 20kg. At the 5% level of
significance are the two fertilizers significantly different?
75
If n items are picked independently from such a population, this leads to the binomial
distribution.
A generalization of this is when the population can be broken into more than two mutually
exclusive categories. For example, a coin could land heads, trails or on edge; when a die is
rolled it could land showing up one of the six faces; a person might be a democrat, a
Republican, or an independent; a person might be an A, B, O or AB blood type, and so on.
If n independent observations are made from such a population, we get a generalized concept
of the binomial distribution called the Multinomial distribution.
With our background of the last section, we are equipped to test the following null hypothesis
Ho: The Proportion of Democrats in the U.S is 0.60 (implying the proportion of non-
Democrats is 0.40)
In this section we consider how to test a null hypothesis of the following type.
Ho: In the U.S, the proportion of Democrats is 0.55, the proportion of Republicans is 0.35,
and the proportion of independents is 0.10.
To test the above hypothesis, suppose we interview 1000 people picked at random. On the
basis of the stipulated null hypothesis, we would expect 550 Democrats, 350 Republicans
and 100 independents.
76
If we actually observe 568 Democrats, 342 Republicans and 90 independents in this sample,
we might be quite willing to go along with the null hypothesis.
On the other hand, if the sample yields 460 Democrats, 400 Republicans and 140
independent, we would be reluctant to accept Ho.
Thus in the final analysis, the statistical test will have to be based on how good a fit or
closeness there is between the observed numbers and the numbers that one would expect
from the hypothesized distribution.
Tests of this type which determine whether the sample data are in conformity with the
hypothesized distribution are called tests of goodness of fit, since they literally test how
good the fit is.
The test criterion is provided by a statistic X whose value for any sample is given as a
number defined by
2
Oi Ei
2
6
2
i 1 Ei
Where Oi represents the observed frequency of the face marked i on the die and Ei the
corresponding expected frequency obtained by assuming that the null hypothesis is true.
Example:
It is believed that the proportions of people with A, B ,O and AB blood types in the population
are, respectively. 0.4, 0.2, 0.3 and 0.1. When 400 randomly picked people were examined, the
observed numbers of each type were 148, 96,106 and 50.
At the 5% level of significance, test the hypothesis that these data bear out the stated belief.
Summary:
1. The population is divided into K categories (classes) C1, C2,…, Ck
2. The null hypothesis stipulates that the probability that as individual belongs to category C 1 is
P1, that it belongs to category C2 is P2, and so on.
77
4. If the null hypothesis is true, then the expected frequencies E1, E2,…,Ek are obtained as
follows:
5. The departure of the observed frequencies from those expected is measured by means of a
O1 E1 O2 E2 Ok Ek
2 2 2
2 ...
E1 E2 Ek
6. If none of the expected frequencies is less than 5, the distribution of X can be approximated
very closely by a chi-square distribution. Since there are K categories, the number of d.f
associated with the chi-square is K – I.
7. The critical region for a given level of significance will therefore consist of the right tail of
the chi-square distribution with K – 1 d.f.
Note:
The distribution of the statistic X employed here is only approximately chi-square. It should not
be used if one of more of the expected frequencies is less than 5.
Here we are interested in observing more than one variable on each individual and finding if
there exists a relationship between these variables. For example: for each person we might
78
observe both blood type and eye color and investigate if these characteristics are related in
any way.
In short, our goal is to test whether two attributes observed on members of a population are
independent.
As a first step, we pick a sample of size n and classify the data in a two way table on the
basis of the two variables. Such a table is called a contingency table, since it alludes to
whether the distribution according to one variable is contingent on the distribution of the
other. If there are r rows and c columns, it is referred to as an “r by c” contingency table.
O E
2
2
The test statistic is given by with (r-1) (c-1) d.f. The decision rule for an
E
level of significance is: Reject Ho if the computed value is greater than the table
2
2r 1 c 1 ,
value
Example:
In a certain community, 360 randomly picked people were classified according to their age group
and political leaning. The data is presented below:
Political Age group
leaning 20-35 36-50 Over 50 Total
Conservative 10 40 10 60
Moderate 80 85 45 210
Liberal 30 25 35 90
Total 120 150 90 360
Test the hypothesis that a person’s age and political leaning are not related. Use = 0.05
79
same. In short, what we are interested in is whether the three states are homogeneous with
respect to the party affiliations of their residents. Tests that deal with problems of this type
are called tests of Homogeneity:
Once again, the measure of departure from homogeneity is provided by a statistic X whose
O E
2
2
The distribution of the statistic is approximately chi-square with (r-1) (c-1) d.f, where r
represents the number of rows and c the number of columns. The approximation is
satisfactory if none of the expected frequencies is less than 5.
Example:
In order to investigate whether the distribution of the blood types in Europe is the same as in the
U.S , information was collected on 200 randomly picked people in Europe and 300 people in the
U.S. From the data provided below, is it true that the distribution of blood types in Europe and
the U.S are significantly different:
Location
Blood type Europe U.S Total
A 95 125 220
B 50 70 120
O 45 90 135
AB 10 15 25
Total 200 300 500
80
10.1 Introduction
2) The populations from which the samples are drawn have identical means and variances
i.e.
1 2 3 ... n
In case we are not able to make these assumptions in a particular problem, the analysis of
variance technique should not be used. In such cases, we should consider using a “non-
parametric (distribution-free) technique”.
The procedure followed in the analysis of variance would be explained separately for
1) One-way classification
2) Two-way classification
is divided into two parts – the sum of squares between the samples and the sum of squares
within the samples.
Individual observations in the same treatment samples, however, can differ from each other
only because of chance variation, since each individual within the group receives exactly the
same treatment.
In other words, in one-way classification, the data are classified according to only one
criterion.
H 0 : 1 2 3 ... k
i.e. the arithmetic means of the population from which the k samples are randomly drawn
are equal to one another.
The steps involved in carrying out the analysis are:
The variance (sum of squares) between samples reflects the contribution of both different
treatments and chance to inter-sample variability.
Sum of squares is a measure of variability. The sum of squares between samples is denoted
by SSB.
82
For calculating variance between samples, we take the total of the squares of the variations of
the means of various samples from the grand mean and divide this total by the degrees of
freedom.
X 1 X 2 ... X k
X
n1 n2 ... nk
iii) Take the difference between the means of the various samples and the grand mean.
iv) Square the deviations and obtain the total which will give the sum of squares between the
samples; and
The degrees of freedom will be one less the number of samples i.e. if there are 4 samples,
then the degrees of freedom will be 4 – 1 = 3. In general v = k – 1 where k = number of
samples.
The variance (sum of squares) within samples measures those inter-sample differences
that arise due to chance only.
It is denoted by SSW. For calculating the variance within the samples we take the total
of the sum of squares of the deviation of various items from the mean values of the
respective samples and divide this total by the degrees of freedom
Thus the steps in calculating variance within the samples will be:
83
ii) Take the deviations of the various observations in a sample from the mean values of
the respective samples
iii) Square these deviations and obtain the total which gives the sum of squares within
the samples.
iv) Divide this total obtained in step (c) by the degrees of freedom, the d.f is obtained by
deducting from the total number of observations, the number samples, the number of
samples, i.e. v = n – k , where k refers to the total number of all the observations.
F is always computed with the variance between the sample means as the numerator and
the variance within the sample means as the denominator.
The denominator is computed by combining the variance within the k samples into single
measures.
Compare the calculated value of F with the table value of F for the given d.f at a certain
critical level (generally we take 5% level of significance).
If the calculated value of F is greater than the table value of F, it indicates that the difference
in sample means is significant,
i.e. it could not have arisen due to fluctuations of random sampling or, in other words, the
samples do not come from the same population.
On the other hand, if the calculated value of F is less than the table value, the difference is
not significant and hence could have arisen due to fluctuations of random sampling.
84
Example
As head of a department of a consumers’ research organization, you have the responsibility for
testing and comparing lifetimes of four brands of electric bulbs. Suppose you test the lifetime of
three electric bulbs of each of the four brands.
The data is shown below, each entry representing the lifetime of an electric bulb, measured in
hundreds of hours.
Brand
A B C D
20 25 24 23
19 23 20 20
21 21 22 20
Can we infer that the mean lifetime of the four brands of electric bulbs are equal?
85
To use the ANOVA table, it is convenient to use the following short-cut computational formulas:
k Tj2 T2
SSB = n j=1
N
j
Between samples sum of squares =
k nj k T j2
SSW = X ij2
j 1 i 1
n
j 1
Within samples sum of squares = j
nj
k
T2
SST X ij2
Total sum of squares = j 1 i 1 N
The format for the ANOVA table using the computational formulas is shown below:
86
Example
Consider the above example.
In order to use the computational formulas the following four quantities must be computed;
k nj k T j2
X 2
ij
Tj
n
j 1
T2
j 1 j 1
, , j
, and N .
87
11.1 Introduction
Correlation analysis is a statistical tool used to ascertain the association between two
variables while regression analysis is used to determine the nature and extent of relationship
between variables. This lesson explains the methods used in studying correlation and
regression.
88
When three or more variables are studied it is a problem of either multiple or partial
correlation.
In multiple correlation three or more variables are studied simultaneously. In partial
correlation, there are more than two variables but only two variables that are influencing each
other are considered, the effect of other influencing variables being kept constant.
Scatter Diagram
It helps to illustrate diagrammatically any relationships that may exist between two variables.
The following diagram indicate various degrees of correlation
Diagram to be drawn
89
Examples
1. Draw a scatter diagram from the following data
Supply (x) 4 5 8 9 10 12 15
Demand (y) 3 4 6 5 7 8 11
Example
The following data refers to exam marks vs hours of study for a sample of 8 candidates that sat a
statistics exam
90
4. The closer r is to +1 or to –1, the closer the relationship between the variables and the closer r
is to 0, the less close the relationship.
Advantage
It summarizes in one figure the degree of correlation and whether it is positive or negative.
Limitations
It assumes linear relationship regardless of the fact whether that assumption is true or not.
The coefficient can be misinterpreted.
The value of the coefficient is unduly affected by the extreme values.
It is time consuming.
Example
Two managers are asked to rank a group of employees in order of potential to eventually become
top managers. The rankings are as follows:
91
I 7 7
J 9 10
Calculate the coefficient of rank correlation and comment on the value.
Example
Calculate the rank correlation Coefficient for the following data of marks of 2 tests given to
candidates for a clerical job
Preliminary Test 92 89 87 86 83 77 71 63 53 50
Final test 86 83 91 77 68 85 52 82 37 57
92
The adjustment consists of adding to the value of where stands for the number of items
whose ranks are common.
The formula can thus be written as
6 d 2 m13 m1 m23 m2 ...
1 1
r 1
12 12
n n 1
2
Example
An examination of eight applicants for a clerical post was taken by a firm. From the marks
obtained by the applicants in the accounting and statistics papers, compute the Rank coefficient
of correlation.
Applicant A B C D E F G H
Marks in accounting 15 20 28 12 40 60 20 80
Marks in statistics 40 30 50 30 20 10 30 60
Merits of the Rank method
It is simpler to understand and easier to apply compared to the Karl Pearson’s method.
Where the data are of qualitative nature like honesty, efficiency, intelligence etc, the method
can be used with great advantage.
It is the only method that can be used where we are given the ranks and not the actual values.
Limitations
The method cannot be used for finding out correlation in a grouped frequency distribution.
Where the number of observations exceeds 30, the calculations become quite tedious and
require a lot of time.
is H 0 : 0, meaning that there is no relationship between the two variables in the
population.
93
n 2
r .
The test statistic to carry out the test is 1 r2
If H0 is true, then this statistic has the students’ t distribution with n-2 degrees of freedom.
Example
Consider the previous example on Exam marks Vs hours of study where we obtained r = 0.88
and r2 = 0.77 based on a sample with n = 10. Test the hypothesis that the population correlation
coefficient is zero at the 5% level.
Types of Regression
Simple linear regression: Involves a relationship between two variables only.
Multiple regression: Analyses or considers the relationship between three or more variables.
In regression analysis, an attempt is made to determine a line (Curve) which best fits the given
pair of data. In case of a linear relationship, a line with the equation of the where a and b are
constants to be determined is fitted. The constants a and b are determined such that
S Y a bX
2
is a minimum.
With the use of differential calculus, S is minimized for a and b which satisfy the following two
normal equations
Y na b X
XY a X b X 2
n XY X Y
bˆ
n X 2 X
2
94
aˆ
1
n
Y bˆ X = ˆ
Y bX
The constant b in the equation is called the regression coefficient of Y on X. It measures the
linear relationship between the two variables X and Y. X is called the independent variable, also
known as the regressor or predictor. Y is called the dependent variable, also known as the
regressed or explained variable.
Example
The following data give the observations on weekly income and expenditure on food for five
households.
Weekly Income (£) 240 270 300 30 360
Expenditure on food(£) 200 220 240 245 250
a) Plot the data on a scatter diagram
b) Determine the least squares regression line of expenditure on weekly income.
c) Using the equation in (b), estimate the expenditure on food for someone having a weekly
income of £380.
11.8 Activities
1. For the following results showing marks obtained by 15 students, calculate the Rank
correlation
Marks 50 50 40 39 38 37 36 35 34 33 32 31 30 29 28
in Maths
Marks 50 49 51 52 43 47 42 40 44 40 30 41 32 33 31
in
English
2. The following data gives the aptitude test scores and productivity indices of 10 workers
selected at random.
95
96
LINEAR PROGRAMMING
ADVANTAGES OF LP
97
LIMITATIONS OF L.P
1. Each problem has to be modelled according to its own constraints and
requirements. This requires great experience and ingenuity.
2. The number of state variables has to be kept low to prevent complicated
calculations.
3. It treats all relationships as linear. I.e. if direct cost of producing 10 units
is sh. 100 then on 20 units it is assumed to be sh. 200. This may not
always be the case in practice.
4. All the parameters in the linear programming model are assumed to be
known with certainty which is not possible in real situation.
a) Graphical methods
b) Simplex method
Whichever the method to be adopted, the 昀椀rst step is to formulate the linear
programming problems using the following steps:
98
Identify all the limitations or constrains in the given problem and then
express them as linear inequalities.
Identify the objective/ criterion which is to be optimized (maximize or
minimize) and express it as a linear function of the de昀椀ned decision
variables.
Example 1:
A manufacturer has two products P1 and P2 both of which are produced in two
steps by machines M1 and M2. The process times per hundred for the
products on the machines are:
P1 4 5 10
P2 5 2 5
Solutions:
99
X2
26
24
22
20
18
16
14
12
10
8
6
4 D(0,16)
2
0 C ( 10,12)
Feasible region
A (0,0) B(25,0)
4 8 12 16 20 24 25 28 32 36 40
X1
100
Product P1 = 25
Product P2 = 0
Z = 10x1 + 5x2
101
5x1 + 2x2 + S2 = 80
S1 4 5 1 0 100
S2 5 2 0 1 80
Z 10 5 0 0 0
Identify the biggest number in Z row (10). This gives the column of the
interest.
Divide the elements in the identi昀椀ed column by quantity solution
100/ 4 = 25
80/5 = 16
The smallest of the answer obtained is 16, which identi昀椀es the row of
interest.
The point where the identi昀椀ed column and the row meet, gives the pivot
element (5)
Step 5: Make pivot elements 1 (by dividing the row with pivot
element by the value of pivot element) and give the row identi昀椀ed
a new identity (the identity of the identi昀椀ed column). The draw
initial simplex tableau reproduced.
Old row: S2 5 2 0 1 80
X1 1 0.4 0 0.2 16
102
S1 4 5 1 0 100
X1 1 0.4 0 0.2 16
Z 10 5 0 0 0
Done to make the elements in identi昀椀ed column zero, except the pivot
element which MUST remain one (1). The operation must be within any
two rows one of which is the one with pivot element. I.E
X1 4 1.6 0 0.8 64
OLD ROW: Z 10 5 0 0 0
OLD ROW: Z 10 5 0 0 0
X1 10 4 0 2 160
103
S1 0 3.4 1 -0.8 36
X1 1 0.4 0 0.2 16
Z 0 1 0 -2 -160
Since all the elements in the Z row are not negatives or zeros, the optimal
solution is not reached. Go to step 8.
a) Pivot element
Column identi昀椀ed = X2
36/3.4 = 10.6
16/0.4 = 40
X1 1 0.4 0 0.2 16
Z 0 1 0 -2 -160
104
X2 0 1 0.29 -0.24 36
b) Less or equal to (≤) primal implies greater or equal to (≥) dual and
vice versa.
d) The right hand side of dual constraints inequalities are objective co-
e昀케cient in primal program and vice versa.
105
Example 1:
Given primal program:
Max, Z = 4 x1 + 2x2 +5x3
Subject to: x1 + 2x2 - x3 ≤ 20 …………………….y1
4 x1 + 8x2 +11x3 ≤ 28 ……………….y2
6 x1 + x2 + 8x3 ≤ 32 ………………....y3
And x1, x2, x3 ≥ 0
Required:
Obtain the dual program
Solution:
Constraints coe昀케cient matrix
1 2 -1
4 8 11
6 1 8
Transposing the above matrix:
1 4 6
2 8 1
-1 11 8
Dual program;
Mix, Z = 20 y1 + 28y2 +32y3
Subject to: y1 + 4y2 - 6y3 ≥ 4
2y1 + 8y2 +y3 ≥ 2
-1y1 + 11y2 + 8y3≥5
And y1, y2, y3 ≥ 0
Example 2:
Given primal program:
Min, Z = 5x1 + 8x2
Subject to: 2x1 + 3x2 ≥ 5 …………………….y1
4 x1 + 10x2 ≥ 19… ……………….y2
x1 + 12x2 ≥ 24… ……………….y2
And x1, x2 ≥ 0
Required:
Obtain the dual program
106
3 10 12
Dual program:
Max, Z = 5 y1 + 19y2 + 24y3
Subject to: 2y1 + 4y2 + y3 ≤ 5
3y1 + 10y2 +12y3 ≤ 8
And x1, x2, x3 ≥ 0
NOTE:
The solution to the dual can be deduced from the solution to the primal using
simplex method. The procedure involve associating the values in the Z-row of
the optimal primal tableau with the dual variables, where the 昀椀rst slack
variable is associated with the 昀椀rst dual variable, the second slack variable
with the second dual variable and so on.
Example3:
Suppose you have primal program as:
Max, Z = 2x1 + 3x2
Subject to: 2x1 + x2 ≤ 4
x1 + 2x2 ≤ 5
And x1, x2 ≥ 0
After performing all steps involved in simplex method, the optimal (last)
tableau is:
SENSITIVITY ANALYSIS
This involves determining the e昀昀ect the various change to the primal
programme would have on the current solution to the program. It is also
called Post-Optimality analysis.
107
The various changes that can occur in linear programming problem include:
a) Changes in the coe昀케cient of the objective program.
Example 4:
Suppose we have a formulated linear program model as:
Max, Z = 2x1 + 3x2
Subject to: 2x1 + x2 ≤ 4 ……………………………R1
x1 + 2x2 ≤ 5 ……………………………R2
And x1, x2 ≥ 0
Also suppose we are given optimal solution (after solving using simplex of
graphical method) as: x1 = 1 x2 = 2 and Z=8
st nd
a) Supposing the 1 constrain (R1) increases by 20% and 2 constrain (R2)
increases by 10%, perform the sensitivity analysis to 昀椀nd the new solution
and check whether it is feasible solution.
Solution:
The new solution is given as:
Current basic variable = (inverse matrix for constraints coe昀케cient) x
(New right hand side)
But inverse of matrix = x Adjoint
Matrix for the coe昀케cient of the constraints for the problem above is 2
1
1 2
Determinant = 4 – 1 =3
Adjoint = Transpose of the cofactor of the matrix
But cofactor = 2 -1
-1 2
Transposing the cofactor = Adjoint = 2 -1
-1 2
Inverse = x 2 -1
-1 2
New right hand side:
New R1 = 4 + x 4 = 4.8
New R2 = 5 + x 5 = 5.5
108
REVISION EXERCISE:
1) Using the information give an example 4 above, determine the new
optimal solution by performing the sensitivity analysis when:
c) Deduce the solution to the primal program from the dual program
109
INDEX NUMBERS
An index number is a number which indicates the level of a certain
phenomenon at any given date with the level of the same phenomenon at
some standard date.
It provides an opportunity for measuring the relative change of a variable
where measurement of its actual change is inconvenient or impossible. It is
also a series of numbers by which changes in the magnitudes of a
phenomenon are measured from time to time or from place to place. An
index is constructed by selecting a base year as a starting point. The price or
quantity of base year is represented by 100 and those of other years
measured against it.
Uses of index numbers;
a) Price index numbers are used to measure changes in a particular group
of prices and help in comparing the movement of one commodity with
another.
c) The quantity index numbers show the rise or fall in the volume of
production, volume of exports and imports.
d) The imports and export prices indices are used to measure the
changes in the terms of trade of a country
110
Such index shows that the value of money is 昀氀uctuating i.e. appreciating or
depreciating accordingly as index numbers of prices are rising or falling. A
rise in index number of prices will signify the deterioration in the value of
money and vice versa.
Simple index numbers;
These are cases where construction of index numbers involves a single
commodity. Methods used in constructing simple index numbers are;
a. Fixed base method
Here, the base period is 昀椀xed and prices of subsequent years are expressed
as relatives of the prices of the base year. A price relative is price of an item
in one year relative to another year i.e.
P1/P0 ×100
Where; P1 = price of current year
P0 = price of base year
Example:
From the following data, compute price index number by taking 2002 as base
year.
Year 2002 2003 2004 2005 2006 2007
Price of 8 10 12.5 18 22 25
sugar/ Kg
Solution
Year Price of sugar/ Price index
Kg (P1/P0× 100)
2002 8 8/8 × 100 = 100
2003 10 10/8 ×100 = 125
2004 12.5 12.5/8 × 100 =
2005 18 156.50
2006 22 18/8 × 100 = 225
2007 25 22/8 × 100 = 275
25/8 ×100 =
312.5
In this method, the base is not 昀椀xed and it changes from year to year. The
price of the previous period is taken as the base period. This method shows
whether the rate of change is rising, falling or constant as well as the extent
of change from year to year.
Price index number = (price of the current year)/ (price of previous
year) × 100
Example;
Construct the chain base index numbers from the following data.
Year 200 200 200 200 200 2007
111
2 3 4 5 6
Price 120 125 140 150 135
160
(Shs)
Solution
Yea Prices Chain base index
r (Shs) numbers
200 120 -
2 125 125/120 × 100 =
200 140 104.17
3 150 140/125 × 100 = Weighted index number;
200 135 112.0 If all commodities selected do not
4 160 150/140 × 100 = have equal importance for consumers
200 107.14 then weighted system is adopted.
5 135/150 × 100 = Appropriate weights are assigned to
200 90.00 di昀昀erent commodities. An index is
called Weighted Aggregate index
6 160/135 × 100 =
when it is constructed for an
200 118.52 aggregate of items (prices) that have
7 been weighted in some way (by
corresponding quantities produced, consumed or sold), so as to re昀氀ect their
importance.
The important formulae of constructing weighted index numbers include;
i) Laspeyres Method (L) - The base year quantities/prices are taken
as weights. The method tries to answer the question “what is the
change in aggregate value of the base period list of goods when
valued at given period prices?”
112
b) Paasche
c) Fishers
2012 2013
Price Quantity Price Quantity
(Shs) (bags) (Shs) (bags)
Maiz 65 20 135 30
e
Whea 95 8 160 7
t
Bean 150 5 320 8
s
Solution:
2012 2013
P0 q0 P1 q1 P1q0 P0q0 P1q1 P0q1
Maiz 65 20 13 30 2700 1300 4050 1950
e 5
Whe 95 8 16 7 1280 760 1120 665
at 0
Bean 15 5 32 8 1600 750 2560 1200
s 0 0
113
REVISION QUESTIONS:
1) Explain uses and limitations of index numbers
TIME
PRODUCT 2012 2013
Quantity Price Quantity Price
(Kg) (shs) (Kg) (shs)
Bread 5 5 7 6.5
Eggs 6 7.75 10 8.8
Soap 4 9.63 6 10.75
Sugar 9 12.5 9 12.75
Calculate:
a) Laspeyre’s price index
114
115
DECISION THEORY
Decision making is at the core of businesses and the lives of each person.
Some decisions are major and not made often while other are minor and
made often. Success in business or in life depends on the decisions made.
Therefore, what is involved in good decision making is crucial. Decision
theory is an analytical and systematic approach to the study of decision
making.
It’s important to distinguish between a good decision and a bad decision. A
good decision:
Is based on logic
Is made after considering all available data and alternatives
Applies appropriate quantitative techniques
116
Table1: pay o昀昀 table (matrix) showing conditional values for a manufacturer
State of nature
Strategy or Favourable Unfavourable
alternatives market market
Construct large 200,000 -180,000
plant
Construct small 100,000 -20,000
plant
Do nothing 0 0
Decision Making Environment for managers:
Managers make decision in environments which can be grouped into four
states:
Certainty
Risk
Uncertainty
Con昀氀ict / Game theory
Both decision theory and game theory have the objective of assisting the
decision maker by providing a structure to enable the evaluation of
information of the relative likelihood of di昀昀erent outcomes so that the best
course of action can be identi昀椀ed.
a) Environment of Certainty
117
Either the case use the formula: Expected value =Σ (Real value ₓ
corresponding probability)
E (X) = Σ X P (X)
Example:
James M is a manager who is contemplating in putting up plant which could
be large or small. The following data has to interrupt; the market demand is
likely to be either favourable or unfavourable. If James constructs a large
plant and under favourable market is likely to get a pro昀椀t of 200,000, but if
the market demand is unfavourable he makes loss of 180,000. If he
constructs a small plant and under a favourable market he gets a pro昀椀t of
100,000 but if the market is unfavourable he gets a loss of 20,000. Further
James believed the favourable and unfavourable markets are equally likely.
Represent the above information in decision table and advice the
management on what plant to put up basing on monetary value and
opportunity loss.
Solution:
Decision table:
State of nature
Strategy or Favourable Unfavourable market
alternatives market (0.5) (0.5)
Construct large 200,000 - 180,000
plant
Construct small 100,000 -20,000
plant
No plant 0 0
Maximise expected monetary value:
Large plant: 200,000 (0.5) + -180,000 (0.5) = 100,000 – 90,000 = 10,000
Small plant: 100,000 (0.5) + -20,000 (0.5) = 50,000 – 10,000 = 40,000
No plant: 0 (0.5) + 0 (0.5) = 0
Decision is to put up small plant as it will maximise on the expected
monetary value
Opportunity loss:
This is the amount one would lose by not taking the best alternative. It is
also called the amount of regret. To obtain the regret table, for each state on
nature we get the di昀昀erence between the consequences of any alternative
and the best possible alternative i.e.
Opportunity loss table/ regret table:
Options Favourable market Unfavourable
market
Large plant 200,000 – 200,000 = 0 0 - -180,000 =
180,000
Small plant 200,000 – 100,000 = 0 - -20,000 =
100,000 20,000
118
These refer to situations where more than one outcome can result from any
single decision. Several methods are used to make decision in circumstances
where only the pay o昀昀s are known and the likelihood of each state of nature
are known.
a) Maximin Method
This criteria is based on the “conservative approach’ to assume that the worst
possible is going to happen. The decision maker considers each strategy and
locates the minimum pay o昀昀 for each and then selects that alternative which
maximizes the minimum payo昀昀
Illustration
Rank the products A B and C applying the Maximin rule using the following
payo昀昀 table showing potential pro昀椀ts and losses which are expected to arise
from launching these three products in three market conditions
Table 1
Ranking the MAXIMIN rule = BAC
b) MAXIMAX method
This method is based on ‘extreme optimism’ the decision maker selects that
particular strategy which corresponds to the maximum of the maximum pay
o昀昀 for each strategy
Illustration
Using the above example
Max. pro昀椀ts row maxima
Product A +8
Product B +12
Product C +16
119
Illustration
Regret table in £ 000’s
Boom Steady state Recessio Mini regret row
condition n maxima
Product A 8 5 22 22
Product B 18 0 0 18
Product C 0 6 38 38
A regret table (table 2) is constructed based on the payo昀昀 table. The regret is
the ‘opportunity loss’ from taking one decision given that a certain
contingency occurs in our example whether there is boom steady state or
recession
The ranking using MINIMAX regret method = BAC
Example
A manager has a choice between
i. A risky contract promising shs 7 million with probability 0.6 and shs 4
million with probability 0.4 and
ii. A diversi昀椀ed portfolio consisting of two contracts with independent
outcomes each promising Shs 3.5 million with probability 0.6 and shs 2
million with probability 0.4
Can you arrive at the decision using EMV method?
Solution
The conditional payo昀昀 table for the problem may be constructed as below.
(Shillings in millions)
Event Probability Conditional pay o昀昀s Expected pay o昀昀 decision
Ei (Ei) decision
(i) Contract Portfolio(iii Contract (i) x Portfolio (i) x
(ii) ) (ii) (iii)
Ei 0.6 7 3.5 4.2 2.1
120
Using the EMV method the manager must go in for the risky contract which
will yield him a higher expected monetary value of shs 5.8 million
Example
A company is considering investing in one of three investment opportunities
A, B and C under certain economic conditions. The payo昀昀 matrix for this
situation is economic condition
Investment 1£ 2£ 3£
opportunities
A 5000 7000 3000
B -2000 10000 6000
C 4000 4000 4000
Solution
Economic condition
Investment 1£ 2£ 3£ Minimu Maximum
opportunities m£ £
A 5000 7000 3000 3000 7000
B -2000 10000 6000 -2000 10000
121
1 2 3 Maximum regret
A 0 3000 3000 3000
B 7000 0 0 7000
C 1000 6000 2000 6000
GAME THEORY
Game theory is used to determine the optimum strategy in a competitive
situation.
When two or more competitors are engaged in making decisions, it may
involve con昀氀ict of interest.
In such a case the outcome depends not only upon an individual’s action
but also upon the action of others.
Both competing sides face a similar problem. Hence game theory is a
science of con昀氀ict
Game theory does not concern itself with 昀椀nding an optimum strategy but it
helps to improve the decision process.
Game theory has been used in business and industry to develop:
bidding tactics,
pricing policies,
advertising strategies,
timing of the introduction of new models in the market e.t.c.
122
iii. Each of these participants has available to him a 昀椀nite set of available
courses of action i.e. choices
iv. The rules governing these choices are speci昀椀ed and known to all
players
v. While playing each player chooses a course of action from a list of
choices available to him.
vi. The outcome of the game is a昀昀ected by choices made by all of the
players. The choices are to be made simultaneously so that no
competitor knows his opponents choice until he is already committed
to his own.
vii. The outcome for all speci昀椀c choices by all the players is known in
advance and numerically de昀椀ned.
NOTE: When a competitive situation meets all these criteria above we call it a
game. Only in a few real life competitive situation can game theory be applied
because all the rules are di昀케cult to apply at the same time to a given situation.
DEFINITION OF TERMS:
Game: It is an activity between two or more persons involving actions by
each one of them according to a set of rules which results in some gain for
each. If in a game the actions are determined by skills, it is called game of
strategy but if they are determined by chance it is termed as a game of
chance.
123
under every possible future contingency that might occur during the play of
the game. Two types of strategies are:
a) Pure strategy – It’s a situation where each player in the game adopts
a simple strategy as an optimal strategy. Here the value of the game
is the same for both players.
b) Mixed strategy – A player adopt a mixture of strategies if the game
is played many times. In this case the players’ uses a combination of
strategies and each player always keep guessing as to which course of
action is to be selected by the other player at a particular occasion.
Thus, there is a probabilistic situation and objective of the player is to
maximize expected gains or to minimize losses.
Example
Two players X and Y have two alternatives each. They show their choices by
pressing two types of buttons in front of them but they cannot see the
opponents move. It is assumed that both players have equal intelligence and
both intend to win the game.
This sort of simple game can be illustrated in tabular form as follows:
Player Y
Player X Button r Button t
Button m X wins 2 points X wins 3 points
Button n Y wins 2 points X wins 1 point
The game is biased against Y because if player X presses button ‘m’ he will
always win. Hence Y will be forced to press button r to cut down his losses
Alternative example
Player Y
Player X Button r Button t
Button m X wins 3 points Y wins 4 points
Button n Y wins 2 points X wins 1 point
In this case X will not be able to press button ‘m’ all the time in order to win (or
button ‘n’). Similarly Y will not be able to press button ‘r’ or button‘t’ all the
time in order to win. In such a situation each player will exercise his choice for
part of the time based on the probability.
124
3, -4, -2, 1 are the known pay o昀昀s and here the game has been represented in
the form of a matrix. When the games are expressed in this fashion the
resulting matrix is commonly known as PAYOFF MATRIX.
STRATEGY:
It refers to a total pattern of choices employed by any player. Strategy could be
pure or a mixed.
In a pure strategy, player X will play one row all of the time or player Y
will also play one of the column all the time.
In a mixed strategy, player X will play each of his rows a certain
portion of the time and player Y will play each of his columns a certain
portion of the time.
Strategy assume the worst and act accordingly if X plays 昀椀rst with his row one
then Y will play with his 2nd column to win 1 point similarly if X plays with his 2nd
row then Y will play his 3rd column to win 7 points and if x plays with his 3rd row
then Y will play his fourth column to win 9 points
In this game X cannot win so he should adopt 昀椀rst row strategy in order to
minimize losses
This decision rule is known as ‘maximum strategy’ i.e. X chooses the highest of
these minimum pay o昀昀s
125
If Y plays with his 4th column, then X will play his 1st row to win 2 points
Thus player Y will make the best of the situation by playing his 2 nd column
which is a ‘Minimax strategy’
This game is also a game of pure strategy and the value of the game is –1(win
of 1 point per game to y) using matrix notation, the solution is shown below
Player Y
Row Minimum
3 -1 4 2 1
Player X -1 -3 -7 0 7
4 -7 3 -9 9
4 -1 4 2
column maximum
Saddle point also gives the value of such a game. In a game having a saddle
point, the optimum strategy for both players is to play the row or column
containing the saddle point.
Note: if in a game there is no saddle point the players will resort to what is
known as mixed strategies.
b) Mixed Strategies
Example
Find the optimum strategies and the value of the game from the following pay
o昀昀 matrix concerning two person game
Player Y
1 4
Player X
5 3
In this game there is no saddle point.
Let Q be the proportion of time player X spends playing his 1 st row and 1-Q be
the proportion of time player X spends playing his 2nd row;
Similarly
126
Let R be the proportion of time player Y spends playing his 1 st column and 1-R
be the proportion of time player Y spends playing his second row
The following matrix shows this strategy
Player Y
R 1 R
Q 1 4
Player X
1 Q 5 3
X’s strategy
X will like to divide his play between his rows in such a way that his expected
winning or loses when Y plays the 1st column will be equal to his expected
winning or losses when y plays the second column
Column 1
Points Proportion played Expected winnings
1 Q Q
5 1-Q 5(1-Q)
127
Player Y
1 4
Player X
5 3
Step I
Subtract the smaller pay o昀昀 in each row from the larger one and smaller pay
o昀昀 in each column from the larger one
1 4 4 -1 3
5 3 5 - 3 2
5 1 4 4 3 1
Step II
Interchange each of these pairs of subtracted numbers found in step I
1 4 2
5 3 3
1 4
Thus player X plays his two rows in the ratio 2: 3
And player Y plays his columns in the ratio 1:4
This is the same result as calculated before
128
DOMINANCE
Dominated strategy is useful for reducing the size of the payo昀昀 table.
Rule of dominance
i. If all the elements in a column are greater than or equal to the
corresponding elements in another column, then the column is
dominated.
ii. Similarly if all the elements in a row are less than or equal to the
corresponding elements in another row, then the row is dominated.
Dominated rows and columns may be deleted which reduces the size of the
game to a 2 by 2 game.
N.B. Always look for dominance then saddle points 昀椀rst when solving
a game problem.
Example:
Determine the optimum strategies and the value of the game from the
following 2 x m pay o昀昀 matrix game for X and Y
Y
6 3 1 0 3
X
3 2 4 2 1
In this columns I, II, and IV are dominated by columns III and V hence Y will not
play these columns.
So the game is reduced to 2×2 matrix, hence this game can be solved using
methods already discussed.
Y
1 3
X
4 1
129