Statistics Notes 2022

lOMoARcPSD|36154693
Statistics Notes 2022
Introduction to probability and statistics (University of Nairobi)
Scan to open on Studocu
Studocu is not sponsored or endorsed by any college or university

Downloaded by RONILO CADIGAL (ronilo.cadigal@deped.gov.ph)
lOMoARcPSD|36154693
STATISTICS NOTES
1.0 Purpose of the Course

The main purpose of the course is to prepare students to describe, gather and analyze
business data, and to use statistical and management science tools to make effective business
decisions in operations, finance, marketing, management, and new product development
2.0 Expected Learning Outcomes of the Course

At the end of the course students should be able to:
1. Enumerate the major concepts in statistics;
2. Analyse the fundamental decision making models;
3. Review the procedures used in sampling and hypothesis testing;
4. Apply correlation and regression models for decision making; and,
5. Determine the use of time series analysis in decision analysis.
3.0 Course Content
Introduction to statistics. Time Series Analysis and Index Numbers. Concepts of applied
probability. Fundamentals for decision making models. Risk based decision making, Correlation
and Regression, Sampling, Inference and Hypothesis Testing.
4.0 Course outline

4.1 INTRODUCTION (week 1)
3.1.1 Definition of Statistics,
3.1.2 Types of Statistics
3.1.3 Population, Sample and Variables
3.1.4 Functions of Statistics
3.1.5 Limitations of Decision-making
3.1.6 Levels of Measurement
4.2 DATA COLLECTION, ORGANIZATION AND PRESENTATION
4.2.1 Introduction
4.2.2 Organization and Presentation of Data
4.2.3 Graphical Representation of a Frequency Distribution
4.3 MEASURES OF CENTRAL TENDENCY

4.3.1 Introduction
4.3.2 Characteristics of a good average
4.3.3 Types of averages
4.3.4 Factors to consider in the choice of an average
4.3.5 Exercise
4.4 MEASURES OF DISPERSION

4.4.1 Introduction

lOMoARcPSD|36154693
4.4.2 Significance of measuring dispersion

4.4.3 Properties of a good measure of dispersion
4.4.4 Measures of dispersion
4.4.5 Skewness and Kurtosis
4.4.6 Exercises
4.5 PROBABILITY DISTRIBUTIONS

4.5.1 Introduction
4.5.2 Probability distribution function of a discrete random variable
4.5.3 Discrete Probability Distributions
4.5.4 Continuous Probability Distributions
4.5.5 Exercises
4.6 SAMPLING AND SAMPLING DISTRIBUTIONS

4.6.1 Introduction
4.6.2 Types of sampling Designs
4.6.3 Reasons for Sampling
4.6.4 Bias and Error in sampling
4.6.5 Sampling Distributions
4.7 ESTIMATION THEORY

4.7.1 Introduction
4.7.2 Point Estimation
4.7.3 Properties for a good estimator
4.7.4 Confidence Intervals for Population Mean when the Population Variance
is Known.
4.7.5 How Large a Sample?
4.7.6 Confidence Interval for Population Mean When the Population Variance is
Unknown
4.8 HYPOTHESIS TESTING

4.8.1 Introduction
4.8.2 The Null and Alternative Hypothesis
4.8.3 Type I and Type II errors
4.8.4 One-Tailed and Two-Tailed tests
4.8.5 Steps to be followed in testing a hypothesis
4.8.6 Test of Hypothesis (Single Population)
4.8.7 Test of Hypothesis (Two Populations)
4.9 CHI-SQUARE TESTS

4.9.1 Introduction
4.9.2 Test of Goodness of Fit
4.9.3 Test of Independence
4.9.4 Test of Homogeneity
4.9.5 Exercises

lOMoARcPSD|36154693
4.10 ANALYSIS OF VARIANCE

4.10.1 Introduction
4.10.2 Assumptions of Analysis of Variance
4.10.3 Computation of Analysis of Variance
4.10.4 One – Way Classification
4.10.5 Analysis of Variance Table
4.10.6 Exercises
4.11 REGRESSION AND CORRELATION ANALYSIS

4.11.1 Introduction
4.11.2 Correlation Analysis
4.11.3 Types of Correlation
4.11.4 Coefficient of Correlation
4.11.5 Methods of Studying Correlation
4.11.6 Test of Hypothesis Regarding Population Correlation Coefficient
4.11.7 Regression Analysis
4.11.8 Exercises
4.12 ADDITIONAL TOPICS
4.12.1 Linear programming
4.12.2 Introduction
4.12.3 Assumptions of linear programming
4.12.4 Methods of Solving Linear Programming Problems
4.12.5 Duality
4.12.6 Sensitivity Analysis
4.12.7 Exercises
4.12.8 Index Numbers
4.12.9 Introduction
4.12.10 Limitations of Index Numbers
4.12.11 Price index number
4.12.12 Decision Theory
4.12.13 Game Theory
5.0 Methods of Delivery

5.1 Lectures
5.2 Assignments readings
5.3 Discussions led by students given their experience in the industry
5.4 Case analysis and group discussions
5.5 Tutorials
5.0 Instructional Material and/ or Equipment

Overhead projector and LCD, whiteboard, Audio-visuals, computers,
pens and smart boards
6.0 Course Assessment
Continuous Assessments Tests x 2 20%

lOMoARcPSD|36154693
Term Paper/Assignments 15%

Class Presentation/Participation 05%
Final Examination 60%
Total 100%
7.0 Core Reading Materials for the Course

7.1 Sharma, J.K. (2008), Business Statistics, second edition, Pearson Publishers
7.2 Gupta S.C (2004), Fundamentals of Statistics, Himalaya Publishing
8.0 Recommended Reference Materials

8.1 Gupta, S.P. (2002), Statistical Methods, Sultan Chand and Sons
8.2 Aczel, B. & Sounderpandian, M. (2006), Complete Business Statistics, McGraw
Hill
8.3 Anderson, Sweeney and William, (2007), Statistics for Business Economics, 9th
edition, Thomson Publishing
8.4 Wisniewski, M. (2010), Quantitative Methods for Decision Makers, Prentice Hall;
5 edition
8.5 Curwin, J. (2001), Quantitative Methods for Business Decisions, Cengage
Learning Business Press; 5 edition
8.6 Waters, D. (2007), Quantitative Methods for Business, Prentice Hall; 4 edition
8.7 Terry Lucey (2007), Quantitative Methods, BookPower, Sixth Edition

lOMoARcPSD|36154693
LESSON ONE: INTRODUCTION
1.1 Definition of Statistics

Statistics is a science that deals with the methods of collecting, organizing, presenting, analyzing
and interpretation of numerical data to assist in making more effective decisions.
According to the above definition, there are five stages in a statistical investigation.
i) Collection
 It is the first step in a statistical investigation
 Data form the foundation of any statistical analysis and therefore should be collected
with utmost care.
 If data are faulty, the conclusions drawn can never be reliable.
ii) Organization
 The large mass of figures that are collected from surveys usually need organization.
 The first step in organizing a mass of data is editing so that omissions, inconsistencies
and irrelevant answers may be corrected.
 The next step is to classify some common characteristics possessed by the items
constituting the data.
 The last step in organization is tabulation. The objective of tabulation is to arrange the
data in columns and rows so that there is clarity
iii) Presentation
 After the data have been collected and organized they are ready for presentation.
 Data presented in an orderly manner facilitates statistical analysis
iv) Analysis
 It’s a major step in any statistical investigation.
 Methods of analysis are numerous ranging from simple observation of data to highly
mathematical techniques.
 We consider only the most common methods of statistical analysis
v) Interpretation
 It entails drawing conclusions from the data collected and analyzed.
 Correct interpretation will lead to valid conclusions of the study and thus can aid in
decision making.

lOMoARcPSD|36154693
1.2 Types of Statistics

a) Descriptive statistics: It deals with processing data without attempting to draw any
inferences from it. It refers to the presentation of data in the form of tables and graphs and to
the description of some of its features such as averages.
b) Inferential/Inductive statistics: Refers to methods of using a sample to obtain information
about a population i.e. making conclusions about the population based on information from
the sample.
1.3 Population, Sample and Variables

 Population: is the totality of all the items or individuals whose characteristics we wish to
study. Examples of a population are all the eligible voters in an election.
 Sample: is a subset or section of the population that is used to represent the whole
population.
 Parameter: is any quantitative measure that describes a characteristic of a population e.g.
population mean (µ) or population variance (  ).

2
 Statistic: is a quantitative measure that describes a characteristic of a sample e.g. sample

2
mean ( x ) or sample variance ( s ).
E.G. The mean height of the people in Kenya is a parameter, whereas the mean height of a
sample of 500 people is a statistic.
 Variable: is the characteristic that is being studied. It is represented by symbols X, Y, or Z.
Height of people, grades in a test etc are examples of variables.
 There are two kinds of variables:
a) Qualitative variables: Are variables that are non-numeric i.e. attributes e.g. Gender,
Religion, Color, State of birth etc.
b) Quantitative variables: are numeric variables e.g. the height of an individual when
expressed in feet or inches, etc. Quantitative variables are either discrete or continuous.
i) Discrete variables: Are variables, which can only assume certain values i.e. whole
numbers. Are always counted. E.G: number of children in a family, the number of
defective bulbs, etc.

lOMoARcPSD|36154693
ii) Continuous variables: Are variables, which can assume any value within a specific
range. Are always measured e.g. height, temperature, weight, radius etc.
1.4 Functions of Statistics

i) Definiteness i.e. statistics presents facts in a definite form:
 Statements or facts conveyed in exact quantitative terms are more convincing than vague
utterances.
 Statements like “the population of Kenya is growing at a very fast rate”, or “the prices of
various commodities are rising”, may not be very convincing as they don’t specify the
numerical dimensions involved.
ii) Condensation i.e. statistics simplifies a mass of figures
 Statistics helps in condensing a mass of figures into a few significant values e.g. mean,
mode, median, standard deviation, etc.
iii) Comparison:
 Statistics facilitates comparison.
 Unless figures are compared with others of the same kind they are foten devoid of any
meaning.
iv) It helps in formulation and testing hypothesis:
 Statistical methods are useful in formulating and testing hypothesis and to develop new
theories.
v) Prediction and formulation of policies:
 Statistical methods provide useful means of forecasting future events.
 Knowledge of future trends is very helpful in framing suitable policies and plans.
1.5 Applications of Statistical Knowledge in Business Management

i) Marketing
 Statistical analysis are frequently used in providing information for marking decisions
 E.G: Analysis of data on population purchasing power, habits of people, completion,
transportation costs etc should precede any attempt to establish a new market.
ii) Production
 The decision about what to produce, how to produce, when to produce, for whom to
produce is based largely on facts analyzed statistically.
7

lOMoARcPSD|36154693
iii) Finance
 The finance mangers in discharging their finance functions efficiently depend heavily on
statistical analysis of facts and figures.
 Financial forecasting, break even analysis and investment decisions under uncertainty are
part of their activities.
 The area of security analysis is also highly quantitative.
iv) Banking
 Banks need to gather and analyze information on the general economic consideration.
 Banks’ reserves are highly influenced by money markets which are not only local but also
international.
 The credit department performs statistical analysis to determine how much credit to
extend to various customers.
v) Purchase
 The purchasing department makes use of statistical data to frame suitable purchase
policies such as where to buy, how to buy, at what time to buy and at what price to buy.
vi) Accounting
 The auditing function makes frequent applications of statistical sampling and estimation
procedures.
 The account collects data on historical costs in the course of auditing a company’s
financial records and may use regression analysis to analyze the cost.
vii) Personnel
 The personnel department frames policies based on facts.
 It makes statistical studies of wage rates, incentive plans, cost of living, labor turnover
rates, employment trends, accident rates employment grievances, performance appraisal,
training programs etc.
 Such studies help the personnel department in the process of manpower planning.
viii) Investment
 Statistics greatly assists investors in making clear judgments in his investment decisions
in selecting securities which are safe and which have the best prospects of yielding a
good income.

lOMoARcPSD|36154693
1.6 Limitations of Decision-making

i) Statistics does not deal with isolated measurement
 Data are statistical when they relate to measurement of masses, not statistical when they
relate to an individual item or event as a separate entity.
 E.G: The wage earned by an individual worker at any one time taken by itself is not
statistical, but taken as a part of a mass of information, it may be a statistical data.
ii) Statistics deals only with quantitative characteristics
 Statements are numerical statements of facts. Thus qualitative characteristics like
honesty, efficiency, intelligence etc cannot be studied directly.
iii) Statistical results are true only on an average
 The conclusions obtained statistically are not universally true; they are true only under
certain conditions
iv) Statistics is only a means
 Statistical methods furnish only one method of studying a problem.
 They may not provide the best solution under all circumstances.
 Very often it may be necessary to supplement the conclusions arrived at by the help of
statistical with other methods
 In deciding a course of action, it may be necessary to take into account other factors like
the country’s culture, religion, philosophy, personal, political or other non-quantitative
considerations.
 Excessive dependence on statistics may lead to fallacious conclusions.
v) Statistics can be misused
 Statistics can be misused i.e. wrong interpretation. It requires experience and skill to draw
sensible conclusions from the data.
 E.G: If statistical conclusions are based on incomplete information or there is bias in
sampling.
1.7 Levels of Measurement

There are four levels of measurement; nominal, ordinal, interval and ratio.
a) Nominal scale
 It’s the lowest level of measurement

lOMoARcPSD|36154693
 It merely groups observations into categories based on common characteristics eg gender,

race, marital status, religion etc.
 Numbers are often assigned to the various categories for the purpose of identification. E.G:
for the variable marital status we can assign 1 = married, 2 = single, 3 = divorced, 4 =
windowed, 5 = separated.
 The numbers assigned to the various categories do no represent quantity or order and
therefore performing mathematical operations on these numbers would yield meaningless
values.
 The counting of members in each group is the only possible arithmetic operation when a
nominal scale is employed. Accordingly we are restricted to use the mode as the measure of
central tendency. There is mo measure of dispersion used for nominal scales.
 Chi-square test is the most common test of statistical significance.
b) Ordinal scale
 Items are not only grouped into categories but they are also ranked into some order.
Therefore in an ordinal scale, numerals are used to represent relative position or order among
the values of the variables.
 The use of ordinal scale implies a statement of ‘greater than’ or ‘less than’ (equality is also
acceptable) without being able to state how much greater or less. The real difference between
ranks 1 and 2 may be more or less than the difference between ranks 5 and 6.
 Since the numbers of this scale have only a rank meaning, the appropriate measure of central
tendency is the median. A percentile or quartile measure is used for measuring dispersion.
 Correlations are restricted to various rank order methods. Measures of statistical significance
are restricted to non-parametric methods.
c) Interval scale
 Numerals assigned to each measure are ranked in order and the intervals between them are
equal. Hence numerals used represent quantity and some mathematical operations would
yield meaningful values.
 However, the zero point is not meaningful, i.e. interval scales have an arbitrary zero and it is
not possible to determine for them what may be called an absolute zero or the unique origin.
 The primary limitation of the interval scale is the lack of a true zero; it does not have the
capacity to measure the complete absence of a trait or characteristic.
10

lOMoARcPSD|36154693
 The Fahrenheit scale is an example of an interval scale. One can say that an increase in
temperature from 30o to 40o involves the same increase in temperature as an increase from
60o to 70o, but one cannot say that the temperature of 60 o is twice as warm as the temperature
of 30o because both numbers are dependent on the fact that then zero on the scale is set
arbitrarily at the temperature of the freezing point of water. The ratio of the two temperatures,
30o and 60o, means nothing because zero is an arbitrary point.
 Intervals scales provide more powerful measurement than ordinal scales since the interval
scale incorporates the concept of equality of interval.
 As such more powerful statistical measures can be used with interval scales. Mean is the
appropriate measure of central tendency, while standard deviation is the most widely used
measure of dispersion.
 Product moment correlation techniques are appropriate and the generally used tests for
statistical significance are the‘t’ test and ‘F’ test.
d) Ratio scale
 Ratio scales have an absolute or true zero of measurement. E.G: the zero point on a
centimeter scale indicates the complete absence of length or height. But an absolute zero of
temperature is theoretically unattainable and it remains a concept existing only in the
scientist’s mind.
 Ratio scale represents the actual amounts of variables. Measures of physical dimensions such
as weight, height, distance, et. Are examples.
 All statistical techniques are usable with ratio scale and all mathematical operations
(including multiplication and division) can be used
 Geometric and harmonic means can be used as measures of central tendency and coefficients
of variation may also be calculated.
LESSON TWO: DATA COLLECTION, ORGANIZATION AND

PRESENTATION
2.1 Introduction
 Data refers to any information or facts collected for reference or analysis.
 There are two types of data: secondary data and primary data.
Secondary Data
11

lOMoARcPSD|36154693
 Its data that been gathered earlier for some other purpose. In contrast, the data that are
collected first hand by someone specifically for the purpose of facilitating the study are
known as primary data.
 E.G: the demographic statistics collected every ten years are the primary data with the
registrar of persons but the same statistics used by anyone else would be secondary data
with that individual.
Advantages of secondary data
i) It is far more economical as the cost of collecting original data is saved.
ii) Use of secondary data is time saving.
Disadvantages of secondary data
i) One does not always know how accurate the secondary data are.
ii) The secondary data might be out dated.
 Before using secondary data it is important to consider the following:

i) Whether the data are suitable for the purpose of investigation
 The suitability of the data can be judged in the light of the nature and scope of
investigation.
 E.G: if the object of inquiry is to study the wage levels including allowances of
workers and the data relate to basic wages alone, such data would not be
suitable for the immediate purpose
ii) Whether the data are adequate for the purpose of investigation
 Adequacy of the data is to be judged in the light of the requirements of the
study and the geographical area covered by the available.
 E.G: if the object is to study wage rates of the workers in the sugar industry in
Kenya and if the available data cover only one region, it would not serve the
purpose.
 The question of adequacy may also be considered in the in the light of the time
period for which the data are available
 E.G: For studying trend of prices data for the last 8-10 years may be required
but if from the sources known the data available is for the last 5-6 years only,
this would not serve the object.
12

lOMoARcPSD|36154693
iii) Whether the data are reliable

 Reliability of the data has to do with the data collection procedures.
 To ensure reliability of the data one may need to determine the context in
which the data were collected, the procedure followed and the level of
accuracy exercised in the collection.
 Determination of the reliability of secondary data is perhaps the most
important and at the same time most difficult job.
Primary Data
 Primary data are measurements observed and recorded as part of an original study.
 The work of collecting primary data is usually limited by time, money and manpower
available.
 When the data to be collected are very large in volume, it is possible to draw reasonably
accurate conclusions from a sample.
 There are two methods of obtaining primary data:
a) Questioning
b) Observation
 Questions may be asked in person or in writing. A formal list of such questions is called a
questionnaire.
 When the data are collected by observation, the investigator asks no questions. Instead, he
observes and records the desired information.
 Of the two methods named above, the questionnaire method is more widely used for
calculating business data. Three different ways of communicating with questionnaires are
available
i) Personal interview
ii) Mail
iii) Telephone interview
 Personal interviews are those in which an interviewer obtains information from respondents
in face-to-face meetings. The information obtained by this method is likely to be more
accurate because the interviewer can clear-up doubts, can cross-examine the informants and
thereby obtain correct information.
13

lOMoARcPSD|36154693
 In mail surveys, questionnaires are mailed to respondents who are supposed to fill them and
return. They are appropriate where the field of investigation is very vast and the informants
are spread over a wide geographical area.
 Telephone interviews are similar to personal interviews except that communication between
interviewer and respondents is on telephone instead of direct personal contact.
2.3 Organization and Presentation of Data

 Data collected in an investigation and not organized systematically is called raw data. The
arrangement of this data in ascending or descending order of magnitude is called an array.
 The difference between the largest and the smallest value is called the range.
 E.G: The table below records the heights, in inches, of eight students. Column I represents
the raw data and column II illustrates the arrangement in an array.
Raw Data Array

66 65
68 66
72 66
65 68
66 68
73 69
68 72
69 73
 The largest value is 73 and the smallest is 65. Hence, the range is 73 – 65 = 8 inches.
Frequency Distribution
Ungrouped data
 In forming an array a value is repeated as many times as it appears. The number of times a
value appears in the listing is referred to as its frequency. In giving the frequency of a value,
we answer the question, “ How frequently does the value occur in the listing?”
14

lOMoARcPSD|36154693
 When the data is arranged in tabular form by giving its frequencies, the table is called a
frequency table. The arrangement itself is called a frequency distribution.
 Quite often it is useful to give relative frequencies instead of actual frequencies. The relative
frequency of any observation is obtained by dividing the actual frequency of the observation
by the total frequency (sum of all frequencies).
 If the relative frequencies are multiplied by 100 and expressed as a percentage, we get the
percentage frequency distribution.
 An advantage of expressing frequencies as percentages is that one can then compare
frequency distributions of two sets of data.
Example:
The following data were obtained when a die was tossed 30 times. Construct a frequency
table.
1 2 4 2 2 6 3 5 6 3
3 1 3 1 3 4 5 3 5 3
5 1 6 3 1 2 4 2 4 4
Grouped Data
 When dealing with a huge mass of data and when the observed values consist of too many
distinct values, it is preferable to divide the entire range of values and group the data into
classes.
 E.G: If we are interested in the distribution of ages of people, we could form the classes
0 – 19, 20 – 39, 40 – 59, 60 – 79 and 80 – 99. A class such as 40 – 59 represents all the
people with ages between 40 and 59 years inclusive.
 When data are arranged in this way, they are called grouped data. The number of individuals
in a class is called the class frequency.
 The following set of steps are suggested to form a frequency distribution from the raw data
i) Range
Scan through the raw data and find the smallest and the largest value. The largest
value minus the smallest value gives the range.
ii) Number of classes
Decide on a suitable number of classes. This could be anywhere from six to twenty.
15

lOMoARcPSD|36154693
iii) Class size

Divide the range by the number of classes. Round this figure to a convenient value to
obtain the class size and form the classes.
iv) Frequency
Find the number of observations in each class.
Example
The following data gives the amounts (in dollars) spent on groceries by 40 housewives during a
week.
22 12 9 8 33 32 30 33 8 11
21 16 12 15 37 30 16 22 12 24
18 25 37 16 25 28 25 18 9 28
25 28 26 15 12 35 38 16 24 31
Construct a frequency distribution using seven classes.
Class Intervals, Class Marks and Class Boundaries

 The blocks 10 – 20, 20 – 30, 30 – 40, etc are called class intervals. The lower ends of the
class intervals are called lower limits and their upper ends are called upper limits.
 The number of values specified in a given interval is called its length or width or
magnitude.
E.G: The class 1 – 3 has values 1, 2, 3 thus its length is 3.
The class 5 – 9 has values 5, 6, 7, 8, 9; the length or magnitude is 5
 There are two types of classes
i) Inclusive type: These are of the type 5 – 9, 10 – 14, 15 – 19, … where both the
upper and lower class limits are included in a given class.
ii) Exclusive type: These are of the type 5 – 10, 10 – 15, 15 – 20, … where the upper
class limit of a given class is the lower class limit of the succeeding class.
The class 5 – 10 has values 5, 6, 7, 8, 9 and the class 10 – 15 has 10, 11, 12, 13, 14.
NB: The conversion of inclusive type of classes to exclusive type is useful in calculating
certain measures such as mode and median.
16

lOMoARcPSD|36154693
 A point that represents the halfway or dividing point between successful classes is called a
class boundary. If d is the difference between the lower class limit of a given class and the
upper class limit of the succeeding class, then
1
d
Upper Class Boundary (UCB) = Lower Class Limit (UCL) + 2
1
d
Lower Class Boundary (LCB) = Upper Class Limit (LCL) - 2
 The class mark is defined as the mid point of a class interval. It is computed by adding the
lower and upper class limits of a class and then dividing by 2.
1
 UCB  LCB 
Mid point = 2
1
 UCB  LCB 
= 2
Example
Class L.C.L U.C.L L.B U.B Class mark
(Midpoint)
10 – 19 10 19 9.5 19.5 14.5
20 – 29 20 29 19.5 29.5 24.5
30 – 39 30 39 29.5 39.5 34.5
40 – 49 30 49 39.5 49.5 44.5
50 – 59 50 59 49.5 59.5 54.5
NB: The upper boundary of one class is the lower boundary of the next.
Cumulative Frequency Distribution:

 If a frequency distribution is arranged in the “less than” form, it is called a cumulative
frequency distribution which presents the accumulated.
 When the data is not grouped, a cumulative frequency distribution will show the number of
items less than or equal to a given value.
Example
The data below gives the weights of 30 people. Find the cumulative frequency distribution.
17

lOMoARcPSD|36154693
Weight Frequency Cumulative frequency (c.f)

140 3 3
150 5 8
160 6 14
170 7 21
180 6 27
190 3 30
 When the data is grouped, the cumulative frequency distribution gives the total frequency of
all the values less than the upper boundary of a given class.
Example
Find the cumulative frequency distribution for the grouped data given below:
Class Frequency Cumulative frequency (cf)
5 – 19 4 4
20 – 34 12 16
35 – 49 15 31
50 – 64 16 47
65 – 79 22 69
80 – 94 11 80
2.4 Graphical Representation of a Frequency Distribution

The following types of graphical representation are usually used for frequency distribution.
a) Histogram: It is a graph in which classes boundaries are marked on the horizontal axis and
and the class frequencies on the vertical axis. The class frequencies are represented by the
heights of the bars and the bars are drawn adjacent to each other.
b) Frequency polygons and Frequency Curve: A frequency polygon is a line graph where we
plot the class marks or midpoints along the horizontal axis and the corresponding frequencies
along the vertical axis. The class midpoints are connected with a line segment.
If the classes are very many and the class widths are so small that the midpoints are close
together, the polygon can be formed by free hand to give a smooth curve known as a
frequency curve.
18

lOMoARcPSD|36154693
c) Cumulative Frequency Curve or the Ogive. An ogive is a line graph obtained by

representing the upper class boundaries along the horizontal axis and the corresponding
cumulative frequency along the vertical axis.
2.5 Exercise
A random sample of 50 auto drivers insured with a company and having similar auto
insurance policies was selected. The following data shows monthly auto insurance premium
(in Kshs.000) paid by them.
54 40 45 20 60 30 35 40 55 70 20 15
45 60 45 25 15 30 25 18 35 25 45 56
59 25 27 39 50 56 20 25 30 30 41 25
56 48 45 25 35 60 55 48 38 34 60 60
60 64
i) Group the above data starting with the class 10 -20 exclusive
ii) Represent the data using a Histogram and an Ogive.
LESSON THREE: MEASURES OF CENTRAL TENDENCY
3.1 Introduction
 A measure of central tendency, also called measures of location or averages, is a single
value within the range of data that is used to represent all the values in the series.
3.2 Characteristics of a good average

Should be-
 Rigidly defined
 Based on all values
 Easily understood and calculated
 Least affected by the fluctuations of sampling
 Capable of further algebraic treatment
 Least affected by extreme values
3.3 Types of averages

The measures of central tendency that are generally used in business are:
a) Arithmetic mean
19

lOMoARcPSD|36154693
b) Median
c) Mode
d) Geometric mean
e) Harmonic mean
3.3.1 The Arithmetic Mean

It is obtained by summing up the values of all the items of a series and dividing this sum by the
number of items.
Computation of the arithmetic mean for
Individual series :-
Direct method
where = arithmetic mean , = number of items
Indirect method
where = provisional mean, = Deviations from P.M, = the sum of deviations from P.M
Grouped series
Direct method
Where = frequencies, = number of items
Indirect method
NB: For a grouped frequency distribution the value of X is taken as the mid point of each class.
Examples
1. The monthly sales of ABC stores for the period of 6 months were as follows:
37,000, 48,000, 84,000, 73,000, 35,000, 53,000.
2. Calculate the mean of the following distribution

Number of vehicles serviced (x) 0 1 2 3 4 5
Number of days (f) 2 5 11 4 4 1
20

lOMoARcPSD|36154693
The following tables gives the marks of 58 students in statistics.

Marks 0-10 10-20 20-30 30-40 40-50 50-60 60-70
Number of students 4 8 11 15 12 6 2
Calculate the mean mark.
Advantages of the arithmetic mean

 Can be easily understood
 Takes into account all the items of the series
 It is not necessary to arrange the data before calculating the average
 It is capable of algebraic treatment
 It is a good method of comparison
 It is not indefinite
 It is used frequently.
Disadvantages of the arithmetic mean

 It is affected by extreme values to a great extent
 It may be a figure that does not exist in a series
 It cannot be calculated if all the items of a series are not known
 It cannot be used incase of qualitative data
Properties of Arithmetic Mean

1. The product of the arithmetic mean and the number of items is equal to the sum of all the
given values.
2. The algebraic sum of the deviations of the values from the arithmetic mean is equal to zero.
As such the mean may be characterized as a point of balance.
3. The sum of the squares of deviations from arithmetic mean is the least.
4. As the arithmetic mean is based on all the items in a series, a change in the value of any
item will lead to a change in the value of the arithmetic mean.
5. If we have the arithmetic means and number of observations of two or more than two groups,
we can compute combined mean of these groups using this formula:
21

lOMoARcPSD|36154693
N1 X 1  N 2 X 2
X 12 
N1  N 2
Examples
1. There are two branches of a company employing 100 and 80 employees respectively. If
arithmetic means of the monthly salaries paid by the two branches are $4570 and $6750
respectively. Find the arithmetic mean of the salaries of the employees of the company put
together.
3.3.2 The Median

 It is the middle value when data has been arranged increasing or decreasing magnitude.
Computation of the Median for Ungrouped data

 If the number of observations is odd, the median is the middle value after the observations
have been arranged in some order
 If the number of observations is even, the median is the arithmetic mean of the two middle
observations after the data has been arranged in some order
Computation of the median in discrete series with Frequencies

Steps
1. Construct the less than cumulative frequency distribution
2 , where N  f
N
2. Find
N
3. Check the cumulative frequency just greater than 2
4. The corresponding value of the variable is the median
Computation of the Median in Grouped Data.

There are two approaches
1. Graphical method - Using the cumulative frequency curve (Ogive curve)
2. Interpolation formula
22

lOMoARcPSD|36154693
Interpolation Formula
Steps
1. Construct the less than cumulative frequency distribution
2 , where N  f
N
2. Find
N
3. Check the cumulative frequency just greater than 2
4. The corresponding class contains the median and is called the median class.
The median has to be interpolated in the class interval containing the median using the
formula:-
hN 
median L    C
f  2 
where = Lower class boundary of the median class
h= Length of the classes
= Frequency of the median class
N = Total frequency
C = cumulative frequency of the class preceding the median class.
Examples
1. Find the median of the data below:
a) 5, 5, 4, 7, 0, 7, 8
b) 20, 15, 30, 45, 60, 10
2. Determine the median of the data below.
Grade A B C D E
No of students (f) 10 15 67 50 21
3. Determine the median for the grouped data below.

Marks 20 - 29 30 - 39 40 - 49 50 - 59 60 - 69 70 – 79 80-89
No of students 2 7 15 30 20 4 1
Properties of the Median

 It is a positional average and is influenced by the position of the items in the series and not
by the size of items
23

lOMoARcPSD|36154693
 The sum of the absolute values of deviations is least.
Advantages of the Median

 It is easy to calculate
 It is simple and is understood easily
 It is less affected by the value of extreme items
 It can be calculated by inspection in some cases
 It is useful in the study of phenomenon which are of qualitative nature
Disadvantages of the Median
 It is not a suitable representative of a series in most cases
 It is not suitable for further algebraic treatment
 It is not used frequently like arithmetic mean
Quartiles, deciles and percentiles

 Quartiles are the values of the items that divide the series into four equal parts.
 Deciles divide the series into 10 equal parts.
 Percentiles divide the series into 100 equal parts.
 The 2nd quartile, 5th decile and 50th percentile are equal to the median.
Computation of the Quartiles

h N 
Q1 LQ1    C
fQ1  4 
First Quartile:
h  2N 
Q2 LQ2    C
f Q2  4 
Second Quartile:
h  3N 
Q3 LQ3    C
fQ3  4 
Third Quartile:
In general, the three quartiles can be computed for grouped data by the formula
h  iN 
Qi LQi    C
f Qi  4 
24

lOMoARcPSD|36154693
LQi
where = Lower class boundary of the ith quartile class
h = Length of the classes
f Qi
= Frequency of the ith quartile class
N = Total frequency
C = Cumulative frequency of the class preceding the ith quartile class.
Computation of the Deciles
h  iN 
Di LDi    C
f Di  10 
LDi
where = Lower class boundary of the ith decile class
f Di
= Frequency of the ith decile class
N = Total frequency
C = Cumulative frequency of the class preceding the ith decile class.
Computation of the Percentiles
h  iN 
Pi LPi    C
f Pi  100 
LPi
where = Lower class boundary of the ith percentile class
f Pi
= Frequency of the ith percentile class
N = Total frequency
C = Cumulative frequency of the class preceding the ith percentile class.
NB: Analogous to the graphical method of estimating the median, the quartiles, deciles and
percentiles of a grouped frequency distribution can be estimated using the cumulative
frequency curve (ogive curve).
Examples
1. Find the 1st , 2nd and 3rd quartiles for the following data
13, 9, 18, 15, 14, 21, 7, 10, 11, 20, 5, 18, 25, 16, 17
25

lOMoARcPSD|36154693
2. Given below is the number of families in a locality according to their monthly expenditure
Monthly No. of
expenditure families
140 - 150 17
150 - 160 29
160 - 170 42
170 - 180 72
180 - 190 84
190 – 200 107
200 – 210 49
210 – 220 34
220 – 230 31
230 – 240 16
240 – 250 12
Calculate:
i) All the quartiles
ii)7th decile
iii) 90th percentile
3.3.3 The Mode

 The mode is the value, which occurs most often in the data. A distribution with one mode is
called unimodal, with two modes bimodal and with many modes, multimodal distribution.
 There are two methods that can be used to estimate the mode of grouped data .
a) Graphically, using a histogram
b) Using an interpolation formula
Graphical determination of mode

Procedure
1. Construct a histogram for the data
2. Locate the highest cell in the histogram, join the upper class boundary of the cell with the
upper boundary of the preceding cell; then join the lower class boundary of the highest cell
with the lower class boundary of the succeeding cell, locate the intersection,
3. Draw a vertical line from the intersection to the horizontal.
4. The value of the vertical line on the horizontal axis is the mode.
26

lOMoARcPSD|36154693
Interpolation Formula
h  f m  f1 
Mode L 
2 f m  f1  f 2
 D1 
L  i
 D1  D2 
=
Where Lower class boundary of the modal class
h= Length of the classes
f m  Frequency of the modal class
Frequency of the class preceding the modal class

Frequency of the class succeeding the modal class
D1  f1  f 0 , D2  f1  f 2
Examples
1. Find the mode for the data below
a) 1, 2, 3, 4, 5, 6; Solution: The mode does not exist
b) 7, 8, 3, 8, 6, 10, 8 Solution: Mode = 8; This is a uni-modal distribution
c) 29,30,60,13,30,7,2,7 Solution: Modes are 30 and 7; This is a bi-modal distribution
d)
X 4 5 6 7 8 9 10
F 2 5 21 18 9 2 1
Solution: Mode = 6; it has the highest frequency.
2. Calculate the mode for the following data
Class (marks) No of student

0 – 10 2
10 – 20 7
20 – 30 11
30 – 40 6
40 – 50 4
Properties of the mode
27

lOMoARcPSD|36154693
 It represents the most typical value of the distribution and it should coincide with existing
items
 It is not affected by the presence of extremely large or small items
Advantages of the Mode
 It is easy to understand
 Extreme items do not affect its value
 It possesses the merit of simplicity
Disadvantages of the Mode

 It is often not clearly defined
 Exact location is often uncertain
 It is unsuitable for further algebraic treatment
 It does not take into account extreme values.
Relationship between the mean, median and mode

There usually exists a relationship among the mean, median and mode for moderately
asymmetrical distributions.
 If the distribution is symmetrical, the mean median and mode will have identical values.
 If the distribution is skewed (moderately) the mean, median and mode will pull apart. If the
distribution tails off towards higher values, the mean and the median will be greater than the
mode i.e. In case, a distribution is skewed to the right, then mean> median> mode. Generally,
income distribution is skewed to the right where a large number of families have relatively
low income and a small number of families have extremely high income. In such a case, the
mean is pulled up by the extreme high incomes.
If it tails off towards lower values, the mode will be greater than either of the two measures i.e.
When a distribution is skewed to the left, then mode> median > mean. This is because here mean
is pulled down below the median by extremely low values.
In either case the median will be about one third as far away from the mean as the mode is. This
means that
Mode = mean –3 (mean – mode)
28

lOMoARcPSD|36154693
= 3(median) – 2(mean)
3.3.4 Geometric Mean

Geometric Mean (GM) is the nth root of the product of n values
For ungrouped data
G.M  n x1 x2 ... xn

1
 G.M  x1 x2 ... xn  n
1
 Log G.M   log x1  log x2  ...  log xn 
n
1
 log G.M 
n
 log xi
 G.M  Anti log

 Logx
n
Grouped data
G.M N x1f1 x2f 2 ...xnf n

1

 G.M  x x ... x
1
fn
2
f2
n
fn
 N
1
 Log G.M   f1 log x1  f 2 log x2  ...  f n log xn 
N
1
 log G.M 
N
 f log x
i i
 G.M  Anti log

 f Logxi i
N N  f
where
Examples
1. The weekly incomes (‘000) of 10 families are given below. Find the geometric mean?
50, 80, 45, 70, 15, 75, 85, 40, 36, 25
2. Calculate the geometric mean of the given data
29

lOMoARcPSD|36154693
X 15 20 25 30 35 40 45 50
F 2 22 29 24 7 8 6 2
Merits of the Geometric mean

 It takes into account all the items in the data and condenses them into one representative
value.
 It gives more weight to smaller values than to large values.
 It is amenable to algebraic manipulations
Demerits
 It is difficult to use and compute
 It is determinate for positive values and cannot be used for negative values or zero.
3.3.5 Harmonic Mean

It is the reciprocal of the arithmetic mean of the reciprocal of a series of observations.
Ungrouped data
n
n
1 1 1
H.M =
 1 x =
  ... 
x1 x2 xn
Grouped data
f f1  f 2  ...  f n
 f x 
f1 f 2 f
  ...  n
H.M = = x1 x2 xn
Examples
1. Calculate the Harmonic mean of the following data
11, 13, 15, 16, 19, 22, 13, 20
2. Calculate the Harmonic mean of the following data

X 15 20 25 30 35 40 45 50
F 2 22 29 24 7 8 6 2
30

lOMoARcPSD|36154693
Merits of the Harmonic mean

 It takes into account all the observations in the data
 It gives more weight to smaller items
 It is amenable to algebraic manipulations
 It measures the rates of change
Demerits
 It is difficult to compute when the number of items is large
 It assigns too much weight to smaller items.
3.4 Factors to consider in the choice of an average

 The purpose for which the average is being used
 The nature, characteristics and properties of the average
 The nature and characteristics of the data.
3.5 Exercise
1. What are the requirements of a good average? Compare the mean, the median and the mode
in the light of these requirements.
2. Find the mean, median and mode for the following set of data
i) 3, 5, 2, 6, 5, 9, 5, 2, 8 and 6
ii) 51.6, 48.7, 50.3, 49.5 and 48.9
3. The following data pertain to marks obtained by 120 students in their final examination in
mathematics:
Marks Number of Students
30 -39 1
40 – 49 3
50 – 59 11
60 – 69 21
70 – 79 43
80 -89 32
90 - 99 9
Total 120
31

lOMoARcPSD|36154693
Calculate the mode and the median.

4. Suppose we are given the following series:
Class 0-10 10-20 20-30 30-40 40-50 50-60 60-70
interval
Frequency 6 12 22 37 17 8 5
i) Draw the histogram and the Ogive from these data
ii) Estimate the median and the mode from the graphs in (i) above
5. The mean of marks in statistics of 100 students of a class was 72. The mean of marks of boys
was 75 while their number was 70. Find out the mean mark of girls in the class.
32

lOMoARcPSD|36154693
LESSON FOUR: MEASURES OF DISPERSION

4.1 Introduction
 Dispersion refers to the degree to which numerical data tends to spread about an average
value. It is the extent of the scatteredness of items around a measure of central tendency.
 The measures of dispersion are also referred to as measures of variation or measures of
spread.
4.2 Significance of measuring dispersion

 To determine the reliability of an average
 To serve as a basis for the control of the variability
 To compare two or more series with regard to their variability
 To facilitate the use of other statistical measures
4.3 Properties of a good measure of dispersion

It should be: -
 Simple to understand
 Easy to compute
 Rigidly defined
 Based on each and every item in the distribution
 Amenable to further algebraic calculations
 Have sampling stability
 Not be unduly affected by extreme values
NOTE:
The measures of dispersion which are expressed in terms of the original units of the observations
are termed as absolute measures. Such measures are not suitable for comparing the variability of
two distributions which are not expressed in the same units of measurements. Therefore it is
better to use relative measure of dispersion obtained as ratios or percentages and are thus pure
numbers independent of the unit of measurement.
4.4 Measures of dispersion

 Range
 Interquartile Range and Quartile Deviation
33

lOMoARcPSD|36154693
 Mean deviation
 Standard deviation / Variance
4.4.1 The Range

It is the difference between the smallest value and the largest value of a series
Example
The following are the prices of shares of a company from Monday to Saturday.
Day Monday Tuesday Wednesday Thursday Friday Saturday
Price 200 210 208 160 220 250
Calculate the range.

Solution: Range = L – S
= 250 – 160 = 90
NB:
In case of grouped frequency distribution the range is the difference between the upper class
boundary of the largest class and the lower class boundary of the smallest class.
Advantages of the Range

 It is the simplest to understand and compute
 It takes the minimum time to calculate the value of the range
Limitations
 It is not based on each and every value of the distribution
 It is subject to fluctuations of considerable magnitude from sample to sample
 It cannot be computed in case of open-ended distributions
 It does not explain or indicate anything about the character of the distribution within the
two extreme observations.
Uses of the range

 Quality control
 Fluctuations of prices
 Weather forecast
34

lOMoARcPSD|36154693
 Finding the difference between two values e.g. wages earned by different employees.
4.4.2 The Interquartile Range and Quartile Deviation

Interquartile range: it’s the difference between the third quartile and the first quartile
i.e. Interquartile range = Q3 – Q1
Quartile Deviation: also called the semi-interquartile range. It’s obtained by dividing the
interquartile range by 2.
Q3  Q1
i.e. Q.D = 2 where Q.D = Quartile Deviation
4.4.3 The Mean Deviation

It is the average amount of scatter of the items in the distribution from the mean, median or
x1 , x2, ..., xn
mode, ignoring the signs of deviation. If are n observations then the mean deviation
about the mean is calculated as;
M .D 
 x x
For ungrouped data: n
M .D 
 f x x
For grouped data: f
Examples
1. Calculate the mean deviation of the following values
3000, 4000, 4200, 4400, 4600, 4800, 5800
2. Calculate the average deviation from the mean for the following
Sales (thousands) 10 – 20 20 – 30 30 – 40 40 – 50 50 – 60
No. of days (f) 3 6 11 3 2
35

lOMoARcPSD|36154693
Merits of Mean Deviation

1. It is easy to compute and understand
2. It uses all the data
3. It is less affected by the extreme values
4. Since deviations are taken from a central value, comparison about formation of different
distributions can easily be made.
5. It shows the significance of an average in the distribution
Demerits
1. Ignores algebraic signs while taking the deviations
2. Cannot be computed for distributions with open-ended class
3. Rarely used in sociological studies
4.4.4 The Variance and Standard Deviation

 The variance of a set of observations is the average squared deviations of the data points
2
from their mean. Variance is the mean square deviation. It is denoted by s for sample data
and  for population data.

2
 Standard deviation is the square root of the variance. It is denoted by s for sample data and
 for population data.
Computing the Variance

 Variance for ungrouped data
 x  x
2
 2

 x  x
2
n , where = sum of squares of the deviations from arithmetic mean
 Variance for grouped data
 f  x x
2
 2

f
Computing the standard deviation
Standard deviation for ungrouped data
36

lOMoARcPSD|36154693
 x  x
2

n
Standard deviation for grouped data
 f  x x
2

f
NB: The computations of  can be simplified by using the following version of the formula
2
 2    x 
x2 2
For ungrouped data: n
 2 
fx 2
  x
2
For grouped data: f

Examples
1. Find the standard deviation of the wages of the following ten workers working in a factory
Worker A B C D E F G H I J
Weekly 1320 1310 1315 1322 1326 1340 1325 1321 1320 1331
Sales
2. An analysis of production rejects resulted in the following figures:

No. of rejects per 21 - 26 - 31 -35 36 - 40 41 - 45 45 - 50 51 - 55
operator 25 30
No of operators (f) 5 15 28 42 15 12 3
Calculate the mean and standard deviation
Combined standard deviation

Combined arithmetic mean for two sets of data with arithmetic means and the number of
N1 X 1  N 2 X 2
X 12 
observations is given by N1  N 2
Combined standard deviation of two series is given by
37

lOMoARcPSD|36154693
N112  N 2 2 2  N1d12  N 2 d 2 2
12 
N1  N 2
where 12 = Combined standard deviation

1 = standard deviation of the first group
 2 = standard deviation of the second group
d1  X 1  X 12 d 2  X 2  X 12
;
NB: The above formula can be extended to find out the standard deviation of three or more
groups. For example, combined standard deviation of three groups would be
N112  N 2 2 2  N 3 3  N1d12  N 2 d 2 2  N 3d 3
123 
N1  N 2  N3
d1  X 1  X 123 d 2  X 2  X 123 d3  X 3  X 123

Where ; ;
Example
1. The number of workers employed, the mean wage per week and the standard deviation in each
branch of a company are given below. Calculate the mean wages and standard deviation of all
workers taken together for the factory.
Branch No. of workers Weekly mean wage Standard deviation

A 50 1413 60
B 60 1420 70
C 90 1415 80
Advantages of the standard deviation

 It is rigidly defined and is based on all the observations of the series
 It is applied or used in other statistical techniques like correlation and regression analysis
and sampling theory.
38

lOMoARcPSD|36154693
 It is possible to calculate the combined standard deviation of two or more groups.
Disadvantages of the standard deviation

 It cannot be used for comparing the dispersion of two or more series of observations given
in different units.
 It gives more weight to extreme values.
Coefficient of Variation
The measures of dispersion which are expressed in terms of the original units of the
observations are termed as absolute measures. Such measures are not suitable for comparing
the variability of two distributions which are not expressed in the same units of measurements.
Therefore it is better to use relative measure of dispersion obtained as ratios or percentages and
are thus pure numbers independent of the unit of measurement.
Standard deviation is an absolute measure of dispersion and a relative measure based on the
standard deviation is called the coefficient of variation. It is a pure number and suitable for
comparing the variability, homogeneity or uniformity of two or more distributions. It is given as
a percentage and calculated as

100
Coefficient of variation (CV) = Mean
The lower the C.V the more consistent or stable the distribution is since the less the variability.
Example
Over a period of 3 months the daily number of components produced by two comparable
machines was measured, giving the following statistics
Machine A: mean = 242.8; Standard deviation = 20.5
Machine B: mean = 281.3; Standard deviation = 23.0
Which machine has less variability in its performance?
4.5 Skewness and Kurtosis

 The term ‘skewness’ refers to lack of symmetry or departure from symmetry. When a
distribution is not symmetrical it is called a skewed distribution.
39

lOMoARcPSD|36154693
 In a symmetrical distribution the values of mean, median and mode are alike. If the value of
mean is greater than the mode, skewness is said to be positive. If the value of mode is greater
than mean, skewness is said to be negative.
 The Karl Pearson’s coefficient of skewness is frequently used for measuring skewness and its
calculated as
Mean  Mode
SK p 

Mean  Mode 3  Mean  Median 
But . Thus the formula for calculating the coefficient of
skewness can be written as
3  Mean  Median 
SK p 

 Kurtosis refers to the degree of flatness or peakedness of a frequency curve. The degree of
peakedness of a distribution is measured relative to the peakedness of the normal
distribution.
 If a distribution is more peaked than the normal curve, it is called Leptokurtic; if it is more
flat-topped than the normal curve, it is called platykurtic or flat-topped. The normal curve
is itself known as Mesokurtic.
Freq
Leptokur琀椀c curve
Mesokur琀椀c (normal curve)
Platykur琀椀c curve
40

lOMoARcPSD|36154693
4.6 Activities
1. The following table indicates the marks obtained by students in a statistics test.
Marks Number of students
0 – 20 5
20 – 40 7
40 – 60 -
60 – 80 8
80 – 100 7
The arithmetic mean for the class was 52.5 marks. You are required to determine the value
of:
i) The missing frequency
ii) The median mark
iii) The modal mark
iv) The standard deviation
v) The coefficient of skewness
2. From the prices of the shares X and Y given below, state which share is more stable in value
and which one would you invest on and why?
X: 55 54 52 53 56 58 52 50 51 49
Y: 108 107 105 105 106 107 104 103 104 101
3. An analysis of the monthly wages paid to workers of two firms A and B belonging to the same
industry gives the following results:
Firm A Firm B
No. of wage earners 586 648
Average monthly wage 52.5 47.5
Standard deviation 10 11
Compute the combined standard deviation.
41

lOMoARcPSD|36154693
LESSON FIVE: PROBABILITY DISTRIBUTIONS

5.1 Introduction
 Probability is the likelihood or chance that a particular event will occur.
 In probability and statistics the term experiment refers to any procedure that gives rise to a
collection of outcomes which cannot be predetermined.
 In tossing a coin, the possible outcomes are as follows:
Tossing 1 coin :  H ,T
Tossing 2 coins:  HH , HT , TH , TT 
Tossing 3 coins:  HHH , HHT , HTH , HTT , THH , THT , TTH , TTT 
 The set of all possible outcomes in an experiment is called a sample space.

 An event is a subset of the sample space.
EXAMPLE
Let the set of all outcomes (sample space) in the experiment of tossing two coins be
 HH , HT , TH , TT  . Then
A=  HT , TH  is the event of getting just one head/tail
B=  HH , HT , TH  is the event of getting atleast one head
=   is the the impossible event
S =  HH , HT , TH , TT  is the sure event
 An elementary event or simple event is the event containing only one point of the sample
space E.G: In the Toss of two coins, the following are elementary events:
 HH  , HT  ,  TH  ,  TT  .
 A random variable is a function which assigns a numerical value to each simple event in a
sample space.
Example
Suppose that three students are selected at random from a class and each is asked whether he
smokes (S) or he does not (N). Then the sample space of this experiment is given by
S  SSS , SSN , SNS , SNN , NSS , NSN , NNS , NNN 
 Let X denote the number of smokers among the three students chosen. Then:
42

lOMoARcPSD|36154693
Simple event in S Random variable X

SSS 3
SSN 2
SNS 2
SNN 1
NSS 2
NSN 1
NNS 1
NNN 0
Thus X is a random variable which takes the values 0, 1, 2, or 3.

 If a random variable can assume only a countable number of distinct values, it is called a
discrete random variable.
E.G: The number of children in a family, the number of telephone calls at a switchboard in
ten minutes period etc.
 A continuous random variable is one that can assume any value within a given time
interval.
E.G: Lifetime of an electric bulb, weight of a person etc.
5.3 Probability distribution function of a discrete random variable
 The probability distribution of a random variable can be described by using all the values that
a random variable can together with the corresponding probabilities. Such a listing is called a
probability distribution or probability mass function of the random variable.
Example
Suppose X represents the number of heads in a random experiment of tossing three coins.
The sample space is:
S  HHH , HHT , HTH , HTT , THH , THT , TTH , TTT 
The probability distribution of the random variable X defined as the “number of heads” is
43

lOMoARcPSD|36154693
x P(X =x)
1
0 8
3
1 8
3
2 8
1
3 8
 In general, suppose X is a random variable that assumes the values x1 , x2, …, xk. if we
represent the probability that X assumes the value xi by P(X=xi), then the probability function
can be given in the form of a table as
X P(x)
x1 p(x1)
x2 P(x2)
. .
. .
. .
xk P(xk)
Sum = 1
 p( x )  p( x )  P( x )  ...  P( x )
i 1 2 k
 The sum of the probabilities, i.e. i 1 is one.
Conditions for a function to be a probability function

i) The probability that a random variable assumes a value xi is always between 0 and
1,
0  p( xi ) 1
i.e.
k
 p( x ) 1
i 1
i
ii) The sum of all probabilities is equal to one, i.e.
Example
44

lOMoARcPSD|36154693
The number of telephone calls received in an office between 9 – 10 am has the probability
distribution as shown below:
Number of calls (X) Probability, P(x)

1 0.05
2 0.20
3 0.25
4 0.20
5 0.10
6 0.15
6 0.05
a) Verify that it is a probability function

b) Find the probability that there will be 3 or more calls
c) Find the probability that there will be an even number of calls
The Mean or Expected Value of a Discrete Random Variable
 It is obtained by multiplying each possible value of the random variable by the corresponding
probability and summing the terms. That is, if x1 , x2 ,...xn are the values assumed by a random
variable with respective probabilities p( x1 ), p( x2 ),... p( xn ) , then its mean  (also called the
expected value) is given by
 x1 p  x1   x2 p  x2   ...xn p  xn 
n
=  xi p  xi 
i 1
The mean is also referred to as the expected value is denoted by
The Variance of a Discrete Random Variable

 The variance of a discrete random variable is defined as
n
Var  X    xi    . p  xi 
2
i 1
45

lOMoARcPSD|36154693
 The positive square root of the variance is called the standard deviation of the random
variable. The variance is commonly denoted as  , hence the standard deviation equals  .
2
Example
Suppose we are given the following data relating to the breakdown of a machine in a certain
company during a given week, where x represents the number of breakdowns of the machine and
P(x) represents the probability value of x.
x 0 1 2 3 4
P(x) 0.12 0.20 0.25 0.30 0.13
Find the mean and the variance of the number of breakdowns per week for this machine
NB: The computations of  can be simplified by using the following version of the formulae:
2
 2  x 2 .P  x    2
5.4 Discrete Probability Distributions

Binomial Probability Distribution
Characteristics
i) An outcome on each trial of an experiment is classified into one of two mutually exclusive
categories; a success or a failure.
i) The probability of a success (p) remains the same from trial to trial and so does the
probability of a failure (q), where p +q = 1.
ii) The trials are independent i.e. the outcome of one trial does not affect the outcome of any
other trial.
 We are interested in the random variable x, where x is the number of successes in n trials.
 It is common to refer to each trial as a Bernoulli trial and to refer to the entire experiment as a
binomial experiment.
 Given a Bernoulli Process where the probability of success in any trial equals p and the
probability of a failure equals q, the probability of x successes in n trials is calculated as
46

lOMoARcPSD|36154693
 n
p  n, x    p x q n  x
 x
The mean of a Binomial distribution =
The variance  npq

2
Example
There are five flights daily from Moi International airport to Jomo Kenyatta International airport.
Suppose the probability that any flight arrives late is 0.2. What is the probability that: -
i) None of the flights are late today?
ii) Exactly one of the flights is late today?
5.5 Continuous Probability Distributions

Normal Probability Distribution
Characteristics
 It is bell shaped and has a single peak at the center of the distribution
 The arithmetic mean, median and mode of the distribution are equal and located at the peak.
 Half of the area under the curve is above this center point and the other half is below it.
 It is symmetrical about its mean i.e. if it is cut vertically at the central value, the two halves
will be mirror images
 It is asymptotic i.e. the curve gets closer and closer to the x-axis but never actually touches
it.
 Since the normal distribution is a continuous distribution, the probabilities are given in
terms of appropriate areas, and the total area under the curve is equal to 1. Thus the
probability that a random variable X having a normal distribution will assume a value
between two numbers a and b is equal to the area under the curve between x = a and x = b,
as shown below:
The standard normal probability distribution ( = 0,  = 1)

 The standard normal curve describes the distribution of a normal random variable with
mean zero and standard deviation 1. The random variable itself is called the standard normal
variable and is denoted by Z.
47

lOMoARcPSD|36154693
E.G: To find the area between z = 0 and z = 1.73, we go to 1.7 in the column and 0.03 in the
row and read the corresponding entry as 0.4582. Hence the area between 0 and 1.73 is
P  0  z 1.73 0.4582
0.4582 and
NB:
i) The curve is symmetrical w.r.t the vertical axis through zero
ii) It is strongly recommended that we sketch the curves and identify the areas under the
curve and the values along the horizontal axis.
EXAMPLES
P  0 z c  0.3944
1. If . Find c.
P   2.42 z 0.8 
2. Find
P  1.8 z 2.8 P   2.8  z  1.8 
3. Find a) b)
4. Find a) P ( z   2.13 b) P( z   1.81
5. Suppose z is a standard normal variable. In each of the following cases find c for which
P  z c  0.1151
a)
P  z c  0.8238
b)
P  1 z c  0.1525
c)
P   c  z  c  0.8164
d)
 Having considered areas under the standard normal curve, we now consider the general case
of a normal distribution with any mean  and any standard deviation  , where   0 .
 If X is a normal random variable with mean  and standard deviation  , then X can be
X
z
converted into a standard normal variable z by setting 
48

lOMoARcPSD|36154693
EXAMPLE 6
Suppose X has a normal distribution with  = 30 and  4. Find
a) P(30  X  35) b) P( X  40) c) P ( X  22)
5.6 Activities
1. A salesman who sells cars for General Motors claims that he sells the largest number of cars
on Saturday. He has the following probability distribution for the number of cars he expects
to sell on a particular Saturday.
No. of cars (x) Probability P(x)
0 .1
1 .2
2 .3
3 .3
4 .1
Total 1.0
i) On a typical Saturday, how many cars does the salesman expect to sell?
ii) What is the variance of the distribution?
2. In a recent survey, 90% of the homes in a city were found to have colored TV’s. In a sample
of nine homes, what is the probability that:
i. All nine have colored TV’s?
ii. Less than five have colored TV’s?
iii. More than five have colored TV’s?
iv. At least seven homes have colored TV’s?
3. The life times of electric components manufactured by Raman Industries Ltd are normally
distributed with mean of 2500 hours and standard deviation of 600 hours. If the daily
production is 500 components, how many are expected to have a life time of:
i) Less than 2600 hours
ii) Between 2350 hours and 2580 hours
iii) More than 2380 hours
49

lOMoARcPSD|36154693
50

lOMoARcPSD|36154693
LESSON SIX: SAMPLING AND SAMPLING DISTRIBUTIONS
6.1 Introduction
 The field of inferential or inductive statistics is concerned with studying facts about populations.
Specifically, the interest is in learning about the population parameters. This is accomplished by
picking a sample and computing the values of the appropriate statistics.
 A parameter is a numerical descriptive measure of a population. Because it is based on the
observation in the population, its value is almost always unknown.
 A Sample statistic is a numerical descriptive measure of a sample. It is calculated from the
observations in the sample.
NB: The term statistic refers to sample quantity and the term parameter refers to a population
quantity.
 Sampling is the process of selecting a sample from a population.
6.2 Types of sampling Designs

There are two major ways of selecting samples;
a) Probability sampling methods
b) Non - Probability sampling methods
a) Probability sampling methods

i) Simple random sampling
 Assumes that every member of the population has an equal chance of being independently
selected. All members of the population are labeled with a number and random numbers
should be used to select the sample.
 This is the best method of sampling as independence of sample members is assumed by
many statistical tests. Unfortunately all members of the population have to be available for
selection and this is rarely the case.
ii) Systematic sampling
 It is useful when the whole sampling frame is not available. The population is listed and
every nth member is included in the sample after the first has been selected randomly.
 Sampling from a production line may make use of this method.
51

lOMoARcPSD|36154693
iii) Stratified random sampling

 Useful when the population consists of a number of distinct subpopulations and there is no
difference between the subpopulations than within each of them.
 The population is split into these differing groups – strata. A random sub-sample is then
drawn from each, in proportion to the strata size.
iv) Cluster Sampling:
The population is divided into internally heterogeneous subgroups and some are randomly
selected for further study. It is used when it is not possible to obtain a sampling frame
because the population is either very large or scattered over a large geographical area.
b) Non-probability sampling
It is used when a researcher is not interested in selecting a sample that is representative of the
population.
i) Purposive Sampling
It allows the researcher to use cases that have the required information with respect to the
objectives of his or her study e.g. educational level, age group, religious sect etc.
ii) Quota Sampling
The researcher purposively selects subjects to fit the quotas identified e.g. Gender: Male or
Female; Class Level: Graduate or Undergraduate; Religion: Muslim, Protestant, catholic,
Jewish; Social economic class: Upper, middle or lower.
iii) Snow ball sampling
It is used when the population that possesses the characteristics under study is not well
known and can be best located through referral networks. Initial subjects are identified who
in turn identify others. Commonly used in drug cultures, teenage gang activities, Mungiki
sect, insider trading, Mau Mau etc.
iv) Convenience or Accidental Sampling
Involves selecting cases or units of observation as they become available to the researcher
e.g. asking a question to the radio listeners, roommates or neighbours.
6.3 Reasons for Sampling

We obtain a sample rather than a complete enumeration (a census) of the population for many
reasons. There are six main reasons for sampling in lieu of the census.
52

lOMoARcPSD|36154693
i) Economy: Directly observing only a portion of the population requires fewer resources than a
census.
ii) The Time factor: A sample may provide an investigator with needed information quickly
iii) The very large populations: Many populations about which inferences must be made are quite
large and sample evidence may be the only way to obtain information.
iv) Partly inaccessible populations: Some populations contain elementary units so difficult to
observe that they are in a sense inaccessible e.g. in determining consumer attitudes not all of the
users of a product can be queried.
v) The Destructive nature of the observation: Sometimes the very act of observing the desired
characteristics of the elementary unit destroys it for the use intended. Classical examples of this
occur in quality control
vi) Accuracy and sampling: A sample may be more accurate than a census. A sloppily conducted
census can provide less reliable information than a carefully obtained sample.
6.4 Bias and Error in sampling

A sample is expected to mirror the population from which it comes from. However, there is no
guarantee that any sample will be precisely representative of the population. One of the things
that make a sample unrepresentative of its population is the sampling error.
Sampling error: It comprises the difference between the sample and the population that are due
solely to the particular elementary units that happen to have been selected.
There are two basic causes for sampling error.
 One is Chance: Bad luck may result in untypical choices. Unusual elementary units do
exists, and there is always a possibility that an abnormally large number of them will be
chosen. The main protection against this type of error is to use a large enough sample.
 Another cause of sampling error is sampling bias. This is the tendency to favor the selection
of elementary units that have particular characteristics. Sampling bias is usually the result of
a poor sampling plan.
Non sampling error
 The other main cause of unrepresentative samples is non sampling error. This type can occur
whether a census or a sample is being used.
53

lOMoARcPSD|36154693
 A non-sampling error is an error that results solely from the manner in which the observations
are made. The simplest example of non sampling error is inaccurate physical measurement due to
faulty instruments or poor procedures. Consider the observation of human weights – no 2
answers will be of equal reliability.
6.5 Sampling Distributions

 By sampling distribution of a statistic we mean the theoretical probability distribution of the
statistic.
6.5.1 Sampling Distribution of the Mean
 If samples of size n are drawn with replacement from a population with mean  and variance
2
 2

 2 ,the mean and variance of the sampling distribution of x are given by  x  and x n .
 When random samples of size n are drawn without replacement from a finite population of size
N that has a mean  and a variance  , the mean and the variance of the sampling distribution
2
of x are given by
2 N  n
 x   x2  
and n N1
2
  2
x
 If the population size is large compared to the sample size, n , approximately
 The standard deviation of the sampling distribution of x is commonly known as the standard

error of the mean. It is n when sampling with replacement. For a sample drawn without
 N n
replacement from a finite population of size N, the standard error of the mean is n N1
54

lOMoARcPSD|36154693

 In the latter case it is approximately n if the population is very large compared to the sample
2
size. In our discussion, we shall assume that the population is large enough that n can be taken
as the value of  x even when sampling without replacement.

2
 The standard error of the mean then depends on two quantities,  and n. It will be large if 
2 2
is large, i.e. if the scatter in the parent population is large. On the other hand, the standard error
will be small if the sample size n is large. Since with a larger sample we can get more
information about the population mean  and consequently less scatter of the sample mean
about  .
 The variance of the parent population is usually not under the experimenter’s control. Therefore
one sure way of reducing the standard error of the mean is by picking a large sample – the larger
the better.
 So for we have concerned ourselves with two parameters of the sampling distribution of
x   x and  x2 
. We now turn our attention to the distribution itself
 The probability distribution of x will very much depend on the distribution of the sampled
population.
 Note that if n the sample size, is large, the distribution of x is close to a normal distribution
2
of course with mean  and variance n . The statement of this result is contained in the central
limit theorem.
Central Limit Theorem
55

lOMoARcPSD|36154693
The distribution of the sample mean x of a random sample drawn from practically any
population with mean  and variance  can be approximated by means of a normal

2
2
distribution with mean  and variance n , provided the sample size is large.
 The central limit theorem tells us that the shape of the distribution is approximately normal. We
2
  and  x2 
already know that if the population has mean  and variance  , then x
2
n .
 Converting to the z scale, we can give an alternate version of the central limit theorem.
x 

When the sample size is large, the distribution of n is close to that of a standard
normal variable z.
(Recall that to convert to the z scale the rule is: subtract the mean and divide by the standard
deviation of the r.v in question)
 Since the central limit theorem applies if the sample size is large, a natural question is, how large
is large enough?
This will depend on the nature of the sampled population
 If the parent population is normally distributed, then the distribution of x is normal for
any sample size,
 If the parent population has a symmetric distribution, the approximation to the normal
distribution will be reached for a moderately small sample size, as low as 10.
 In most instances, the tendency towards normality is so strong that the approximation is fairly
satisfactory with a sample size of about 30.
Example 1
The records of the Dept of health, education and welfare show that the mean expenditure
incurred by a student during 2010 was $5000 and the standard deviation of the expenditure was
$800. Find the approximate probability that the mean expenditure of 64 students picked at
random was
a) More than $4820
56

lOMoARcPSD|36154693
b) Between $4800 and $5120
Example 2
The length of life (in hours) of a certain type of electric bulb is a random variable with a mean
life of 500 hours and a standard deviation of 35 hours.
What is the approximate probability that a random sample of 49 bulbs will have a mean life
between 488 and 505 hours?
6.6.2 Sampling Distribution of the Proportion

 If n items are picked independently from a population where the probability of success is p
x
(not very close to 0 or 1) and if n is large, then the distribution of the sample proportion n
pq
is approximately normal with mean p and variance n where p  q 1 .
x
 p
n
pq
 Converting to the z scale, it follows that n has a distribution that is very close to the
x  np
standard normal distribution provided n is large. This leads to the conclusion that npq is
distributed approximately as a standard normal variable.
Example 1
Suppose 10% of the tubes produced by a Machine are defective. If a sample of 100 tubes is
inspected at random
a) Find the expected proportion of defectives in the sample
b) Find the variance of the proportion of defective in the sample
c) Find the approximate distribution of the sample proportion
d) Find the probability that the proportion of defective will exceed 0.16
Example 2
57

lOMoARcPSD|36154693
If 60% of the population feels that the president is doing a satisfactory job, find the approximate
probability that in a sample of 900 people interviewed at random, the proportion who share this
view will
a) Exceed 0.65
b) Be less than 0.56
58

lOMoARcPSD|36154693
LESSON SEVEN: ESTIMATION THEORY

7.1 Introduction
 The main objective of any statistical investigation is to acquire an understanding of the

population by studying the population parameters.
 The investigation of the entire population may not be feasible due to several reasons.
Thus there is a need to get an idea about the population parameters by studying the
corresponding sample statistics.
 There are two ways of giving an estimate of a parameter: Point estimate and Interval
estimate.
7.2 Point Estimation

 A numerical value of the estimator computed from a given set of sample values is called
an estimate of the parameter. Thus a point estimate is a single number, which is used to
estimate an unknown population parameter.
 An estimator of a parameter is a statistic relevant for estimating the parameter. An
estimator is thus a random variable; an estimate is its computed value from a given sample.
For instance X is an estimator of  . A particular value of X computed from a given sample

will be denoted by x and will represent an estimate of  .
n
 x  x
2
i
s 2  i 1
Similarly, S is an estimator of  and n 1
2 2
 is its estimate computed from a
set of data x1 , x2 ,......, xn . Also If X represents the number of successes in a sample of n, then
X x
n is an estimator of P and if in a particular sample there are x successes, then n is an
estimate of P.
 The major limitation of a point estimate is that it fails to indicate how close it is to the
quantity it is supposed to estimate. In other words, a point estimate does not give any idea
about the reliability or precision of the method of estimation used.
59

lOMoARcPSD|36154693
Interval Estimation
 Another method of estimating parameters is called the method of Interval Estimation or
Confidence Interval.
 It involves computing two points and constructing an interval within which the parameter lies
with a specified degree of confidence. In constructing the end points of the interval, all of the
factors, namely, the point estimate, the population variance, and the sample size, are brought
into play.
7.3 Properties for a good estimator

a) Unbiasedness: An unbiased estimator of a population parameter is an estimator whose
expected value is equal to that parameter i.e. if you were to take an infinite number of
samples, calculate the value of the estimator in each sample, and then average these values,
the average value would equal the parameter.
b) Consistency: An estimator is said to be consistent if the difference between the estimator
and the parameter grows smaller as the sample size grows large.
c) Efficiency: an efficient estimator should have the least variance or least standard error.
d) Sufficiency: An estimator is said to be sufficient if it extracts from the sample such an
amount of information as no other estimator does. This means that an estimator should be
such that it utilizes all the information contained in the sample for the purpose of
estimating a given parameter.
 When we find a point estimate, we certainly do not expect that it will exactly equal to the
parameter value on the dot. Also if we take two samples from the same population, we do not
expect the two estimates computed from these samples to be exactly equal. This is due to the
sampling error involved. Thus, the method of point estimation has some drawbacks.
7.4 Confidence Intervals for Population Mean when the Population Variance is
Known.

If the population has a normal distribution and
1   is known, then a 100 percent
confidence interval for  is given by
60

lOMoARcPSD|36154693
 
x  z    x  z
2 n 2 n.
Example 1:
A gas station sold a total of 8019 gallons of gas on 9 randomly picked days. Suppose the amount
sold on a day is normally distributed with a standard deviation of  90 gallons. Construct
confidence intervals for the true mean amount sold on a day with the following confidence
levels:
a) 98%
b) 80%
Example 2:
A random sample of 16 fully grown turkeys had a mean weight of 20.8kgs. If we can assume
from past experience that  2.8 kgs, construct confidence interval for  , the true mean weight,
with the following confidence coefficients.
a) 90%
b) 95%
c) 98%
7.5 How Large a Sample?

The sample size needed so as to be
 1    100 percent confident that the estimate x does not
2
   
n  2 
 e 
differ from  by more than a pre assigned quality e is   .
Example
A population has a normal distribution with variance 225. Find how large a sample must be
drawn in order to be 95% confident that the sample mean will not differ from the population
mean by more than 2 units.
61

lOMoARcPSD|36154693
7.6 Confidence Interval for Population Mean When the Population Variance is
Unknown
A1   
100 percent confidence interval for when the population is normally distributed and
 is not known is given by
S S
x  tn  1,    x  tn  1,
2 n 2 n
tn  1,  2
Note that 2 , will be very close to if n is 30 or more. In that case, the above confidence.
Interval for  becomes, approximately
S S
x   2    x   2
n n
Example 1
When 16 cigarettes of a particular brand were tested in a laboratory for the amount of nicotine
content, it was found that their mean content was 18.3 mg with S =1.8mg.
Set a 90 percent confidence interval for the mean nicotine content  in the population of
cigarettes of this brand. (Assume that the amount of nicotine in the cigarette is normally
distributed).
Example 2
In order to estimate the amount of time in minutes that teller spends on a customer, a bank
manager decided to observe 64 customers picked at random. The amount of time the teller spent
on each customer was recorded. It was found that the sample mean was 3.2 minutes with
S 2 1.44 find a 98% confidence interval for the mean amount of time  .
Example 3
The following data represent the amount of sugar consumed (in pounds) in a household during
five randomly picked weeks: 3.8, 4.5, 5.2, 4.0 and 5.5. Construct a 90% confidence interval for
the true mean consumption  . (Assume a normal distribution for the amount of sugar consumed)
62

lOMoARcPSD|36154693
63

lOMoARcPSD|36154693
LESSON EIGHT: HYPOTHESIS TESTING

8.0 Introduction
 A statistical hypothesis is a statement, assertion or claim about the nature of a population.

Hypothesis testing is a procedure based on sample evidence and probability theory to
determine whether the hypothesis is a reasonable statement.
8.2 The Null and Alternative Hypothesis

 A hypothesis that is being tested for the purpose of possible rejection is called a null
hypothesis denoted as H 0 . It should be stated in such a way that it contains the equality sign.
 The hypothesis against which the null hypothesis is tested is called the Alternative
H A . This is the hypothesis that is accepted when the null hypothesis

hypothesis denoted as
is rejected. The null hypothesis denies the claim posed in the question.
 A test of statistical hypothesis is a rule or procedure that leads to a decision to accept or to
reject the hypothesis under consideration when the experimental sample values are obtained.
This rule is often referred to as a decision rule. If the evidence compiled from the sample
does not support the claim under H 0 , we will reject H 0 and conclude that H 0 is false.
8.3 Type I and Type II errors

 The error of rejecting the null hypothesis when it is in fact true is called a type I error or
rejection error. The probability of committing this error is denoted by the Greek letter 
(alpha) and is referred to as the level of significance of the test.
 This error of accepting H 0 when it is false is called a type II error or an acceptance error.
The probability of this error is denoted by the Greek letter  (beta).
8.4 One-Tailed and Two-Tailed tests

The nature of the critical region for a statistical test procedure depends on the alternative
hypothesis. We shall consider 3 cases of the alternative hypothesis.
a) H A :   0
b) H A :   0
64

lOMoARcPSD|36154693
c) H A :  0 , where 0 is a given specific value.
a) A right – tailed test

In discussing the engineer’s claim, we have considered the principle of testing the null
hypothesis
H 0 :  0 against the alternative hypothesis
H A :   0 with 0 450
It can be seen that for arbitrary 0 , the critical region C is given by


C  0  Z 
n
where n is the sample size,  is the population standard deviation, which is assumed known and
Z is the value on the z scale such that the area in right tail is  .
The decision rule with the level of significance  is the given by

x  0
 Z
 
x   0  Z
Reject H 0 if n or equivalently reject H 0 if n
It is the one-sided nature of the alternative hypothesis (greater than, >) that prompts the rejection
of H 0 if the value of the statistic falls in the right tail of its distribution. The test is therefore
called a one-tailed test, specifically, a right-tailed test.
b) A left-tailed test
Suppose the null and alternative hypotheses are given as
H 0 :  0
H A :   0
Once again, the alternative hypothesis is one sided (less than, <). We reject H 0 for smaller
values of x , leading to the rejection of H 0 if the value falls in the left tail of the distribution of
x as shown below. This gives a one-tailed test that is specifically a left-tailed test.
65

lOMoARcPSD|36154693
0 x
Ac琀椀ons
H 0  Z H0

C  0  Z .
The critical value C is given by n
The decision rule is given as:

x  0
  Z
 
x   0  Z
Reject H 0 if n or equivalently reject H 0 if n
c) A Two-Tailed test
A test leads to a two-tailed test if the alternative hypothesis is two sided.
Consider he following example:
E.g. Suppose a machine is adjusted to manufacture bolts to the specification of 1 – inch diameter,
and we state the null and alternative hypotheses as
H 0 :  1
H A :  1
If the sample mean of the diameters was too far off on either side of 1, we would favor rejecting
H 0 . If the value of x falls in either tail of the distribution of X , we will reject H 0 .

0.025
The rejection region with  0.05 has been distributed as 2 at each tail
66

lOMoARcPSD|36154693
 
2 2
0 x
 Z Z z - scale
2 2
Ac琀椀ons:
H0 H0 H0
We have two critical values C1 and C2 and they are given by

 
C1 0  Z C2 0  Z
2 n and 2 n
The decision rule is formulated as follows:

x  0
  
x  0  Z x   0  Z
Reject H 0 if 2 n or 2 n or equivalently reject H 0 if n is less
 Z Z
than 2 or greater than 2
8.5 Steps to be followed in testing a hypothesis

1. State the null hypothesis. We treat here only the special case where H 0 stipulates that the
parameter value is equal to a specific number.
2. State the alternative hypothesis. There should be no overlap between the sets of parameter
values stipulated under H 0 and H A .

The alternative hypotheses is important in deciding whether the critical region is one-tailed or
two-tailed. Rejection of H 0 leads to the acceptance of H A

3. Pick an appropriate test statistic
67

lOMoARcPSD|36154693
4. Stipulate the value of  , the probability of rejecting H 0 wrongly. It is the value of  that will
determine the critical point(s). Together with step 2, formulate the decision rule, i.e. determine
the values of the test statistics that will lead to the rejection of H 0 (the critical region)
5. Take a random sample and compute the value of the test statistic.
6. The final step consist of making the decision in light of the decision rule formulated in step 4.
It is important to interpret the conclusions in a non statistical language for the benefit of the un-
initiated
8.6 Test of Hypothesis (Single Population)

8.6.1 Test of Hypothesis for the Population Mean When the Population Variance is known
A basic assumption about the population in this case is that it is normally distributed. In the
absence of a normally distributed population, we will require that the sample size be large
x  0

  30  . The relevant statistic in this case is n
A summary of the test criteria to test H 0 :  0 against the three forms of alternative
hypotheses is given below
Alternative hypothesis
The decision rule is to reject H 0 if the
computed value is
  0 Greater than Z
  0 Less than  Z
  0  Z Z
Less than 2 or greater than 2
Example 1
After taking a refresher course, a salesman found that his sales (in dollars) on 9 random days
were 1280, 1250, 990, 1100, 880, 1300, 1100, 950 and 1050. Does the sample indicate that the
refresher course had the desired effect, in that his mean sale is now more than 1000 dollars?
Assume  100 , and the probability of erroneously saying that the refresher course is beneficial
should not exceed 0.01. Also assume that the sales are normally distributed.
68

lOMoARcPSD|36154693
Example 2
An IQ test was administered to 9 students and their mean IQ was found to be 95. Assuming the
population variance is 144, is it true that the mean IQ in the population is less than 100?
Use  0.15 , and assume that IQ is normally distributed.
Example 3
A machine can be adjusted so that when under control, the mean amount of sugar filled in a bag
is 5kgs. From past experience, the standard deviation of the amount filled is known to be
0.15kgs.
To check if the machine is in control, a random sample of 16 bags was weighed and the mean
weight was found to be 5.1kgs. At the 5% level of significance, is there evidence to believe that
the adjustment is out of control [Assume a normal distribution of the amount of sugar filled in a
bag]
8.6.2 Test of Hypothesis for the Population Mean when the Population Variance is Unknown
and the Sample is Small
x  0


In the case where was known, we used the test statistic n
Since  is not known, we will use its estimation S. Hence the appropriate test statistic is
x  0
T
S
n
At this point we need the added assumption that the population is normally
distributed, especially if n is small. Since, under this assumption, the statistic T has student’s t
distribution with n – 1 d.f, we get the decision rules given in the following table, depending upon
the particular alternative hypothesis
Alternative Hypothesis
The decision rule is to reject H 0 if the computed value of T
is
1.   0
t
Greater than n  1,
2.   0
t
Less than n  1,
3.  0 Less than
 tn  1,
2 or greater than
tn  1,
2
69

lOMoARcPSD|36154693
Example 4
A car salesman claims that a particular make of car would give a mean milleage of greater than
20 miles per litre To test the claim, a field experiment was conducted where 10 cars were each
run on one litre of petrol. The results (in miles) were 23, 18, 22, 19, 19, 22, 18, 18, 24, 22.
Do the data corroborate the salesman’s claim? Use  0.05 and assume a normal distribution
for mileage per gallon.
Example 5
A home economist claims that is a person is put on a certain diet, it will lead to a reduction of his
or her weight. The following data records the weights (in pounds) of five people, before and
after the diet. Does the data support the claim at the 5% level of significance?
Person number 1 2 3 4 5
Before the diet 175 168 140 130 150
After the diet 170 169 133 132 143
Example 6
An auto dealer believes that his new model will give mean trouble-free service of at least 12,000
miles. In a simulated test with 4 cars, the following numbers of trouble-free miles were
obtained: 11,000, 12,000, 11,800 and 11,200
Do these data refute the dealer’s claim? Use  0.05 [assume a normal distribution]
Example 7
A machine can be adjusted so that when under control, the mean amount of sugar filled in a bag
is 5 kg. To check if the machine is in control, six bags were picked at random and their weights
were found to be 5.3, 5.2, 4.8, 5.2, 4.8 and 5.3.
At the 5% level of significance, is there evidence to believe that the machine is not in control?
[Assume a normal distribution for the weight of a bag]
70

lOMoARcPSD|36154693
8.6.3 The Population Proportion for Qualitative Data

 So far we have considered data where the observed variable can be measured on a numerical
scale. We now consider the case of a qualitative variable where the data is recorded as short -
tall, black - green, defective - non defective etc
 Our objective will be to test hypothesis regarding the proportion p of a certain attribute in the
population.
 We shall specifically consider the problem of testing the null hypothesis H 0 : p  p0 , where
p0 is a number between 0 and 1 against various alternative hypotheses.
e.g. we might be interested in the proportion of defective items produced by a machine and
wish to test: p 0.2 against p  0.2 ; or p 0.2 against p  0.2 ; or p 0.2 against p 0.2
 To carry out a test of hypothesis regarding the population proportion. We pick a sample of
independent observations and use the sample proportion as the statistic on which the test is
x
based. If p is the proportion in the population then the sample proportion n has a sampling
p 1 p n
distribution with mean p and standard deviation
x
 Furthermore, if the sample is large, the shape of the distribution of n is approximately
normal. Consequently, under the null hypothesis, which postulates that the population
x
proportion is 0 , n has a distribution that is approximately normal with mean p0 and
p
p0  1  p0  n
standard deviation provided n is large,
 We now have a situation analogous to the one where we tested hypotheses regarding the
population mean when  was known.

2
x  p0  1  p0 
x n
The role of is played by , that of  0 by p0 and that of n by n
The table below gives the 3 cases based on the nature of the alternative hypothesis
71

lOMoARcPSD|36154693
Alternative
The decision rule is to reject H 0 if the computed value of
hypothesis
x
 p0
n
p0  1  p0  n
is
p  p0 Greater than Z
p  p0 Less than  Z
p  p0  Z Z
Example 1
A machine is known to produce 30% defective tubes. After repairing the machine, it was found
that it produced 22 defective tubes in the first run of 100. Is it true that after the repaired the
proportion of defective tubes is reduced? Use  0.01 .
Example 2
The proportion of Kenyans who traveled abroad last year was 20%. To find the attitude of
people on foreign travel this year, 100 people were interviewed. Of these 15 said they would
travel and the remaining 85 said they would not. Is there any basis to believe that the attitude has
changed from last year? Use  0.10 .
8.7 Test of Hypothesis (Two Populations)

We now consider tests of hypothesis concerning the difference of means of two populations and
the difference of proportions of an attribute in two populations.
8.7.1 Difference in Population Mean When the Variances are Known

The null hypothesis under test is
H 0 : 1 2 that is 1   2 0 and the test statistic appropriate for the purpose is
72

lOMoARcPSD|36154693
XY
12  22

m n
The decision rules for various forms of alternative hypothesis are given in the table below.
Alternative hypothesis
The decision rule is to reject H 0 if the computed value is
1  2 Greater than Z
1  2 Less than  Z
1 2  Z Z
Example 1
For a sample of 15 adult Kenyans picked at random, the mean weight was x 154 pounds,
whereas for a sample of 18 people in the U.S, the mean weight was y 162 pounds. From past
surveys it is known that the variance of weight in Kenya is 1 100 and in the U.S it is
2
 22 169 .
Is it true that there is significant difference between mean weights in the two places? Use
 0.05 . [Assume that the weights are normally distributed]
Example 2
In order to compare two brands of cigarettes, brand A and brand B, for their nicotine content, a
sample of 60 was inspected from brand A and a sample of 40 from brand B. The results of the
tests were summarized as follows.
Brand A x 15.4 S12 3
Brand B y 16.8 S22 4
At the 5% level of significance, do the two brands differ in their mean nicotine content?
73

lOMoARcPSD|36154693
8.7.2 Difference in Population Means when the Variances are unknown but are assumed
equal
 The following test procedure is particularly suited for the case when small independent
samples are drawn from normally distributed populations both having the same variance.
 We are interested in testing the null hypothesis H 0 : 1  2

XY
12  22

 When the variance are known, we used the statistic m n
 But we are given that the variances are equal. So suppose 1  2 and let  represent the
2 2 2
XY
1 1
 
common value. The above test statistic then reduces to m n
S p2 
 m  1 S12   n  1 S22
 Since  is not known, we shall use its polled estimator S P where, mn 2
XY
1 1
Sp 
 Therefore, the test statistic appropriate for carrying out the test of H 0 is m n
The test procedure for the various form of the alternative hypothesis are given in the table below
Alternative Hypothesis
The decision rule is to reject H 0 if the computed value of is
4.    2
t
Greater than m n  2,
5.    2
t
Less than - m n  2,
6.  2 Less than
 tm n 2,
2 or greater than
tmn  2,
2
Example 3
A nitrogen fertilizer was used on 10 plots and the mean yield per plot was found to be x 82.5
with an estimate S1 of the population standard deviation of yield per plot equal to 10kg. On the
other hand, 15 plots treated with phosphate fertilizer gave a mean yield y 90.5 kg per plot with
74

lOMoARcPSD|36154693
an estimate S 2 of the standard deviation of yield per plot equal to 20kg. At the 5% level of
significance are the two fertilizers significantly different?
75

lOMoARcPSD|36154693
LESSON NINE: CHI-SQUARE TESTS

9.0 Introduction
 This lesson covers the tests of goodness of fit, tests of independence and tests of
homogeneity
9.2 Test of Goodness of Fit

 While discussing tests of hypothesis about a population proportion, the items that were
inspected were classified into one of two categories: for instance, a coin could land heads or
tails, a person could be a smoker or a non smoker, an item could be defective or non
defective, and so on.
 If n items are picked independently from such a population, this leads to the binomial
distribution.
 A generalization of this is when the population can be broken into more than two mutually
exclusive categories. For example, a coin could land heads, trails or on edge; when a die is
rolled it could land showing up one of the six faces; a person might be a democrat, a
Republican, or an independent; a person might be an A, B, O or AB blood type, and so on.
 If n independent observations are made from such a population, we get a generalized concept
of the binomial distribution called the Multinomial distribution.
 With our background of the last section, we are equipped to test the following null hypothesis
Ho: The Proportion of Democrats in the U.S is 0.60 (implying the proportion of non-
Democrats is 0.40)
 In this section we consider how to test a null hypothesis of the following type.
Ho: In the U.S, the proportion of Democrats is 0.55, the proportion of Republicans is 0.35,
and the proportion of independents is 0.10.
 To test the above hypothesis, suppose we interview 1000 people picked at random. On the
basis of the stipulated null hypothesis, we would expect 550 Democrats, 350 Republicans
and 100 independents.
76

lOMoARcPSD|36154693
 If we actually observe 568 Democrats, 342 Republicans and 90 independents in this sample,
we might be quite willing to go along with the null hypothesis.
 On the other hand, if the sample yields 460 Democrats, 400 Republicans and 140
independent, we would be reluctant to accept Ho.
 Thus in the final analysis, the statistical test will have to be based on how good a fit or
closeness there is between the observed numbers and the numbers that one would expect
from the hypothesized distribution.
 Tests of this type which determine whether the sample data are in conformity with the
hypothesized distribution are called tests of goodness of fit, since they literally test how
good the fit is.
 The test criterion is provided by a statistic X whose value for any sample is given as a
number  defined by
2
 Oi  Ei 
2
6
 
2
i 1 Ei
Where Oi represents the observed frequency of the face marked i on the die and Ei the
corresponding expected frequency obtained by assuming that the null hypothesis is true.
Example:
It is believed that the proportions of people with A, B ,O and AB blood types in the population
are, respectively. 0.4, 0.2, 0.3 and 0.1. When 400 randomly picked people were examined, the
observed numbers of each type were 148, 96,106 and 50.
At the 5% level of significance, test the hypothesis that these data bear out the stated belief.
Summary:
1. The population is divided into K categories (classes) C1, C2,…, Ck
2. The null hypothesis stipulates that the probability that as individual belongs to category C 1 is
P1, that it belongs to category C2 is P2, and so on.
77

lOMoARcPSD|36154693
3. To test this hypothesis, a random sample of n individuals is picked. The observed

frequencies of the categories are recorded as O1, O2,…,OK.
4. If the null hypothesis is true, then the expected frequencies E1, E2,…,Ek are obtained as
follows:
E1 nP1 , E2 nP2 ,…, Ek nPk
5. The departure of the observed frequencies from those expected is measured by means of a
statistic X whose value  is given by

2
 O1  E1   O2  E2   Ok  Ek 
2 2 2
2    ... 
E1 E2 Ek
6. If none of the expected frequencies is less than 5, the distribution of X can be approximated
very closely by a chi-square distribution. Since there are K categories, the number of d.f
associated with the chi-square is K – I.
7. The critical region for a given level of significance will therefore consist of the right tail of
the chi-square distribution with K – 1 d.f.
The decision rule is:

2
Reject Ho if the computed  value is greater than the table value k  1, ,
2
Note:
The distribution of the statistic X employed here is only approximately chi-square. It should not
be used if one of more of the expected frequencies is less than 5.
9.3 Test of Independence

 In the previous section, we have observed only one characteristic on any individual e.g. in
classifying an individual as A, B, O or AB blood type, we observed the characteristic “blood
type”.
 Here we are interested in observing more than one variable on each individual and finding if
there exists a relationship between these variables. For example: for each person we might
78

lOMoARcPSD|36154693
observe both blood type and eye color and investigate if these characteristics are related in
any way.
 In short, our goal is to test whether two attributes observed on members of a population are
independent.
 As a first step, we pick a sample of size n and classify the data in a two way table on the
basis of the two variables. Such a table is called a contingency table, since it alludes to
whether the distribution according to one variable is contingent on the distribution of the
other. If there are r rows and c columns, it is referred to as an “r by c” contingency table.
 O  E
2
 
2
 The test statistic is given by with (r-1) (c-1) d.f. The decision rule for an 
E
level of significance is: Reject Ho if the computed  value is greater than the table
2
 2r  1 c  1 ,
value
Example:
In a certain community, 360 randomly picked people were classified according to their age group
and political leaning. The data is presented below:
Political Age group
leaning 20-35 36-50 Over 50 Total
Conservative 10 40 10 60
Moderate 80 85 45 210
Liberal 30 25 35 90
Total 120 150 90 360
Test the hypothesis that a person’s age and political leaning are not related. Use  = 0.05
9.4 Test of Homogeneity

 Sometimes, one might want to compare the proportions of a characteristic in more than two
populations. For instance one might want to compare the proportions of democrats in four
states such as Newyork, California, Indiana, and Florida
 Also, if one considered three states, say network, California and Indiana, we might want to
test whether in these three states, the proportions of Republications are the same, whether the
proportions of Democrats are the same and whether the proportion of independents are the
79

lOMoARcPSD|36154693
same. In short, what we are interested in is whether the three states are homogeneous with
respect to the party affiliations of their residents. Tests that deal with problems of this type
are called tests of Homogeneity:
 Once again, the measure of departure from homogeneity is provided by a statistic X whose
O  E
2
 
2
value for any sample is given by E
 The distribution of the statistic is approximately chi-square with (r-1) (c-1) d.f, where r
represents the number of rows and c the number of columns. The approximation is
satisfactory if none of the expected frequencies is less than 5.
Example:
In order to investigate whether the distribution of the blood types in Europe is the same as in the
U.S , information was collected on 200 randomly picked people in Europe and 300 people in the
U.S. From the data provided below, is it true that the distribution of blood types in Europe and
the U.S are significantly different:
Location
Blood type Europe U.S Total
A 95 125 220
B 50 70 120
O 45 90 135
AB 10 15 25
Total 200 300 500
80

lOMoARcPSD|36154693
LESSON TEN: ANALYSIS OF VARIANCE
10.1 Introduction
 Analysis of variance (ANOVA) is a technique used to test for the significance of

the difference between more than two sample means and to make inferences about
whether the samples are drawn from the same mean.
 The ‘analysis of variance’ procedure or ‘F test’ is used in such problems, to test for
the significance of the difference among more than two sample means.
10.2 Assumptions of Analysis of Variance

The analysis of variance technique is based on the following assumptions:
1) Each sample is drawn from a normal population and the sample statistics tend to reflect
the characteristics of the population.
2) The populations from which the samples are drawn have identical means and variances
i.e.
1  2 3  ... n
1  2  3  ...  n
In case we are not able to make these assumptions in a particular problem, the analysis of
variance technique should not be used. In such cases, we should consider using a “non-
parametric (distribution-free) technique”.
10.3 Computation of Analysis of Variance

 The null hypothesis taken while applying analysis of variance technique is that the means of
different samples do not differ significantly.
 The procedure followed in the analysis of variance would be explained separately for
1) One-way classification
2) Two-way classification
 However, irrespective of the type of classification, the analysis of variance is a technique of

partitioning the total sum of squared deviations of all sample values from the grand mean and
81

lOMoARcPSD|36154693
is divided into two parts – the sum of squares between the samples and the sum of squares
within the samples.
 Individual observations in the same treatment samples, however, can differ from each other
only because of chance variation, since each individual within the group receives exactly the
same treatment.
10.4 One – Way Classification

 The term ‘one-factor analysis of variance’ refers to the fact that a single variable or factor of
interest is controlled and its effect on the elementary units is observed.
 In other words, in one-way classification, the data are classified according to only one
criterion.
 Suppose we have k independent random samples of n1 , n2 , ..., nk observations from k

populations.
 The population means are denoted by 1  2 3  ...  k .
 The one-way analysis of variance is designed to test the null hypothesis:
H 0 : 1  2 3  ...  k
i.e. the arithmetic means of the population from which the k samples are randomly drawn
are equal to one another.
 The steps involved in carrying out the analysis are:
 Calculate the variance between the samples:
 The variance (sum of squares) between samples reflects the contribution of both different
treatments and chance to inter-sample variability.
 Sum of squares is a measure of variability. The sum of squares between samples is denoted
by SSB.
82

lOMoARcPSD|36154693
 For calculating variance between samples, we take the total of the squares of the variations of
the means of various samples from the grand mean and divide this total by the degrees of
freedom.
 Thus the steps in calculating variance between samples will be:
i) Calculate the mean of each sample i.e. X 1 , X 2 , ... X K .
ii) Calculate the grand mean X . Its value is obtained as
X 1  X 2  ... X k
X
n1  n2  ... nk
iii) Take the difference between the means of the various samples and the grand mean.
iv) Square the deviations and obtain the total which will give the sum of squares between the
samples; and
v) Divide the total obtained I step (d) by the degrees of freedom.
The degrees of freedom will be one less the number of samples i.e. if there are 4 samples,
then the degrees of freedom will be 4 – 1 = 3. In general v = k – 1 where k = number of
samples.
 Calculating the variance within the samples:
 The variance (sum of squares) within samples measures those inter-sample differences
that arise due to chance only.
 It is denoted by SSW. For calculating the variance within the samples we take the total
of the sum of squares of the deviation of various items from the mean values of the
respective samples and divide this total by the degrees of freedom
 Thus the steps in calculating variance within the samples will be:
i) Calculate the mean of each sample i.e. X 1 , X 2 , ... X K .
83

lOMoARcPSD|36154693
ii) Take the deviations of the various observations in a sample from the mean values of
the respective samples
iii) Square these deviations and obtain the total which gives the sum of squares within
the samples.
iv) Divide this total obtained in step (c) by the degrees of freedom, the d.f is obtained by
deducting from the total number of observations, the number samples, the number of
samples, i.e. v = n – k , where k refers to the total number of all the observations.
 Calculate the F-Ratio
 Calculate the F – ratio as follows
Variance between the samples S12

F*  F* 
Variance within the samples i.e. S22
 F is always computed with the variance between the sample means as the numerator and
the variance within the sample means as the denominator.
 The denominator is computed by combining the variance within the k samples into single
measures.
 Compare the computed value of F
 Compare the calculated value of F with the table value of F for the given d.f at a certain
critical level (generally we take 5% level of significance).
 If the calculated value of F is greater than the table value of F, it indicates that the difference
in sample means is significant,
i.e. it could not have arisen due to fluctuations of random sampling or, in other words, the
samples do not come from the same population.
 On the other hand, if the calculated value of F is less than the table value, the difference is
not significant and hence could have arisen due to fluctuations of random sampling.
84

lOMoARcPSD|36154693
Example
As head of a department of a consumers’ research organization, you have the responsibility for
testing and comparing lifetimes of four brands of electric bulbs. Suppose you test the lifetime of
three electric bulbs of each of the four brands.
The data is shown below, each entry representing the lifetime of an electric bulb, measured in
hundreds of hours.
Brand
A B C D
20 25 24 23
19 23 20 20
21 21 22 20
Can we infer that the mean lifetime of the four brands of electric bulbs are equal?
10.5 Analysis of Variance Table

Since there are several steps involved in the computation of both the between and within sample
variances, the entire set of results may be organized into an analysis of variance (ANOVA) table.
This table is summarized as shown below:
Source of Sum of Degrees of Mean squares Variance
Variation Squares Freedom MS Ratio, F

SSB
MSB =
Between Samples SSB k–1 k-1
MSB
F
MSW
SSW
MSW =
Within Samples SSW n–k n k
Total SST n-1
85

lOMoARcPSD|36154693
To use the ANOVA table, it is convenient to use the following short-cut computational formulas:
k Tj2 T2
SSB = n j=1

N
j
Between samples sum of squares =
k nj k T j2
SSW =  X ij2 
j 1 i 1
n
j 1
Within samples sum of squares = j
nj
k
T2
SST  X ij2 
Total sum of squares = j 1 i 1 N
The format for the ANOVA table using the computational formulas is shown below:
Source of Sum of Mean squares Variance
Variation Squares D.F MS Ratio, F

k Tj2 T2
SSB =   MSB =
SSB
j=1 nj N k-1
Between Samples k–1
MSB
F
MSW
k nj k T j2
SSW =  X ij2  n MSW =
SSW
Within Samples j 1 i 1 j 1 j
n–k n k
nj
k
T2
SST  X ij2 
Total j 1 i 1 N n-1
86

lOMoARcPSD|36154693
Example
Consider the above example.
In order to use the computational formulas the following four quantities must be computed;
k nj k T j2
 X 2
ij
Tj
n
j 1
T2
j 1 j 1
, , j
, and N .
87

lOMoARcPSD|36154693
LESSON ELEVEN: REGRESSION AND CORRELATION ANALYSIS
11.1 Introduction
 Correlation analysis is a statistical tool used to ascertain the association between two
variables while regression analysis is used to determine the nature and extent of relationship
between variables. This lesson explains the methods used in studying correlation and
regression.
11.2 Correlation Analysis

 If two quantities vary in such a way that movements in one are accompanied by movements
in the other, these quantities are said to be correlated. Thus correlation is the existence of
some definite relationship between two or more variables.
 E.G: There exists some relationship between family income and expenditure on luxury
items, price of a commodity and amount demanded, etc.
 Correlation analysis helps in determining the degree of relationship between two or more
variables – it does not tell us anything about cause-effect relationship.
11.3 Types of Correlation

Correlation may be classified in the following ways:-
(a) Positive and negative correlation.
 Whether correlation is positive (direct) or negative (inverse) would depend upon the
direction of change of the variable.
 If both the variables are varying in the same direction, i.e. if one variable is increasing the
other is also increasing or, if one variable is decreasing the other is also decreasing,
correlation is said to be positive.
 If, on the other hand, the variables are varying in opposite directions, i.e. as one variable is
increasing the other is decreasing and vice versa, correlation is said to be negative.
(b) Simple, partial and multiple correlation

 The distinction between simple, partial and multiple correlation is based upon the number of
variables studied.
 When only two variables are studied, it is a problem of simple correlation.
88

lOMoARcPSD|36154693
 When three or more variables are studied it is a problem of either multiple or partial
correlation.
 In multiple correlation three or more variables are studied simultaneously. In partial
correlation, there are more than two variables but only two variables that are influencing each
other are considered, the effect of other influencing variables being kept constant.
(c) Linear and Non-Linear correlation

 The distinction between linear and non-linear correlation is based upon the constancy of the
ratio of change.
 If the amount of change in one variable tends to bear a constant ratio to the amount of change
in the other variable, correlation is said to be linear.
 Correlation would be called non-linear or curvilinear if the amount of change in one variable
does not bear a constant ratio to the amount of change in the other variable.
11.4 Methods of Studying Correlation

1. Scatter diagram
2. Karl Pearson’s coefficient of correlation
3. Spearman’s rank correlation coefficient
Scatter Diagram
It helps to illustrate diagrammatically any relationships that may exist between two variables.
The following diagram indicate various degrees of correlation
Diagram to be drawn
89

lOMoARcPSD|36154693
Examples
1. Draw a scatter diagram from the following data
Supply (x) 4 5 8 9 10 12 15
Demand (y) 3 4 6 5 7 8 11
11.5 Coefficient of Correlation

Coefficient of correlation, denoted by r, is a unit free measure of the degree of linear relationship
between two or more variables. The square of correlation coefficient i.e. r 2 is called the
coefficient of determination. It measures the amount of variation one variable that can be
accounted for in terms of variation in the other(s). For instance if r = 0.90 then r 2 = 0.81, which
implies that 81% of the variation in one variable can be attributed to variation in the in the other.
11.5.1 Karl Pearson’s coefficient of correlation (Product moment coefficient of correlation)

The coefficient of correlation (r) is a measure of strength of the linear relationship between two
variables. It is also referred to as the sample coefficient of correlation and is given by
n XY   X Y
r
n X 2
     X    n  Y    Y  
2 2 2
 
Example
The following data refers to exam marks vs hours of study for a sample of 8 candidates that sat a
statistics exam
Exam mark (Y) 64 61 84 70 88 92 72 71

Hours of study (X) 20 16 34 23 27 32 18 22
a) Calculate the Pearson’s product moment coefficient of correlation
b) Calculate the coefficient of determination and give a comment about the correlation
between exam marks and hours of study.
Interpretation of the coefficient of correlation
1. When r = +1, there is a perfect positive correlation between the variables
2. When r = -1, there is a perfect negative correlation between the variables
3. When r = 0, there is no correlation between the variables
90

lOMoARcPSD|36154693
4. The closer r is to +1 or to –1, the closer the relationship between the variables and the closer r
is to 0, the less close the relationship.
Advantage
 It summarizes in one figure the degree of correlation and whether it is positive or negative.
Limitations
 It assumes linear relationship regardless of the fact whether that assumption is true or not.
 The coefficient can be misinterpreted.
 The value of the coefficient is unduly affected by the extreme values.
 It is time consuming.
11.5.2 Spearman’s Rank Correlation Coefficient

 This is a measure of the degree of linear relationship between variables which are given in
terms of terms of their ranks (positions) in the series.
 The spearman’s rank coefficient is denoted by r and is given by the formula
6 di2 6 d 2
r  1  1
n(n 2  1) n3  n
In rank correlation, there are two types of problems:-
i. Where actual ranks are given
ii. Where actual ranks are not given
Where actual ranks are given
Steps:
 Take the differences of the two ranks i.e. (R1-R2) and denote these differences by d.
 Square these differences and obtain the total
6 d 2
r 1 
n  n 2  1
 Use the formula
Example
Two managers are asked to rank a group of employees in order of potential to eventually become
top managers. The rankings are as follows:
91

lOMoARcPSD|36154693
Employees ranking by manager I Ranking by manager II

A 10 9
B 2 4
C 1 2
D 4 3
E 3 1
F 6 5
G 5 6
H 8 8
I 7 7
J 9 10
Calculate the coefficient of rank correlation and comment on the value.
Where ranks are not given

Ranks can be assigned by taking either the highest value as 1 or the lowest value as 1. The same
method should be followed in case of all the variables.
Example
Calculate the rank correlation Coefficient for the following data of marks of 2 tests given to
candidates for a clerical job
Preliminary Test 92 89 87 86 83 77 71 63 53 50
Final test 86 83 91 77 68 85 52 82 37 57
EQUAL RANKS OR TIE IN RANKS

 Where two or more individuals are to be ranked equal, the rank assigned for purposes of
calculating the coefficient of correlation is the average of the ranks which these individuals
would have got had they differed slightly from each other.
 Where equal ranks are assigned to some entries, an adjustment in the formula for
calculating the Rank coefficient of correlation is made.
92

lOMoARcPSD|36154693
 The adjustment consists of adding to the value of where stands for the number of items
whose ranks are common.
 The formula can thus be written as
 
6  d 2   m13  m1    m23  m2   ...
1 1
r 1   
12 12
n  n  1
2
Example
An examination of eight applicants for a clerical post was taken by a firm. From the marks
obtained by the applicants in the accounting and statistics papers, compute the Rank coefficient
of correlation.
Applicant A B C D E F G H
Marks in accounting 15 20 28 12 40 60 20 80
Marks in statistics 40 30 50 30 20 10 30 60
Merits of the Rank method
 It is simpler to understand and easier to apply compared to the Karl Pearson’s method.
 Where the data are of qualitative nature like honesty, efficiency, intelligence etc, the method
can be used with great advantage.
 It is the only method that can be used where we are given the ranks and not the actual values.
Limitations
 The method cannot be used for finding out correlation in a grouped frequency distribution.
 Where the number of observations exceeds 30, the calculations become quite tedious and
require a lot of time.
11.6 Test of Hypothesis Regarding Population Correlation Coefficient

 The parameter that provides a measure of association between two variables in the
population analogous to the way r does in the sample is called the population correlation
coefficient and is denoted by the Greek letter  (rho).
 Suppose we obtain a certain value f r from a given set of data. What is it suggesting about 
? We shall consider only the simple case where the null hypothesis in which we are interested
is H 0 :  0, meaning that there is no relationship between the two variables in the
population.
93

lOMoARcPSD|36154693
n 2
r .
 The test statistic to carry out the test is 1 r2
 If H0 is true, then this statistic has the students’ t distribution with n-2 degrees of freedom.
Example
Consider the previous example on Exam marks Vs hours of study where we obtained r = 0.88
and r2 = 0.77 based on a sample with n = 10. Test the hypothesis that the population correlation
coefficient is zero at the 5% level.
11.7 Regression Analysis

Regression analysis is the statistical tool which helps to estimate or predict the unknown values
of one of one variable from known values of another variable.
Types of Regression
Simple linear regression: Involves a relationship between two variables only.
Multiple regression: Analyses or considers the relationship between three or more variables.
In regression analysis, an attempt is made to determine a line (Curve) which best fits the given
pair of data. In case of a linear relationship, a line with the equation of the where a and b are
constants to be determined is fitted. The constants a and b are determined such that
S   Y  a  bX 
2
is a minimum.
With the use of differential calculus, S is minimized for a and b which satisfy the following two
normal equations
 Y na  b X
 XY a  X  b X 2
Solving for a and b simultaneously yields the formulas
n XY   X  Y
bˆ 
n X 2    X 
2
94

lOMoARcPSD|36154693
aˆ 
1
n
  Y  bˆ X  = ˆ
Y  bX
The constant b in the equation is called the regression coefficient of Y on X. It measures the
linear relationship between the two variables X and Y. X is called the independent variable, also
known as the regressor or predictor. Y is called the dependent variable, also known as the
regressed or explained variable.
Example
The following data give the observations on weekly income and expenditure on food for five
households.
Weekly Income (£) 240 270 300 30 360
Expenditure on food(£) 200 220 240 245 250
a) Plot the data on a scatter diagram
b) Determine the least squares regression line of expenditure on weekly income.
c) Using the equation in (b), estimate the expenditure on food for someone having a weekly
income of £380.
11.8 Activities
1. For the following results showing marks obtained by 15 students, calculate the Rank
correlation
Marks 50 50 40 39 38 37 36 35 34 33 32 31 30 29 28
in Maths
Marks 50 49 51 52 43 47 42 40 44 40 30 41 32 33 31
in
English
2. The following data gives the aptitude test scores and productivity indices of 10 workers
selected at random.
Aptitude scores (X) 6 6 6 7 7 4 5 7 6 8

0 2 5 0 2 8 3 3 5 2
Productivity index 6 6 6 8 8 4 5 6 6 8
(Y) 8 0 2 0 5 0 2 2 0 1
i) Determine the regression equation of Y on X.
95

lOMoARcPSD|36154693
ii) Estimate the productivity index of a worker whose test score is 92

iii) Compute the coefficient of correlation and coefficient of determination and interpret their
values.
iv) Test the hypothesis that the population correlation coefficient is zero at the 5% level.
96

lOMoARcPSD|36154693
LINEAR PROGRAMMING
This is a mathematical technique that deals with the optimization of a linear

function of variables known as objective function subject to a set of linear
inequalities known as constrains. The objective function may be pro昀椀t,
revenue, contribution and cost. The constrains may be imposed by di昀昀erent
resources such as labour, 昀椀nance, materials, machines, market, technology
etc. By linearity is meant a mathematical expression in which all expressions
among the variables are linear (plotted you obtain a straight line.
A linear programming has two basic parts:
 The objective function, which describes the primary purpose of the

formulation - to maximize some return (pro昀椀t) or to minimize some
cost.
 The constrains set, which is a system of inequalities under which
optimization is to be accomplished.
Assumptions of linear programming:
a) Linearity - costs, revenues or any physical properties which form the

basis of the problem vary in direct proportion (linearly) with the
quantities or number of components produced.
b) Divisibility - quantities, revenues and costs are in昀椀nitely divisible i.e.
any fraction or decimal answer is valid.
c) Certainty – the technique makes no allowance for uncertainty in the
estimate made, although the evaluation of dual values indicates the
sensitivity of the solution to marginal uncertainty in constraint values.
d) Positive solutions – non-negativity constraints are introduced to
ensure only positive values are considered.
e) Interdependence between demand for products is ignored;
products may be complementary or a substitute for one another.
f) Time factors are ignored. All production is assumed to be
instantaneous
SOME APPLICATIONS OF DYNAMIC PROGRAMMING
a) Production and distribution problems

b) Scheduling inventory control
c) Resource allocation
d) Replacement and maintenance problems
ADVANTAGES OF LP
97

lOMoARcPSD|36154693
1. In certain types of problems such as inventory control management,

Chemical Engineering design, dynamic programming may be the only
technique that can solve the problems.
2. It helps in attaining the optimum use of productive factors. Linear
programming indicates how a manager can utilize his productive factors
most e昀昀ectively by a better selection and distribution of these elements.
E.g. more e昀케cient use of manpower and machines can be obtained by use
of linear programming.
3. Most problems requiring multistage, multi period or sequential decision
process are solved using this type of programming.
4. Because of its wide range, it is applicable to linear or non-linear problems,
discrete or continuous variables, deterministic or stochastic problems.
5. The mathematical techniques used can be adapted to the computer.
6. Better and more successful decisions
LIMITATIONS OF L.P
1. Each problem has to be modelled according to its own constraints and
requirements. This requires great experience and ingenuity.
2. The number of state variables has to be kept low to prevent complicated
calculations.
3. It treats all relationships as linear. I.e. if direct cost of producing 10 units
is sh. 100 then on 20 units it is assumed to be sh. 200. This may not
always be the case in practice.
4. All the parameters in the linear programming model are assumed to be
known with certainty which is not possible in real situation.
METHODS OF SOLVING LINEAR PROGRAMMING PROBLEMS:
The two methods used to solve linear programming problems are:
a) Graphical methods
b) Simplex method
Whichever the method to be adopted, the 昀椀rst step is to formulate the linear
programming problems using the following steps:
 Identify the decision variables to be determined and express them in

terms of algebraic symbols.
98

lOMoARcPSD|36154693
 Identify all the limitations or constrains in the given problem and then
express them as linear inequalities.
 Identify the objective/ criterion which is to be optimized (maximize or
minimize) and express it as a linear function of the de昀椀ned decision
variables.
Example 1:
A manufacturer has two products P1 and P2 both of which are produced in two
steps by machines M1 and M2. The process times per hundred for the
products on the machines are:
M1 M2 contribution (per 100 units)
P1 4 5 10
P2 5 2 5
Available hours 100 80
The manufacturer is in a market upswing and can sell as much as he can

produce of both the products. Formulate the mathematical model and
determine the optimal product mix.
Solutions:
Using the graphical method
Formulate the linear programming:
Let product P1 be represented by x1 and P2 by x2
Objective function, Z = 10x1 + 5x2
Subject to; 4x1 + 5x2 ≤ 100 (M1 constrain)
5x1 + 2x2 ≤ 80 (M2 constrain)
And x1, x2 ≥ 0 (non-negativity condition)
Solving using graphical method,
 Determine the coordinates;
M1 constrain: 4x1 + 5x2 = 100
99

lOMoARcPSD|36154693
When x1 = 0; x2 = 100/5 = 20 (0, 20)
When x2 = 0; x1 = 100/4 = 25 (25, 0)
M2 constrain: 5x1 + 2x2 = 80
When x1 = 0; x2 = 80/2= 40 (0, 40)
When x2 = 0; x1 = 80/5 = 16 (16, 0)
 Plotting the graph;
X2
26
24
22
20
18
16
14
12
10
8
6
4 D(0,16)
2
0 C ( 10,12)
Feasible region
A (0,0) B(25,0)
4 8 12 16 20 24 25 28 32 36 40
X1
100

lOMoARcPSD|36154693
 Considering the points of intersections, their coordinates and testing

using the objective functions;
Points coordinates Z= 10x1 + 5x2
A (0,0) 10(0) + 5 (0) = 0
B (25,0) 10(25) + 5 (0) = 250
C (10,12) 10(10) + 5(12) = 160
D (0, 16) 10(0) + 5(16) = 80
Thus the product mix should be;
Product P1 = 25
Product P2 = 0
And maximum contribution will be 250
2) Using Simplex method

It’s a method which is designed to solve any linear programme. It is an
iterations where the same computational steps are repeated a number of
times before the optimum is reached. In order to develop a general solution
method, the LP problem must be put in a common format, which we call the
standard form.
Step 1: Formulate the LP problem
Let product P1 be represented by x1 and P2 by x2
Objective function, Z = 10x1 + 5x2
Subject to; 4x1 + 5x2 ≤ 100 (M1 constrain)
5x1 + 2x2 ≤ 80 (M2 constrain)
And x1, x2 ≥ 0 (non-negativity condition)
Step 2: Convert the inequalities in constraints into equalities.
This can be done by adding the slack variables s1, s2, ….
Z = 10x1 + 5x2
101

lOMoARcPSD|36154693
4x1 + 5x2 + S1 = 100
5x1 + 2x2 + S2 = 80
Step 3: Initial Simplex Tableau
Product Slack Quantit

Solution variables y
variable solutio
X1 X2 S1 S2 n
S1 4 5 1 0 100
S2 5 2 0 1 80
Z 10 5 0 0 0
Step 4: Obtain the Pivot Element
 Identify the biggest number in Z row (10). This gives the column of the
interest.
 Divide the elements in the identi昀椀ed column by quantity solution
100/ 4 = 25
80/5 = 16
 The smallest of the answer obtained is 16, which identi昀椀es the row of
interest.
 The point where the identi昀椀ed column and the row meet, gives the pivot
element (5)
Step 5: Make pivot elements 1 (by dividing the row with pivot
element by the value of pivot element) and give the row identi昀椀ed
a new identity (the identity of the identi昀椀ed column). The draw
initial simplex tableau reproduced.
Old row: S2 5 2 0 1 80
New row: X1 5/5 2/5 0/5 1/5 80/5
X1 1 0.4 0 0.2 16
Initial Simplex Tableau reproduced
102

lOMoARcPSD|36154693
Solutio Product Slack Quantit

n variables y
variabl solutio
e X1 X2 S1 S2 n
S1 4 5 1 0 100
X1 1 0.4 0 0.2 16
Z 10 5 0 0 0
Step 6: Row operations.
Done to make the elements in identi昀椀ed column zero, except the pivot
element which MUST remain one (1). The operation must be within any
two rows one of which is the one with pivot element. I.E
OLD ROW: S1 4 5 1 0 100
X1 (1 0.4 0 0.2 16) × 4
OLD ROW: S1 4 5 1 0 100
X1 4 1.6 0 0.8 64
NEW ROW: S1 0 3.4 1 -0.8 36
OLD ROW: Z 10 5 0 0 0
X1 (1 0.4 0 0.2 16) × 10
OLD ROW: Z 10 5 0 0 0
X1 10 4 0 2 160
NEW ROW: Z 0 1 0 -2 -160
Step 7: Second Simplex Tableau
Second Simplex Tableau

n variables y
variabl solutio
e X1 X2 S1 S2 n
103

lOMoARcPSD|36154693
S1 0 3.4 1 -0.8 36
X1 1 0.4 0 0.2 16
Z 0 1 0 -2 -160
Since all the elements in the Z row are not negatives or zeros, the optimal
solution is not reached. Go to step 8.
Step 8: Repeat steps 4 to 7.
a) Pivot element
Column identi昀椀ed = X2
Dividing elements in this column by elements in quantity solution;
36/3.4 = 10.6
16/0.4 = 40
The smallest of answer obtained (10.6) identify the row
Where the row and column identi昀椀ed meet is pivot element.
b) Make pivot element 1 and give the row new identity.
Old row: S1 0 3.4 1 -0.8 36
New row: X2 0/3.4 3.4/3.4 1/3.4 -0.8/3.4

36/3.4
X2 0 1 0.29 -0.24 10.6
Second Simplex Tableau reproduced

Row n variables y operations
variabl solutio
Old e X1 X2 S1 S2 n row: X1 1 0.4
0 0.2 16
X2 0 1 0.29 -0.24 36
X1 1 0.4 0 0.2 16
Z 0 1 0 -2 -160
104

lOMoARcPSD|36154693
X2 (0 1 0.29 -0.24 36) × 0.4
Old row: X1 1 0.4 0 0.2 16
X2 0 0.4 0.122 -0.096 14.4
New row: X1 1 0 -0.122 0.296 1.6
0ld row: Z 0 1 0 -2 -160
X2 0 1 0.29 -0.24 36
New row: Z 0 0 -0.29 -1.76 -196
c) Third simplex tableau
Third Simplex Tableau

n variables y Thus, the product mix
variabl solutio should be:
e X1 X2 S1 S2 n
Product P1 = 1.6
X2 0 1 0.29 -0.24 36
Product P2 = 36
X1 1 0 - 0.29 1.6
0.12 6 Maximum
2 contribution of
196
Z 0 0 -0.29 -1.76 -196
DUALITY
Every linear program has an opposite program called Dual program. The
initial formulated programme is called primal program. The relationship
between primal and dual program is that the objective optimal solution is the
same and the solution of one can be deduced from the other.
Procedure for determining dual program from primal is:
a) Maximum primal implies minimum dual and vice versa
b) Less or equal to (≤) primal implies greater or equal to (≥) dual and
vice versa.
c) Number of variables in the dual program equal number of constraints

in the primal and vice versa.
d) The right hand side of dual constraints inequalities are objective co-
e昀케cient in primal program and vice versa.
105

lOMoARcPSD|36154693
e) Constraint coe昀케cients in the dual program are the transpose of the

matrix of constraint co-e昀케cient in the primal.
f) Non-negativity conditions do not change.
Example 1:
Given primal program:
Max, Z = 4 x1 + 2x2 +5x3
Subject to: x1 + 2x2 - x3 ≤ 20 …………………….y1
4 x1 + 8x2 +11x3 ≤ 28 ……………….y2
6 x1 + x2 + 8x3 ≤ 32 ………………....y3
And x1, x2, x3 ≥ 0
Required:
Obtain the dual program
Solution:
Constraints coe昀케cient matrix
1 2 -1
4 8 11
6 1 8
Transposing the above matrix:
1 4 6
2 8 1
-1 11 8
Dual program;
Mix, Z = 20 y1 + 28y2 +32y3
Subject to: y1 + 4y2 - 6y3 ≥ 4
2y1 + 8y2 +y3 ≥ 2
-1y1 + 11y2 + 8y3≥5
And y1, y2, y3 ≥ 0
Example 2:
Given primal program:
Min, Z = 5x1 + 8x2
Subject to: 2x1 + 3x2 ≥ 5 …………………….y1
4 x1 + 10x2 ≥ 19… ……………….y2
x1 + 12x2 ≥ 24… ……………….y2
And x1, x2 ≥ 0
Required:
Obtain the dual program
Constraints coe昀케cient matrix

2 3
4 10
1 12
Transposing the above matrix:
2 4 1
106

lOMoARcPSD|36154693
3 10 12
Dual program:
Max, Z = 5 y1 + 19y2 + 24y3
Subject to: 2y1 + 4y2 + y3 ≤ 5
3y1 + 10y2 +12y3 ≤ 8
And x1, x2, x3 ≥ 0
NOTE:
The solution to the dual can be deduced from the solution to the primal using
simplex method. The procedure involve associating the values in the Z-row of
the optimal primal tableau with the dual variables, where the 昀椀rst slack
variable is associated with the 昀椀rst dual variable, the second slack variable
with the second dual variable and so on.
Example3:
Suppose you have primal program as:
Max, Z = 2x1 + 3x2
Subject to: 2x1 + x2 ≤ 4
x1 + 2x2 ≤ 5
And x1, x2 ≥ 0
After performing all steps involved in simplex method, the optimal (last)
tableau is:
Solutio Products Slack variables Quantit

n x1 x2 s1 s2 y
variabl solutio
e n
x1 1 0 2/3 1/3 1
x2 0 1 1/3 2/3 2
Z 0 0 -1/3 - 4/3 -8
Dual program would be:
Min, � = 4y1 + 5y2
Subject to: 2y1 + y2 ≥ 2
y1 + 2y2 ≥ 3
And y1, y2 ≥ 0
The solution to the dual program is determined by associating the Z-row
values in the primal optimal tableau corresponding to the slack variables.
That is:
y1 = 1/3 which corresponds to s1
y2 = 4/3 which corresponds to s2
Thus, the optimal solution for the dual is:
y1 = 1/3 y2 = 4/3 �=8
SENSITIVITY ANALYSIS
This involves determining the e昀昀ect the various change to the primal
programme would have on the current solution to the program. It is also
called Post-Optimality analysis.
107

lOMoARcPSD|36154693
The various changes that can occur in linear programming problem include:
a) Changes in the coe昀케cient of the objective program.
b) Changes in the availability of resources or the right hand side of the

inequalities.
c) Changes in the coe昀케cient of the constraints.
d) Addition of new constraints.
Example 4:
Suppose we have a formulated linear program model as:
Max, Z = 2x1 + 3x2
Subject to: 2x1 + x2 ≤ 4 ……………………………R1
x1 + 2x2 ≤ 5 ……………………………R2
And x1, x2 ≥ 0
Also suppose we are given optimal solution (after solving using simplex of
graphical method) as: x1 = 1 x2 = 2 and Z=8
st nd
a) Supposing the 1 constrain (R1) increases by 20% and 2 constrain (R2)
increases by 10%, perform the sensitivity analysis to 昀椀nd the new solution
and check whether it is feasible solution.
Solution:
The new solution is given as:
Current basic variable = (inverse matrix for constraints coe昀케cient) x
(New right hand side)
But inverse of matrix = x Adjoint
Matrix for the coe昀케cient of the constraints for the problem above is 2
1
1 2
Determinant = 4 – 1 =3
Adjoint = Transpose of the cofactor of the matrix
But cofactor = 2 -1
-1 2
Transposing the cofactor = Adjoint = 2 -1
-1 2
Inverse = x 2 -1
-1 2
New right hand side:
New R1 = 4 + x 4 = 4.8
New R2 = 5 + x 5 = 5.5
Thus, current basic variables, x1 = 4.8

x2 5.5
Hence, x1 = 1.4 and x2 = 2.1
108

lOMoARcPSD|36154693
Finding optimal solution,= 2 (1.4) + 3 (2.1) = 8.9

Since the values for the basic variables are all positive then we can conclude
that the new solution is feasible solution.
REVISION EXERCISE:
1) Using the information give an example 4 above, determine the new
optimal solution by performing the sensitivity analysis when:
a) R1 increases by 10% and R2 decreases by 20%.
b) R1 remain 4 and R2 increases by 30%
c) R1 reduces by 2 units and R2 increase by 3 units.
2) Given primal program as:
Min, Z = 1000x1 + 800 x2

Subject to: 6x1 + 2x2 ≤ 12
12x1 + 4x2 ≤ 24
And x1, x2 ≥ 0
a) Write a dual program
b) Solve the dual programme using simplex method
c) Deduce the solution to the primal program from the dual program
109

lOMoARcPSD|36154693
INDEX NUMBERS
An index number is a number which indicates the level of a certain
phenomenon at any given date with the level of the same phenomenon at
some standard date.
It provides an opportunity for measuring the relative change of a variable
where measurement of its actual change is inconvenient or impossible. It is
also a series of numbers by which changes in the magnitudes of a
phenomenon are measured from time to time or from place to place. An
index is constructed by selecting a base year as a starting point. The price or
quantity of base year is represented by 100 and those of other years
measured against it.
Uses of index numbers;
a) Price index numbers are used to measure changes in a particular group
of prices and help in comparing the movement of one commodity with
another.
b) Index numbers of industrial production provide a measure of change in

the level of industrial production in a country.
c) The quantity index numbers show the rise or fall in the volume of
production, volume of exports and imports.
d) The imports and export prices indices are used to measure the
changes in the terms of trade of a country
e) Used to forecast business conditions of a country and to discover

seasonal 昀氀uctuations and business cycles
f) Used to measure enrolment changes and performance of students.
Limitations of index numbers;

a) It is not practicable to price all the goods and services as well as to
take into account all changes in quantity or product.
b) Can be a昀昀ected by sampling error as we calculate index numbers using

samples.
c) In price index numbers the choice of a normal period is di昀케cult as few

periods can be regarded as normal for all segments of the economy.
d) The results obtained by di昀昀erent methods of construction may not

quite agree
e) Comparisons of changes in variables over long periods are not reliable
Price index number:
110

lOMoARcPSD|36154693
Such index shows that the value of money is 昀氀uctuating i.e. appreciating or
depreciating accordingly as index numbers of prices are rising or falling. A
rise in index number of prices will signify the deterioration in the value of
money and vice versa.
Simple index numbers;
These are cases where construction of index numbers involves a single
commodity. Methods used in constructing simple index numbers are;
a. Fixed base method
Here, the base period is 昀椀xed and prices of subsequent years are expressed
as relatives of the prices of the base year. A price relative is price of an item
in one year relative to another year i.e.
P1/P0 ×100
Where; P1 = price of current year
P0 = price of base year
Example:
From the following data, compute price index number by taking 2002 as base
year.
Year 2002 2003 2004 2005 2006 2007
Price of 8 10 12.5 18 22 25
sugar/ Kg
Solution
Year Price of sugar/ Price index
Kg (P1/P0× 100)
2002 8 8/8 × 100 = 100
2003 10 10/8 ×100 = 125
2004 12.5 12.5/8 × 100 =
2005 18 156.50
2006 22 18/8 × 100 = 225
2007 25 22/8 × 100 = 275
25/8 ×100 =
312.5
b. Chain base method
In this method, the base is not 昀椀xed and it changes from year to year. The
price of the previous period is taken as the base period. This method shows
whether the rate of change is rising, falling or constant as well as the extent
of change from year to year.
Price index number = (price of the current year)/ (price of previous
year) × 100
Example;
Construct the chain base index numbers from the following data.
Year 200 200 200 200 200 2007
111

lOMoARcPSD|36154693
2 3 4 5 6
Price 120 125 140 150 135
160
(Shs)
Solution
Yea Prices Chain base index
r (Shs) numbers
200 120 -
2 125 125/120 × 100 =
200 140 104.17
3 150 140/125 × 100 = Weighted index number;
200 135 112.0 If all commodities selected do not
4 160 150/140 × 100 = have equal importance for consumers
200 107.14 then weighted system is adopted.
5 135/150 × 100 = Appropriate weights are assigned to
200 90.00 di昀昀erent commodities. An index is
called Weighted Aggregate index
6 160/135 × 100 =
when it is constructed for an
200 118.52 aggregate of items (prices) that have
7 been weighted in some way (by
corresponding quantities produced, consumed or sold), so as to re昀氀ect their
importance.
The important formulae of constructing weighted index numbers include;
i) Laspeyres Method (L) - The base year quantities/prices are taken
as weights. The method tries to answer the question “what is the
change in aggregate value of the base period list of goods when
valued at given period prices?”
P01 = ∑P1q0 × 100

∑P0q0
Where: P01 = price index number
P0 = price of the base year
q0 = quantity of the base year
P1 = price of the current year
q1 = quantity of current year
ii) Paasche Method (P) - Here, the current year quantities / prices are
taken as weights. It tries to answer the question, “what would be the
value of the given period list of goods when valued at current period
prices?”
P01 = ∑P1q1 × 100

∑P0q1
N.B in Laspeyres index weights (q0) are the base year quantities and do not
change from one year to next unlike Paasche index which requires
continuous use of new quantity weights for each period considered.
112

lOMoARcPSD|36154693
iii) Fisher’s Ideal Method - Taken as geometric mean of Laspeyres and

Paasche indices.
P01 = ∑P1q0 × ∑P1q1 × 100

∑P0q0 ∑P0q1
P01 = √ (L × P)
iv) Marshall-Edge Worth method - The current year as well as base
year prices and quantities are considered.
P01 = ∑(q0 +q1) P1 × 100

∑(q0 + q1) P0
On opening the brackets;
P01 = ∑ P1q0 + P1q1 × 100
∑P0q0 + P0q1
Example:
From the following data, calculate index numbers for 2013 taking 2012 as
the base and using the following formulae;
a) Laspeyres
b) Paasche
c) Fishers
d) Marshall –edge worth
2012 2013
Price Quantity Price Quantity
(Shs) (bags) (Shs) (bags)
Maiz 65 20 135 30
e
Whea 95 8 160 7
t
Bean 150 5 320 8
s
Solution:
2012 2013
P0 q0 P1 q1 P1q0 P0q0 P1q1 P0q1
Maiz 65 20 13 30 2700 1300 4050 1950
e 5
Whe 95 8 16 7 1280 760 1120 665
at 0
Bean 15 5 32 8 1600 750 2560 1200
s 0 0
113

lOMoARcPSD|36154693
558 281 773 381

0 0 0 5
a) Laspeyres index number
P01 = ∑P1q0 × 100

∑P0q0
= 5580/ 2810 × 100 = 198.6

b) Paasches index number
P01 = ∑P1q1 × 100

∑P0q1
= 7730/3815 × 100 = 202.6
c) Fishers index number
P01 = ∑P1q0 × ∑P1q1 × 100

∑P0q0 ∑P0q1
= (5580/2810) × (7730/3875) × 100

= 2.0058 × 100
= 200.6
d) Marshall –edge index number
P01 = ∑ P1q0 + P1q1 × 100

∑P0q0 + P0q1
P01 = 5580 + 7730 × 100
2810 + 3815
= 13310/ 6625 × 100 = 200.9
REVISION QUESTIONS:
1) Explain uses and limitations of index numbers
2) Given below is a table of four commodities with the corresponding

prices and quantities over the years (2012 and 2013)
TIME
PRODUCT 2012 2013
Quantity Price Quantity Price
(Kg) (shs) (Kg) (shs)
Bread 5 5 7 6.5
Eggs 6 7.75 10 8.8
Soap 4 9.63 6 10.75
Sugar 9 12.5 9 12.75
Calculate:
a) Laspeyre’s price index
114

lOMoARcPSD|36154693
b) Paasche price index
c) Fishers price index
115

lOMoARcPSD|36154693
DECISION THEORY
Decision making is at the core of businesses and the lives of each person.
Some decisions are major and not made often while other are minor and
made often. Success in business or in life depends on the decisions made.
Therefore, what is involved in good decision making is crucial. Decision
theory is an analytical and systematic approach to the study of decision
making.
It’s important to distinguish between a good decision and a bad decision. A
good decision:
 Is based on logic
 Is made after considering all available data and alternatives
 Applies appropriate quantitative techniques
A bad decision misses at least one of these components.

Even though a good decision occasionally does not result in favourable
outcome it is still a good decision because if used in the long term it results
in successful outcomes. A bad decision sometimes by luck may results in a
favourable outcome but none the less it is still a bad decision.
There are six steps involved in taking any decision irrespective of how major
or minor it’s such as taking a trip to town or investing two millions of
shillings.
a) Clearly de昀椀ne the problem at hand (for example, whether or not produce
a new product x)
b) List possible alternatives (strategies or courses of action) which the
decision maker can choose from. For example, production of x can be
from a large plant, a small plant or some other alternatives. Not producing
at all that is doing nothing is an important alternative. All important
alternatives must be considered.
c) Identify possible outcomes. The outcomes that the decision maker has no
control over are termed as states of nature. Since the product is for sale
the possible outcomes are the kind of demand for the product that will
exist in the market: the product might have high demand or it might have
low demand. The full ranges of outcomes have to be considered;
pessimistic and optimistic ones.
d) List the payo昀昀s or pro昀椀t of each combination of alternatives and
outcomes. It is clear that not all decisions can be evaluated on the basis
of pro昀椀t but a way to measure bene昀椀ts from di昀昀erent alternatives and
outcomes has to be found. Such payo昀昀s are termed as conditional
values. The payo昀昀s are more easily compared when presented in a
payo昀昀 matrix, also termed as payo昀昀 table or decision table. (see table 1)
e) Select one of the mathematical decision theory models
f) Apply model to make the decision.
116

lOMoARcPSD|36154693
Table1: pay o昀昀 table (matrix) showing conditional values for a manufacturer
State of nature
Strategy or Favourable Unfavourable
alternatives market market
Construct large 200,000 -180,000
plant
Construct small 100,000 -20,000
plant
Do nothing 0 0
Decision Making Environment for managers:
Managers make decision in environments which can be grouped into four
states:
 Certainty
 Risk
 Uncertainty
 Con昀氀ict / Game theory
Both decision theory and game theory have the objective of assisting the
decision maker by providing a structure to enable the evaluation of
information of the relative likelihood of di昀昀erent outcomes so that the best
course of action can be identi昀椀ed.
a) Environment of Certainty
Certainty exists if all information required to make a decision is known and

available. This is a case of perfect information. Assuming certainty for a
problem where all the information is not known with certainty often provides
a reasonable approximation of the optimal solution. This is where all the
information about which state of nature will occur is for sure. The model used
to recommend the best cause of action is deterministic models.
b) Environment of Risk
Condition of risk exists if perfect information is not available but the

probabilities of certain outcomes can be estimated. Therefore, decision
making under risk relies heavily on probability theory. Various stochastic
methods have been developed for decision making under conditions of risk
as queuing theory. In a risk situation the di昀昀erent outcomes available to the
decision maker have known probabilities which can be expressed in a
probability distribution or function.
The method of using the expected monetary value (EMV) is the most popular
method of decision making under risk. EMV is the weighted sum of possible
payo昀昀s for each alternative. In this environment it’s not known exactly which
state of nature will occur. However, there is su昀케cient information for us to
estimate the chances of occurrence of the various state of nature. The model
is used to recommend the best cause of action is probabilistic models
(stochastic models).
This includes:
117

lOMoARcPSD|36154693
i) Maximise expected monetary value

ii) Minimum expected opportunity loss
Either the case use the formula: Expected value =Σ (Real value ₓ
corresponding probability)
E (X) = Σ X P (X)
Example:
James M is a manager who is contemplating in putting up plant which could
be large or small. The following data has to interrupt; the market demand is
likely to be either favourable or unfavourable. If James constructs a large
plant and under favourable market is likely to get a pro昀椀t of 200,000, but if
the market demand is unfavourable he makes loss of 180,000. If he
constructs a small plant and under a favourable market he gets a pro昀椀t of
100,000 but if the market is unfavourable he gets a loss of 20,000. Further
James believed the favourable and unfavourable markets are equally likely.
Represent the above information in decision table and advice the
management on what plant to put up basing on monetary value and
opportunity loss.
Solution:
Decision table:
State of nature
Strategy or Favourable Unfavourable market
alternatives market (0.5) (0.5)
Construct large 200,000 - 180,000
plant
Construct small 100,000 -20,000
plant
No plant 0 0
Maximise expected monetary value:
Large plant: 200,000 (0.5) + -180,000 (0.5) = 100,000 – 90,000 = 10,000
Small plant: 100,000 (0.5) + -20,000 (0.5) = 50,000 – 10,000 = 40,000
No plant: 0 (0.5) + 0 (0.5) = 0
Decision is to put up small plant as it will maximise on the expected
monetary value
Opportunity loss:
This is the amount one would lose by not taking the best alternative. It is
also called the amount of regret. To obtain the regret table, for each state on
nature we get the di昀昀erence between the consequences of any alternative
and the best possible alternative i.e.
Opportunity loss table/ regret table:
Options Favourable market Unfavourable
market
Large plant 200,000 – 200,000 = 0 0 - -180,000 =
180,000
Small plant 200,000 – 100,000 = 0 - -20,000 =
100,000 20,000
118

lOMoARcPSD|36154693
No plant 200,000 – 0 = 200,000 0 – 0 = 0

Expected opportunity loss;
Large plant: 0 (0.5) + 180,000 (0.5) = 90,000
Small plant: 100,000 (0.5) + 20,000 (0.5) = 60,000
No plant: 200,000 (0.5) + 0 (0.5) = 100,000
Decision is put up small plant as it minimises on the opportunity loss.
c) Environment under uncertainty
These refer to situations where more than one outcome can result from any
single decision. Several methods are used to make decision in circumstances
where only the pay o昀昀s are known and the likelihood of each state of nature
are known.
a) Maximin Method
This criteria is based on the “conservative approach’ to assume that the worst
possible is going to happen. The decision maker considers each strategy and
locates the minimum pay o昀昀 for each and then selects that alternative which
maximizes the minimum payo昀昀
Illustration
Rank the products A B and C applying the Maximin rule using the following
payo昀昀 table showing potential pro昀椀ts and losses which are expected to arise
from launching these three products in three market conditions
Pay o昀昀 table in £ 000’s

Boom Steady state Recession Mini pro昀椀ts
condition row minima
Product A +8 1 -10 -10
Product B -2 +6 +12 -2
Product C +16 0 -26 -26
Table 1
Ranking the MAXIMIN rule = BAC
b) MAXIMAX method
This method is based on ‘extreme optimism’ the decision maker selects that
particular strategy which corresponds to the maximum of the maximum pay
o昀昀 for each strategy
Illustration
Using the above example
Max. pro昀椀ts row maxima
Product A +8
Product B +12
Product C +16
119

lOMoARcPSD|36154693
Ranking using the MAXIMAX method = CBA
c) MINIMAX regret method

This method assumes that the decision maker will experience ‘regret’ after
he has made the decision and the events have occurred. The decision maker
selects the alternative which minimizes the maximum possible regret.
Illustration
Regret table in £ 000’s
Boom Steady state Recessio Mini regret row
condition n maxima
Product A 8 5 22 22
Product B 18 0 0 18
Product C 0 6 38 38
A regret table (table 2) is constructed based on the payo昀昀 table. The regret is
the ‘opportunity loss’ from taking one decision given that a certain
contingency occurs in our example whether there is boom steady state or
recession
The ranking using MINIMAX regret method = BAC
d) The expected monetary value method

The expected pay o昀昀 (pro昀椀t) associated with a given combination of act and
event is obtained by multiplying the payo昀昀 for that act and event
combination by the probability of occurrence of the given event. The
expected monetary value (EMV) of an act is the sum of all expected
conditional pro昀椀ts associated with that act
Example
A manager has a choice between
i. A risky contract promising shs 7 million with probability 0.6 and shs 4
million with probability 0.4 and
ii. A diversi昀椀ed portfolio consisting of two contracts with independent
outcomes each promising Shs 3.5 million with probability 0.6 and shs 2
million with probability 0.4
Can you arrive at the decision using EMV method?
Solution
The conditional payo昀昀 table for the problem may be constructed as below.
(Shillings in millions)
Event Probability Conditional pay o昀昀s Expected pay o昀昀 decision
Ei (Ei) decision
(i) Contract Portfolio(iii Contract (i) x Portfolio (i) x
(ii) ) (ii) (iii)
Ei 0.6 7 3.5 4.2 2.1
120

lOMoARcPSD|36154693
E2 0.4 4 2 1.6 0.8

EMV 5.8 2.9
Using the EMV method the manager must go in for the risky contract which
will yield him a higher expected monetary value of shs 5.8 million
e) Expected opportunity loss (EOL) method

This method is aimed at minimizing the expected opportunity loss (OEL). The
decision maker chooses the strategy with the minimum expected opportunity
loss
f) The Hurwitz method

This method was the concept of coe昀케cient of optimism (or pessimism)
introduced by L. Hurwicz. The decision maker takes into account both the
maximum and minimum pay o昀昀 for each alternative and assigns them
weights according to his degree of optimism (or pessimism). The alternative
which maximizes the sum of these weighted payo昀昀s is then selected
g) The Laplace method

This method uses all the information by assigning equal probabilities to the
possible payo昀昀s for each action and then selecting that alternative which
corresponds to the maximum expected pay o昀昀
Example
A company is considering investing in one of three investment opportunities
A, B and C under certain economic conditions. The payo昀昀 matrix for this
situation is economic condition
Investment 1£ 2£ 3£
opportunities
A 5000 7000 3000
B -2000 10000 6000
C 4000 4000 4000
Determine the best investment opportunity using the following criteria

i. Maximin
ii. Maximax
iii. Minimax
iv. Hurwicz (Alpha = 0.3
Solution
Economic condition
Investment 1£ 2£ 3£ Minimu Maximum
opportunities m£ £
A 5000 7000 3000 3000 7000
B -2000 10000 6000 -2000 10000
121

lOMoARcPSD|36154693
C 4000 4000 4000 4000 4000

i. Using the Maximin rule Highest minimum = £ 4000
Choose investment C
ii. Using the Maximax rule Highest maximum = £ 10000
Choose investment B
a. Minimax Regret rule
1 2 3 Maximum regret
A 0 3000 3000 3000
B 7000 0 0 7000
C 1000 6000 2000 6000
Choose the minimum of the maximum regret i.e. £3000

Choose investment A
iii. Hurwicz rule: expected values
For A (7000 x 0.3) + (3000 x 0.7) = 2100 + 2100 = £4200
For B (10000 x 0.3) + (-2000 x 0.7) = 3000 + 1400 = £ 1600
For C (4000 x 0.3) + (4000 x 0.7) = 1200 + 2800 = £ 4000
Best outcome is £ 4200 choose investment A
GAME THEORY
 Game theory is used to determine the optimum strategy in a competitive
situation.
 When two or more competitors are engaged in making decisions, it may
involve con昀氀ict of interest.
 In such a case the outcome depends not only upon an individual’s action
but also upon the action of others.
 Both competing sides face a similar problem. Hence game theory is a
science of con昀氀ict
Game theory does not concern itself with 昀椀nding an optimum strategy but it
helps to improve the decision process.
Game theory has been used in business and industry to develop:
 bidding tactics,
 pricing policies,
 advertising strategies,
 timing of the introduction of new models in the market e.t.c.
RULES/ ASSUMPTIONS OF GAME THEORY

i. The number of competitors is 昀椀nite
ii. There is con昀氀ict of interests between the participants
122

lOMoARcPSD|36154693
iii. Each of these participants has available to him a 昀椀nite set of available
courses of action i.e. choices
iv. The rules governing these choices are speci昀椀ed and known to all
players
v. While playing each player chooses a course of action from a list of
choices available to him.
vi. The outcome of the game is a昀昀ected by choices made by all of the
players. The choices are to be made simultaneously so that no
competitor knows his opponents choice until he is already committed
to his own.
vii. The outcome for all speci昀椀c choices by all the players is known in
advance and numerically de昀椀ned.
NOTE: When a competitive situation meets all these criteria above we call it a
game. Only in a few real life competitive situation can game theory be applied
because all the rules are di昀케cult to apply at the same time to a given situation.
LIMITATIONS OF GAME THEORY:

a) Most of the competitive situations in which managerial decisions are
made are never really a two-person games because the government and
or society are present as the third and /or fourth persons in the game.
b) There are many situations in the managerial decisions environment when
both the competitors may lose or gain i.e. it may not be a zero-sum game.
c) In real life game, the two competitors rarely have equal information or
intelligence.
d) The technique of solving games involving mixed strategies practically in
case of larger pay o昀昀 matrices is very complicated. This limit the
application of this analysis.
DEFINITION OF TERMS:
Game: It is an activity between two or more persons involving actions by
each one of them according to a set of rules which results in some gain for
each. If in a game the actions are determined by skills, it is called game of
strategy but if they are determined by chance it is termed as a game of
chance.
Player: Is each participant or competitor playing a game.

Play: A play of the game is said to occur when each player chooses one of
his courses of action.
Strategy: It is the total pattern of choices employed by any player. It’s a

complete set of plan of action specifying precisely what the player will do
123

lOMoARcPSD|36154693
under every possible future contingency that might occur during the play of
the game. Two types of strategies are:
a) Pure strategy – It’s a situation where each player in the game adopts
a simple strategy as an optimal strategy. Here the value of the game
is the same for both players.
b) Mixed strategy – A player adopt a mixture of strategies if the game
is played many times. In this case the players’ uses a combination of
strategies and each player always keep guessing as to which course of
action is to be selected by the other player at a particular occasion.
Thus, there is a probabilistic situation and objective of the player is to
maximize expected gains or to minimize losses.
Example
Two players X and Y have two alternatives each. They show their choices by
pressing two types of buttons in front of them but they cannot see the
opponents move. It is assumed that both players have equal intelligence and
both intend to win the game.
This sort of simple game can be illustrated in tabular form as follows:
Player Y
Player X Button r Button t
Button m X wins 2 points X wins 3 points
Button n Y wins 2 points X wins 1 point
The game is biased against Y because if player X presses button ‘m’ he will
always win. Hence Y will be forced to press button r to cut down his losses
Alternative example
Player Y
Player X Button r Button t
Button m X wins 3 points Y wins 4 points
Button n Y wins 2 points X wins 1 point
In this case X will not be able to press button ‘m’ all the time in order to win (or
button ‘n’). Similarly Y will not be able to press button ‘r’ or button‘t’ all the
time in order to win. In such a situation each player will exercise his choice for
part of the time based on the probability.
STANDARD CONVENTIONS IN GAME THEORY:

Consider the following table
Y
X 3 -4
-2 1
(Assuming X wins on +ve and Y wins on –ve)
X plays row I, Y plays columns I, X wins 3 points
124

lOMoARcPSD|36154693
X plays row I, Y plays columns II, X looses 4 points

X plays row II, Y plays columns I, X looses 2 points
X plays row II, Y plays columns II, X wins 1 point
3, -4, -2, 1 are the known pay o昀昀s and here the game has been represented in
the form of a matrix. When the games are expressed in this fashion the
resulting matrix is commonly known as PAYOFF MATRIX.
STRATEGY:
It refers to a total pattern of choices employed by any player. Strategy could be
pure or a mixed.
 In a pure strategy, player X will play one row all of the time or player Y
will also play one of the column all the time.
 In a mixed strategy, player X will play each of his rows a certain
portion of the time and player Y will play each of his columns a certain
portion of the time.
VALUE OF THE GAME:

Refers to the average pay o昀昀 per play of the game over an extended period of
time.
a) Pure strategy Game

Example
Determine the optimum strategies for the two players X and Y and 昀椀nd the
value of the game from the following pay o昀昀 matrix
Player Y
 3 -1 4 2 
Player X  -1 -3 -7 0 
 4 -7 3 -9 
Strategy assume the worst and act accordingly if X plays 昀椀rst with his row one
then Y will play with his 2nd column to win 1 point similarly if X plays with his 2nd
row then Y will play his 3rd column to win 7 points and if x plays with his 3rd row
then Y will play his fourth column to win 9 points
In this game X cannot win so he should adopt 昀椀rst row strategy in order to
minimize losses
This decision rule is known as ‘maximum strategy’ i.e. X chooses the highest of
these minimum pay o昀昀s
Using the same reasoning from the point of view of y

If Y plays with his 1st column, then X will play his 3rd row to win 4 points
If Y plays with his 2nd column, then X will play his 1st row to lose 1 point
If Y plays with his 3rd column, then X will play his 1st row to win 4 points
125

lOMoARcPSD|36154693
If Y plays with his 4th column, then X will play his 1st row to win 2 points
Thus player Y will make the best of the situation by playing his 2 nd column
which is a ‘Minimax strategy’
This game is also a game of pure strategy and the value of the game is –1(win
of 1 point per game to y) using matrix notation, the solution is shown below
Player Y
Row Minimum
 3 -1 4 2  1
Player X  -1 -3 -7 0  7
 4 -7 3 -9  9
4 -1 4 2
column maximum
In this case value of the game is –1

Minimum of the column maximums is –1
Maximum of the row is also –1
And best strategies are: For player X 1st row
For Player Y 2nd column
Saddle Point
The saddle point in a pay o昀昀 matrix is one which is the smallest value in
its row and the largest value in its column. It is also known as equilibrium
point in the theory of games.
Saddle point also gives the value of such a game. In a game having a saddle
point, the optimum strategy for both players is to play the row or column
containing the saddle point.
Note: if in a game there is no saddle point the players will resort to what is
known as mixed strategies.
b) Mixed Strategies
Example
Find the optimum strategies and the value of the game from the following pay
o昀昀 matrix concerning two person game
Player Y
 1 4
Player X  
 5 3
In this game there is no saddle point.
Let Q be the proportion of time player X spends playing his 1 st row and 1-Q be
the proportion of time player X spends playing his 2nd row;
Similarly
126

lOMoARcPSD|36154693
Let R be the proportion of time player Y spends playing his 1 st column and 1-R
be the proportion of time player Y spends playing his second row
The following matrix shows this strategy
Player Y
R 1 R
Q  1 4
Player X
1  Q  5 3 
X’s strategy
X will like to divide his play between his rows in such a way that his expected
winning or loses when Y plays the 1st column will be equal to his expected
winning or losses when y plays the second column
Column 1
Points Proportion played Expected winnings
1 Q Q
5 1-Q 5(1-Q)
Total = Q + 5(1 –Q)

Column 2
Points Proportion played Expected winnings
4 Q 4Q
3 1-Q 3(1-Q)
Total = 4Q + 3(1 –Q)

Therefore Q + 5(1-Q) = 4Q +3(1-Q)
Giving Q = and (1-Q) =
This means that player X should play his 昀椀rst row th of the time and his second
row th of the time
Using the same reasoning
1×R + 4(1-R) = 5R +3(1-R)
Giving R = and (1-R) =
This means that player Y should divide his time between his 昀椀rst column and
second column in the ratio 1:4
Player Y
1 4
5 5
2
5  1 4
Player X 3 
5 5 3 
Short cut method of determining mixed matrices
127

lOMoARcPSD|36154693
Player Y
 1 4
Player X  
 5 3
Step I
Subtract the smaller pay o昀昀 in each row from the larger one and smaller pay
o昀昀 in each column from the larger one
 1 4  4 -1 3
 5 3  5 - 3 2
 
5  1 4 4  3 1
Step II
Interchange each of these pairs of subtracted numbers found in step I
 1 4 2
 5 3 3
 
1 4
Thus player X plays his two rows in the ratio 2: 3
And player Y plays his columns in the ratio 1:4
This is the same result as calculated before
To determine the value of the game in mixed strategies

In a simple 2 x 2 game without a saddle point, each players strategy consists
of two probabilities denoting the portion of the time he spends on each of his
rows or columns. Since each player plays a random pattern the probabilities
are listed under
Pay o昀昀 Strategies which produce this Joint

pay o昀昀 probability
1 Row I column I
4 Row I column II
5 Row II column I
3 Row II column II
Expected value (or value of the game)

Pay o昀昀 Probability p(x) Expected value x
(p(x)
1
4
5
3
Ƹx p(x) = 85/25 = 17/5 = 3.4

3.4 is the value of the game
128

lOMoARcPSD|36154693
DOMINANCE
Dominated strategy is useful for reducing the size of the payo昀昀 table.
Rule of dominance
i. If all the elements in a column are greater than or equal to the
corresponding elements in another column, then the column is
dominated.
ii. Similarly if all the elements in a row are less than or equal to the
corresponding elements in another row, then the row is dominated.
Dominated rows and columns may be deleted which reduces the size of the
game to a 2 by 2 game.
N.B. Always look for dominance then saddle points 昀椀rst when solving
a game problem.
Example:
Determine the optimum strategies and the value of the game from the
following 2 x m pay o昀昀 matrix game for X and Y
Y
 6 3  1 0  3
X  
 3 2  4 2  1
In this columns I, II, and IV are dominated by columns III and V hence Y will not
play these columns.
So the game is reduced to 2×2 matrix, hence this game can be solved using
methods already discussed.
Y
  1  3
X  
  4  1
129

Statistics Notes 2022

Uploaded by

Copyright:

Available Formats

Statistics Notes 2022

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Statistics Notes 2022

Uploaded by

Copyright:

Available Formats

lOMoARcPSD|36154693

Statistics Notes 2022

Introduction to probability and statistics (University of Nairobi)

Scan to open on Studocu

Studocu is not sponsored or endorsed by any college or university

1.0 Purpose of the Course

2.0 Expected Learning Outcomes of the Course

4.0 Course outline

4.3 MEASURES OF CENTRAL TENDENCY

4.4 MEASURES OF DISPERSION

Downloaded by RONILO CADIGAL (ronilo.cadigal@deped.gov.ph)

4.4.2 Significance of measuring dispersion

4.5 PROBABILITY DISTRIBUTIONS

4.6 SAMPLING AND SAMPLING DISTRIBUTIONS

4.7 ESTIMATION THEORY

4.8 HYPOTHESIS TESTING

4.9 CHI-SQUARE TESTS

Downloaded by RONILO CADIGAL (ronilo.cadigal@deped.gov.ph)

4.10 ANALYSIS OF VARIANCE

4.11 REGRESSION AND CORRELATION ANALYSIS

5.0 Methods of Delivery

5.0 Instructional Material and/ or Equipment

6.0 Course Assessment

Continuous Assessments Tests x 2 20%

Downloaded by RONILO CADIGAL (ronilo.cadigal@deped.gov.ph)

Term Paper/Assignments 15%

7.0 Core Reading Materials for the Course

8.0 Recommended Reference Materials

Downloaded by RONILO CADIGAL (ronilo.cadigal@deped.gov.ph)

LESSON ONE: INTRODUCTION

1.1 Definition of Statistics

Downloaded by RONILO CADIGAL (ronilo.cadigal@deped.gov.ph)

1.2 Types of Statistics

1.3 Population, Sample and Variables

population mean (µ) or population variance (  ).

 Statistic: is a quantitative measure that describes a characteristic of a sample e.g. sample

Downloaded by RONILO CADIGAL (ronilo.cadigal@deped.gov.ph)

1.4 Functions of Statistics

1.5 Applications of Statistical Knowledge in Business Management

Downloaded by RONILO CADIGAL (ronilo.cadigal@deped.gov.ph)

Downloaded by RONILO CADIGAL (ronilo.cadigal@deped.gov.ph)

1.6 Limitations of Decision-making

1.7 Levels of Measurement

 It’s the lowest level of measurement

Downloaded by RONILO CADIGAL (ronilo.cadigal@deped.gov.ph)

 It merely groups observations into categories based on common characteristics eg gender,

 Chi-square test is the most common test of statistical significance.

Downloaded by RONILO CADIGAL (ronilo.cadigal@deped.gov.ph)

LESSON TWO: DATA COLLECTION, ORGANIZATION AND

Downloaded by RONILO CADIGAL (ronilo.cadigal@deped.gov.ph)

 Before using secondary data it is important to consider the following:

Downloaded by RONILO CADIGAL (ronilo.cadigal@deped.gov.ph)

iii) Whether the data are reliable

Downloaded by RONILO CADIGAL (ronilo.cadigal@deped.gov.ph)

2.3 Organization and Presentation of Data

Raw Data Array

Downloaded by RONILO CADIGAL (ronilo.cadigal@deped.gov.ph)

Downloaded by RONILO CADIGAL (ronilo.cadigal@deped.gov.ph)

iii) Class size