Stat Int Note1

UNIT ONE: INTRODUCTION
Objectives:
At the end of this session, students should be able to:

 understand statistics and basic terminologies
 understand scales of measurement in statistics
 understand the basic methods of data collection
1.1 Definition and Classification of Statistics

Definition of Statistics
The word statistics has several meanings. Statistics is a plural noun which describes a collection of numerical
facts or figures such as employment statistics, accident statistics, population statistics etc. It is in this sense that
the word 'statistics' is usually understood by a layman.
Remark: statistics are aggregate of facts. Single and isolated figures are not statistics as they cannot be
compared and are unrelated.
Secondly statistics is singular noun which deals with the methods of collecting, organizing, presenting,
analyzing and interpreting statistical data.
Classification of Statistics
Statistics may be divided into two main branches:
1) Descriptive Statistics
2) Inferential Statistics
Descriptive statistics:
 Includes statistical methods involving the collection, presentation, and characterization of a set of data in
order to describe the various features of the data.
 Methods of descriptive statistics include graphic methods (bar chart, pie chart, e t c) and numeric
measures (mean, median, variance e t c).
 Descriptive statistics do not allow us to make conclusions beyond the data we have analyzed.
 Meaningful and pertinent information cannot be realized from raw data unless summarized by the tools
of descriptive statistics.
 Descriptive statistics, therefore, allow us to present the data in a more meaningful way which allows
interpretation of the data easily.
Inferential statistics:
 Includes statistical methods which facilitate estimation the characteristics of a population or making
decisions concerning a population on the basis of sample results.
 In this regard, methods like estimation and hypothesis testing are examples of inferential statistics.
For example, a biologist collected blood samples of 10 students from biology department to study blood types.
Accordingly, the following data is obtained:
O A O AB A O O B A O
Summary measures, for example, the proportion of students with blood type O in the sample is 50% is an
example of descriptive statistics. We can also describe the data using bar or pie charts.
However, if he/she wants to get information on the proportion of students with blood type O in the entire class,
he/she may use the sample proportion (50%) as an estimate of the corresponding value of the entire class. This
is an example of inferential statistics.
1
1.2 Stages in statistical investigation
A statistical study might involve the following stages: collection of data, organizing and presenting the
collected data, analyzing and interpreting the result.
Stage 1: DATA COLLECTION: this stage involves acquiring data related with the problem at hand.
Stage 2: ORGANIZING AND PRESENTING DATA: this stage involves the classification or sorting the
collected data based on some characteristics or attributes such as age, sex, marital status e t c. Further we may
use tables, graphs, charts so on to present the data.
Stage 3: DATA ANALYSIS: a systematic analysis of the data is necessary in order to reach conclusions or
provide answers to a problem. The analysis might require simple or sophisticated statistical tools depending on
the type of answers that may have to be provided.
Stage 4: INTERPRETATION OF THE RESULT: logically a statistical analysis has to be followed by
conclusions in order to be able to make a decision. The technical terminology used to describe this last process
of a statistical study is referred to as interpretation.
1.3 Definition of some terms

A population: Consists of all elements, individuals, items or objectives whose characteristics are being studied.
The population that is being studied is called target population.
Sample: A portion of the population selected for study.
Sample survey: The technique of collecting information from a portion of the population.
Census survey: A survey that includes every member of the population.
Variable: is a characteristic under study that assumes different values for different element.
Quantitative variable: A variable that can be measured numerically. The data collected on quantitative
variable are called quantitative data. Examples include weight, height, number of students in a class, number of
car accidents, e t c.
Qualitative variable: A variable that cannot assume a numerical value but can be classified into two or more
non numerical categories. The data collected on such a variable are called qualitative or categorical data.
Examples include sex, blood type, marital status, religion e t c.
Discrete variable: a variable whose values are countable. Examples include number patients in a hospital.
Continuous variable: a variable that can assume any numerical value over a certain interval or intervals.
Examples include weight of new born babies, height of seedlings, temperature measurements e t c.
Parameter: A statistical measure obtained from a population data. Examples include population mean,
proportion, variance and so on.
Statistic: A statistical measure obtained from a sample data. Examples include sample mean, proportion,
variance and so on.
Unit of analysis: The type of thing being measured in the data, such as persons, families, households, states,
nations, etc.
1.4 Applications, uses and limitations of statistics

Application of statistics
We pointed out that statistics has already become a very important subject area, and, that various tools of
statistics are being used to solve problems in everyday life, in research, in marketing, in planning, in production
and quality control and other areas. Nevertheless, statistics has its own limitation and it can also be misused. In
the following section we outline the limitations.
Limitation of statistics
 Statistics deals with only those subjects of inquiry which are capable of being quantitatively measured and
numerically expressed.
 Statistics deals only with aggregates of facts and no importance is attached to individual items
 Statistical data is only approximately and not mathematically correct
2
 Statistics is liable to be misused. Hence expertise in the subject is very essential. Besides, honesty is very
important in the use of statistics.
1.5 Scales of measurement

Formally, we distinguish among four levels of measurement scales.
Nominal scale:
 It is the simplest measurement scale.
 There is no natural ordering of the levels or values of the scale in nominal scale.
 For example, sex of an individual may be male or female. There is no natural ordering of the two sexes.
Others examples include religion, blood type, eye colour, marital status e t c.
 The values of nominal scale can be coded using numerical values;
 However, we cannot perform any mathematical operations on the numbers used to code.
Ordinal scale:
 This measurement scale is similar to the nominal scale but the levels or categories can be ranked or
order.
 That is, we can compare levels or categories of the scale.
 Therefore, this scale of measurement gives better information on the quantities being measured as
compared to nominal scale. For example, living standard of a family can be poor, medium or higher.
 These categories can be ordered as poor is less than medium and medium is less than higher class.
 However, the distance or magnitude between the levels, say between poor and medium, is not clearly
known.
Interval scale:
 This measurement scale shares the ordering or ranking and labeling properties of ordinal scale of
measurement. Besides, the distance or magnitude between two values is clearly known (meaningful).
 However, it lacks a true zero point (i.e., zero point is not meaningful). For example, temperature in
degree centigrade or Fahrenheit of an object. If the temperature of an object is zero degree centigrade, it
doesn’t mean that the object lacks heat. Hence zero is arbitrary point in the scale. It doesn’t make sense
to say that 80° F is twice as hot as 40° F; in centigrade the ratio would be 6; neither ratio is meaningful.
 We can do subtraction and addition on interval level data but division and multiplication are impossible.
Ratio scale:
 It is the highest level of measurement scale.
 It shares the ordering, labeling and meaningful distance properties of interval scale.
 In addition, it has a true or meaningful zero point. The existence of a true zero makes the ratio of two
measures meaningful. For instance, if your salary is 1000 birr and your wife’s is 2000 we can say that
your wife earns twice of yours. If you don’t have any source of income, your income is zero in this scale
context and it is meaningful assignment. Other example includes, weight, height e t c.
 We can do subtraction, addition, multiplication and division on ration level data.
The more precise variable is ratio variable and the least precise is the nominal variable. Ratio and interval level
data are classified under quantitative variable and, nominal and ordinal level data are classified under qualitative
variable.
3
UNIT TWO: METHODS OF DATA COLLECTION AND PRESENTATION
Objectives:
After completing this unit you should be able to
 organize data using frequency distribution.
 present data using suitable graphs or diagrams.
2.1 Methods of data collection
Depending on the source, data can be primary or secondary.
Primary data refers to the statistical data which the investigator originates for the purpose of inquiry.
Secondary data refers to data which is not originated by the investigator himself, but which he obtains from
someone else records. Secondary data can be obtained from published or unpublished documents: reports,
journals, magazines, articles e t c.
Primary methods of data collection: It includes data collection using observation, personal interview, self
administered questionnaire, mailed questionnaire etc.
2.2 Classification and tabulation of data

The uses of classifying and tabulating data are:
 to display the points of similarity and dissimilarity;
 to save mental strain by systematic condensation and suppression of irrelevant detail;
 to enable one to form a mental picture of objects of perception; and
 to prepare the ground for comparison and inference.
Types of classification
1. Geographical- in terms of cities, districts, countries etc.
2. Chronological - on the basis of time
3. Qualitative - according to some qualitative characteristics.
4. Quantitative – in terms of magnitude.
One can also use combination of these to classify data.
Tabulation: tables may be classified according to the number of characteristics used for tabulation.
1. Simple or one way table: it uses only one characteristic or variable for classification.
Example 2.1: Students who took introduction to statistics in 1998 E.C.by gender.
Gender Number
Male 2000
Female 700
2. Two-way tables: it uses two characteristics for classification.
Example 2.2: Students who took introduction to statistics in 1998 E.C.by age and gender.
Age Gender
Number of male Number of female
19 and below 200 180
20-25 1415 385
26 and above 385 135
3. Higher ordered tables: results when we have more than two characteristics of classification. For instance,
we can classify the students who took introduction to statistics in 2014 by age, gender and faculty.
2.3 Frequency distributions

In this section, we will concentrate on some of the frequently used method of organizing data. The easiest
method of organizing data is using a frequency distribution, which converts raw data into a meaningful pattern
for statistical analysis.
The main uses of a frequency distribution are
 to organize data in a meaningful way.
4
 to enable one to determine the nature or shape of the distribution; how the observations cluster around a
central value; and how the values spread around the center of the data.
 to facilitate computational procedures for measures of average and spread.
 to enable one to draw charts and graphs for the presentation of data.
 to enable one to make comparisons between data sets.
Terminologies
Frequency distribution: a grouping of data into categories showing the number of observations in each
mutually exclusive category.
Array: data put in an ascending or descending order of magnitude.
Grouped data: data presented in the form of a frequency distribution.
Frequency: the number of observations corresponding to a fixed value or to a class of values.
Relative frequency: the number obtained when the frequency of a class is divided by total number of
observations.
Components of a frequency distribution
Class limits: the values of a variable which typically serve to identify the classes of a frequency distribution.
They are sometimes referred to as nominal or apparent limits. The smaller and the larger values are known as
the lower and the upper class limits, respectively. They should be selected in such a way that they have the same
number of significant places or units of measurement as the observations to be classified.
Class boundaries: the precise points which separate various classes rather than the values included in any one
of the classes. They are sometimes referred to as exact or true limits. They leave no space for ambiguity and
overlapping. A class boundary is located mid-way between the upper class limit of a class and the lower class
limit of the next higher class. They are carried out to one more decimal place than the class limits.
Class mark: the point which divides the class into two equal parts. This is also known as class mid-point. This
can be determined by dividing the sum of the two limits or the sum of the two boundaries by 2.
Class width: the length of a class.
Example 2.3: The following data are the weights in kg of 40 individuals participated in a diet program for
weight loss:
70 64 99 55 64 89 87 65 62 38 67 70 60 69 78 39 75 56 71 51
99 68 95 86 57 53 47 50 55 81 80 98 51 36 63 66 85 79 83 70
By grouping data into classes we can make the data much easier to read and understand. We group these data by
10s. The smallest weight is 36 kg, thus the 1rst class of weights is 31 kg up to, including, 40 kg.
Table 2.1: Distribution of weights.
Class Class boundary Count (Frequency)
31 – 40 30.5-40.5 3
41 – 50 40.5-50.5 2
51 – 60 50.5-60.5 8
61 – 70 60.5-70.5 12
71 – 80 70.5-80.5 5
81 – 90 80.5-90.5 6
91 – 100 90.5-100.5 4
Total 40
For this example, the first class is ‘31-40’. Lower limit of this class = 31; upper limit = 40. The lower class
boundary = 30.5; upper class boundary = 40.5. The width of the class = upper class boundary - lower class
boundary = 40.5-30.5 = 10. The class mark (class mid-point) of this class is (31+40)/2 = 35.5. The values 36,
39, 38 are included in this class. Therefore, the frequency of this class is 3.
Guidelines for constructing a frequency distribution
 Find the range of the data
Range =R=Maximum value – Minimum value
 Determine the number of classes.
5
Let K be the number of classes and n be the number of observations to be classified. There are two
alternatives to determine K.
1. Choose K to be between 5-15;
2. Use the following formula known as Sturgess’ formula given by:
and round K to the nearest integer.
 Determine the class width , let it be W , by using
W=Range/K
 Tally the observations, count and assign frequencies to the classes.
Example 2.4: The following data are on the number of minutes to travel from home to work for a group of
automobile workers: 28 25 48 37 41 19 32 26 16 23 23 29 36 31 26 21 32 25 31 43 35 42 38 33
28. Construct a frequency distribution for this data.
Solution:
 Range = 48 – 16 =32
 K=1+3.322 =5.64≈6
 W=32/6=5.33 rounding up to the nearest integer i.e W=6.
Let the lower limit of the first class be 16 then the frequency distribution is as follows:
Class limit Class boundaries Tally Frequency
16-21 15.5-21.5 \\\ 3
22-27 21.5-27.5 \\\\\ \ 6
28-33 27.5-33.5 \\\\\ \\\ 8
34-39 33.5-39.5 \\\\ 4
40-45 39.5-45.5 \\\ 3
46-51 45.5-51.5 \ 1
Total 25
The final frequency distribution is shown in table below.
Table: The distribution of the times

Time (in minute) Number of workers
16-21 3
22-27 6
28-33 8
34-39 4
40-45 3
46-51 1
Total 25
This frequency distribution is more understandable than the raw data. For instance, many observations are
found in the second class and third class. This in turn implies that many workers took around 22 to 33 minutes
to travel from home to work.
Types of frequency distributions
Based on the type of frequency assigned to the classes we have three types of frequency distributions:
 Absolute frequency distribution
 Relative frequency distribution
 Cumulative frequency distribution
The frequency distributions that we have seen in the previous examples (examples 2.3 and 2.4 ) are absolute
frequency distributions because the frequencies assigned are absolute frequencies.
Definition 2.1: A relative frequency distribution is a distribution which specifies the frequency of a class
relative to the total frequency.
6
Example 2.5: Convert the above absolute frequency distribution in example 2.4 to a relative frequency
distribution.
Solution: First we find the relative frequency of each class. The relative frequency of a class is the frequency of
the class divided by the total number of observations. For instance the relative frequency of the first class is
3/25=0.12, the relative frequency of the second class is 6/25=0.24, and so on. Thus, the relative frequency
distribution is shown in the table below.
Table: Relative frequency distribution of times

Time (in minute) Relative frequency
16-21 0.12
22-27 0.24
28-33 0.32
34-39 0.16
40-45 0.12
46-51 0.04
Total 1
Note: Proportion may also be changed to percentages to obtain a percentage relative frequency distribution.
Definition 2.2: Cumulative frequency refers to the number of observations that are below a specified value or
that are above a specified value.
Note: Class boundaries are mostly used to obtain cumulative frequencies. Based on whether the observations
are bounded from above or from below, we can have a cumulative less than or a cumulative more than
frequency distributions, respectively.
Example 2.6: Convert the absolute frequency distribution in example 2.4 into:
i) a cumulative less than frequency distribution.
ii) a cumulative more than frequency distribution.
Solution:
i) We use the class boundaries to form cumulative frequencies. For instance, there is no observation
which is less than 15.5, 3 observations are less than 21.5, 9 observations are less than 27.5 and so on.
Thus, the following less than cumulative frequency distribution is obtained.
Table: Less than cumulative frequency distribution of times

Time (in minute) Cumulative frequency
Less than 15.5 0
Less than 21.5 3
Less than 27.5 9
Less than 33.5 17
Less than 39.5 21
Less than 45.5 24
Less than 51.5 25
ii) There are 25 observations which are more than 15.5, 22 observations are more than 21.5, 16
observations are more than 27.5 and so on. Thus, the following more than cumulative frequency
distribution is obtained.
7
Table: More than cumulative frequency distribution of times
Time (in minute) Cumulative frequency
More than 15.5 25
More than 21.5 22
More than 27.5 16
More than 33.5 8
More than 39.5 4
More than 45.5 1
More than 51.5 0
Note: If class limits are used instead of class boundaries the phrases ‘or more’ and ‘or less’ are used in place of
‘more than’ and ‘less than’, respectively to obtain cumulative frequencies.
Ungrouped frequency distributions (Single-value grouping)
In the previous examples each class that we used for grouping data represented a range of possible values. In
some cases, however, using classes that each represents a single value is more appropriate.
Example 2.7: A demographer is interested in the number of children a family may have. He took a random
sample of 30 families. The following data is the number of children in a sample of 30 families.
4 2 4 3 2 8 3 4 4 2 2 8 5 3 4
4 5 4 3 5 2 7 3 3 6 7 3 8 4 5
To group these data, we will use classes based on the single numerical value.
Table: Distribution of the number of children.

Number of Children Frequency Relative frequency
2 5 .17
3 7 .23
4 8 .27
5 4 .13
6 1 .03
7 2 .07
8 3 .1
Total 30 1
As we can see from this frequency distribution most families have 4 or 3 children. It would have been difficult
to observe such feature of the data if we did not organize the raw data using a frequency distribution.
Note: Up to now we have seen frequency distributions for quantitative data; we can have also frequency
distributions for qualitative (categorical) data.
Categorical frequency distributions

The categorical frequency distribution is used for data which can be placed in specific categories such as
nominal or ordinal level data. For example, data on political affiliation, religious affiliation, blood type, marital
status, or major field of study would use categorical frequency distributions.
Example 2.8: The following data are on the political party affiliations of sample of 40 biology students. D, R,
and O stand for Democratic, Republican and Other, respectively.
D D D D O R O R O R O R O D D R D D D R
R O R D R R O R R R R R O O R R D R D D
8
The classes for grouping are ‘Democratic’, ‘Republican’ and ‘Other’.
Table: Number of students by political party affiliations

Class frequency Relative frequency
Democratic 13 0.325
Republican 18 0.45
Other 9 0.225
Total 40 1
2.4 Diagrammatic and graphical presentation of data

2.4.1 Graphs for quantitative data
Histogram: it consists of a set of adjacent rectangles whose bases are marked off by class boundaries (not class
limits) along the horizontal axis and whose heights are proportional to the frequencies associated with the
respective classes.
To construct a histogram from a data set:
1. Construct a frequency table.
2. Draw adjacent bars having heights determined by the frequencies in step1.
The importance of a histogram is that it enables us to organize and present data graphically so as to draw
attention to certain important features of the data. For instance, a histogram can often indicate how symmetric
the data are; how spread out the data are; whether there are intervals having high levels of data concentration;
whether there are gaps in the data; and whether some data values are far apart from others.
Example 2.9: The following is a histogram for the frequency distribution in example 2.4.
Figure: Distribution of number of minutes spent by the automobile workers
Frequency polygon: is a graphic form of a frequency distribution. It can be constructed by plotting the class
frequencies against class marks and joining them by a set of line segments.
Note: we should add two classes with zero frequencies at the two ends of the frequency distribution to complete
the polygon.
Example 2.10: Construct a frequency polygon for the frequency distribution of the time spent by the
automobile workers that we have seen in example 2.4.
9
Figure: Distribution of number of minutes spent by the automobile workers
2.4.2 Graphs useful for presenting qualitative data

Bar charts are diagrammatic representation of data in which the data are represented by series of vertical or
horizontal bars, the height (or length) of each bar indicating the size of the figure represented.
Example 2.11: Draw a bar chart for the following coffee production data.
Table: Coffee productions from 1990 to 1995.
Production year 1990 1991 1992 1993 1994 1995

Amounts of coffee (in 1000 tons) 50 75 92 64 100 120
A m o u n t o f c o ffe e in 1 0 0 0 to n s
120
100
80
60
40
20
0
1990 1991 1992 1993 1994 1995
Production year
Figure: Production of coffee from 1990 to 1995.
Pie-chart: it is a circle divided by radial lines into sections or sectors so that the area of each sector is
proportional to the size of the figure represented.
Pie-chart construction:
 Calculate the percentage frequency of each component. It is .
 Calculate the degree measures of each sector. It is given by .

 Draw the circle using protractor and compass
Example 2.13: Draw a pie-chart to represent the following data on a certain family expenditure.
Table: Family expenditure.
Item Food Clothing House rent Fuel & light Miscellaneous Total
Expenditure(in birr) 50 30 20 15 35 150
Percentage 33.33 20 13.33 10 23.33
frequencies
Angles of the sector 1200 720 480 360 840 3600
10
Item
Food
Clothing
House rent
Fuel and light
Miscellaneous
Figure: Family expenditure
Example 2.14: The following data are the blood types of 50 volunteers at a blood plasma donation clinic:
O A O AB A A O O B A O A AB B O O O A B A A O A A B O B A O AB A O O
A B A A A O B O O A O A B O AB A O
a) Organize this data using a categorical frequency distribution
b) Present the data using both a pie and a bar chart.
Solution
a) The classes of the frequency distribution are A, B, O, AB. Count the number of donors for each of the
blood types.
Table: The number of donors by blood types
Blood type Frequency Percent
A 19 38.0
B 8 16.0
O 19 38.0
AB 4 8.0
Total 50 100.0
b) Pie chart
Find the percentage of donors for each blood type. In order to find the angles of the sector for each blood
type, multiply the corresponding percentage by 3600 and divide by 100.
Blood type Frequency Percent Angles

A 19 38.0 136.80
B 8 16.0 57.60
O 19 38.0 136.80
AB 4 8.0 28.80
Total 50 100.0 360 0
11
Blood type
A
B
O
AB
Figure: Distribution of blood types of donors.

Bar chart
Label the horizontal axis by blood types and the vertical axis by the number of donors. Then draw a bar
for each blood type which is proportional to their percentage or frequency.
20
Num ber of donors
15
10
0
A B O AB
Blood type
Figure : Distribution of blood types of donors.
12
UNIT THREE: MEASURES OF CENTERAL TENDENCY
Objectives:
Having studied this unit, you should be able to:
 understand the role of descriptive statistics in summarization, description and interpretation of data.
 use several numerical methods belonging to measures of central tendency to describe the characteristics
of a data set.
3.1 Introduction and objectives of measuring central tendency

In unit 2, we discussed how raw data can be organized in terms of tables, charts and frequency distributions in
order to be easily understood and analyzed. Frequency distributions and their corresponding graphical displays
roughly tell us some of the features of a data set. However, they don’t condense the mass of data in a way that
we can easily understand and interpret. In this chapter, we will see how to summarize data using a descriptive
measure called average. This will help us in condensing a mass of data into a single value which is in some
sense representative of the whole data set.
Suppose you might be interested to know electric power supply needed for a certain residential area. You might
get individual families electricity demand. This mass of data might not be helpful in making decision regarding,
say how to allocate electric power among individual families. But if you compute an average value from the
data set it may help you to make a decision.
Definition 3.1: An average is a single value intended to represent a distribution as a whole.
Note that the individual values of the distribution must have a tendency to cluster around an average. In view of
this requirement an average is also referred to as a measure of central tendency.
An average (a measure of central tendency) is considered satisfactory if it possesses all or most of the following
properties. An average should be
 based on all the observed values.
 simple to understand and easy to interpret.
 calculable by reasonable ease and rapidity.
 easily manipulated algebraically.
 little affected by fluctuations of sampling.
 should not unduly be influenced by extreme values.
 it should be defined rigidly which means that it should have a definite value.
Objectives of measuring central tendency:
 To get one single value that describes the characteristics of the entire group.
 To facilitate comparison between different data sets.
3.2 The summation notation

Suppose a variable is represented by X. The successive values of this variable may be represented by using
subscripts or indexes as x1, x2, x3,…, xn. If the sum of these values or terms is required, we write x 1+x2+x3+…
+xn. The Greek letter ∑ (read as sigma) can be used to write the above sum in a compact form as
where 1= lower limit and n = upper limit. When no confusion can result we
shall often denote the above sum simply by . Using this notation we may have also
occasions to write terms like
13
Rules of summation
and
3.3 Types of measures of central tendency

3.3.1 Arithmetic mean
Definition 3.2:
i) Let be the values of the variable X. The simple arithmetic mean denoted by is the
sum of these observations of X divided by the no values.
ii) If the numbers occur with frequencies , respectively. Then mean can be
defined in a more compact form as
Note that if the data refers to a population data the mean is denoted by the Greek letter µ (read as mu).
Arithmetic mean for raw data (ungrouped data)
Example 3.1: The following data is the weight (in Kg) of eight youths: 32,37,41,39,36,43,48 and 36. Calculate
the arithmetic mean of their weight.
Solution:
Example 3.2: The ages of a random sample of patients in a given hospital in Ethiopia is given below:
Age 10 12 14 16 18 20 22
Number of patients 3 6 10 14 11 5 4
Calculate the average age of these patients.
Solution:
Age (xi) Number of patients (fi)
10 3 30
12 6 72
14 10 140
14
16 14 224
18 11 198
20 5 100
22 4 88
Total 53 852
Thus the mean age of these patients is 16.075.

The weighted arithmetic mean
In some cases the data in the sample or population should not be weighted equally, and each value weighted
according to its importance. There is a measure of average for such problems known as weighted Arithmetic
mean. Weighted arithmetic mean is used to calculate the average when the relative importance of the
observations differs. This relative importance is technically known as weight. Weight could be a frequency or
numerical coefficient associated with observations.
Definition 3.3: If have weights , respectively, then the weighted arithmetic
mean denoted by , is defined as
Example 3.3: The GPA or CGPA of a student is a good example of a weighted arithmetic mean. Suppose that
Solomon obtained the following grades in the first semester of the freshman program at AASTU in 2006.
Course Credit hour (wi) Grade
Math101 4 A=4
Stat2091 3 C=2
Chem101 3 B=3
Phys101 4 B=3
Flen101 3 C=2
Find the GPA of Solomon.
Example 3.4: In a vacancy for a position of botanist in an organization, the criteria of selection were work experience,
entrance exam, and, interview result. The relative importance of these criteria was regarded to be different. The
weights of these criteria and the scores obtained by 3 candidates (out of 100 in each criterion) are given in the following
table. In addition, the selection of a candidate is based on average result on these criteria.
Criterion Weight Candidates
Tesfaye Gutema Kedir
Work experience 4 70 89 85
Entrance exam 3 78 83 89
Interview result 2 90 92 90
15
Who is the appropriate candidate for the position based on the criteria?
Solution: We use the weighted mean since the relative importances of these criteria are different.
Criterion Weight Candidates
Tesfaye Gutema Kedir
xi xiwi xi xiwi xi xiwi
Work experience 4 70 280 89 356 85 340
Entrance exam 3 78 234 83 249 89 267
Interview result 2 90 180 92 184 90 180
Total 9 238 694 264 789 264 787
The weighted mean and the simple arithmetic mean for the applicants are as follows:
Applicant Tesfaye Gutema Kedir
Weighted mean 694/9=77.11 789/9=87.67 787/9=87.44
Simple arithmetic mean 238/3=79.33 264/3=88 264/3=88
If we use the simple arithmetic mean of the scores, both Gutema and Kedir have got equal chances to be
recruited. However, the relative importance of the criteria is different. So we have to use the weighted mean for
discriminating among the candidates. The weighted mean of the scores obtained by Gutema is larger than the
others. So Gutema should be recruited for the job.
Properties of arithmetic mean
i. It can be computed for any set of numerical data, it always exists, and unique.
ii. It depends on all observations.
iii. The sum of deviations of the observations about the mean is zero i.e.
iv. It is greatly affected by extreme values.

v. It lends itself to further statistical treatment, for instance, combinations of means.
vi. It is relatively reliable, i.e. it is not greatly affected by fluctuations in sampling.
vii. The sum of squares of deviations of all observations about the mean is the minimum
i.e. for any constant A.
3.3.2 Geometric mean
Definition 3.4: The geometric mean of any n positive numbers is the n th root of the products of the numbers.
Symbolically if are given their geometric (G.M) mean is given by
Example 3.5: Find the geometric mean of the following numbers 2, 4, 8.

Solution:
.
Note: The geometric mean is useful in finding the average of percentages, ratios, indexes, or growth rates.
Example 3.6: During the beginning of an epidemic in a region 12 cases were reported in the first day, 18 on
second day and 48 on the third day.
a) Find the average growth rate of the epidemic disease.
b) Assuming that the growth pattern continues, forecast the number of cases that would be reported on the
4th and 8th days.
16
Solution:
a) Find the 2 growth rates first.
From first day to second day the rate is 18/12=1.5.
From second day to third day the rate is 48/18=2.67.
Therefore, the average rate = .
b)
Day Number of cases
1st 12
2nd 24=2 12
3rd 48=2
4th 96=2
5th 192=2
Example 3.7: A company’s year-to-year changes in fuel consumption expenditures were 5, 10, 20, 40 and 60
percent. Determine the average yearly percent change in expenditure.
Solution: The 1st, 2nd ,3rd , 4th , and 5th growth rates are 105 %,110%,120%,140%, and 160%, respectively.
Average growth rate=
The average percentage change=125.43% -100%=25.43%
Example 3.8: The rate of increase in population of a country during the last three decades is 5 percent, 8
percent, and 12 percent. Find the average rate of growth during the last three decades.
Solution: Since the data is given in terms of percentage, therefore geometric mean is a more appropriate
measure. The calculations of geometric mean are shown in the following table.
Decade Rate of increase Population at the end of decade (x) taking proceeding
in population (in %) decade as 100
1 5 105
2 8 108
3 12 112
Average growth rate=
Hence the average rate of increase in population over the last three decades is 108.2-100=8.2 percent.
3.3.3 The median

Definition 3.5: the median of a set of data is a value which divides the set in such a way that the number of
observations below it is the same as the number of observations above it.
Median for raw data
i. If the number of observations, say n, is odd then the median is equal to the observation of
the array.
ii. If the number of observations n is even then the median is equal to the sum of observation
and observation divided by two.
17
Notation: If X is the variable under consideration, then is used to denote the median.
Example 3.9: Find the median for the following sets of data:
i. 10 5 7 9 6 5 4
Solution: First arrange the data in the form of an array.
4 5 5 6 7 9 10
Here we have n=7 which is odd
Therefore, the median, observation = the 4th observation = 6.
ii. 10 5 7 9 6 5 4 8
Solution: Arrange the data in ascending order.
4 5 5 6 7 8 9 10
Here n=8 which is even.
Therefore,
iii. A shop keeper (sales person) recorded the number of video cassette recorders (VCRs) sold per
month over a two year period. Find the median number VCRs sold.
Number of sets sold Frequency ( months) Cumulative frequency
1 3 3
2 8 11
3 5 16
4 4 20
5 2 22
6 1 23
7 1 24
The number of observations n=24 , even.
Properties of median
 It is an average of position.
 It is affected by the number of observations than by extreme values.
 The sum of the deviations about the median, signs ignored, is less than the sum of deviations taken from
any other value or specific average.
3.3.4 The mode
Definition 3.6: The mode (modal value) of an observed set of data is the value that occurs the largest number of
times.
The mode for raw data

Example 3.10: Find the modal value for the following sets of data.
i. 5 6 5 8 7 4 . In this data set, 5 is the most frequent value. Therefore, the mode is 5. Since the
modal value is only one number, we call the distribution unimodal.
18
ii. 1 2 3 4 8 2 5 4 6. In this data,the modal values are 2 and 4 since both 2 and 4 appear most
frequently and they occur equal number of times. These kind distributions are called bimodal
distribution.
iii. 1 2 4 3 5 6 8 7 In this data set, all values appear equal number of times so there is no modal
value.
Note:
 If a distribution has more than two modal values then we call the distribution multimodal.
 If in a set of observed values, all values occur once or equal number of times, there is no mode.
 The mode is also useful in finding the most typical case when the data are nominal or categorical.
Example 3.11: A survey showed the following distribution for the number of students enrolled in each field. Find the
mode.
Field Number of students
Civil 850
Electrical 825
Computer sciences 645
Mechanical 478
Chemical 100
Solution: Since the category with the highest frequency is Civil, the most typical case is a Civil.
19
Properties of modal value
 It is easy to calculate and understand.
 It is not affected by extreme values.
 It is not based on all observations.
 Is not used in further analysis of data.
The mean, median, and mode of grouped data
The mean for grouped data can be found by considering the values in the interval are centered at the mid-point
of the interval.
Example 3.12: Consider the frequency distribution of the time spent by the automobile workers. Find the mean
time spent by these workers from this frequency distribution.
Time (in minute) Class mark (xi) Number of workers f×xi
15.5- 21.5 18.5 3 55.5
21.5-27.5 24.5 6 147
27.5-33.5 30.5 8 244
33.5-39.5 36.5 4 146
39.5-45.5 42.5 3 127.5
45.5-51.5 48.5 1 48.5
Total 25 768.5
Solution:
Note: In case of grouped data if any class interval is open, arithmetic mean cannot be calculated.
The median for grouped data can be approximated by the following formula.
where Lm= lower class boundary for the median class.

n= total number of observations in the distribution.
cf= less than cumulative frequency for the class preceding the median class.
w= class width for median class.
fm=frequency for median class.
Note that the median class is the class containing the (n/2) th observation.
Example 3.13: Find the median for the following frequency distribution.
Class boundaries Frequency (f) Cumulative frequency
5.5-10.5 1 1
10.5-15.5 2 3
15.5-20.5 3 6
20.5-25.5 5 11
25.5-30.5 4 15
30.5-35.5 3 18
35.5-40.5 2 20
Solution: The class containing the (n/2) observation or the 10 observation is the median class. This class has
th th
class boundaries 20.5 & 25.5(4th class).
Therefore, the median is 24.5.

Note:
i. We approximate the median by assuming that the values in the median class are evenly distributed.
20
ii. We can compute the median for open-ended frequency distribution as long as the middle value does
not occur in the open-ended class.
The mode for grouped data can be estimated by the following formula.
The modal value is denoted by . For grouped data we can compute the mode as follows:
where f1 = frequency of the modal class

f0 = frequency of the class preceding the modal class
f2 = frequency of the class next to the modal class
L1= lower class boundary of the modal class
W = class width of the modal class
Note: The modal class is the class with the highest frequency.
Example 3.14: Calculate the modal time spent by the automobile workers.
Time (in minute) Number of workers
15.5- 21.5 3
21.5-27.5 6
27.5-33.5 8
33.5-39.5 4
39.5-45.5 3
45.5-51.5 1
The modal class is the class with largest frequency and it is the third class.
Therefore, the modal time spent is 29.5 minutes.

Note: The mode can be calculated for distributions with open ended classes.
3.3.5 Quartiles, Deciles and Percentiles

These are averages of position. They are collective known as fractile (quantile) points.
Definition 3.7
Quartiles are three points which divide an array into four parts in such a way that each portion contains an equal
number of elements. The 1st, the 2nd and the 3rd points are known as the 1st, the 2nd and the 3rd quartiles and are
usually denoted by Q1, Q2 and Q3, respectively.
Deciles are nine points which divide an array into 10 parts in such a way that each part contains equal number
of elements. The 1st, 2nd,…, and the 9th points are known as the 1st, 2nd,…, and the 9th deciles and are usually
denoted by D1,D2,…,D9, respectively.
Percentiles are 99 points which divide an array into 100 parts in such a way that each part consists of equal
number of elements. The 1st, 2nd… and the 99th points are known as the 1st, 2nd… and the 99th percentiles and are
usually denoted by P1, P2… P99, respectively.
Note: The array should be in ascending order in order to get the quantiles.
i. Quantile points for raw data
First form an array in an ascending order and then apply the following procedure.
21
Example 3.15: The following data relate to sizes of shoes sold at a stock during a week. Find the quartiles, the
seventh decile and the 90th percentile.
Size of shoes 5 5.5 6 6.5 7 7.5 8 8.5 9 9.5
Number of pairs 2 5 15 30 60 40 23 11 4 1
Solution: The total number of observations is 191.
Note: Relationships between fractile points
22
 Q1=P25
 Q2=P50=D5=
 Q3=P75
 D1=P10; D2=P20 …D9=P90.
23
UNIT FOUR: MEASURES OF VARIATION
Objectives:
Having studied this unit, you should be able to
 understand the importance of measuring the variability (dispersion) in a data set.
 measure the scatter or dispersion in a data set.
 understand ‘moments’ as a convenient and unifying method for summarizing several descriptive
statistical measures.
 measure the extent to which the distribution of values in a data set deviate from symmetry.
4.1 Introduction and objectives of measuring variation
We have seen that averages are representatives of a frequency distribution. But they fail to give a complete
picture of the distribution. They do not tell anything about the spread or dispersion of observations within the
distribution. Suppose that we have the distribution of yield (kg per plot) of two rice varieties from 5 plots each.
Variety 1: 45 42 42 41 40
Variety 2: 54 48 42 33 30
The mean yield of both varieties is 42 kg. The mean yield of variety 1 is close to the values in this variety. On
the other hand, the mean yield of variety 2 is not close to the values in variety 2. The mean doesn’t tell us how
the observations are close to each other. This example suggests that a measure of central tendency alone is not
sufficient to describe a frequency distribution. Therefore, we should have a measure of spreads of observations.
There are different measures of dispersion. In this chapter we shall discus the most commonly used measure of
dispersion or variation like Range, Quartile Deviation, Standard Deviation, coefficient of variation. And
measure of shape such as skewness and kurtosis.
Objectives of measuring variation
 To describe dispersion (variability) in a data.
 To compare the spread in two or more distributions.
 To determine the reliability of an average.
Note: The desirable properties of good measures of variation are almost identical with that of a good measure of
central tendency.
4.2 Absolute and relative measures
Measures of variation may be either absolute or relative. Absolute measures of variation are expressed in the
same unit of measurement in which the original data are given. These values may be used to compare the
variation in two distributions provided that the variables are in the same units and of the same average size.
In case the two sets of data are expressed in different units, however, such as quintals of sugar versus tones of
sugarcane or if the average sizes are very different such as manager’s salary versus worker’s salary, the absolute
measures of dispersion are not comparable. In such cases measures of relative dispersion should be used. A
measure of relative dispersion is the ratio of a measure of absolute dispersion to an appropriate measure of
central tendency. It is a unitless measure.
4.3 Types of measures of variation
The range and relative range
Definition 4.1: Range is defined as the difference between the maximum and minimum observations in a set of
data.
Range is the crudest absolute measures of variation. It is widely used in the construction of quality control
charts and description of daily temperature.
24
Definition 4.2: Relative range (RR) is defined as
Variance, standard deviation and coefficient of variation
Definition 4.3: The variance is the average of the squares of the distance each value is from the mean. The
symbol for the population variance is σ2 (σ is the Greek lower case letter sigma). Let x1,x2,…,xN be the
measurements on N population units then, the population variance is given by the formula:
where and N=Population size.
Definition 4.4: The standard deviation is the square root of the variance. The symbol for the population standard
deviation is The corresponding formula for the standard deviation is
Example 4.1: The height of members of a certain committee was measured in inches and the data is presented
below.
Height(x): 69 66 67 69 64 63 65 68 72
2 -1 0 2 -3 -4 -2 1 5
4 1 0 4 9 16 4 1 25
And
Definition 4.5: The sample variance is denoted by S2, and its formula is
Definition 4.6: The sample standard deviation, denoted by S, is the square root of the sample variance
Example 4.2: For a newly created position, a manager interviewed the following numbers of applicants each
day over a five-day period: 16, 19, 15, 15, and 14. Find the variance and standard deviation.
Solution:
25
Note that the procedure for finding the variance and standard deviation for grouped data is similar to that for
finding the mean for grouped data, and it uses the mid-points of each class.
Properties of variance
 The unit of measurement of the variance is the square of the unit of measurement of the observed
values. It is one of its limitations.
 The variance gives more weight to extreme values as compared to those which are near to mean value,
because the difference is squared in variance.
 It is based on all observations in the data set.
Properties of standard deviation
 Standard deviation is considered to be the best measure of dispersion and is used widely.
 There is, however, one difficulty with it. If the unit of measurement of variables of two series is not the
same, then their variability cannot be compared by comparing the values of standard deviation.
Uses of the variance and standard deviation
 The variance and standard deviations can be used to determine the spread of data, consistency of a
variable and the proportion of data values that fall within a specified interval in a distribution.
 If the variance or standard deviation is large, the data is more dispersed. This information is useful in
comparing two or more data sets to determine which is more (most) variable.
 Finally, the variance and standard deviation are used quite often in inferential statistics.
Coefficient of variation (CV)
The standard deviation is an absolute measure of dispersion. The corresponding relative measure is known as
the coefficient of variation (CV).
Coefficient of variation is used in such problems where we want to compare the variability of two or more
different series. Coefficient of variation is the ratio of the standard deviation to the arithmetic mean, usually
expressed in percent:
where S is the standard deviation of the observations.

A distribution having less coefficient of variation is said to be less variable or more consistent or more uniform
or more homogeneous.
Example 4.3: Last semester, the students of Computer science and Earth science Departments took Stat 273
course. At the end of the semester, the following information was recorded.
Department Computer Earth
science science
Mean score 79 64
Standard deviation 23 11
Compare the relative dispersions of the two departments’ scores using the appropriate way.
Solution:
Computer science Earth science
Since the CV of Computer science Department students is greater than that of Earth science Department
students, we can say that there is more dispersion in the distribution of Computer science students’ scores
compared with that of Earth science.
Example 4.4: The mean weight of 20 children was found to be 30 kg with variance of 16kg2 and their mean
height was 150 cm with variance of 25cm2. Compare the variability of weight and height of these children.
26
The weight of the children is more variable than their height.
2.2.1 Standard score
A standard score is a measure that describes the relative position of a single score in the entire distribution of
scores in terms of the mean and standard deviation. It also gives us the number of standard deviations a
particular observation lie above or below the mean.
Population standard score: where is the value of the observation, and are the mean and
standard deviation of the population respectively.
Sample standard score: where is the value of the observation, and are the mean and standard
deviation of the sample respectively.
Interpretation:
Example 4.5: Two sections were given an exam in a course. The average score was 72 with standard deviation
of 6 for section 1 and 85 with standard deviation of 5 for section 2. Student A from section 1 scored 84 and
student B from section 2 scored 90. Who performed better relative to his/her group?
Solution: Section 1: = 72, = 6 and score of student A from Section 1; = 84
Section 2: = 85, = 5 and score of student B from Section 2; = 90
Z-score of student A:
Z-score of student B:
From these two standard scores, we can conclude that student A has performed better relative to his/her section
students because his/her score is two standard deviations above the mean score of selection 1 while the score of
student B is only one standard deviation above the mean score of section 2 students.
Example 4.6: A student scored 65 on a calculus test that had a mean of 50 and a standard deviation of 10; she
scored 30 on a history test with a mean of 25 and a standard deviation of 5. Compare her relative positions on
each test.
Solution: First, find the z-scores.
For calculus the z-score is
For history the z-score is
Since the z-score for calculus is larger, her relative position in the calculus class is higher than her relative
position in the history class.
4.4 Moments, skewness and kurtosis
Moments
Definition 4.7: The average of deviations from an arbitrary origin raised to an integral power of the observations
of a distribution is defined as a moment. Let x1,x2,…,xn be observations, we define the r-th moment about A as:
27
The most known moments are moments about the mean also known as the central moments and the moments
about zero (also known as moments about the origin.)
The rth moment about the mean, µr, is given by:
Special ceases: µ0=1, µ1=0, µ2=s2.

The rth moment about the origin, , is given by:
Special cases: , ,
Skweness: it refers to lack of symmetry in a distribution.

Note: for a symmetrical and unimodal distribution:
i) Mean =median =mode
ii) The lower and upper quartiles are equidistant from the median, so also are corresponding pairs of deciles
and percentiles.
iii) Sum of positive deviations from the median is equal to the sum of negative deviations (signs ignored).
iv) The two tails of the frequency curve are equal in length from the central value.
If a distribution is not symmetrical we call it skewed distribution.
Measures of skewness
i) Pearsonian coefficient of skewness (Pcsk) defined as:
In moderately skewed distributions: Mode = mean- 3(mean-median)
Interpretation:
Note: in a negatively skewed distribution larger values are more frequent than smaller values. In a positively
skewed distribution smaller values are more frequent than larger values.
Example 4.7: If the mean, mode and s.d of a frequency distribution are 70.2, 73.6, and 6.4, respectively. What
can one state about its skeweness?
This figure suggests that there is some negative skeweness.
28
Kurtosis: it refers to the degree of peakedness of a distribution.
When the values of a distribution are closely bunched around the mode in such a way that the peak of the
distribution becomes relatively high, the distribution is said to be leptokurtic. If it is flat topped we call it
platykurtic. A distribution which is neither highly peaked nor flat topped is known as a meso-kurtic distribution
(normal).
Measures of kurtosis
i. Moment coefficient of kurtosis (Mck) is given by
where
Interpretation:
29
UNIT FIVE: ELEMENTARY PROBABILITY
Objectives:
 understand the elements of probability
 calculate some probabilities of events associated with random experiments
 apply the concept of probability in some biological phenomena
5.1 Introduction
Why it is that science is not always certain? Nature is complex and full of unexplained variability. In addition,
almost all methods of observation and experiment are imperfect. Observers are subject to human bias and error.
Science is a continuing story; subjects vary; measurements fluctuate. Biomedical science, in particular, contains
controversy and disagreement; with the best of intentions, biomedical data, medical histories, physical
examinations, interpretations of clinical tests, descriptions of symptoms and diseases are somewhat inexact. But
most important of all, we always have to deal with incomplete information: It is either impossible, or too costly,
or too time consuming, to study the entire population; we often have to rely on information gained from a
sample, that is, a subgroup of the population under investigation. So some uncertainty almost always prevails.
Science and scientists cope with uncertainty by using the concept of probability. By calculating probabilities,
they are able to describe what has happened and predict what should happen in the future under similar
conditions. In short, more often the quantities we are interested in will not be predictable in advance but, rather,
will exhibit an inherent variation. Probability and statistics are concerned in the quantification of such quantities
( or random phenomena).
5.2 Definition of some probability terms

Definition 5.1: Random experiment is an experiment in which the outcome cannot be determined or
predicted exactly in advance, i.e. it is the process of observing or measuring the outcome of a chance event.
Some of the characteristics of a random experiment are
 all the possible outcomes of the experiment can be specified in advance.
 the experiment can be repeated indefinitely.
 there is a sort of regularity in the outcomes observed in large repetitions of the experiment.
Examples of random experiments includes throwing a fair coin and observing the outcome, throwing a fair die
and observing the number on the top face, taking a student at random from science class and noting the sex of
the student.
All of these examples satisfy the above characteristics of a random experiment.
Definition 5.2:
Sample point (outcome): The individual result of a random experiment.
Sample space: The set containing all possible sample points (out comes) of the random experiment. The
sample space is often called the universe and denoted by S.
Event: The collection of outcomes or simply a subset of the sample space. We denote events with capital
letters, A, B, C, etc.
Example 5.1: If an experiment consists of flipping of a coin once, then

S = {H, T} where H means that the outcome of the toss is a head and T that it is a tail. A= {H} represents the
event of head occurring.
Example 5.2: If an experiment consists of rolling a die once and observing the number on top, then the sample
space is S = {1, 2, 3, 4, 5, 6} where the outcome i means that i appeared on the die, i = 1, 2, 3, 4, 5, 6. {1}, {2},
{3},{4},{5} and {6}are elementary events i.e. events consisting of a single outcome. Let A represents the event
of an odd number will occur, then A is simply the set containing 1, 3 and 5 i.e. A= {1, 3, 5}.
30
Review of set theory
Concepts of set theory are important in understanding probability. Given A, B and C are events associated with
a sample space S and ω represents an elementary event (outcome) in S, then the following are some useful
definitions and results in set theory.
Definitions 5.3:
1. Union: The union of A and B, A u B, is the event containing all sample points in either
A or B or both. Sometimes we use A or B for union.
2. Intersection: The intersection of A and B, A n B, is the event containing all sample points that are both in
A and B. Sometimes we use AB or A and B for intersection.
3. Subset: If for any ω A, then ω B. Then .
4. Empty set: If a set A contains no points, it will be called the null set, or empty set, and denoted by .
5. Complement: The complement of a set A denoted by Ac is the set where ω S, ω Ac but, ω .
6. Mutually Exclusive Events: Two events are said to be mutually exclusive (or disjoint) if their
intersection is empty. (i.e. A n B = ). Subsets A1, A2,… are defined to be mutually exclusive if A i n Aj =
for every i ≠ j.
Theorem 5.1: Important elementary set theory results

i) Au B=B u A and A n B = B n A
ii) Au (B u C) = (Au B) u C and A n (B n C) = (A n B) n C
iii) An (B u C) = (A n B) u (A n C) and Au (B n C) = (A u B) n (A u C)
iv) (Ac)c = A
v) An S = A; A u S = S; A n = ; and A u A =A
vi) (A u B)c = Ac n Bc and (A n B)c = Ac u Bc
5.3 Counting rules

Combinatorics refers to the methods used to count things. If a sample space contains a finite set of outcomes,
determining the probability of an event often is a counting problem. But often the numbers are just too large to
count in the 1, 2, 3, 4 ordinary ways. For example, if you put a grain of rice on the first square of a chessboard,
then two grains on the second square, four on the third square, and continue doubling until all 64 squares are
filled, how many grains of rice would you have in all? The number is so large that it is difficult to handle
without a systematic enumeration technique.
In short, to assign probabilities for an event, we might need to enumerate the possible outcomes of a random
experiment and need to know the number of possible outcomes favoring the event. The following principles
will help us in determining the number of possible outcomes favoring a given event.
Theorem 5.2: Addition principle

If a task can be accomplished by k distinct procedures where the i th procedure has ni alternatives, then the total
number of ways of accomplishing the task equals
n1 + n2+…+nk.
Example 5.3: Suppose one wants to purchase a certain commodity and that this commodity is on sale in 5
government owned shops, 6 public shops and 10 private shops. How many alternatives are there for the person
to purchase this commodity?
Solution: Total number of ways =5+6+10=21 ways
Theorem 5.3: Multiplication principle

If a choice consists of k steps of which the first can be made in n 1 ways, for each of these the second can be
made in n2 ways,…, and for each of these the kth can be made in nk ways, then the whole choice can be made
in n1.n2….nk ways.
31
Example 5.4: If we can go from Addis Ababa to Rome in 2 ways and from Rome to Washington D.C. in 3
ways then the number of ways in which we can go from Addis Ababa to Rome to Washington D.C. is 2x3
ways or 6 ways. We may illustrate the situation by using a tree diagram below:
W
R W
W
A
W
R
W
Example 5.5: If a test consists of 10 multiple choice questions, with each permitting 4 possible answers, how
many ways are there in which a student gives his/her answers?
Solution: There are 10 steps required to complete the test.
First step: To give answer to question number one. He/she has 4 alternatives.
Second step: To give answer to question number two, he/she has 4 alternatives……
Last step: To give answer to last question, he/she has 4 alternatives.
Therefore, he/she has 4x4x4x…x4=410 ways or1, 048, 576 ways of completing the exam. Note that there is only
one way in which he /she can give correct answers to all questions and that there are 3 10 ways in which all the
answers will be incorrect.
Example 5.6: A manufactured item must pass through three control stations. At each station the item is
inspected for a particular characteristic and marked accordingly. At the first station, three ratings are possible
while at the last two stations four ratings are possible. Hence there are 48 ways in which the item may be
marked.
Example 5.7: Suppose that car plate has three letters followed by three digits. How many possible car plates are
there, if each plate begins with a H or an F?
2x 26x 26x 10x 10x 10 or 1, 352, 000 different plates.
Definition 5.4: If n is a positive integer, we define n!= n(n-1)(n-2)…1 and call it n-factorial and 0!=1.
Permutations
Suppose that we have n different objects. In how many ways, say nPn, may these objects be arranged
(permuted)? For example, if we have objects a, b and c we can consider the following arrangements: abc, acb,
bac, bca, cab, and cba. Thus the answer is 6. The following theorem gives general result on the number of such
arrangements.
Theorem 5.4: Permutation

i) The number of permutations of n different objects is given by nPn= n!
ii) A permutation of n objects, arranged in groups of size r, without repetition, and order being important
is:
32
Example 5.8: Suppose that we have five letters a, b, c, d.
i) What is the number of possible arrangements of these letters taken all at a time?
ii) What is the number of possible arrangements of these letters if we use only three of the letters at a time?
Solution:
i) Using (i) of theorem 5.4, we have 4! ways of arranging the 4 letters, i.e. we have 24 possible
arrangements.
ii) Using (ii) of theorem 5.4, we have 4P3 ways of arranging 3 letters taken from the four letters, i.e. we have
24 possible arrangements.
Example 5.9: In a class with 8 boys and 8 girls
i) In how many ways can the children line up if they alternate girl-boy-girl-boy-... ?
ii) In how many ways can the children line up so that no two of the same sex are next to each other?
Solution:
i) The 8 girls can line-up in 8! ways, and likewise the 8 boys can line-up in 8! ways. For any single
arrangement of the girls, all possible arrangements of the boys are possible, thus by multiplication
principle we have 8!x 8! ways to arrange the children in girl-boy lines.
ii) Now we must include the case of boy-girl. So we have 2x8!x 8! ways of arranging.
Example 5.10: If I have 5 different books on my shelf, in how many ways can I arrange these books? Solution:
We can arrange the books in 5! different ways or 5x4x3x2x1 ways or 120 ways.
Remarks
i) The number of permutations of n distinct objects arranged in a circle is (n-1)!.
This is because we consider two permutations the same if one is a rotation of the other. For n objects arranged
around a circle, there a n rotations that give the same permutation. Dividing n! by n gives (n - 1)!. The two
circular permutations below are considered the same; their order is a, b, c, d, e.
ii) Permutations when not all objects are different

Given n objects of which n1 are one kind, n2 are another kind, …, nk of another kind, then the total number of
distinct permutations that can be made from these objects is .
Example 5.11
i) How many "words" (text strings or distinct arrangements) can be made from the letters b,k,o,o?
ii) How many permutations are there for the letters in the word banana?
Solution:
i) If we label the two o’s as o 1 and o2, and think of them as distinct, then the number of permutations is
4!. For each permutation there will be a matching permutation that switches the o’s, that is for o1o2bk
there is the matching o2o1bk permutation. We can see then that if we divide the number of distinct
permutations by two, we have a count of the number of permutations of the 4 letters where we do not
distinguish between the two o’s. Therefore, there are distinct4!/2 text strings or 12 text strings.
ii) If we think of all 6 letters as distinct, then we would have 6! permutations. As in the preceding
example for the two n’s, we would need to divide 6! by 2. For the 3 a’s, we would have 6 counts for
a single permutation. For instance, each of the following would be a single word if the a’s were not
distinct. a1a2a3bnn, a1a3a2bnn, a2a1a3bnn, a2a3a1bnn, a3a1a2bnn, and a3a2a1bnn. Hence the number of
distinct permutations of the word banana is
33
.
Combinations
Consider n different objects. This time we are concerned with counting the number of ways we may choose r
out of these n objects without regard to order. For example, we have the objects a, b, c and d, and r=2; we wish
to count ab, ac, ad, bc, bd, and cd. In other words, we do not count ab and ba since the same objects are
involved and only the order differs.
There are many problems in which we are interested in determining the number of ways in which r objects can
be selected from n distinct objects without regard to the order in which they are selected. Such selections are
called combinations or r-sets. It may help to think of combinations as committees. The key here is without
regard for order.
To obtain the general result we recall the formula derived above: the number of ways of choosing r objects out
of n and permuting the chosen r equals n!/(n-r)!. Let C be the number of ways of choosing r out of n,
disregarding order. C is the number required. Note that once the r items have been chosen, there are r! ways of
permuting them. Hence applying the multiplication principle again, together with the above result, we obtain
C.r! = n!/(n-r)!. Therefore, . This number arises in many contexts in mathematics and hence a
special symbol is used for it. We shall write
.
Theorem 5.5: Combination

The number of ways of choosing r out of n different objects, disregarding order, is given by
.
Example 5.12: How many different committees of 3 can be formed from Hawa, Segenet, Nigisty and Lensa?
Solution: The question can restated in terms of subsets from a set of 4 objects, how many subsets of 3 elements
are there? In terms of combinations the question becomes, what is the number of combinations of 4 distinct
objects taken 3 at a time? The list of committees:{H,S,N}, {H,S,L}, {H,N,L}, {S,N,L}.Therefore, we have 4C3
or 4 possible number of committees.
Example 5.13:
(i) A committee of 3 is to be formed from a group of 20 people. How many different committees are possible?
(ii) From a group of 5 men and 7 women, how many different committees consisting of 2 men and 3 women can
be formed?
Solution: (i) There are possible committees.
(i) possible committees.

Remarks:
i)
ii) A set with n elements has 2n subsets.
34
5.4 Probability of an event
Definition 5.5: The Axioms of Probability
Probabilities are real numbers assigned to events (or subsets) of a sample space. We can think of the
assignment of probabilities to events, or probability measure, as a function between the collection of
subsets of the sample space and the real numbers. Mathematically, a probability measure P for a random
experiment is a real-valued function defined on the collection of events that satisfies the following axioms:
Axiom 1: The probability of an event is a nonnegative real number; that is, P(A) ≥ 0 for any subset A of S.
Axoim 2: P(S) = 1
Axiom 3: If A1, A2, A3 ... is a finite or infinite sequence of mutually exclusive
events of S, then P(A1 u A2 u A3 u ...) = P( A1) + P( A2) + P( A3) + ...=
It is rather surprising that with only these three axioms, we can construct the "entire" theory of probability! The
next theorems and definitions help in assigning probabilities of events.
Theorem 5.6 :If A is an event in a discrete sample space S, then P(S) equals the sum of the probabilities of
the individual outcomes comprising A.
Theorem 5.7: Suppose that we have a random experiment with sample space S and probability function P
and A and B are events. Then we have the following results:
i) P( ) = 0
ii) P(Ac) = 1 − P(A)
iii) P(B n Ac) = P(B) − P(A n B)
iv) If A subset of B then P(A) ≤ P(B).

Definition 5.6: The classical definition of probability
If an experiment can result in any one of N equally likely and mutually exclusive outcomes, and if n of
these outcomes constitute the event A, then the probability of event A is .
Example 5.14: Consider the experiment of tossing a fair die. A fair die means that all six numbers are equally
likely to appear. Calculate the probabilities of the following events:
a) A=One will occur ={1}
b) B=Even number will occur ={2, 4, 6}
c) C=Odd number will occur ={1, 3, 5}
d) D=A number less than 3 will occur ={1,2}
Solution:
a) Since the die is fair
Example 5.15: Suppose that we toss two coins, and assume that each of the four outcomes in the sample space
S = {(H,H),(H, T ), (T ,H), (T , T )} are equally likely and hence has probability ¼. Let A = {(H, H),(H, T )} and
B = {(H,H), (T ,H)} that is, A is the event that the first coin falls heads, and B is the event that the second coin
falls heads. Then, calculate the probabilities of A, B, A c, Bc, and Sc. The event that none of the outcomes will
occur is the same as Sc.
35
Solution:
Example 5.16: From a group of 5 men and 7 women, it is required to form a committee of 5 persons. If the
selection is made randomly, then
i) what is the probability that 2 men and 3 women will be in the committee?
ii) what is the probability that all members of the committee will be men?
iii) what is the probability that at least three members will be women?
Solution: The total number of possible committees is , i.e. the number of possible out comes
in the sample space is 792.
i) Let A be the event that the committee will consist of two 2 men and 3 women. We need to know the
number of possible outcomes favoring this event. The number of ways we can select 2 men from 5
men is and the number of ways of selecting 3 women out of 7 women is
. Using the multiplication principle, the number of elements favoring event A is

10x35 or 350.
Hence, using the classical definition of probability,
ii) Let B be the event that all members of the committee will be men. Hence
iii) Let C be the event that at least three of the committee members will be women.
Basically, three different compositions of committee members can be formed in terms of sex: 3
women and 2 men, 4 women and 1 man, and all are women. Hence the number of possible outcomes
favoring event C using the principle of combination together with the addition principle is
.
Therefore,
Definition 5.7: Relative Frequency Definition of probability

If an experiment is repeated a large number, n, of times and the event A is observed nA times, the probability
36
of A is P(A) ≈ nA/n.
The above definition of probability is based on empirical data accumulated through time or based on
observations made from repeated experiments for a large number of times.
5.5 Some probability rules
Theorem 5.8: If A and B , then P(A u B) = P(A) + P(B) − P(A n B).
Example 5.17: Consider the experiment of tossing a fair die. Let

A = Even number occurring = {2,4,6}
B = A number greater than 2 occurring ={3, 4, 5, 6}
C = Odd number occurring ={1, 3, 5}
i) What is the probability that A and B will occur?
ii) What is the probability that A or B will occur?
Solution: We use the concept of set theory to help us solve probability questions very easily and vein diagrams
are useful tools to depict the relations between events within the sample space. The shaded region on Fig 1.
shows the event that both A and B will occur.
i) A and B ≡ AnB ={4,6}
Thus P(AnB)=2/6.
ii) A or B ≡ AUB ={2,3,4,5,6}
AnB={4,6} Hence,
Example 5.18: Sixty percent of the families in a certain community own their own car, thirty percent own their
own home, and twenty percent own both their own car and their own home. If a family is randomly chosen,
a) what is the probability that this family do not have a car?
b) what is the probability that this family owns a car or a house?
c) what is the probability that this family owns a car or a house but not both?
d) what is the probability that this family owns only a house?
e) what is the probability that this family neither owns a car nor a house?
Solution: Let A represents that the family owns a car and B represents that the family owns a house. Given
information: P(A)=0.6,P(B)=0.3, and P(AnB)=0.2.
a) Required: P(Ac) = ?
P(Ac)=1-P(A) = 1-0.6 = 0.4
b) Required: P(AUB) = ?
P(AUB) = P(A)+P(B)-P(AnB) = 0.6+0.3-0.2 = 0.7
c) Required: P((AnBc)U(AcnB)) = ?
P((AnBc)U(AcnB)) = P(AnBc)+P(AcnB) = [P(A)-P(AnB)]+[P(B)-P(AnB)]
= [0.6-0.2]+[0.3-0.2]=0.5
d) Required: P(A nB) =?
c
P(AcnB) = P(B)-P(AnB) = 0.3-0.2 = 0.1

e) Required: P(AcnBc) = ?
P(AcnBc) = P((AUB)c) = 1-P(AUB) = 1-0.7 = 0.3
We can represent various events by an informative diagram called vein diagram. If properly and correctly
drawn, a vein diagram helps to calculate probabilities of events easily. The figure below shows various events
represented by shaded regions. Note that the rectangle in each figure represents the sample space.
37
5.6 Conditional probability and independence
Conditional Probability
Conditional probability provides us with a way to reason about the outcome of an experiment, based on partial
information. Here are some examples of situations we may have in our mind:
(a) What is the probability that a person will be HIV-Positive given he has tuberculosis?
(d) A spot shows up on a radar screen. How likely is it that it corresponds to an aircraft?
In more precise terms, given an experiment, a corresponding sample space, and a probability law, supposes that
we know that the outcome is within some given event B. We wish to quantify the likelihood that the outcome
also belongs to some other given event A. We thus seek to construct a new probability law, which takes into
account this knowledge and which, for any event A, gives us the conditional probability of A given B, denoted
by P(A|B).
Definition 5.8: If P(B) > 0, the conditional probability of A given B, denoted by P(A|B), is
Example 5.19: Suppose cards numbered one through ten are placed in a hat, mixed up, and then one of the
cards is drawn at random. If we are told that the number on the drawn card is at least five, then what is the
conditional probability that it is ten?
Solution: Let A denote the event that the number on the drawn card is ten, and B be the event that it is at least
five. The desired probability is P(A|B).
Example 5.20: A family has two children. What is the conditional probability that both are boys given that at
least one of them is a boy? Assume that the sample space S is given by S = {(b, b), (b, g), (g, b), (g, g)}, and all
outcomes are equally likely. (b, g) means, for instance, that the older child is a boy and the younger child is a
girl.
Solution: Letting A denote the event that both children are boys, and B the event that at least one of them is a
boy, then the desired probability is given by
Law of Multiplication
The defining equation for conditional probability may also be written as:
P(AnB) = P(B) P(A|B)
This formula is useful when the information given to us in a problem is P(B) and P(A|B) and we are asked to
find P(AnB). An example illustrates the use of this formula. Suppose that 5 good fuses and two defective ones
have been mixed up. To find the defective fuses, we test them one-by-one, at random and without replacement.
What is the probability that we are lucky and find both of the defective fuses in the first two tests?
38
Example 5.21: Suppose an urn contains seven black balls and five white balls. We draw two balls from the urn
without replacement. Assuming that each ball in the urn is equally likely to be drawn, what is the probability
that both drawn balls are black?
Solution: Let A and B denote, respectively, the events that the first and second balls drawn are black. Now,
given that the first ball selected is black, there are six remaining black balls and five white balls, and so P(B|A)
= 6/11. As P(A) is clearly 7/12 , our desired probability is
Independence
We have introduced the conditional probability P(A|B) to capture the partial information that event B provides
about event A. An interesting and important special case arises when the occurrence of B provides no
information and does not alter the probability that A has occurred, i.e., P(A|B) = P(A). When the above equality
holds, we say that A is independent of B. Note that by the definition P(A|B) = P(A ∩ B)/P(B), this is equivalent
to P(A ∩ B) = P(A)P(B).
Definition 5.9: Independence

Two events A and B are said to independent if P(A ∩ B) = P(A)P(B). If in addition, P(B) > 0, independence
is equivalent to the condition P(A|B) = P(A).
Example 5.22: A basket contains 2 black and 2 white balls. Consider selecting two balls at random in two
ways:
a) selecting a second ball without replacing first selected ball in the basket
b) selecting a second ball after replacing first selected ball in the basket
Let A and B represents black ball will be selected in the first and second selection, respectively. In which of the
two ways are A and B independent?
Solution:
First way (a):
S = {B1B2, B2B1, B1W1, B1W2 , B2W1, B2W2, W1W2 , W1B1, W2B1 , W1B2, W2B2,W2W1}
A = {B1B2, B2B1, B1W1, B1W2 , B2W1, B2W2 }
B = {B1B2, B2B1, W1B1, W2B1 , W1B2, W2B2 }
A n B = {B1B2, B2B1}
Thus A and B are not independent.

Second way (b):
Using this option to select a ball does not affect the composition of the basket.
Hence
We can see that
Thus A and B are independent.
39
UNIT SIX: PROBABILITY DISTRIBUTIONS
Objectives:
 compute probabilities of events using the concept of probability distributions.
 compute expected values and variances of random variables.
 apply the concepts of probability distributions to real-life problems.
Introduction
In many applications, the outcomes of probabilistic experiments are numbers or have some numbers associated
with them, which we can use to obtain important information, beyond what we have seen so far. We can, for
instance, describe in various ways how large or small these numbers are likely to be and compute likely
averages and measures of spread. For example, in 3 tosses of a coin, the number of heads obtained can range
from 0 to 3, and there is one of these numbers associated with each possible outcome. Informally, the quantity
“number of heads” is called a random variable, and the numbers 0 to 3 its possible values. The value of a
random variable is determined by the outcome of the experiment. Thus, we may assign probabilities to the
possible values of the random variable.
6.1 Definition of random variables and probability distributions

Given an experiment and the corresponding set of possible outcomes (the sample space), a random variable
associates a particular number with each outcome. Mathematically, a random variable is a real-valued function
of the experimental outcome. The following are some examples of random variables:
(a) In an experiment involving a sequence of 5 tosses of a coin, the number of heads in the sequence is a
random variable.
(b) In an experiment involving two rolls of a die, the following are examples of random variables: (1) The sum
of the two rolls, (2) The number of sixes in the two rolls.
(c) In an experiment involving the transmission of a message, the time needed to transmit the message, the
number of symbols received in error, and the delay with which the message is received are all random variables.
Notation: We will use capital letters to denote random variables, and lower case characters to denote real
numbers such as the numerical values of a random variable.
Types of random variables: Generally, two types of random variables exist: discrete and continuous. A
random variable is called discrete if its range (the set of values that it can take) is finite or at most countably
infinite. For instance, the number of children in a family, number of car accidents within given period of time in
a certain locality, the number of bacteria in a cubic mm of agar, etc. If random variable assumes any numerical
value in an interval or collection of intervals, then it is called a continuous random variable. Examples include
body weight of new born baby, life time of a human being, height of a person, etc.
The most important way to characterize a random variable is through the probabilities of the values that it can
take. For a discrete random variable X, these are captured by the probability mass function (p.m.f. for short) of
X, denoted PX(x). For a continuous random variable X it is done by the probability density function (p.d.f.),
denoted fX(x).
Definition 6.1: Probability mass function

If x is any possible value of X, the probability mass of x, denoted PX(x), is the probability of the event {X =
x} consisting of all outcomes that give rise to a value of X equal to x. A probability mass function must
satisfy the following conditions:
i. PX(x)≥0 for any value of x of X.
ii. where the summation is over all values of x .
Example 6.1: Consider an experiment of tossing two fair coins. Letting X denote the number of heads
appearing on the top face, then X is a random variable taking on one of the values 0, 1, 2 . The random variable
X assigns a 0 value for the outcome (T,T), 1 for outcomes (T ,H) and (H, T ), and 2 for the outcome (H,H).
Thus, we can calculate the probability that X can take specific value/s as follows:
40
P(X = 0) = P({(T , T )}) = ¼
P(X = 1) = P({(T ,H),(H, T )}) = 2/4,
P(X = 2) = P({(H,H)}) = ¼
The table below shows the probability mass function X.
X 0 1 2
PX(x) ¼ 2/4 ¼
We can justify that PX(x) is probability mass function.
PX(x)≥0 for x=0,1,2 and
P(X = 0) + P(X = 1)+ P(X = 2) = ¼ + 2/4 + ¼=1
Suppose we are interested to calculate the probability that X≥1. The values of X which are greater than or equal
to 1 are 1 and 2. Thus, the probability that X is greater than or equal to 1, denoted P(X≥1), is found as P(X≥1)
= P(X = 1) + P(X = 2)=3/4.
Definition 6.2: Continuous random variable

A random variable X is called continuous if there exists a function fX(x) called the probability density
function of X which satisfies
a. fX(x)≥0 for all x.
b.
We can use the probability density function to calculate probabilities of events expressed in terms of the random
variable X. For instance, if we are interested in the probability that X lies between two points, say a and b, we
can find it using integration of fX(x) on the interval [a,b],i.e.
Figure: P(a≤ X ≤ b) is the shaded region

Remarks:
i) The area bounded under the graph of a probability density function and below by the horizontal axis is 1.
ii) The probability that a continuous random variable X will assume a specific value is zero, i.e.
where c is a constant.
iii) The probability that a continuous random variable X will assume a value in a closed intervals is the same
as the probability that it will assume in open interval or half open intervals, i.e. , P(a≤X≤b) = P(a<X<b) =
P(a≤X<b) = P(a<X≤b), P(X≤c) = P(X<c) , P(X≥c) = P(X>c) where a, b, and c are constants.
6.2 Introduction to expectation: mean and variance

We can associate with each random variable certain “averages” of interest, such as mean and variance which
give useful summary of a probability distribution.
Mean
Definition 6.3: The (mean) expected value of a random variable X denoted by E(X) or μ is given by
i) if X is discrete r.v.
ii) if X is continuous r.v.
41
It is useful to view the mean of X as a “representative” value of X, which lies somewhere in the middle of its
range. We can make this statement more precise, by viewing the mean as the center of gravity of the
distribution.
Variance
Definition 6.4: The variance of a random variable X denoted V(X) or σ 2 is defined as V(X)=E[(X- μ)2] =
E(X2) – μ2.
i) if X is discrete,
ii) if X is continuous,
The variance provides a measure of dispersion of X around its mean. Another measure of dispersion is the
standard deviation of X, which is defined as the square root of the variance and is denoted by σ.
Example 6.2: Calculate the mean and variance of the random variable X in example 6.1.
6.3 Common discrete probability distributions – binomial and Poisson

The Binomial distribution
Many real problems (experiments) have two possible outcomes, for instance, a person may be HIV-Positive or
HIV-Negative, a seed may germinate or not, the sex of a new born bay may be a girl or a boy, etc. Technically,
the two outcomes are called Success and Failure. Experiments or trials whose outcomes can be classified as
either a “success” or as a “failure” are called Bernoulli trails.
Suppose that n independent trials, each of which results in a “success” with probability p and in a “failure” with
probability 1 − p, are to be performed. If X represents the number of successes that occur in the n trials, then X
is said to have binomial distribution with parameters n and p. The probability mass function of a binomial
distribution with parameters n and p is given by
The mean and variance of the binomial distribution are np and np(1-p), respectively. Note that the binomial
distributions are used to model situations where there are just two possible outcomes, success and failure. The
following conditions also have to be satisfied.
i) There must be a fixed number of trials called n
ii) The probability of success (called p) must be the same for each trial.
iii) The trials must be independent
Example 6.3: A fair coin is flipped 4 times. Let X be the number of heads appearing out of the four trials.
Calculate the following probabilities:
i) 2 heads will appear
ii) No head will appear
iii) At least two heads will appear
iv) Less than two heads will appear
v) At most heads 2 will appear
Solution: We can consider that the outcomes of each trial are independent to each other. In addition the
probability that a head will appear in each trial is the same. Thus, X has a binomial distribution with number of
42
trials 4 and probability of success (the occurrence of head in a trial) is ½. The probability mass function of X is
given by
, Note that n = 4 and p = 1/2
i)
ii)
iii)
iv)
v)
Example 6.4: Suppose that a particular trait of a person (such as eye color or left handedness) is classified on
the basis of one pair of genes and suppose that d represents a dominant gene and r a recessive gene. Thus a
person with dd genes is pure dominance, one with rr is pure recessive, and one with rd is hybrid. The pure
dominance and the hybrid are alike in appearance. Children receive one gene from each parent. If, with respect
to a particular trait, two hybrid parents have a total of four children, what is the probability that exactly three of
the four children have the outward appearance of the dominant gene?
Solution: If we assume that each child is equally likely to inherit either of two genes from each parent, the
probabilities that the child of two hybrid parents will have dd, rr, or rd pairs of genes are, respectively, ¼, ¼,
½. Hence, because an offspring will have the outward appearance of the dominant gene if its gene pair is either
dd or rd, it follows that the number of such children ,say X, is binomially distributed with parameters n equals
4 and p equals ¾. Thus the desired probability is
Example 6.5: Suppose it is known that the probability of recovery for a certain disease is 0.4. If random sample
of 10 people who are stricken with the disease are selected, what is the probability that:
(a) exactly 5 of them will recover?
(b) at most 9 of them will recover?
Solution: Let X be the number of persons will recover from the disease. We can assume that the selection
process will not affect the probability of success (0.4) for each trial by assuming a large diseased population
size. Hence, X will have a binomial distribution with number of trials equal to 10 and probability of success
equal 0.4.
(a)
(b)
The Poisson Random Variable
A random variable X, taking on one of the values 0, 1, 2, . . . , is said to have a Poisson distribution if its
probability mass function is given by
.
λ is the parameter of this distribution. The mean and variance of the poisson distribution are equal and their
values are equal to λ. Note that poisson distributions is used to model situations where the random variable X is
the number of occurrences of a particular event over a given period of time (or space). Together with this , the
following conditions must also be fulfilled: events are independent of each other, events occur singly, and
events occur at a constant rate (in other words for a given time interval the mean number of occurrences is
proportional to the length of the interval).
43
The poisson distribution is used as a distribution of rare events such as telephone calls made to a switch board in
a given minute, number of misprints per page in a book, road accidents on a particular motor way in one day,
etc. The process that give rise to such events are called poisson processes.
Example 6.6: Suppose that the number of typographical errors on a single page of this lecture note has a
Poisson distribution with parameter λ = 1. if we randomly select a page in this lecture note, calculate the
probability that
a) no error will occur.
b) exactly three errors will occur.
c) less than 2 errors will occur.
d) there is at least one error.
Solution: Let X= Number of errors per page
a) Required P(X≥1)=?
b)
c)
D)
Example 6.7: If the number of accidents occurring on a highway each day is a Poisson random variable with
parameter λ = 3, what is the probability that no accidents will occur on a randomly selected day in the future?
Solution: Let X= number of accidents per day
Required P(X= 0) = ?
Note: The Poisson random variable has a wide range of applications in a diverse number of areas. An important
property of the Poisson random variable is that it may be used to approximate a binomial random variable when
the binomial parameter n is large and p is small. The probability that X will be k can be approximated by
substituting λ by np in the Poisson distribution, i.e. .
6.4 Common continuous probability distributions
Normal distribution
The normal distribution plays an important role in statistical inference because many real-life distributions are
approximately normal; many other distributions can be almost normalized by appropriate data transformations
(e.g., taking the log) and as a sample size increases, the means of samples drawn from a population of any
distribution will approach the normal distribution.
A continuous random variable X is said to follow normal distribution , if and only if , its probability density
function (p.d.f.) is where x (-∞,∞ ), μ (-∞,∞ ) and σ (0,∞ ). There are
infinitely many normal distributions since different values of μ and σ define different normal distributions. For
instance, when μ= 0 and σ =1 , the above density will have the following form . This
particular distribution is called the standard normal distribution and sometimes known as Z-distribution.. The
random variable corresponding to this distribution is usually denoted by Z. If X has a normal distribution with
mean μ and variance σ2, we denote it as .
Properties of normal distribution
44
i) The normal distribution curve is a bell shaped, symmetrical about μ and mesokurtic. The p.d.f. attains its
maximum value at x= μ.
ii) Since for x= μ divides the area under the normal curve into two equal parts, μ is the mean, the median and
the mode of the distribution.
iii) The mean and variance of the normal distribution are μ, and σ2, respectively.
iv) The total area under the curve and bounded from below by the horizontal axis is 1, i.e.
Figure: The shaded area under the normal curve is one

Since a normal distribution is a continuous probability distribution, the probability that X lies between a and b is
the area bounded under the curve, from left to right by the vertical lines x = a and x = b and below by the
horizontal axis.
Figure: P(a<X<b) equals the shaded region
However, evaluating is very complicated. To facilitate this problem, we use the
standard normal table which gives area values bounded by two points. Areas under the standard normal
distribution curve are tabulated in various ways. The most common tables give areas bounded between Z=0 and
a positive value of Z. In addition to the standard normal table, the properties of normal distribution and the
following theorem are useful to make probability calculations very easy for any normal distribution.
Theorem 6.1: Standardization of a normal random variable

If X has a normal distribution with mean, μ and standard deviation ,σ , then
i) will have a standard normal distribution.
ii)
Example 6.8: Let Z be the standard normal random variable. Calculate the following probabilities using the
standard normal distribution table: a) P(0<Z<1.2) b) P(0<Z<1.43) c) P(Z≤0) d) P(-1.2<Z<0) e) P(Z≤-1.43)
f) P(-1.43≤Z<1.2) g) P(Z≥1.52) h)P(Z≥-1.52)
45
Solution:
a) The probability that Z lies between 0 and 1.2 can be directly found from the standard normal table as
follows: look for the value 1.2 from z column ( first column) and then move horizontally until you find
the value of 0.00 in the first row. The point of intersection made by the horizontal and vertical movements
will give the desired area (probability). Hence P(0<Z<1.2)= 0.3849. Refer the table below as a guide to
find this probability.
Figure: P(0<Z<1.2) is the shaded area

b) In a similar way P(0<Z<1.43)= 0.4236.
c) We know that the normal distribution is symmetric about its mean. Hence the area to the left of 0 and the
to the right of zero are 0.5 each. Therefore P(Z≤0)=P(Z≥0)=0.5
Figure: The area to the left and the right of 0 for z-distribution
d) P(-1.2<Z<0)=P(0<Z<1.2)= 0.3849 due to symmetry
e) P(Z<-1.43)= 1- P(Z ≥ -1.43) Using the probability of the complement event.
= 1-[P(-1.43<Z<0)+P(Z≥0)] Since a region can be broken down
=1-[P(0<Z<1.43)+P(Z ≥0)] into non overlapping regions.
=1-[0.4236 + 0.5]
=1-0.9236=0.0764
46
Figure: P(Z<-1.43) is the shaded region
f) P(-1.43≤Z<1.2) = P(-1.43≤Z<0) + P(0≤Z<1.2)=P(0<Z≤1.43) + 0.3849= 0.4236 + 0.3849 =0.8085
Figure: P(-1.43≤Z<1.2) is the shaded region

g) P(Z≥1.52) = 0.5 – P(0≤ Z<1.52)=0.5 – 0.4357=0.0643
Figure: P(Z≥1.52) is the shaded region

h) P(Z≥-1.52) = P(-1.52≤Z<0) + P(Z ≥0 )= P(0 < Z≤1.52) + 0.5
=0.4357 +0.5=0.9357
Example 7.11: Find the following values of z* of a standard normal random variable based on the given
probability values:
a) P(Z > z*) =0.1446
b) P(Z>z*) = 0.8554
Solution: We need to find specific values of Z given some probability values.
a) If the probability that Z>z* is 0.1446 implies that z* is to the right of zero because
P(Z>0) = 0.5 is greater than P(Z>z*).
47
P(Z > z*) = 0.1446 implies that P(0<Z≤z*) = 0.5 -0.1446=0.3554.
Hence we can look for the value of z* satisfying the above condition form the standard normal table. Thus z*
=1.06
b) If the probability that Z>z* is 0.8554 implies that z* is to the left of zero because
P(Z>0) = 0.5 is less than P(Z>z*). It implies that z* is a negative number.
P(Z>z*) = 0.8554 = P(z*≤ Z <0) + P( Z ≥ 0) = P(0 ≤ Z ≤ - z*) + 0.5

Implies P(0 ≤ Z ≤ - z*) = 0.8554 – 0.5=0.3554. Hence the value –z* form the table satisfying the above
condition is 1.06. Therefore z* = -1.06.
Example 6.9: If the total cholesterol values for a certain target population are approximately normally
distributed with a mean of 200 (mg/100 ml) and a standard deviation of 20 (mg/100 ml), calculate the
probability that a person picked at random from this population will have a cholesterol value
a) greater than 240 (mg/100 ml)
b) between 180 and 220(mg/100 ml)
c) less 200 (mg/100 ml)
Solution: Let X be the cholesterol values in mg/100 ml, then
a)
b)
c)
48
Example 6.10: Assume that the test scores for a large class are normally distributed with a mean of 74 and a
standard deviation of 10.
(a) Suppose that you receive a score of 88. What percent of the class received scores higher than yours?
(b) Suppose that the teacher wants to limit the number of A grades in the class to no more than 20%. What
would be the lowest score for an A?
Solution: Let X be the score of a randomly picked student, then
a)
Hence 8.08 percent of the students score more than you did?
b) Let XA be the lowest mark to get letter grade A. We are given that
Hence, the lowest mark to get letter grade A is 82.5.

The chi-square and t distributions
The chi-square and t distributions are important continuous distributions which are useful in statistical
inference. In this section we will see a brief introduction of these distributions. In later chapters, we are going to
see in detail on how to use these distributions in estimation and hypotheses testing.
Chi-square distribution
A random variable X is said to have a chi-square distribution with n degrees of freedom (denoted by ) if its
probability density function is given by
The chi-square distribution has one parameter called the degrees of freedom, n. Depending on the values of n,
we can have many different chi-square distributions. The mean and the variance of chi-square distribution are
n, and 2n, respectively.
Figure: The chi-square distribution
Because of its importance, the chi-square distribution is tabulated for various values of the parameter n (refer
table). Thus we may find in the table that value, denoted by , satisfying The
example below helps on how to read chi-square distribution values.
Example 6.11: To read the chi-square value with 2 degrees of freedom where the area to the right of this value
is 0.005.Look the degrees of freedom, 2, in the first column (df column) and then move horizontally until you
find the value of α , 0.005 in the first row. The point of intersection made by the horizontal and vertical
movement will give the desired chi-square value, 10.597. This value satisfies the following:
. In a similar way,The chi-square value with 100 degrees of freedom where the area to
the right of this value is 0.975 is 74.222.
49
The t distribution
The t distribution is an important distribution useful in inference concerning population mean/means. This
distribution has one parameter called the degrees of freedom. Depending on the values of the degrees of
freedom, we may have different t distributions. The degrees of freedom is usually denoted by n. In inference on
the population mean, the degrees of freedom is related to sample size. As the sample size or degrees of
freedom increases, the t distribution approaches the standard normal distribution.
The t- distribution shares some characteristics of the normal distribution and differs from it in others. The t
distribution is similar to the standard normal distribution in the following ways.
i) it is bell-shaped
ii) it is symmetrical about the mean
iii) the mean, median, and mode are equal to 0 and are located at the center of the distribution.
iv) The curve never touches the x-axis
The t distribution differs from the standard normal distribution in the following ways.
i) the variance is greater than 1.
ii) The t distribution is actually a family of curves based on the concept of degrees of freedom.
Figure: The t distribution

Due to its importance in inference values of t distribution is tabulated for some values of n (refer table). Thus
we may find in the table that value, denoted by , satisfying and t(n)
represents the t random variable with n degrees of freedom. The following example will help you to read t
distribution values.
Example 6.12: To find the t value with 3 degrees of freedom where the area to the right of this value is 0.05.
Look the degrees of freedom, 3, in the first column (df column) and then move horizontally until you find the
value of α , 0.05 in the first row. The point of intersection made by the horizontal and vertical movement will
give the desired t value 2.353. This value satisfies the following:
50
UNIT SEVEN: SAMPLING AND SAMPLING DISTRIBUTION OF SAMPLE MEAN
Objectives:
After a successful completion of this unit, students will be able to:
 Differentiate the two major sampling techniques: probabilistic and non-probabilistic
 Apply simple random sampling technique to select sample
 Define sampling distribution of the sample mean
Introduction to sampling and sampling distribution

In our daily life we are forced to make decision based on small scale study. For instance, a laboratory technician
take small droplets of blood to examine the presence disease; we examine fruits before we purchase it;
zoologists use the concept of sampling to estimate the population of rodents, e t c. This process of inspection is
very wide and is commonly used on various occasions. But this job is difficult to implement on large scale. On
the basis of small study, we make inference about the entire population.
7.1 Methods of sampling

Definition of some basic terms
Sampling: is the technique of selecting representative sample from the whole.
Population: is the totality of elements or units under study.
Sample: is the part of the population.
Sampling Frame: A complete list of all the units of the population is called the sampling frame. A unit of
population is a relative term. If all the workers in a factory make a population, then a worker is a unit of the
population. If all the factories in a country are being studied for some purpose, then a factory is a unit of the
population of factories. The frame provides a base for the selection of a sample.
Major reasons to use sampling
1. Saves Time and Cost: As the size of the sample is small as compared to the population, the time and cost
involved on sample study are much less than the complete counts. Hence a sample study requires less time
and cost.
2. To prevent destruction: The destructive nature of some experiments (or inspection) do not allow to
carryout complete enumeration, for instance, to check quality of beers, to study the efficacy of new drugs,
testing the life length of a bulb, e t c.
3. Sample survey provides higher level of accuracy: This accuracy can be achieved through more selective
recruiting of interviewers and supervisors, more extensive training programs, a closer supervision of the
personnel involved and a more efficient monitoring of the field work.
Types of sampling
Generally, two types of sampling methods exist: probability and non-probability sampling.
Probability Sampling
The term probability sampling (or random sampling) is used when the selection of the sample is purely based on
chance. There is no subjective bias in the selection of units. Every unit of the population has a known nonzero
probability to be in the sample. The following are some of the t random sampling methods: Simple random
sampling, Stratified random sampling, Cluster sampling, Systematic random sampling.
Simple random sampling

Simple random sampling is a method of selecting a sample from a population in such a way that every unit of
the population is given an equal chance of being selected. In practice, you can draw a simple random sample of
elements using either the 'lottery method' or 'tables of random numbers'.
51
For example, you may use the lottery method to draw a random sample by using a set of 'N' tickets, with
numbers ' 1 to N' if there are 'N' units in the population. After shuffling the tickets thoroughly, the sample of a
required size, say n, is selected by picking the required n number of tickets.
The best method of drawing a simple random sample is to use a table of random numbers. After assigning
consecutive numbers to the units of population, the researcher starts at any point on the table of random
numbers and reads the consecutive numbers in any direction horizontally, vertically or diagonally. If the read
out numbers corresponds with the one written on a unit card, then that unit is chosen for the sample.
Suppose that a sample of 6 study centers is to be selected at random from a serially numbered population of 60
study centers. The following table is portion of a random numbers table used to select a sample.
Row 1 2 3 4 5 …… N
Column
1 2315 7548 5901 8372 5993 ….. 6744
2 0554 5550 4310 5374 3508 ….. 1343
3 1487 1603 5032 4043 6223 ….. 0834
4 3897 6749 5094 0517 5853 ….. 1695
5 9731 2617 1899 7553 0870 ….. 0510
6 1174 2693 8144 3393 0862 ….. 6850
7 4336 1288 5911 0164 5623 ….. 4036
8 9380 6204 7833 2680 4491 ….. 2571
9 4954 0131 8108 4298 4187 ….. 9527
10 3676 8726 3337 9482 1569 ….. 3880
11 ….. ….. ….. ….. ….. ….. …..
12 ….. ….. ….. ….. ….. ….. …..
13 ….. ….. ….. ….. ….. ….. …..
14 ….. ….. ….. ….. ….. ….. …..
15 ….. ….. ….. ….. ….. ….. …..
N 3914 5218 3587 4855 4888 ….. 8042
If you start in the first row and first column, centers numbered 23, 05, 14,…, will be selected. However, centers
numbered above the population size (60) will not be included in the sample. In addition, if any number is
repeated in the table, it may be substituted by the next number from the same column. Besides, you can start at
any point in the table. If you chose column 4 and row 1, the number to start with is 83. In this way you can
select first 6 numbers from this column starting with 83.
The sample, then, is as follows:
83 75
53 33
40 01
05 26
Hence, the study centers numbered 53, 40, 05, 33, 01 and 26 will be in the sample.
52
Simple random sampling ensures the best results. However, from a practical point of view, a list of all the units
of a population is not possible to obtain. Even if it is possible, it may involve a very high cost which a
researcher or an organization may not be able to afford. In addition, it may result an unrepresentative sample by
chance.
Stratified sampling
Stratified random sampling takes into account the stratification of the main population into a number of sub-
populations, each of which is homogeneous with respect to one or more characteristic(s). Having ensured this
stratification, it provides for selecting randomly the required number of units from each sub-population. The
selection of a sample from each subpopulation may be done using simple random sampling. It is useful in
providing more accurate results than simple random sampling.
Systematic sampling
In this method, samples are selected at equal intervals from the listings of the elements. This method provides a
sample as good as a simple random sample and is comparatively easier to draw a sample. For instance, to study
the average monthly expenditure of households in a city, you may randomly select every fourth households
from the household listings
Cluster sampling
Cluster sampling is used when sampling frame is difficult to construct or using other sampling techniques
(simple random sampling) is not feasible or costly. For instance, when the geographic distribution of units is
scattered it is difficult to apply simple random sampling. It involves division of the population of elementary
units into groups or clusters that serve as primary sampling units. A selection of the clusters is then made to
form the sample. The precision of estimates made based on samples taken using this method is relatively low.
Non-probabilily sampling techniques
In non-probability sampling, the sample is not based on chance. It is rather determined by personal judgment.
This method is cost effective; however, we cannot make objective statistical inferences. Depending on the
technique used, non-probability samples are classified into quota, judgment or purposive and convenience
samples.
Sampling and non-sampling errors

Sampling error is the difference between the value of a sample statistic and the value of the corresponding
population parameter. On the other hand, non-sampling error is an error that occurs in the collection, recording
and tabulation of data. Sampling error can be minimized by using appropriate sampling methods and/or
increasing the sample size. The non-sampling error is likely to increase with increase in sample size.
7.2 Sampling distribution of the sample mean

The value of the sample mean for any sample will depend on the elements included in that sample.
Consequently, the sample mean is a random variable. Therefore, like other random variable, the sample means
possess a probability distribution which is more commonly called the sampling distribution of sample mean. In
general, the probability distribution of a sample statistic is called its sampling distribution. Sampling
distribution is important in statistical inference. The important characteristics of the sampling distribution of the
sample mean are its mean, variance and the form of the distribution.
Example 7.1: Suppose we have a hypothetical population of size 3, consisting of three children namely: A is 3
years old, B is 6 years old and C is 9 years old. Construct sampling distribution of the sample mean of size 2
using sampling without replacement and with replacement.
Solution: The mean and variance of the population are 6 and 6, respectively.
1. If sampling is without replacement we will have 3C2 = 3 possible samples: (A, B), (A, C) and (B, C) and
their corresponding sample means are (3+6)/2 = 4.5, 6 and 7.5, respectively. Hence the probability
distribution (sampling distribution) of the sample mean is:
4.5 6 7.5
P( = ) 1/3 1/3 1/3
E( )= = 4.5(1/3) + 6(1/3) + 7.5(1/3) = 6
53
V( )= = (6.75 + 12 + 18.75) – 36 = 1.5
2. If sampling is with replacement we will have N n = 32 = 9 possible samples: (A, A), (A, B), (A, C), (B,
A), (B, B), (B, C), (C, A), (C, B) and (C, C). Hence the probability distribution (sampling distribution) of the
sample mean is:
3 4.5 6 7.5 9
P(X = ) 1/9 2/9 3/9 2/9 1/9
E( )= = 3(1/9) + 4.5(2/9) + 6(3/9) + 7.5(2/9) + 9(1/9) = 6
V( )= = (1 + 4.5 + 12 + 12.5 + 9) – 36 = 3
Note:
 The mean of the sampling distribution of the sample mean is the same as the population mean
irrespective of the sampling procedure.
 The variance of the sampling distribution of the sample mean is:
 The problem with using sample mean to make inferences about the population mean is that the sample
mean will probably differ from the population mean. This error is measured by the variance of the
sampling distribution of the sample mean and is known as the standard error. The standard error is the
average amount of sampling error found because of taking a sample rather than the whole population.
As sample size increases, the standard error decreases.
7.3 Central Limit Theorem
If X1, X2, …, Xn is a random sample from a population with mean μ and variance σ 2, then as n goes to infinity
the distribution of the sample mean, , approximates normal distribution with mean μ and variance σ 2/n. That
is, as n gets large,  N (μ, σ2/n) and its standardized form is
Note: The central limit theorem is useful for approximating the distribution of the sample mean based on a large
sample size and when the population distribution is non normal; however, if the population is normal, then the
sampling distribution of the sample mean will be normal regardless of the sample size.
Example 7.2: If the uric acid values in normal adult males are normally distributed with mean 5.7 mgs and
standard deviation of 1mg. Find the probability that
a) a sample of size 4 will yield a mean less than 5
b) a sample of size 9 will yield a mean greater than 6
Solution: Let X be the amount of uric acids in normal adult males with mean 5.7 and variance 1.
a) If a sample of size 4 is taken, then ~ N (5.7, 0.25) since the population is normally distributed.
b) If a sample of size 9 is taken, then ~ N (5.7, 1/9) since the population is normally distributed.
UNIT EIGHT: ESTIMATION AND HYPOTHESIS TESTING

Objectives:
 construct and interpret confidence interval estimates
54
 formulate hypothesis about a population mean
 determine an appropriate sample size for estimation
Introduction
We now assume that we have collected, organized and summarized a random sample of data and are trying to
use that sample to estimate a population parameter. Statistical inference is a procedure whereby inferences
about a population are made on the basis of the results obtained from a sample. Statistical inference can be
divided in to two main areas: estimation and hypothesis testing. Estimation is concerned with estimating the
values of specific population parameters; hypothesis testing is concerned with testing whether the value of a
population parameter is equal to some specific value.
8.1 Point and interval estimation of the mean
Point estimate: In point estimation, a single sample statistic (such as ) is calculated from the sample
to provide an estimate of the true value of the corresponding population parameters (such as ). Such
a single statistic is termed as point estimator, and the specific value of the statistic is termed as point estimate.
For example, the sample mean is an estimator for population mean and = 10 is an estimate, which is one
of the possible values of .
Interval estimate: In most practical problems, a point estimate does not provide information about ‘how close
is the estimate’ to the population parameter unless accompanied by a statement of possible sampling errors
involved based on the sampling distribution of the statistic. Hence, an interval estimate of a population
parameter is a confidence interval with a statement of confidence that the interval contains the parameter value.
An interval estimate of the population parameter consists of two bounds within which the parameter will be
contained:
where is the lower bound and is the upper bound.

Case 1: When the population is normal.
 If the variance is known, the sampling distribution of the sample mean is normal with mean and
variance . i.e., and ~ N(0,1).
 If the variance is unknown, will have t-distribution with
n - 1 degrees of freedom. Moreover, as the sample size increases t is approximately the same as standard
normal.
Consider the case is known, we can derive a confidence interval for the population mean .
Let be a point on the standard normal curve that cuts an area of to the right. i.e. = . By
the symmetric property of the normal distribution, = (see the diagram below).
From the standard normal distribution, we know that
55
To obtain the limit of the interval estimate, we use the standardized form of in the above probability
statement. i.e., letting
Becomes
We can assert with probability that the interval contains the population
mean we are estimating.
Thus, confidence interval for the population mean is given by
The end points of the interval, and , are called confidence limits and the probability
is called the degree of confidence.
In a similar way a confidence interval for the population mean with unknown variance is
given by
where is the critical value of t-test statistic providing an area in the right tail of the t-distribution with
degrees of freedom, and .
Case 2: When the population is non normal.
56
We use the central limit theorem to approximate the distribution of the sample mean based on large sample (
). Large sample size is a necessary condition to use the normal distribution. And hence, ~
N(0,1). If is unknown we can replace it by its sample estimate S. The resulting confidence
interval of becomes
Example 8.1: A drug company is testing a new drug which is supposed to reduce blood pressure. From the six
people who are used as subjects, it is found that the average drop in blood pressure is 2.28 millimeter of
mercury (mmHg) with a standard deviation of 0.95 mmHg. What is the 95% confidence interval for the mean
change in blood pressure? (Assume that the population is normal).
Solution: Given: , ,
 is a point estimate for the population mean drop in blood pressure .

A 95% confidence interval of population mean for unknown and small sample size is:
And from the t distribution table,
(2.28-0.997, 2.28+0.997)
(1.28, 3.27)
We are 95% confident that the mean drop in blood pressure lies in between 1.28 mmHg and 3.27 mmHg for the
sampled population.
Example 8.2: Punctuality of patients in keeping appointment is of interest to a research team. In a study of
patients flow through the office of general practitioners, it was found that a sample of 35 patients were 17.2
minutes late for appointments, on the average. Previous research had shown the standard deviation to be about 8
minutes. The population distribution was felt to be not normal. What is the 90 percent confidence interval for
the true mean amount of time late for appointment?
Solution: Given: , ,
Since the sample size is fairly large (n > 30), and since the population standard deviation is known, according to
the central limit theorem, the sampling distribution of sample mean is approximately normal. Thus, a
confidence interval of the population mean is given by:
And from the standard normal distribution table,
(17.2 – 2.2, 17.2 + 2.2)

57
(15.0, 19.4)
Therefore, the 90% confidence interval for true mean amount of time late for appointment is between 15.0 and
19.4 minutes.
8.2 Hypothesis Testing about the Mean

In many circumstances we merely wish to know whether a certain proposition is true or false. The process of
hypothesis testing provides a framework for making decisions on an objective basis, by weighing the relative
merits of different hypotheses, rather than on a subjective basis by simply looking at the numbers. Different
people can form different opinions by looking at data, but a hypothesis test provides a standardized decision-
making process that will be consistent for all people.
Statistical hypothesis: is a claim (belief or assumption) about an unknown population parameter values.
Examples of hypothesis:
 There is association between lung cancer and number of cigarettes an individual smokes.
 The proportion of female students in Hawassa University is 0.35.
 In sub-Saharan Africa 40% of individuals are leaving below poverty line.
Hypothesis testing: is the procedure that enables decision-makers to draw inferences about population
characteristics by analyzing the difference between the value of sample statistic and the corresponding
hypothesized parameter value.
General procedure for hypothesis testing
To test the validity of the claim or assumption about the population parameter, sample is drawn from the
population and analyzed. The result of the analysis are used to decide whether the claim is valid or not.
Step 1: State the null hypothesis ( ) and alternative hypothesis ( )
Null hypothesis ( ): refers to a hypothesized numerical value of the population parameter which is initially
assumed to be true. The null hypothesis is always expressed in the form of an equation making a claim
regarding the specific value of the population parameter. That is, for example
where is hypothesized value of the population mean.

Alternative hypothesis ( ): is the logical opposite of the null hypothesis. The alternative hypothesis states
that specific population parameter value is not equal to the value stated in the null hypothesis. For example,
(Two-sided test)
(One-sided test)
Step 2: State the level of significance (alpha) for the test
The level of significance is the probability to wrongly reject the null hypothesis when it is actually true. It
is specified by the statistician or the researcher before the sample is drawn. The most commonly used values of
are 0.10, 0.50 or 0.01.
Step 3: Calculate the appropriate test statistic
Test statistic is a value computed from a sample that is used to determine whether the null hypothesis has to be
rejected or not. The choice of suitable test statistic depends on the sampling distribution of the sample statistic.
Accordingly, we have the following cases:
Case 1: When the population is normal.
 If the variance is known, the sampling distribution of the sample mean is normal with mean and
variance . i.e., and the test statistic is ~ N(0,1).
 If the variance is unknown the test statistic is, ~t (n-1).
Case 2: When the population is non normal.

58
We use the central limit theorem to approximate the distribution of the sample mean based on large sample (
). Large sample size is a necessary condition to use the normal distribution. And hence the test statistic is
~ N(0,1). If is unknown we can replace it by its sample estimate S.
Step 4: Establish a decision rule (critical or rejection region)

The cut-off point to reject or not reject depends on the level of significance , the type of test statistic
chosen and the form of the alternative hypothesis. If the value of the test statistic falls in the rejection region, the
null hypothesis is rejected, otherwise we do not reject (see fig 1 below). The value of the sample statistic
that separates the regions of acceptance and rejection is called critical value. For a specified , we read the
critical values from the Z or t tables, depending on the test statistic chosen.
Figure: Area of acceptance and rejection of (Two-tailed test)

Based on the form of the alternative hypothesis and the test statistic we can make the following decisions:
i. For (two-tailed test) reject if .
ii. For (right-tailed test) reject if .
59
iii. For (left-tailed test) reject if .
We can summarize the decsion rules as follows:

Alternative hypotheses
Decision
Reject if
Reject if
Step 5: Interpret the result.
Errors in Hypotesis Testing
Ideally the hypotesis testing procedure should lead to the rejection of the null hypothesis when it is false
and nonrejection of when it is true. However, the correct decision is not always possible. Since the decision
to reject or do not reject a hypothesis is based on sample data, there is a possibility of committing an incorrect
decision or error. Hence, a decision-maker may commit one of the two types of errors while testing a null
hypothesis. These errors are summarized as follows:
Null Hypothesis ( )
Decision True False
Reject Type I error ( ) Correct decision
Accept Type II error (
Correct decision )
Type I error is committed if we reject the null hypothesis when it is true. The probability of committing a type I
error, denoted by is called the level of significance. The probability level of this error is decided by the
decision-maker before the hypothesis test is performed. Type II error is committed if we do not reject the null
hypothesis when it is false. The probability of committing a type II error is denoted by (Greek letter beta).
As type one error increases type two error will decrease (they are inversely proportional). Hence we cannot
reduce both errors simultaneously. As the sample size increases both errors will decrease.
Example 8.3: The life expectancy of people in the year 1999 in a country is expected to be 50 years. A survey
was conducted in eleven regions of the country and the data obtained, in years, are given below:
Life expectancy (years): 54.2, 50.4, 44.2, 49.7, 55.4, 47.0, 58.2, 56.6, 61.9, 57.5, and 53.4.
Do the data confirm the expected view? (Assuming normal population) Use 5% level of significance.
Solution: Let be the life expectancy of people in the year 1999 in a country.
1. (The life expectancy of people in the year 1999 in a country is 50 years)
(The life expectancy of people in the year 1999 in a country is different from 50 years)
2. Level of significance, α = 0.05.
3. Since is unknown and the population is normal, the t-test statistic is appropriate.
60
Given: n = 11; and we need to compute and .
Then, the t-test statistic is calculated as:
4. For α = 0.05 and two-tailed test, the critical (table) value is:
Since reject the null hypothesis . That is, the calculated t value lies
in the rejection region (the shaded region).
5. Conclusion: The data do not confirm the expected view. That is, the life expectancy is different from 50
years at 5% level of significance.
Example 8.4: Suppose that we want to test the hypothesis with a significance level of .05 that the climate has
changed since industrialization. Suppose that the mean temperature throughout history is 50 degrees. During
the last 40 years, the mean temperature has been 51 degrees and the population standard deviation is 2 degrees.
What can we conclude?
Solution:
Let be the mean temperature.
1. (There is no change in temperature since industrialization)
(There is change in temperature since industrialization)
3. Since n = 40 is large, the Z-test statistic is appropriate.
Given: n = 40; = 2; = 51;
= 3.16
4. For α = 0.05 and two-tailed test, the critical (table) value is:
61
Since reject the null hypothesis . That is, the calculated Z value
lies in the rejection region (the shaded region).
5. Conclusion: There has been a change in temperature since industrialization, at 5% level of significance.
Example 8.5: A study was conducted to describe the menopausal status, menopausal symptoms, energy
expenditure and aerobic fitness of healthy midwife women and to determine relationship among these factors.
Among the variables measured was maximum oxygen uptake (Vo2max). The mean Vo2max score for a sample of
242 women was 33.3 with a standard deviation of 12.14. On the basis of these data, can we conclude that the
mean score for a population of such women is greater than 30? Use 5% level of significance.
Solution:
Let be the mean Vo2max score for a population of healthy midwife women.
1. (The mean score for a population of healthy midwife women is 30)
(The mean score for a population of healthy midwife women is greater than 30).
3. Since n = 242 is large, the Z-test statistic is appropriate.
Given: n = 242; = 12.14; = 33.3;
= 4.23
4. For α = 0.05 and right-tailed test, the critical (table) value is:
Since reject the null hypothesis . That is, the calculated Z value lies in the
rejection region (the shaded region).
5. Conclusion: The mean Vo2max score for the sampled population of healthy midwife women is greater
than 30 at 5% level of significance.
8.3 Test of Association (Independence)
Usually we encounter with nominal scale data. The test of association is useful for determining whether
there is any relationship or association exists between two nominal variables. For instance, we might be
interested in the relationship between HIV status with sex, lung cancer and smoking habit, political affiliation
and sex, e t c.
When observations are classified according to two variables or attributes and arranged in a table, the display is
called a contingency table as shown below:
62
The test of association or independence uses the contingency table format. Here the variables A and B have
been classified into mutually exclusive categories. The values Oij in row i and column j of the table shows the
observed frequency falling in each joint category i and j. The row and column totals are the sums of their
corresponding frequencies. The sum of row or column totals will give grand total n, which represents the
sample size. The procedures to test the association between two independent variables is summarized as
follows:
Step 1: State the null and alternative hypotheis
: There is no association or relationship exists between two variables, that is, the two variables are
independent.
: There is association or relationship between two variables, that is, the two variables are dependent.
Step 2: State the level of significance, .
Step 3: Calculate the expected frequencies, Eij, corresponding to the observed frequency in row i and column j.
The expected frequencies in each cell are calculated as:
Step 4: Compute the value of test-statistic:
where Oij is the observed frequency of row i and coulumn j and E ij is the expected frequency of row i and
coulumn j.
Step 5: Find the critical (table) value of (from Appendix..). The value of correponds to an area
in the right tail of the distribution.
where = (Number of rows – 1)(Number of columns – 1) = (r – 1)(c – 1)
Step 6: Compare the calculated and table values of . Decide wheather the variables are independent or not,
using the following decision rule:
Reject if is greater than . Otherwise do not reject .
Example 8.6: The following data on the colour of eye and hair for 6800 individuals were obtained from a
source:
Eye colour
Hair colour Fair Brown Black red Total
Blue 1768 808 190 47 2813
Green 946 1387 746 43 3122
Brown 115 444 288 18 865
Total 2829 2639 1224 108 6800
Test the hypothesis that hair colour and eye colour are independently distributed (there is no association
between colour of eye and colour of hair) at the level of = 0.01.
Solution:
63
1. : There is no association between hair colour and eye colour.
: There is association between hair colour and eye colour.
2. = 0.01.
3. Calculate the expected frequencies, Eij
………………..
…………………..
Therefore, the contingency table for expected frequencies is as follows:

Eye colour
Hair colour Fair Brown Black red Total
Blue 1170.29 1091.69 506.34 44.68 2813
Green 1298.84 1211.61 561.96 49.58 3122
Brown 359.87 335.70 155.70 13.74 865
Total 2829 2639 1224 108 6800
4. Calculate the test statistic:
5. Critical value
= (r – 1) (c – 1) = (3 – 1) (4 – 1) = (2) (3) = 6
6. Since > Reject .

7. Conclusion: There is association between hair colour and eye colour. That is, hair colour and eye colour
are dependent.
UNIT NINE: SIMPLE LINEAR REGRESSION AND CORRELATION
Objectives:
Having studied this unit, you should be able to:
 formulate a simple linear regression model.
 express quantitatively the magnitude and direction of the association between two variables
Introduction
The statistical methods discussed so far are used to analyze the data involving only one variable. Often an
analysis of data concerning two or more variables is needed to look for any statistical relationship or association
64
between them. Thus, regression and correlation analysis are helpful in ascertaining the probable form of the
relationship between variables and the strength of the relationship.
9.1 Simple linear regression analysis

Regression analysis is the statistical method that helps to formulate a functional relationship between two or
more variables. It can be used for assessment of association, estimation and prediction. For instance one might
be interested to formulate a statistical model to relate the height of fathers and their sons, blood pressure and
age, fertilizer amount and yield, etc.
A simple model to relate dependent (response) variable Y and with only one predictor variable X is to consider
a linear relationship.
The first step in regression analysis involving two variables is to construct a scatter plot (diagram) of the
observed data. Scatter diagram is a plot of all ordered pairs on the coordinate plane which is helpful
for determining an apparent relationship between two variables.
The simple linear regression of on can be expressed with respect to the population parameters  and 
as
where = y-intercept that represents the mean value of the dependent variable when the independent
variable is zero; = slope of the regression line that represents the change in the mean of for a unit
change in the value of ; = error term
The population parameters  and  can be estimated from sample data using the least square technique. The
estimators of and are usually denoted by a and b, respectively. The resulting regression line is
and the equation is known as the fitted regression line. The estimated values of are denoted by . The
observed values of are denoted by y. The difference between the observed and the estimated values, - ,
is known as error or residual, and is denoted by . The residual can be positive, negative or zero.
A best fitting line is the one for which the sum of squares of the residuals, has the minimum value. This is
called the method of least squares. According to this method, one would select a and b such that =
is minimum. The solution of this minimization problem using partial differentiation is as follows:
= and
Example 9.1: A researcher wants to find out if there is any relationship between height of the son and his
father. He took random sample of 6 fathers and their sons. The height in inch is given in the table below:
Height of father (X) 63 65 64 65 67 68
Height of the son (Y) 66 68 65 67 69 70
i) Draw the scatter diagram and comment on the type of relationship.
ii) Fit the regression line of Y on X.
iii) Predict the height of the son if his father’s height is 66 inch.
Solution:
i)
65
From the scatter plot one can see that the points are roughly on straight line.
ii)
, , , ,
= 0.923 = 7.2
Then the fitted (regression) line of Y on X is given by:

= 7.2+0.923X
 The slope of the line, i.e. b=0.923, tells us that a unit (one inch) increase in the height of the father
results in 0.923 inch increase in the height of the son.
 The y-intercept of the line, i.e. a=7.2, is the value of Y when the value of X is zero(do you think that
the intercept is meaningful?)
iii) Y=7.2+0.923(66) =68.118, thus the height of the son is 68.118 inch.
9.2 The covariance and the correlation coefficient

Correlation coefficient measures the degree of linear relationship between two variables. The population
correlation coefficient is represented by  and its estimator is r. For a set of n pairs of sample values X and Y,
Pearson’s correlation coefficient is calculated as the ratio of the covariance of the variables X and Y to the
product of the standard deviations of X and Y. symbolically,
Alternatively, the Pearson’s correlation coefficient r can be obtained as:
Properties of Pearson’s correlation coefficient r,

o It is appropriate to calculate when both variables X and Y are measured on an interval or ratio scale.
o The value of r is independent of the unit in which X and Y are measured. i.e., it is a pure number.
o The value of r ranges from +1 to -1.
o r = +1 indicates a perfect linear relationship between X and Y with positive slope.
o r = -1 indicates a perfect linear relationship between X and Y with negative slope.
o r = 0 indicates no linear relationship between the two variables X and Y.
o as r approaches +1 indicates strong and positive linear relationship between the two variables
o as r approaches -1 indicates strong and negative linear relationship between the two variables
66
o as r approaches 0 indicates weak linear relationship between the two variables
Examples of correlation coefficients:
Example 9.2: In some locations, there is strong association between concentrations of two different pollutants.
An article reports the accompanying data on ozone concentration x (ppm) and secondary carbon concentration y
(
X 0.066 0.088 0.120 0.050 0.162 0.186 0.057 0.100
Y 4.6 11.6 9.5 6.3 13.8 15.4 2.5 11.8
0.112 0.055 0.154 0.074 0.111 0.140 0.071 0.110

8.0 7.0 20.6 16.6 9.2 17.9 2.8 13
a. Calculate the correlation coefficient and comment on the strength and direction of the relationship
between the two variables.
Solution: The summary quantities are
The Person’s correlation coefficient is
The value of 0.716 indicates that there is somehow strong and positive relationship between ozone
concentration and secondary carbon concentration.
67
68
69
70

Stat Int Note1

Uploaded by

Copyright:

Available Formats

Stat Int Note1

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Stat Int Note1

Uploaded by

Copyright:

Available Formats

UNIT ONE: INTRODUCTION

At the end of this session, students should be able to:

1.1 Definition and Classification of Statistics

1.3 Definition of some terms

1.4 Applications, uses and limitations of statistics

1.5 Scales of measurement

2.2 Classification and tabulation of data

2.3 Frequency distributions

Table: The distribution of the times

Table: Relative frequency distribution of times

Table: Less than cumulative frequency distribution of times

Table: Distribution of the number of children.

Categorical frequency distributions

Table: Number of students by political party affiliations

2.4 Diagrammatic and graphical presentation of data

Figure: Distribution of number of minutes spent by the automobile workers

2.4.2 Graphs useful for presenting qualitative data

Production year 1990 1991 1992 1993 1994 1995

Figure: Production of coffee from 1990 to 1995.

 Calculate the degree measures of each sector. It is given by .

Figure: Family expenditure

Table: The number of donors by blood types

Blood type Frequency Percent

Blood type Frequency Percent Angles

Figure: Distribution of blood types of donors.

Figure : Distribution of blood types of donors.

3.1 Introduction and objectives of measuring central tendency

Definition 3.1: An average is a single value intended to represent a distribution as a whole.

3.2 The summation notation

occasions to write terms like

3.3 Types of measures of central tendency

defined in a more compact form as

Thus the mean age of these patients is 16.075.

Definition 3.3: If have weights , respectively, then the weighted arithmetic

mean denoted by , is defined as

iv. It is greatly affected by extreme values.

Example 3.5: Find the geometric mean of the following numbers 2, 4, 8.

Therefore, the average rate = .

3.3.3 The median

Median for raw data

and observation divided by two.

Therefore, the median, observation = the 4th observation = 6.

The number of observations n=24 , even.

The mode for raw data

where Lm= lower class boundary for the median class.

class boundaries 20.5 & 25.5(4th class).

Therefore, the median is 24.5.

where f1 = frequency of the modal class

Therefore, the modal time spent is 29.5 minutes.

3.3.5 Quartiles, Deciles and Percentiles

Note: Relationships between fractile points

Variance, standard deviation and coefficient of variation

where and N=Population size.

where S is the standard deviation of the observations.

For history the z-score is

Special ceases: µ0=1, µ1=0, µ2=s2.

Skweness: it refers to lack of symmetry in a distribution.

In moderately skewed distributions: Mode = mean- 3(mean-median)

This figure suggests that there is some negative skeweness.

P(Z>z) = 0.8554 = P(z≤ Z <0) + P( Z ≥ 0) = P(0 ≤ Z ≤ - z*) + 0.5