Statstics Full Handout

INTRODUCTION TO PROBABILITY AND STATISTICS
CHAPTERONE: INTRODUCTION TO STATISTICS

1.1 Definition and classification of Statistics
The word statistics is defined in different ways depending on its use in the plural and
singular sense.
In the plural sense: - statistics is defined as the collection of numerical facts or figures
(or the raw data themselves).
Eg. 1. Vital statistics (numerical data on marriage, births, deaths, etc).
2. The average mark of statistics course for students is 70% would be considered
as a statistics whereas Abebe has got 90% in statistics course is not statistics.
Remark: statistics are aggregate of facts. Single and isolated figures are not statistics as
they cannot be compared and are unrelated.
In its singular sense:- the word Statistics is the subject that deals with the methods of
collecting, organizing, presenting, analyzing and interpreting statistical data.
Classification of Statistics
Statistics is broadly divided into two categories based on how the collected data areused.
Descriptive Statistics:-deals with describing the data collected without going further
conclusion.
Example1.1:Suppose that the mark of 6 students in Statistics course for computer
science students is given as 40, 45, 50, 60, 70 and 80. The average mark of the 6 students
is 57.5 and it is considered as descriptive statistics.
Inferential Statistics:- It deals with making inferences and/or conclusions about a
population based on data obtained from a sample of observations. It consists of
performing hypothesis testing, determining relationships among variables and making
predictions.
Example1.2:In the above example, if we say that the average mark in Statistics course
for science is 57.5, then we talk about inferential statistics (draw conclusion based on the
sample observation).
1.2 Stages of Statistical Investigation

The area of statistics points out the following five stages. These are collection,
organization, presentation, analysis and interpretation of data.
1
Collection of data: This is the process of obtaining measurements or countsor obtaining
raw data.
Data can be collected in a variety of ways; one of the most common methods isthrough
the use of sample or census survey.
Organization of data: -Data collected from published sources are generally in organized
form. However if an investigator has collected data through a survey, it is necessary to
edit these data in order to correct any apparent inconsistencies, ambiguities, and
recording errors.
This phase also includes correcting the data for errors, grouping data into classes and
tabulating.
Presentation of data:-After the data have been collected and organized they can be
presented in the form of tables, charts, diagrams and graphs. This presentation in an
orderly manner facilitates the understanding as well as analysis of data.
Analysis of data: - the basic purpose of data analysis is to dig out useful information for
decision making. This analysis may simply be a critical observation of data to draw some
meaningful conclusions about it or it may involve highly complex and sophisticated
mathematical techniques.
Interpretation of data: - Interpretation means drawing conclusions from the data
collected and analyzed. Correct interpretation will lead to a valid conclusion of the study
& thus can aid in decision making.
1.3 Definition of some statistical terms
Population: - It is the totality of objects under study. The populationrepresents the target
of an investigation, and the objective of the investigation is todraw conclusions about the
population hence we sometimes call it target population. The word population doesn’t
necessarily refer to people.
Examples:- All clients of Telephone Company, Population of families, etc.
The population could be finite or infinite (an imaginary collection of units).
Sample: - is part or subset of population under study.
Sampling frame: - is the list of all possible units of the population that the sample can be
drawn from it.
Eg. List of all students of AASTU, List of all residential houses in A.A city, etc
2
Survey: - is an investigation of a certain population to assess its characteristics. It may be
census or sample.
Census survey: a complete enumeration of the population under study.
Sample survey: the process of collecting data covering a representative part or portion of
a population.
Parameter: -is a statistical measure of a population, or summary value calculated from a
population. Examples: Average, Range, proportion, variance, etc
Statistic: - is a descriptive measure of a sample, or it is a summary value calculated from
a sample.
Sampling: - The process or method of sample selection from the population.
Sample size: - The number of elements or observation to be included in thesample.
An element: -is a member of sample or population. It is specific subject or object (for
example a person, firm, item, etc.) about which the information is collected.
Variable: - It is an item of interest that can take numericalor non-numerical values for
different elements. It may be qualitative or quantitative.Example: age, weight, sex,
marital status, etc.
Observation (measurement):-is the value of a variable for an element.
Qualitative variables:- are variables that assume non-numerical values. They can be
categorized and they are usually called attributes. Example: - Sex, marital status, ID
number, etc.
Quantitative variables: - are variables which assume numerical values.eg. Age, weight,
etc.
1.4 Applications, uses and limitations of Statistics
Statistics can be applied in any field of study which seeks quantitative evidence. For
instance, Engineering, Economics, Natural Science, etc.
Engineering: Statistics have wide application in engineering.
 To compare the breaking strength of two types of materials
 To determine the probability of reliability of a product.
 To control the quality of products in a given production process.
 To compare the improvement of yield due to certain additives such as fertilizer,
herbicides, e t c.
Function/Uses of Statistics
The following are some uses of statistics:
3
• It condenses and summarizes a mass of data: the original set ofdata (raw data) is
normally voluminous and disorganized unless it issummarized and expressed in few
presentable, understandable & precise figures.
• Statistics facilitates comparison of data: measures obtained from different set of data
can be compared to draw conclusion about those sets.Statistical values such as averages,
percentages, ratios, rates, coefficients, etc, are the tools that can be used for the purpose
of comparing sets of data.
• Statistics helps to predict future trends: statistics is very useful for analyzing the past
and present data and forecasting future events.
• Statistics helps to formulate & review policies
• Formulating and testing hypothesis: Statistical methods are extremely useful in
formulating and testing hypothesis and to develop new theories.
Limitations of Statistics
Some of these limitations are:
a) It does not deal with individual values: as discussed earlier, statistics deals with
aggregate of facts. For example, wage earned by an individual worker at any one time,
taken by itself is not a statistics.
b) It does not deal with qualitative characteristics directly: statistics is not applicable
to qualitative characteristics such as beauty, honesty, poverty, standard of living and so
on since these cannot be expressed in quantitative terms.
c) Statistical conclusions are not universally true: since statistics is not anexact
science, as is the case with natural sciences, the statistical conclusionsare true only under
certain assumptions.
d) It can be misused: statistics cannot be used to full advantage in the absence of proper
understanding of the subject matter.
1.5Levels of Measurement
Proper knowledge about the nature and type of data to be dealt with is essential in order
to specify and apply the proper statistical method for their analysis and inferences.
Scale Types
Measurement is the assignment of values to objects or events in a systematic fashion.
Four levels of measurement scales are commonly distinguished: nominal, ordinal,
interval, and ratio. The first two are qualitative while the last two are quantitative.
4
Nominal scale: The values of a nominal attributeare just different names, i.e., nominal
attributes provide only enough information todistinguish one object from another.
Qualities with no ranking or ordering; nonumerical or quantitative value. These types of
data are consists of names, labels and categories.
Example 1.3:Eye color: brown, black, etc, sex: male, female.
 In this scale, one is different from the other
 Arithmetic operations(+, -, *, ÷) are not applicable, comparison (<, >,≠, etc)is
impossible
Ordinal scale: - defined as nominal data that can be ordered or ranked.
 Can be arranged in some order, but the differences between the data valuesare
meaningless.
 Data consisting of an ordering of ranking of measurements are said to be onan
ordinal scale of measurements. That is, the values of an ordinal scaleprovide
enough information to order objects.
 One is different from and greater /better/ less than the other
 Arithmetic operations (+, -, *, ÷)are impossible, comparison (<, >, ≠, etc) is
possible.
Example 1.4 -Letter grading (A, B, C, D, F), -Rating scales (excellent, very good, good,
fair, poor), military status (general, colonel, lieutenant, etc).
Interval Level: data are defined as ordinal data and the differences between data values
are meaningful. However, there is no true zero, or starting point, and the ratio ofdata
values are meaningless. Note: Celsius & Fahrenheit temperature readings haveno
meaningful zero and ratios are meaningless.
In this measurement scale:-
 One is different, better/greater and by a certain amount of difference thananother.
 Possible to add and subtract. For example; 800c – 500c = 300c, 700c – 400c
=300c.
 Multiplication and division are not possible. For example; 600c = 3(200c). Butthis
does not imply that an object which is 600c is three times as hot as an objectwhich
is 200c.
Most common examples are: temperature, IQ.
5
Ratio scale: Similar to interval, except there is a true zero (absolute absence), or starting
point, and theratios of data values have meaning.
 Arithmetic operations (+, -, *, ÷) are applicable. For ratio variables, both
differences and ratios are meaningful.
 One is different/larger /taller/ better/ less by a certain amount of differenceand so
much times than the other.
 This measurement scale provides better information than interval scale
ofmeasurement.
Example1.5:weight, age, number of students.
6
CHAPTER TWO: METHODS OF DATA COLLECTION AND PRESENTATION
2.1 Methods of Data Collection
Data:-is a measurement or observation value recorded for a certain element or variable. it

is the raw material of statistics. It can be obtained either by measurement or counting.
Sources of data
The statistical data may be classified under two categories depending up on the sources.
Primary data: - Data collected by the investigator himself for the purpose of a specific
inquiry or study. Three of the most common methods of collecting Primary data are:
 Telephone survey
 Mailed questionnaire
 Personal interview.
Secondary data: - When an investigator uses data, which have already been collected by
others, such data are called secondary data. . Example of secondary data: books, reports,
magazines, etc.
2.2 Methods of Data Presentation
The presentation of data is broadly classified in to the following two categories:
 Frequency distribution /Tabular presentation
 Diagrammatic and Graphic presentation.
2.2.1 Frequency distribution
Frequency:- is the number of times a certain value or class of values occurs.
Frequency distribution (FD):- is the organization of raw data in table form using classes
and frequency.
There are three types of FD and there are specific procedures for constructing each type.
The three types are:-
I. Categorical FD, II. Ungrouped FD andIII. Grouped FD

I. Categorical FD: Used for data that can be placed in specific categories; such as
nominal, ordinal level of data.
Example 2.1: Twenty five patients were given a blood test to determine their blood type.
The data is as shown below: A B B AB O O O B AB B B B O A O O O AB AB A O O B
A.
7
Solution: since the data are categorical by taking the four blood types as classes we can
construct a FD as shown below.
Step 1: Make a table which contains class, tally, frequency and percent.
Step 2: Tally data and place the result under the column Tally.
Step 3: Count the tallies and place the result under the column Frequency.
Step 4: find the percentage of values in each class by the formula (%= f/n * 100%; f=
frequency, n total number of observation.)
CLASS TALLY FREQUANCY PERCENRT

A //// 5 5/25* 100 = 20%
B //// // 7 28%
AB //// 4 16%
O //// //// 9 9/25*100 = 36%
II. Ungrouped Frequency Distribution (UFD):- Is a table of all potential raw score
values each times each actually could possibly occur in the data along with the number of
times each actually could occur. It is often constructed for small set of data or data of
discrete variable.
Constructing ungrouped frequency distribution:
 First find the smallest and largest raw score in the collected data.
 Arrange the data in order of magnitude and count the frequency.
 To facilitate counting one may include a column of tallies.
Example 2.2: The following data represent the mark of 20 students.
80 76 90 85 80 70 60 62 70 85 65 60 63 74 75 76 70 70 80 85
Construct a frequency distribution, which is ungrouped.
Solution:
Make a table as shown, Tally the data, Compute the frequency.
Mark 60 62 63 65 70 74 75 76 80 85 90 Total
Tally // / / / //// / // / /// /// /
Frequency 2 1 1 1 4 1 2 1 3 3 1 20
8
-Each individual value is presented separately, that is why it is named ungrouped
frequency distribution.
3. Grouped Frequency Distribution (GFD).
When the range of the data is large the data must be grouped in to classes that are more
than one unit in width.
Definition of some basic terms
 Grouped frequency distribution: is a FD when several numbers are grouped into
one class.
 Class limits (CL): It separates one class from another. The limits could actually
appear in the data and have gaps between the upper limits of one class and the
lower limit of the next class.
 Unit of measure (U): This is the possible difference between successive values.
E.g. 1, 0.1, 0.01, 0.001……
 Class boundaries: Separate one class in a grouped frequency distribution from the
other. The boundary has one more decimal place than the raw data. There is no gap
between the upper boundaries of one class and the lower boundaries of the
succeeding class. Lower class boundary is found by subtracting half of the unit of
measure from the lower class limit and upper class boundary is found by adding
half unit measure to the upper class limit.
 Class width (W): The difference between the upper and lower boundaries of any
consecutive class. The class width is also the difference between the lower limit or
upper limits of two consecutive classes.
 Class mark (Midpoint): It is found by adding the lower and upper class limit
(Boundaries) and divided the sum by two.
 Cumulative frequency (CF): It is the number of observation less than the upper
class boundary or greater than the lower class boundary of class.
 CF (Less than type): it is the number of values less than the upper class boundary
of a given class.
 CF (Greater than type): it is the number of values greater than the lower class
boundary of a given class.
 Relative frequency (Rf ):The class frequency divided by the total frequency. This
gives the percent of values falling in that class.
9
Rfi = fi/n= fi/∑fi
 Relative cumulative frequency (RCf): The class cumulative frequency divided by

the total frequency gives the percent of the values which are less than the upper
class boundary or the reverse.
RCfi = Cfi/n= Cfi/∑fi
STEPS IN CONSTRUCTING A GFD

1. Find the highest and the smallest value,
2. Compute the range; R = H – L,
3. Determine the number of classes using using sturgles formula
K= 1 + 3.322Log n; n= Total frequency
4. Find the class width (W) by dividing the range by the number of classes and
round to the nearest integer.
W = R/K
5. Identify the unit of measure usually as 1, 0.1, 0.01,…..
6. Pick a suitable starting point less than or equal to the minimum value. Your
starting point is lower limit of the first class, then continue to add the class width
to get the rest lower class limits.
7. Find the upper class limits UCLi = LCLi+w-U. then continue to add width to get
the rest upper class
8. Tally the data and find the frequencies.
Example 2.3: Construct FD for the following data.
11 29 33 22 27 19 22 21 18 17 22 38 26 39 27 6 34 13 20
Solution:-
1) Highest value = 39, Lowest value = 6, 2) Range = 39 – 6 = 33, 3) K = 1+
3.322Log20 = 1 + 3.322(1.301) = 5.3 ≈ 5, 4) W = R / K = 33/5 = 6.6 ≈ 7, 5) U = 1, 6)
LCL1= 6, 7) Find the upper class limits,
8) Find class boundaries, 9) Find class mark
10) Tally the data
10
Class Class Class Tally Freq. CF(<) CF(>) RF RCF(>)
limit boundary Mark
6 – 11 5.5 – 11.5 8.5 // 2 2 20 2/20=0.1 1
12 – 17 11.5 - 17.5 14.5 // 2 4 18 2/20=0.1 0.9
Class Frequency Percent Degree
18 – 23 17.5 – 23.5 20.5 ///// // 7 11 16 7/20=0.35 0.8

24 – 29 23.5 – 29.5 26.5 //// 4 15 9 4/20=0.2 0.45
30 – 35 29.5 – 35.5 32.5 /// 3 18 5 3/20=0.15 0.25
36 – 41 35.5 – 41.5 38.5 // 2 20 2 2/20=0.1 0.10
2.2.2 Diagrammatic presentation of data: Pie-chart, Bar charts, Pictograph
The three most commonly used diagrammatic presentation for discrete as well as
qualitative data are:
 Pie chart, Bar chart andPictogram
A) Pie chart: -
A pie chart is a circle that is divided in to sections or wedges according to the percentage
of frequencies in each category of the distribution. The angle of the sector is obtained
using:
Valueofthepart
Angleofasector = ∗ 360
Thewholequantity
Example 2.4:Draw a pie chartto represent the following population data in a town.
Men Women Girls Boys

2500 2000 4000 1500
Solutions:
Step 1: Find the percentage, Step 2: Find the number of degrees for each class.
Step 3: Using a protractor, graph each section and write its name with corresponding
percentage.
11
Men 2500 25 90
Women 2000 20 72
Girls 4000 40 144
Boys Men
Boys 1500 15 54
15% 25%
Total 10000 100 360 Wome
Girls n
40% 20%
B) Bar Charts
 Used to represent & compare the frequency distribution of discrete variables and
attributes or categorical series. Bars can be drawn either vertically or horizontally.
There are different types of bar charts. The most common being:
 Simple bar chart and Component or sub divided bar chart.
 Simple bar chart:- Are used to display data on one variable, They are thick lines
(narrow rectangles) having the same breadth.
Example 2.5: Number of students in the four department of Science College given as
follows:
Department Physics Maths Chemistry Biology
Number of 200 400 450 600

students
Male 170 350 250 200
Female 30 50 200 400
Draw a simple bar chart of the number of students by department.
Solution:
Simple bar chart
800 600
Frequency
600 400 450

400 200
200
0
Phys Maths Chem Bio
I. Component Bar chart Depr tm ent
 When there is a desire to show how a total (or aggregate) is divided in to its
component parts, we use component bar chart.
12
Example 2.6:Draw a component (sub-divided) bar chart of the number of students by
department is given in the
Sub-divided bar chart
example 2.5.
800
Female
Solution 600
Frequency 400 Male
200
0
Phys Maths Chem Bio
Department
C) Pictograph:-In this diagram, we represent data by means of some picture

symbols. We decide about a suitable picture to represent a definite number of
units in which the variable is measured.
2.2.3 Graphical Presentation of data
The histogram, frequency polygon and cumulative frequency graph or ogive is most
commonly applied graphical representation for continuous data.
Histogram:-To construct a histogram, the class boundaries or the class marks are
plotted on the horizontal axis and the class frequencies are plotted on the vertical axis.
Example 2.7:Construct a histogram to represent the following data.
Class limits 15-24 25-34 35-44 45-54 55-64 65-74 75-84

Frequency 3 4 10 15 12 4 2
Solution:
13
Histogram
Frequency
20
15
15 12
10
10
4 4
5 3 2
0
Class boundaries
Frequency polygon
A frequency polygon is a line graph where class frequencies are plotted against the class
marks and the successive points are connected by straight lines.
Example 2.8:Construct a frequency polygon to represent the previous data in example
2.8.
Solution:
Class Freq. Class Class R.F. % R.F. Less than More than
limits marks boundaries C.F. C. F.
(percent)
15 - 24 3 19.5 14.5 - 24.5 0.06 6% 3 50
25 – 34 4 29.5 24.5 - 34.5 0.08 8% 7 47
35 - 44 10 39.5 34.5 - 44.5 0.20 20% 17 43
45 - 54 15 49.5 44.5 - 54.5 0.30 30% 32 33
55 - 64 12 59.5 54.5 - 64.5 0.24 24% 44 18
65 - 74 4 69.5 64.5 - 74.5 0.08 8% 48 6
75 - 84 2 79.5 74.5 - 84.5 0.04 4% 50 2
Total 50 1.00 100%
Adding two class marks with fi  0 , we have 9.5 at the beginning, and 89.5 at the end,
the following frequency polygon is plotted:
14
Frequency Polygon
20
F
r
15
e
q
10
u
e
n 5
c
y 0
9.5 19.529.539.549.559.569.579.589.5
Class mark
Ogive (cumulative frequency polygon)
An Ogive (pronounced as “oh-jive”) is a line that depicts cumulative frequencies, just as

the cumulative frequency distribution lists cumulative frequencies. Note that the Ogive
uses class boundaries and cumulative frequencies along the horizontal and vertical scales
respectively. There are two type of Ogive namely less than Ogive and more than Ogive.
The difference is that less than Ogive uses less than cumulative frequency and more than
Ogive uses more than cumulative frequency on y axis.
Example 2.9: Draw a both types of ogives for the F.D. of Example 2.7.
Solutions:
The More than Ogive

Cumulative Frequency The Less than Ogive
60
50 Ogive 60
40 50
Cumulative
Frequency
30 40
20 30
10 20
0 10
14.5 24.5 34.5 44.5 54.5 64.5 74.5 84.5 0
14.5 24.5 34.5 44.5 54.5 64.5 74.5 84.5
Class Boundaries
Class Boundaries
15
CHAPTER THREE: MEASURES OF CENTRAL TENDENCY
3.1 The Summation Notation (S)

Statistical Symbols: Let a data set consists of a number of observations, represents by x
, x , … , x where n (the last subscript) denotes the number of observations in the data and
x is the ith observation. Then the sum of all numbers (x ′s) where i goes from 1 up to n is
symbolically given by ∑ x or ∑ x or ∑ x that is,
∑x = x + x + … +x
x - whole set of numbers, x - specific score in a set of numbers, n - total number of
observations
Example 3.1. For instance a data set consisting of six measurements 2, 3, 9, 10, 8 and -2
is represented by x , x , … , x where x = 2,x =3, x =9, x = 10, x = 8 and x =-2
Their sum becomes ∑ x = x + x + … + x = 2+3+9+10+8+ (-2) = 30
3.2 Properties of measures of central tendency
A good average should be:
1. Rigidly defined (unique), 2. Based on all observation under investigation, 3. Easily

understood,
4. Simple to compute, 5. Suitable for further mathematical treatment, 6. Little affected

by fluctuations of sampling, 7. Not highly affected by extreme values.
3.3 Types of Measures of Central Tendency

Measures of Central Tendency:-A single value that describes the characteristics of the
entire mass of data is called measures of central tendency. The following are types of
Central Tendency which are suitable for a particular type of data. These are
-Arithmetic Mean:-Weighted Arithmetic Mean, Combined mean, -Geometric
Mean,- Harmonic Mean,
- Median, - Mode or modal value
3.3.1 Arithmetic Mean:- Arithmetic mean is defined as the sum of the measurements
of the items divided by the total number of items. It is usually denoted byx.
Arithmetic Mean for individual series
Supposex , x , … , x are observed values in a sample of size n from a population of size
N, n<N then the arithmetic mean of the sample, denoted by x is given by
… ∑
x= =
16
If we take an entire population the mean is denoted by μ and is given by:
… ∑
μ= =
Where N stands for the total number of observations in the population.
Example 3.2: Consider the samples given below:

i. 46 54 21 35 , ii. 10.5 2.4 3.6 5.9 8.7
Find the arithmetic mean
Solution:
i. The sample values are: 46 54 21 35
∑
x= = = = 39, The arithmetic mean for sample value is 39.
ii. The sample values are: 10.5 2.4 3.6 5.9 8.7
∑ . . . . . .
x= = = = 6.22, The arithmetic mean for sample value is
6.22.
Arithmetic mean for discrete data arranged in frequency distribution
When the numbers x , x , … , x occur with frequencies f , f , … , f , respectively, then

the mean can be expressed in a more compact form as:
… ∑
x= …
= ∑
Example 3.3: Calculate the arithmetic mean of the sample of numbers of students in 10
classes:
50 42 48 60 58 54 50 42 50 42
∑
x= = = = = 49.6 ≈ 50
In this case there are three 42’s, one 48, three 50’s, one 54, one 58 and one 60. The
number of times each number occurs is called its frequency and the frequency is usually
denoted by f. The information in the sentence above can be written in a table, as follows.
Value, xi 42 48 50 54 58 60
Frequency, fi 3 1 3 1 1 1
xifi 126 48 150 54 58 60
The formula for the arithmetic mean for data of this type is
17
… ∑
x= …
= ∑
In this case we have:
x= = = = 49.6 ≈ 50,
The mean numbers of students in ten classes is 50.
Arithmetic Mean for Grouped Continuous Frequency Distribution

If data are given in the form of continuous frequency distribution, the sample mean can
be computed as
∑ …
x= ∑
= where x is the class mark of the ith class; i=1, 2, . . . , k , f
…
is the frequency of the ith class and k is the number of classes

Note that ∑ f = n = the total number of observations.
Example 3.4: The following frequency table gives the height (in inches) of 100 students
in a college.
Class Interval (CI) 60- 62-64 64-66 66-68 68-70 70-72 Total
62
Frequency (f) 5 18 42 20 8 7 100
Calculate the mean
Solution:
The formula to be used for the mean is as follows:
∑
x= ∑
Let us calculate these values and make a table for these values for the sake of
convenience.
Class Interval (CI) 60-62 62-64 64-66 66-68 68-70 70-72 Total
Frequency (f) 5 18 42 20 8 7 100
Mid-Point (x ) 61 63 65 67 69 71
fx 305 1134 2730 1340 552 497 6558
Substituting these values with ∑ f = 100, we get

∑
x= ∑
=x= = 65.58, The mean height of students is 65.58
Properties of the Arithmetic Mean

• The algebraic sum of the deviations of a set of numbers x , x , … , x from their mean x
is always zero. i.e.
18
n
 ( x  x)  0
i 1
i
n
2
• The sum of squares of deviations from the mean is the least. That is,  ( x  A)
i 1
i is
minimum when A  x .
 If the mean of x , x , … , x is x , then

a) The mean of x ± k,x ±k ,..., x ±k will be x ± k
b) The mean of kx , kx , … , kx will be k x.
Weighted Arithmetic Mean:- When the observations have different weight, we use
weighted average. Weights are assigned to each item in proportion to its relative
importance.
If x , x , … , x represent values of the items and w , w , … , w are the corresponding
weights, then the weighted mean, (x ) is given by
w1 x1  w2 x2    wn xn  wi xi
xw  
w1  w2    wn  wi
Example 3.5: A student’s final mark in Mathematics, Physics, Chemistry and Biology
are respectively A, B, D and C. If the respective credits received for these courses are 4,
4, 3 and 2, determine the approximate average grade the student has got for the course.
Solution: We use a weighted arithmetic mean, weight associated with each course being
taken as the number of credits received for the corresponding course.
x 4 3 1 2 Total
w 4 4 3 2 13
xw 16 12 3 4 35
w1 x1  w2 x2    wn xn  wi xi
xw  
w1  w2    wn  wi
= = = 2.69
Average grade of the student is approximately 2.69.
Combined mean: When a set of observations is divided into k groups and x is the mean
of n1 observations of group 1, x is the mean of n2 observations of group2, …, x is the
19
mean of nk observations of group k, then the combined mean,denoted byx , of all
observations taken together is given by
x n +x n +⋯+x n
x
n + n + ⋯+ n
This is a special case of the weighted mean. In this case the sample sizes are the weights.
Example 3.6: In the Previous year there were two sections taking Statistics course. At the
end of the semester, the two sections got average marks of 70 & 78. There were 45 and
50 students in each section respectively. Find the mean mark for the entire students.
Solution:
⋯
x = = = = 74.21
⋯
The combined mean of the entire students will be 74.21.
3.3.2 Geometric Mean

The geometric mean like arithmetic mean is calculated average. It is used when observed
values are measured as ratios, percentages, proportions, indices or growth rates.
Geometric mean for individual series: The geometric mean, G.M. of an individual
series of positive numbers x , x , … , x is defined as the nth root of their product.
G.M  n x1 .x2  xn = antilog ( ∑ logx )

Example 3.7: Find the G. M of 3 and 12
Solution: GM  3  12  36  6
Example 3.8: Find the G. M ofb) 2, 4 and 8
Solution: GM= √2x4x8 = √64 = 4
3.3.3 Harmonic Mean

It is a suitable measure of central tendency when the data pertains to speed, rate and time.
The harmonic of n values is defined as n divided by the sum of their reciprocal.
Harmonic mean for individual series: If x , x , … , x are n observations, then
harmonic mean can be represented by the following formula:
20
n
H .M 
1 1 1
 
x1 x2 xn
Example 3.9 A car travels 25 miles at 25 mph, 25 miles at 50 mph, and 25 miles at 75
mph. Find the harmonic mean of the three velocities.
Solution
H .M 
n = = 40.9
1 1 1
 
x1 x2 xn
3.3.4 Median
The median is as its name indicates the middle most value in the arrangement which
divides the data into two equal parts. It is obtained by arranging the data in an increasing
or decreasing order of magnitude and denoted byx.
Median for individual series
We arrange the sample in ascending order of the variable of interest. Then the median is
the middle value (if the sample size n is odd) or the average of the two middle values (if
the sample size n is even).
For individual seriesthe median is obtained by
a/ x = ( ) value if n is odd, and
( ) ( )
b/ x = if n is even
Example 3.10: Find the median for the following data.

a/ -5 15 10 5 0 2 1 4 6 and 8
b/ 5 2 2 3 1 8 4
Solution;
i. The data in ascending order is given by:
-5 0 1 2 4 5 6 8 10 15
n=10 n is even. The two middle values are 5th and 6th observations. So the
median is,
( ) ( )
x= value = = = 4.5
ii. The data in ascending order is given by:

1 2 2 3 4 5 8
The middle value is the 4th observation. So the median is 3.
21
Median for Discrete data arranged in a frequency distribution:- In this case also, the
median is obtained by the above formula. After arranging the values in an increasing
order find the smallest CF greater than or equal to that value obtained by a&b above
formula and the corresponding value is the median.
Median for grouped continuous data:-For continuous data, the median is obtained by the
following formula.
w n 
Median  L    CF   ~
x
f med  2 
Where: L= the lower class boundary of the median class; w = the class width of the
median class;
f med = the frequency of the median class; and CF  the cum. freq. corresponding to the
class preceding the median class. That is, the sums of the frequencies of all classes lower
than the median class. Where the median class is the class which contains the (n/2)th
observation whether n is odd or even, since the items have already lost their originality
once they are grouped in to continuous classes.
Example 3.11: Calculate the median for the following frequency distribution.
C.I 1 - 5 6 - 10 11 – 15 16 – 20 21 - 25 26 - 30 31 - 35 Total
Freq. 4 8 12 6 3 4 3 40
Solution: Construct the less than cumulative frequency distribution, then:
C.I 1-5 6 - 10 11 – 15 16 – 20 21 - 25 26 - 30 31 - 35 Total
Freq. 4 8 12 6 3 4 3 40
Cuml. Freq. 4 12 24 30 33 37 40
Since n = 40, 40/2 = 20, and the smallest CF greater than or equal to 20 is 24; thus, the
median class
22
is the third class. And for this class, L = 10.5, w = 5, f med =12, CF = 12. Then applying the
formula,
we get:
~
x =10.5+(20-12)*5/12=13.8
3.3.5 The Mode or modal value

The mode or the modal value is the value with the highest frequency and denoted byx.
Mode of individual series: - The mode or the modal value of individual series (raw data) is
simplyobtained by locating the observation with the maximum frequency.
Example 3.12: Consider the following data:

a. 30 45 69 70 32 18 32.The mode (x) = 32.
b. 10 20 30 10 40 30. The mode (x) = 10 and 30.
c. 10 40 30 20 50 60. No mode.
Mode for discrete data arranged in a frequency distribution:-In the case of discrete
grouped data, the mode is determined just by looking to that value (s) having the highest
frequency.
Mode for Grouped Continuous Frequency Distribution

For grouped data, the mode is found by the following formula:
In such cases, one can only determine the modal class easily: the class with the highest
frequency.
After locating this class, the mode is interpolated using:
1
Mode  L   w , where L = the lower class boundary of the modal class;
1   2
 1  f mod  f 1 ,  2  f mod  f 2 , w = the common class width, f 1 = frequency of the

class immediately preceding the modal class; f 2 = frequency of the class immediately
succeeding the modal class; and fmode = frequency of the modal class.
Example 3.13: Calculate the mode for the frequency distribution of data of example
3.11.
Solution: By inspection, the mode lies in the third class, where L =10.5, fmod = 12, f1=8,
f2=6, w = 5
23
Using the formula, the mode is:
1
Mode  L   w = 10.5 + (12-8)*5/(12-8)+(12-5) = 12.5
1   2
3.4 The Relationship of the Mean, Median and Mode

In the case of symmetrical distribution; mean, median and mode coincide. That is
mean=median = mode. However, for a moderately asymmetrical (non
symmetrical) distribution, mean and mode lie on the two ends and median lies
between them and they have the following important empirical relationship,
which is
Mean – Mode = 3(Mean - Median)
Example 3.14: In a moderately asymmetrical distribution, the mean and the mode are 30
and 42 respectively. What is the median of the distribution?
Solution:
Median = (2mean + Mode)/2 = (2*30 + 42)/3 = 34
Hence the median of the distribution is 34.
3.5 Measures of Non-central Locations
Median is the value of the middle item which divides the data in to two equal parts and
found by arranging the data in an increasing or decreasing order of magnitude, where as
quintiles are measures which divides a given set of data in to approximately equal
subdivision and are obtained by the same procedure to that of median. They are averages
of position (non-central tendency). Some of these are quartiles, deciles and percentiles.
Quartiles: are values which divide the data set in to approximately four equal parts,
denoted by Q , Q andQ . The first quartile (Q ) is also called the lower quartile and the
third quartile (Q ) is the upper quartile. The second quartile (Q ) is the median.
• Quartiles for Individual series:
Let x1 , x 2 ,  , x n be n ordered observations. The ith quartile Qi  is the value of the item
corresponding
with the [i(n+1)/4]th position, i = 1, 2, 3.
That is, after arranging the data in ascending order, Q1, Q2, & Q3 are, obtained by:
( ) ( ) ( )
Q = value, Q = value and Q = value.
24
• Quartiles for discrete data arranged in a frequency distribution:-Arranged in a
frequency distribution this case also, we will follow the same procedure as the median. That
is, we construct the less than cumulative frequency distribution and apply the formula of
quartile for individual series.
• Quartiles in continuous data:- For continuous data, use the following formula:
w  in 
Qi  L    CF 
f Qi  4 
Where i = 1,2, 3, and L, w ,fQi and CF are defined in the same way as the median.
i.e. Q1 = L + − CF , Q2 = L + − CF andQ3 = L + − CF
The class under question is the one including (ixn/4)th value. That is, the class with the
minimum
frequency greater than or equal to (ixn/4) th is the class of the ith quartile.
Deciles: are values dividing the data approximately in to ten equal parts, denoted by
D ,D ,…, D .
• Deciles for Individual Series:
Let x1 , x 2 ,  , x n be n ordered observations. The ith decile (D ) is the value of the item
corresponding
with the [i(n+1)/10]th position, i = 1, 2, . . . ,9.
That is, after arranging the data in ascending order, D1, D2, . . . & D9 are, obtained by:
( ) ( ) ( )
D = value, D = value . . . and D = value.
• Deciles for Discrete data arranged in a frequency distribution:-Arranged in a

frequency
distribution this case also, we will follow the same procedure as the median. That is, we
construct the less than cumulative frequency distribution and apply the formula of deciles
for individual series.
• Deciles for continuous data: Apply the following formula and follow the procedures of
quartile for continuous data.
25
D = L+ − CF ,i = 1, 2,...,9 . Then
Define the symbols in similar ways as we did in the case of quartiles for continuous data.
Percentiles: are values which divide the data approximately in to one hundred equal
parts, and
denoted by P , P ,…, P .
• Percentiles for Individual Series:
Let x1 , x 2 ,  , x n be n ordered observations. The ith percentile (P ) is the value of the item
corresponding with the [i(n+1)/100]th position, i = 1, 2, . . . ,99.
That is, after arranging the data in ascending order, P1, P2, . . . & P99 are, obtained by:
( ) ( ) ( )
P = value, P = value . . . and P = value.
• Percentiles for Discrete data arranged in a frequency distribution:-Arranged in a

frequency distribution this case also, we will follow the same procedure as the median. That
is, we construct the less than cumulative frequency distribution and apply the formula of
percentile for individual series.
• Percentiles for continuous data: Apply the following formula
P =L+ − CF ,i = 1, 2,...,99 . Then
Define the symbols similar ways as we did in the case of quartiles or deciles for
continuous data.
Interpretations
1. Q is the value below which ( i × 25) percent of the observations in the series are found
(where i = 1, 2,3). For instance Q means the value below which 75 percent of
observations in the given series are found.
2. D is the value below which ( i ×10) percent of the observations in the series are found
(where i = 1, 2,...,9 ). For instance D is the value below which 40 percent of the values
are found in the series.
3.P is the value below which i percent of the total observations are found (where i = 1,
2,3,...,99 ). For example 60 percent of the observations in a given series are belowP .
26
Example 3.15: Calculate Q , D , &P for the following data given on the table below.
x 10 11 12 13 14 15 16 17 18
f 2 8 25 48 65 40 20 9 2
Solution: The data is arranged in an increasing order. So we need to construct only the
cumulative frequency table before calculating the required values.
x 10 11 12 13 14 15 16 17 18
f 2 8 25 48 65 40 20 9 2
Cum. 2 10 35 83 148 188 208 217 219
Freq.
The total number of observations is 219 which is odd. Clearly then the median is 14. i.e.
x=( ) =( ) value = 110th value = 14
( ) ( )
Q = value = value = 55th value = 13
( ) ( )
D = value = value = 88th value = 14
( ) ( )
P = value = value = 198th value = 16
Example 3.16: Marks of 50 students out of 85 is given below. Based on the data find Q ,
D andP .
Marks 46-50 51-55 56-60 61-65 66-70 71-75 76-80
fi 4 8 15 5 9 5 4
Solution:- first find the class boundaries and cumulative frequency distributions.
Marks 46-50 51-55 56-60 61-65 66-70 71-75 76-80
class 45.5-50.5 50.5-55.5 55.5-60.5 60.5-65.5 65.5-70.5 70.5-75.5 75.5-80.5
boundary
fi 4 8 15 5 9 5 4
Cum. 4 12 27 32 41 46 50
frequency
27
Q1 Measure of (n/4)th value = 12.5th value which lies in group 55.5 – 60.5
Q1 = L + − CF = 55.5 + (12.5 − 12) = 55.7
D4 Measure of (4n/10)th value = 20th value which lies in group 55.5 – 60.5.
D4 = L + − CF = 55.5 + (20 − 12) = 58.2
P7 Measure of (7n/100)th value = 3.5th value which lies in group 45.5 – 50.5
P7 = L + − CF = 45.5 + (3.5 − 0) = 49.875.
28
CHAPTER FOUR: MEASURES OF DISPERSION(VARIATION)
4.1 Introduction
Just as central tendency can be measured by a number in the form of an average, the
amount of variation (dispersion, spread, or scatter) among the values in the data set can
also be measured.
Dispersion refers to the variation of the items around an average. Thus, dispersion is
defined as scatteredness or spreadness of the individual items in a given series.
4.2 Absolute and Relative Measures of Dispersion
Absolute measures of dispersion: Absolute measure is expressed in the same

statistical unit in which the original data are given such as kilograms, tones etc.
Relative measures of dispersion: A relative measure of dispersion is the ratio of a

measure of absolute dispersion to an appropriate average or the selected items of the data.
4.3 Types of Measures of Variation
4.3.1 The Range and Relative Range

Rangeis the simplest measures of dispersion. It is defined as the difference between the
largest and smallest value in a given set of data. Its formula is:
R=L−S
Where R=Range, L= Largest value in a given set of data, S= smallest value in a given set
of data.
The relative measures of range, also called coefficient of range, is defined as
LS
RelativeRange(RR) =
LS
Example 4.1: Five students obtained the following marks in statistics: 20, 35, 25, 30, 15.
Find the range and relative range
Solution: Here, L = 35, andS = 15, Range = L − S = 35 − 15 = 20,
LS 35  15
RR =   0 .4
LS 35  15
Example 4.2: Find out range and relative range of the following given data.
29
Size 5-10 11-15 16-20 21-25 26-30
Frequency 4 9 15 30 40
Solution: Here,
L = Upper class limit of the largest class = 30, L = lower class limit of the smallest class
=5
30  5
Range = 30 – 5 = 25, RR =  0.7143 .
30  5
4.3.2 The Quartile Deviation and Coefficient of Quartile Deviation
Inter-quartile range and quartile deviation are other measures of dispersion. The
difference between the upper quartile (Q ) and lower quartile (Q ) is called inter-quartile
range. Symbolically,
nter uartile ange (IQD) = Q − Q , QuartileDeviation(QD) =
The relative measure of quartile deviation also called the coefficient of quartile deviation
(CQD) is defined as:
Q −Q
CQD =
Q +Q
Example 4.3: Find inter-quartile range, quartile deviation and coefficient of quartile
deviation from the following data.
28, 18, 20, 24, 27, 30, 15
Solution: First arrange the data in ascending order. 15, 18, 20, 24, 27, 28, 30
n+1 7+1
Q = sizeof item = sizeof item = sizeof2 item = 18 marks
4 4
n+1 7+1
Q = sizeof3 itemsizeof 3 item = sizeof6 item = 28 marks
4 4
IQR = Q − Q = 28 − 18 = 10, QD = = = 5, CQD = =
= 0.217
30
Example 4.4: Find inter-quartile range, quartile deviation and coefficient of quartile
deviation from the following data
Marks 2 3 4 5 6 7 8 9
No. Of students 10 11 12 13 5 12 7 5
Solution:
Marks 2 3 4 5 6 7 8 9
No. of students 10 11 12 13 5 12 7 5
CF 10 21 33 46 51 63 70 75=N
Q = = = 19 item = 3, Q = 3 =3 = 57th item = 7
IQR = Q − Q = 7 − 3 = 4, QD = = = 2, CQD = = = 0.4
4.3.3 The Mean Deviation and Coefficient of Mean Deviation
1. The mean deviation about the arithmetic mean is, given by
∑| |
MD(X) = … for ungrouped data (individual series).
∑ | |
MD (X) = . . . for discrete data arranged in FD and for grouped
continuous frequency distribution; where X is the value for discrete data

arranged in FD and class mark of the ith class for continuous grouped data, f is
the frequency of the ith class and n = ∑ f .
2. The mean deviation about the median is also given by
∑| |
MD(X) = … for ungrouped data (individual series).
∑ | |
MD(X) = . . . for discrete data arranged in FD and a grouped
continuous frequency distribution;

Example 4.5
The following are the number of visit made by ten mothers to the local doctor’s surgery.
8, 6, 5, 5, 7, 4, 5, 9, 7, 4. Find mean deviation about mean and median.
Solution:
First calculate the three averages
31
X = 6, X = 5.5,
Then take the deviations of each observation from these averages.
xi 4 4 5 5 5 6 7 7 8 9 Total
|X − X| 2 2 1 1 1 0 1 1 2 3 14
|X − x| 1.5 1.5 0.5 0.5 0.5 0.5 1.5 1.5 2.5 3.5 14
Since the distribution is ungrouped the mean deviation about mean and median:
∑| | ∑| |
MD(X) = = = 1.4, MD X = = = 1.4
Coefficient of mean deviation (CMD):
 CMD about the arithmetic mean is given by:
( )
CMD(X) = , where MD is the mean deviation calculated about the arithmetic mean.
 CMD about the median is given by:
( )
CMD(X) = in which case MD is calculated about the median of the observations.
Example 4.6: Calculate the coefficient of mean deviation about the mean and median
the data in Example 4.5 above.
Solution:
( ) . .
CMD(X) = = = 0.23, CMD X = = .
= 0.25
4.3.4 The Variance, Standard Deviation and Coefficient of Variation
Variance and Standard Deviation
Variance is the average of squared deviations from the mean.
Population Variance ( )
If we divide the variation by the number of values in the population, we get something
called the population variance. This variance is the "average squared deviation from the
mean".
 For ungrouped data (individual series )
32
∑ ( )
= = ∑ X − where is the population arithmetic mean and N
is the total number of observations in the population.
 For discrete data arranged in FD & for continuous grouped data

∑ ( )
= = ∑f X − where is the population arithmetic mean, is the
class mark of the ith class, f is the frequency of the ithclass and N=∑ f
Sample Variance ( )
The sum of the squares of the deviations is divided by one less than the sample size.
 For ungrouped data
∑ ( )
S = = [∑ x − nx ]Where is the sample arithmetic mean and n is
the total number of observations in the sample.
 For discrete data arranged in FD and continuous grouped data

If the values xi have frequencies fi (i=1,2,…,m), then the sample variance is given
by:
2 1 m 2
S =
∑ ( )
= ∑ f x − nx or S   f i  xi  x 
n  1 i 1
For continuous grouped data,x is the class mark of the ith class, f is the frequency of
the ithclass and n=∑ f .
The Standard Deviation
It is the positive square root of the variance.
 Population Standard Deviation (s ), σ = √ where σ is the population variance.
 Sample Standard Deviation ( S ), S = √S where S is the sample variance.
Example 4.7: Find the sample variance and standard deviation of:
xi 2 4 5 6 8
fi 2 2 3 1 2
Solution: Prepare the following table:
xi fi fixi xi2 fixi2
33
2 2 4 4 8
4 2 8 16 32
5 3 15 25 75
6 1 6 36 36
8 2 16 64 128
Sum 10 49 279
Thus, n=∑ f = 10, ∑ f x = 49, ∑ f x = 279.
1
S = f x − nx
n−1
= 279 − 10( ) = (38.9) = 4.32, andS = √4.32 = 2.08.
Properties of Variance & Standard Deviation
1. If a constant is added to (or subtracted from) all the values, the variance remains
the same; i.e., for any constant k, V ( xi  k )  V ( xi ) .
Example 4.8 Consider the 6 sample values xi: 54,52,53,50,51, and 52.
The sample variance is 2 = V  xi  . Now, subtract 50 from each value to get:
yi : 4, 2, 3, 0, 1, 2; and, the variance of this new series is 2. i.e., V  x   V  y   2 .
2. If each and every value is multiplied by a non-zero constant (k), the standard
deviation is multiplied by |k| and the variance is multiplied by k2; i.e.,
V ( kxi )  k 2V ( xi ) .
3. Both the variance and the standard deviation give more weight to extreme values
and less to those which are near to the mean.
Coefficient of Variation
Coefficient of variation is used in problems where we want to compare the variability of
two or more different series.
CV = × 100%
34
For population data:CV = × 100, Where σ is the population standard deviation and μ
is population mean.
For sample data:CV = × 100, Where S is the sample standard deviation and x is sample
mean.
Remark: A distribution having less coefficient of variation is said to be less variable or
more consistent or more uniform or more homogeneous.
Example 4.9: Last semester, the students of Mathematics and Chemistry Departments
took Introduction to Statistics course. At the end of the semester, the following
information was recorded.
Department Mathematics Chemistry

Mean score 85 65
Standard deviation 25 12
Compare the relative dispersions of the two departments’ scores using the appropriate
way.
Solution:
Mathematics Departments Chemistry Departments
CV = × 100 CV = × 100
= × 100 = × 100
= 29.41% = 18.46%
Interpretation: Since the CV of Mathematics Department students is greater than that of
Chemistry Department students, we can say that there is more dispersion relative to the
mean in the distribution of Mathematics students’ scores compared with that of
Chemistry students.
4.4Standard Scores (Z-Scores):-The standard score (z-score) tells us how many
standard deviations a specific value is above or below the mean value of the data set.
That is, the z-score is the number of standard deviations the data value falls above
(positive z-score) or below (negative z-score) the mean for the data set.
Z-score computed from the population
35
X−μ
Z =
σ
Z-score computed from the sample
X−X
Z
S
Example 4.10: What is the Z-score for the value of 14 in the following sample data set?
3 8 6 14 4 12 7 10
Solution:
X = 8, S = 3.8173 thus, Z = ≈ 1.57.

.
 The data value of 14 is located 1.57 standard deviations above the mean 8 because the
z-score is positive.
Example 4.11: Suppose that a student scored 66 in Statistics and 80 in Mathematics. The
score of the summary of the courses is given below.
Course Average score Standard deviation of the score

Statistics 51 12
Mathematics 72 16
In which course did the student scored better as compared to his classmates?
Solution:
Z-score of student in Statistics: Z = = = = 1.25
Z-score of student in Mathematics: Z = = = = 0.5
From these two standard scores, we can conclude that the student has scored better in
Statistics course relative to his classmates than in Mathematics course.
4.5Moments, Skewness and Kurtosis
4.5.1 Moments
36
The moments of a distribution are the arithmetic mean of the various powers of the
deviations of items from some number. In our course, we shall use it in the study of
Skewness and Kurtosis of statistical distribution.
Moments about the origin
∑
M = , Where r = 0, 1, 2, 3, …
Moments about the origin for grouped frequency distribution andfor ungrouped
frequency distribution is
∑
M = , Where f is the frequency ofX . X is the midpoint in the case of grouped
frequency distribution or class value in the case of ungrouped frequency distribution.
Note that:M = X, M = 1
Moments about the Mean (Central Moments)
∑(X − X)
M′ =
n
Moments about the mean for grouped frequency distribution andfor ungrouped frequency
distribution.
∑ ( )
M′ = , Where f is the frequency ofX . X is the midpoint in the case of grouped
frequency distribution or class value in the case of ungrouped frequency distribution.
Note that:M ′ = SD if it is assumed n = n − 1.
Moments about any arbitrary constant
∑(X − A)
M′ =
n
Moments about any arbitrary constant A for grouped frequency distribution andfor
ungrouped frequency distribution
∑ ( )
M′ = .
37
Example 4.12: Find the first four moments about the mean for the following individual
series
X: 3 6 8 10 18
Solution: n=5,
Ser. ( − ) ( − ) ( − ) ( − )
No
1 3 -6 36 -216 1296
2 6 -3 9 -27 81
3 8 -1 1 -1 1
4 10 1 1 1 1
5 18 9 81 729 6561
Total X = 45 (X − X) = 0 (X − X) = 128 (X − X) (X
= 486 − X)
= 7940
∑( ) ∑( ) ∑( )
Thus, X = = 9, M = = 0, M = = = 25.6, M = =
= 97.2
∑(X − 9) 7940
M = = = 1588
5 5
4.5.2 Skewness
Skewness refers to lack of symmetry (or departure from symmetry) in a distribution.
A distribution is said to be symmetrical when the value is uniformly distributed around

the mean (distribution of the data below the mean and above the mean are equal). In a
symmetrical distribution, the mean, median and mode coincide (i.e., mean = median =
mode).
Positively skewed distribution: In a positively skewed distribution mean is greater than
the mode and the median lies somewhere in between mean and mode.
38
Negatively Skewed distribution: In a negatively skewed distribution mode is greater
than the mean and the median lies in between mean and mode. .
Note that: In moderately skewed distributions the averages have the following
relationship.
(Mean – mode) = 3(mean - median)
Measures of skewness ( )
It gives information about the shape of the distribution and the degree of variation on
either side of the central value. The three most commonly used measures of skewness are
Pearson’s coefficient of skewness, Bowley’s coefficient of skewness and coefficient of
skewness based on moments.
1. Pearson’s coefficient skewness (Pearsonian coefficient of skewness)

The skewness of the distribution can be measured by Pearson’s Coefficient of
Skewness ( ), for which the formula is given below:
Mean − Mode
α =
Standarddeviation
2. Bowley’s Coefficient of Skewness
Bowley’s coefficient of skewness is based on quartiles. The formula for
calculating coefficient of skewness is:
( ) ( )
α = =
3. Moment Coefficient of Skewness

39
Moment coefficient of skewness is based on moments. The formula for
calculating coefficient of skewness is:
α = / =
Where, M'r = ∑ (x − x) /n
The shape of the curve is determined by the value of α
α > 0, the distribution is positively skewed/skewed to the right,i.e mode < median
<mean
smaller observations are more frequent than larger observations. i.e., the majority of
the observations have a value below an average.
α = 0, the distribution is symmetric,i.e. mean = mode = median
α < 0, the distribution is negatively skewed/skewed to the left.i.e., mean < median <
mode
smaller observations are less frequent than larger observations. i.e., the majority of
the observations have a value above an average.
4.5.3 Kurtosis
Kurtosis is a measure of peakedness of a distribution. If a curve is more peaked

than the normal curve it is called ‘leptokurtic’; if it is more or flate-topped than
the normal curve it is called ‘platykurtic’ or flat-topped. The normal curve itself is
known as ‘mesokurtic’.
40
Measures of Kurtosis ( )
The moment coefficient of kurtosis:
α = =
The peakedness depends on the value of α

 α > 3  the curve is leptokurtic,
 α = 3  the curve is mesokurtic,
 α < 3  the curve is platykurtic.
Example4.13: Based on the following data:

M′0 = 1, M′1 = -0.6, M′2 = 1.6, M′3 = -2.4, M′4 = 5.8
a/ Find the coefficient of skewness and discuss the distribution type.
b/ Find the coefficient of kurtosis and discuss the distribution type.
Solution:
.
a/ α = / = . /
= -1.19 < 0, the distribution is negatively skewed.
.
b/ α = = = 2.26 < 3, the curve is platykurtic.
.
Example 4.14: Findthe coefficient of skewness and the coefficient of kurtosis for
the above example 4.13.
Solution:
. .
i) α = / = = .
= 0.75
( . )
the distribution is positively skewed.
ii) α = =
.
= 2.42
the curve is platykurtic.
CHAPTER FIVE: ELEMENTARY PROBABLITY
5.1 Definition of some probability terms

 Experiment: Any process of observation or measurement or any process which
generates well defined outcome.
41
 Random experiment: it is an experiment which can be repeated any number of times
under the same conditions, but does not give unique results. The result will be any
one of several possible outcomes, but for each trial, the result will not be known in
advance. ARandom experiment is also called a trial & the outcomes are called events.
 Sample space: - is the collection of all possible out comes or sample points of a
random experiment.
 Sample point: -Each element of sample space is called Sample point.
 Event: - is a subset of a sample space i.e. an event is a collection of sample points.
 Impossible event:- this is an event which will never occur.
Example 5.1: In an experiment of rolling a fair die, S = {1, 2, 3, 4, 5, 6}, each sample
point is an equally likely out come. It is possible to define many events on this sample
space as follows:
A = {1, 4} - the event of getting a perfect square number.
B = {2, 4, 6} - the event of getting an even number.
C = {1, 3, 5} - the event of getting an odd number.
D = the event of getting number 8 is an impossible event.
Example 5.2: If we toss a coin the sample space (S) of this experimentS = {head, tail}
where head and tail are two faces of a coin. If we are interested the outcome of head will
turn up then the event E= {head}.
Example 5.3: Find the sample space of tossing a coin three times.
S= {HHH, HHT, HTH, HTT, THH, THT, TTH, TTT}
 Mutually exclusive event: - two events A and B are said to be mutually exclusive if
there is no sample point which is common to A and B. i.e. A ∩ B = ϕ
 Independent event: two or more events are said to be independent if the occurrence
or non-occurrence of an event does not affect the occurrence or non-occurrence of the
other.
 Dependent Events: Two events are dependent if the first event affects the outcome
or occurrence of the second event in a way the probability is changed.
 Complement of an Event: the complement of an event A means nonoccurrence of A
and is denoted by A', or Ac contains those points of the sample space which don’t
belong to A.
42
 Equally likely outcomes: if each outcome in a sample space has the same chance to
be occurred.
Example 5.4: Casting a fair die all possible outcomes are equally likely.
5.2 Counting rules:addition, multiplication, Permutation&Combination rule
In order to calculate probabilities, we have to know
 The number of elements of an event.
 The number of elements of the sample space.
That is in order to judge what is probable, we have to know what is possible.
In order to determine the number of out comes one can use several rules of counting:
1. The addition rule
2. The multiplication rule
3. Permutation rule
4. Combination rule
1. The addition Rule
Suppose that a procedure, designated by 1, can be done in n1 ways. Assume that a second
procedure designated by 2, can be done in n2 ways. Suppose furthermore, that it is not
possible that both 1 and 2 done together. Then, the number of ways in which we can do1
or 2 is n + n ways.
Example 5.5:suppose we are planning a trip to some place. If there are 3 bus routs & two
train routs that we can take, then there are 3+2=5 different routs that we can take.
2. Multiplication rule: If an operation consists of k steps and the 1st step can be done in
n1 ways, the 2nd step can be done in n2 ways (regardless of how the 1st step was
performed), the kth step can be done in nk ways, (regardless of how the preceding steps
were performed), then the entire operation can be performed in n1· n2·… · nkways.
Example 5.6: Suppose that a person has 2 different pairs of trousers and 3 shirts. In how
many ways can he wear his trousers and shirts?
Solution: He can choose the trousers in n1  2 ways, and shirts in n 2  3 ways. Therefore,
he can wear in n1  n2  2  3  6 possible ways.
3. Permutation:-An arrangement of objects with attention given to order of arrangement
is called permutation. The number of permutation of n different objects taken r at a time
is obtained by:
n!
n Pr  for r  0,1, 2,  , n
(n  r )!
43
Permutation Rule:
a) The number of permutations of n objects taken all together is n!
n! n!
i.e. n!= n*(n-1)*(n-2)*…*3*2*1 = n Pn    n!
(n  n)! 0!
Note: By definition 0! = 1
b) The arrangement of n distinct objects in a specific order using r objects at a time is is
called the permutation of n objects taken r objects at a time. It is written as nPr and the
formula is
n!
n Pr 
( n  r )!
c) The number of distinct permutation of n objects in which n1 are alike, n2 are alike,..., nk
are alike is
n! for n  n1  n2    nk
n1 !.n 2 !. .n k !
Example 5.7: Find number of permutations of the letters in the word ‘‘statistics’’.
Solution:
There are 3 s’s, 3t’s, 1a’s, 2i’s and 1c’s. i.e. n = 3, n = 3,n = 1,n = 2 and n = 1
Therefore 10! = 50,400.
3!.3!.1!.2!1!
Example 5.8: A photographer wants to arrange 3 persons in a row for photograph. How
many different types of photographs are possible?
Solution:
Assume 3 persons Aster (A), lemma (L), Yared (Y) and n=3
Since n! =3! = 3*2! = 6, there are 6 possible arrangement ALY, AYL, LAY, LYA,YLA
and YAL.
Example 5.9: Suppose we have a letters A,B, C, D&E
a) How many permutations are there taking all the four?
b) How many permutations are there taking two letters at a time?
Solution:
a) Here n = 5, there are four distinct object.
There are 5! = 120 permutations.
b) Here n = 5, r = 2
There are 5P2 = 5!/(5-2)! = 120/6 = 20 permutations.
44
Example 5.10: Fifteen Ethiopian athletes were entered to the race. In how many different
ways could prizes for the first, the second and the third place be awarded?
Solution
15 objects taken 3 at a time 15P3=15!/(15-3)! = 2730 ways.
4. Combination-A selection of objects considered without regard to order in which they
occur is called Combination. The number of combination of n different objects taking r of
n n!
them at a time is n C r     , for r  0,1,2,  , n .
r
  r!( n  r )!
Example 5.11: Given the letters A, B, C, and D list the permutation and combination for
selecting two letters.
Solution:
Permutation Combination
AB BA CA DA AB BC
AC BC CB DB AC BD
AD BD CD DC AD DC
Note that in permutation AB is different from BA but in combination AB is the same as
BA.
Example 5.12: In a club containing 7 members a committee of 3 people is to be formed.
In how many ways can the committee be formed?
 n n! 7 7!
Solution: 7C3 = n C r      7 C3     = 35
 r  r!( n  r )!  3  3!(7  3)!
Example 5.13: How many four-digit numbers can be formed with the 10 digits 0,1,2, . .
,9 if
a/ repetitions are allowed, b/ repetitions are allowed, and c/ the last digit must be zero &
repetitions are not allowed.
Solution:
a/ the first digit can be any one of 9 (since 0 is not allowed). The second, third and fourth
digits can be any one of 10. Then 9.10.10.10=9000 numbers can be formed.
b/ the first digit can be any one of 9 & the remaining three can be chosen in 9 P3 ways.
Thus 9. 9 P3 = 4536 numbers can be formed.
c/ the first digit can be chosen in 9 ways & the next two digits in 9 P2 ways. Thus 9. 8 P2 =
504 numbers can be formed.
45
5.3 Probability of an event
Definition: Probability is a numerical measure of the chance or likelihood that a
particular event will occur & it lies in the range from 0-1, inclusive. Probability is a
building block of inferential statistics.
Definition: Let E be an experiment. Let S be a sample space associated with E. With
each event A in S we associate a real number designated by P (A) and called the
probability of A.
Generally probability can be divided into two
i) Subjective probability: - probability determined based on individual’s own judgment,
experience, information, belief . . . is called Subjective probability.
ii) Objective probability: - the probability of an event in a certain experiment based on
experimental evidence.
Basic approaches to probability
There are three different conceptual approaches to the study of probability theory.
These are:
1. The classical approach.2. The frequentist approach.3. The axiomatic approach.
1. Classical approach:
Definition: If there are n equally likely outcomes of an experiment, and out of the n
outcomes event A occur only k times the probability of the event A is denoted by P (A) is
defined as
( )
p(A) = = ( )
=
Note: Classical approach of measuring probability fails to answer for the following
conditions:
 If total number of outcomes is infinite or if it is not possible to enumerate all
elements of the sample space.
 If each out come is not equally likely.
Example 5.14: Compute a/ the probability of having two boys & one girl is a three child
family using the classical method, assuming boys & girls are equally likely.
b/ using (a) compute the probability of having three boys in a three-child family.
c/ using (a) compute the probability of having three girls in a three –child family.
d/ using (a) compute the probability of having two girls & one boy in three child
family.
46
Solution
The sample space S or the experiment is

S= {BBB, BBG, BGB, BGG, GBB, GBG, GGB, GGG}
So n(S)=8
a/ For the event A= ''two boys & a girl'' = {BBG,BGB,GBB} , we have
n(A)=3,Since the outcome are equally likely , the probability of A is P(A)=
n(A)/n(S)=3/8 =0.375
b/ Compute the probability of having three boys in a three-child family.
For the event B= ''three boys'' = {BBB} , we have n(B)=1,Since the outcome
are equally likely , the probability of B is P(B)= n(B)/n(S)=1/8 = 0.125
c/ compute the probability of having three girls in a three –child family.
For the event C= ''three girls'' = {GGG} , we have n(C)=1,Since the outcome
are equally likely , the probability of C is P(C)= n(C)/n(S)=1/8 = 0.125
d/ Compute the probability of having two girls & one boy in three child family.
For the event D= ''two girls & one boy'' = {BGG, GBG,GGB}, we have
n(A)=3,Since the outcome are equally likely, the probability of D is P(D)=
n(D)/n(S)=3/8 =0.375.
Example 5.15: A box of 80 candles consists of 30 defective and 50 non defective
candles. If 10 of these candles are selected at random with out replacement, what is the
probability
a) all will be defective?b) 6 will be non-defective?c) all will be non-defective?
Solution
 80 
Total Selection:    N  n( S )
 10 
a) Let A be the event that all will be defective.
 30   50 
Total way in which A occur =   *    N(A)=n (A)
 10   0 
( )  30   50   80 
P (A) ) = =   *   /    0.00001825
( )
 10   0   10 
b) Let A be the event that 6 will be non defective.
 30   50 
Total way in which A occur =   *    NA=n (A)
4 6
47
( )  30   50   80 
P (A) ) = ( )
=   *   /    0.265
4 6  10 
c) Let A be the event that all will be non defective.
 30   50 
Total way in which A occur =   *    NA=n (A)
 0   10 
( )  30   50   80 
P (A) = ( )
=   *   /    0.00624.
 0   10   10 
2. The Frequentist Approach (Empirical Probability): This approach to probability is
based on relative frequencies.
Definition: Suppose we do again and again a certain experiment n times and let A be an
event of the experiment and let k be the number of times that event A occurs. Therefore
the probability of the event A happening in the long run is given by:
P(A) = =
In other words given a frequency distribution, the probability of an event (A) being
in a given class is P(A) =
Example 5.16: The national center for health statistics reported that of every 539 deaths
in recent years, 24 resulted that from automobile accident, 182 from cancer, and 353 from
other disease. What is the probability that particular death is due to an automobile
accident?
Solution
P (automobile) = death due to automobile /total death =24/539 = 0.445
The probability that particular death is due to an automobile accident is 0.445.
3. The axiomatic approach.
Let E be a random experiment and S be a sample space associated with E. With each
event A a real number called the probability of A satisfies the following properties called
axioms of probability or postulates of probability.
1.0≤ P (A) ≤1
2. P(S) =1, S is the sure/certain event.
3. If A1 and A2 are mutually exclusive events, the probability that one or the other occur
equals the sum of the two probabilities. i. e. P(A1∪A2)=P(A1)+P(A2)
Similarly P(A1∪A2∪ . . . An) = P(A1)+P(A2) +. . . +P(An) = ∑ A
4. P (A') =1-P (A)
48
5. P (ø) =0, ø is the impossible event.
5.4 Some probability rules
Rule l: let A be an event and A' be the complement of A with respect to a given sample
space of an experiment, then P(A')=1-P(A)
Proof: let S be a sample space S=AUA' and, A and A' are mutually exclusive
A∩A' = ø
P(S) = P (AUA') = P (A') + P (A) and P(S) = 1
1= P (A') + P (A) => P (A') = 1-P (A)
Rule 2: let A and B are events of a sample space S, then
P (A'∩ B) = P (B)-P (A ∩ B)
Proof: B =S ∩ B = (AUA') ∩ B = (A∩ B) U (A'∩ B)
If A∩B ≠ ø , then P(B) =P (A∩ B) +P (A' ∩ B)
P (A' ∩ B) = P(B) – P(A ∩ B).
Rule 3: Suppose A and B are two events of a sample space, then
P(AUB) = P(A) + P(B) – P(A ∩ B)
Proof:
(AUB) = AU(A' ∩ B), A and A' ∩ B are disjoint sets
∴ P(AU B) = p(A) + p(A' ∩ B) . . . .*
But we have already proved that P (A’ n B) = P (B) – P (A ∩ B)
Put this in equation *
P(A U B) = P(A) + P (B) – P (A ∩ B)
Example 5.17: A fair die is thrown twice. Calculate the probability that the sum of spots
on the face of the die that turn up is divisible by 2 or 3.
Solution
S={(1,1),(1,2),(1,3),(1,4),(1,5),(1,6),(2,1),(2,2),(2,3),(2,4),(2,5),(2,6),(3,1),(3,2),(3,3),(3,4
),(3,5),
(3,6),(4,1),(4,2),(4,3),(4,4),(4,5),(4,6),(5,1),(5,2),(5,3),(5,4),(5,4),(5,5),(5,6),(6,1),(6,2),(6,
3),(6,4),(6,5),(6,6)}
This sample space has 6*6 =36 elements let A be the event that the sum of the spotson
the die is divisible by 2 and B be the event that the sum of the spots on the die isdivisible
by three, then
49
A = {(1,1), (1,3), (1,5), (2,2), (2,4), (2,6), (3,1), (3,3), (3,5), (4,2), (4,4), (4,6), (5,1), (5,3),
(5,5), (6,2), (6,4), (6,6)}
B = {(1,2), (1,5), (2,1), (2,4), (3,3), (3,6), (4,2), (4,5), (5,1), (5,4), (6,3), (6,6)}
A∩B = {(1,5), (2,4), (3,3), (4,2), (5,1), (6,6)}
P (A or B) = P (A U B)= P (A) +P (B) – P (A∩B)= 18/36 + 12/36 -6/36 = 24/36 = 2/3
5.5 Conditional Probability and Independence
5.5.1 Conditional Probability
If A and B are events. Conditional probability of A given B means the probability of
occurrence of A when the event B has already happened.
It is denoted by P (A/B) and is defined by
P (A/B) = P(A ∩ B)/P (B), if P (B)≠0
Conditional probability of B given A means the probability of occurrence of B when the
event A has already happened. It is denoted by P (B/A) and is defined
P (B/A) = P(A ∩ B)/P (A), if P (A)≠0
P (A ∩ B) = P (A) P (B/A) = P (B) P (A/B).
5.5.2 Multiplication Law of Probability
If A and B are events in a sample space S, then
P (A ∩ B) = P (A) P (B/A), P (A) ≠ 0
P (A ∩ B) = P (B) P (A/B), P (B) ≠ 0
Where P (B/A) represents the conditional probability of B given A and P (A/B)
represents the conditional probability of A given B.
Note: Extension of multiplication law of probability for ‘n’ events A1, A2, …, An we
haveP (A1∩ A2∩ …∩An) = P (A1) P (A2/A1) p (A3/A1∩ A2)…P(An/A1∩ A2∩ …∩An-1)
Example 5.18: A coin is tossed twice. If it is already known that the first coin has thrown
a head, what is the probability of getting two heads?
Solution:
S = {HH, HT, TH, TT}, A = the first shows a head = {HH, HT}, B= two heads occur
={HH}P (B/A) = P(A ∩ B)/ P(A)But A ∩ B ={HH}, P(A ∩ B) =1/4, P(A)=1/2,
therefore, P (B/A) = P(A ∩ B)/ P(A) = 1/2
Example 5.19: Let A and B are events such that P (A U B) = ¾, P (A ∩ B) = ¼ and P(A'
) = 2/3.
Find P (A'/B)
Solution:
50
P(A') = 2/3  P (A) = 1- P(A') = 1-2/3 = 1/3
Now, P (A U B) = P (A) + P (B) - P (A ∩ B)
3/4 = 1/3 + P (B) – ¼
P(B) = 3/4 - 1/3 + ¼ = 2/3
Therefore, P (A/B) = P (A ∩ B)/P(B) = 3/8  P(A'/B) =1-P (A/B) = 1-3/8 =5/8.
5.5.3 Probability of Independent Event
Two events A and B are said to be independent if the occurrence of A has no bearing on
occurrence of B. That means knowledge of A has occurred given no information about
the occurrence of B. Two events, A and B, are said to be independent if P(A∩B)
= P(A)P(B).
Suppose A and B are independent events with 0<P (A) <1 and 0<P (B) <1. Thefollowing
statements true:
i. A' and B' are independent, ii. A and B' are independent, iii. A' and B are independent
iv. P(B|A) = P(B), v. P(B|A') = P(B)
Example 5.20: A box contains four black and six white balls. What is the probability of
getting two black balls in drawing one after the other under the following conditions?
a. The first ball drawn is not replaced
b. The first ball drawn is replaced
Solution
Let A= first drawn ball is black
B= second drawn is black
Required P (A n B)
a. P (A ∩ B) = P (B/A) P(A) = (4/10) (3/9) = 2/15
b. P (A ∩ B) = P (A) P (B) = (4/10) (4/10) = 16/100 = 4/25.
5.6 Total probability and Bayes’ Theorem
Total probability:-If events B1, B2, …,& Bk constitute a partition of the sample space S & p(Bi) ≠ 0
for i = 1,2,…,k, then for any event A in S, P(A)= ∑ p(Bi)p(A/Bi).
Example5.21: In a factory, machines A1,A2, A3 manufactures 25%, 35%, 40% of the

total output respectively. Out of their products 5%, 4% & 2% are, respectively defective.
An item is drawn at random from the products is found to be defective. What is the
probability that defective item is produced by all machines?
51
So/n: p(A1)=0.25, p(A2) = 0.35, p(A3) = 0.40, P(D/A1)= 0.05, P(D/A2) = 0.04, P(D/A3)
=0.02P(D)= ∑ p(Ai)p(D/Ai) = p(A1) P(D/A1) + p(A2) P(D/A2) + p(A3) P(D/A3)
= (0.25) (0.05)+ (0.35) (0.04)+ (0.40) (0.020) = 0.0345
Bayes’ Theorem:- If B1, B2, …,& Bk are events which make an exhaustive partition of
the sample space S, if A is any event in S, then the conditional probability of Bi given
P( Bi )  P( A / Bi )
that A has already occurred is: P( Bi / A)  k
 P( B )  P ( A / B )
i
i i
Note: the denominator is the total probability
Example 5.22: Based on the above example, what is the probability that it was
manufactured by machine A1?
P( A1 )  P( D / A1 )
Sol/n:- P( A1 / D)  k
= (0.25)(0.05)/0.0345 = 0.3623
 P( A )  P( D / A )
i
i i
52
CHAPTER SIX
PROBABILITY DISTRIBUTION
6.1 The Concept of Random Variables
Definition: A variable whose values are determined by chance with associated

probabilities is called a random variable. It is a quantity which in different observations
can assume different values.
Random variables are usually denoted with capital letter X, Y, Z etc, while the values
taken by them are denoted by lower case letters x, y, z etc. Thus, P (x1 X  x2) is the
probability that the random variable X takes values between x1 and x2, both inclusive. A
random variable can be discrete or continuous.
6.1.1 Discrete Random Variable
If the random variable X can assume only a particular finite or countably infinite set of
values, it is said to be a discrete random variable.
Example 6.1: Consider an experiment of "flipping a fair coin 3 times". List the elements
of the sample space that are assumed to be equally likely (as this is what is meant by a
fair or balanced coin) and the corresponding values x of the r-v X, the number of heads
observed.
Solution: If H stands for heads and T for tails, then the sample space corresponding to
this experiments is S = {HHH, HHT, HTH, HTT, THH, THT, TTH, TTT}.
Since X= the number of heads observed, the results are shown in the following table:
Element of HHH HHT HTH HTT THH THT TTH TTT

sample space
1/8 1/8 1/8 1/8 1/8 1/8 1/8 1/8

Probability
X 3 2 2 1 2 1 1 0
Thus, we can write X(HHH) = 3, X(HHT) = 2, , X(TTT) = 0, and P(X = 3) = 1/8 = the
probability that the r-v X is 3, P(X= 2) = 3/8, and P(X=0)=1/8.
53
Note that the possible values of X are: xi  0, 1, 2, 3 .
6.1.2 Continuous Random Variable
A random variable X is said to be continuous if it can take all possible values (integral as
well as fractional) between certain limits. Continuous random variables occur when we
deal with quantities that are measured on a continuous scale.
Example 6.2: -The height of an individual, -The distance between Debre Markos and
Addis Ababa
6.2 Probability Distribution
A probability distribution shows the possible outcomes of an experiment and the

probability of each of these outcomes.
 Discrete variable is the probability massy function (pmf) and is usually denoted by
p(x). If X is a discrete random variable taking at most a countably infinite number of
values x1, x2, …, then P (xi) = P(X = xi): i= 1, 2 …is called the probability mass
function of random variable X. The set of ordered pairs {xi, P (xi)} i= 1, 2 … gives
the probability distribution of the random variable X. The numbers P (xi): i= 1,
2…must satisfy the following conditions.

i) P(xi) ≥0, ii)  P ( X  x i ) = 1
i 1
 Continuous variable is the probability density function (pdf) and is usually denoted by
f(x).
The function f(x) is called probability density function of X. And it satisfies the following
conditions.
i) f(x)≥0 for all x, -∞ <x < ∞
ii) ∫ f (x)d x = 1
Discrete probability distribution

Discrete probability distribution is a distribution whose random variable is discrete. It
describes a finite set of possible occurrences, for discrete “count data.”
Example 6.3: Consider the possible outcomes for the random experiment of tossing three
coins together once.
Sample space, S = {HHH, THH, HTH, HHT, TTH, THT, HTT, TTT}
54
Let X be the number of heads that will turn up when three coins tossed. The possible
values of X are 0,1,2 and 3.
P(X = 0) = P(X (TTT)) = 1/8,
P(X=1) = P(X (HTT))+P(X (THT) )+ P(X (TTH) )=1/8+1/8+1/8 = 3/8
P(X=2) = P(X (HHT)) +P(X (HTH)) +P(X (THH)) = 1/8+1/8+1/8 = 3/8,
P(X=3) = P(X (HHH)) = 1/8.
X 0 1 2 3
P(X=x)
Continuous probability distribution

Continuous probability distribution is a probability distribution whose random variable is
continuous. Let a and b be any two values; a <b. The probability that X assumes a value
that lies between a and b is equal to the area under the curve a and b;that is P(a  X  b) =
b
 f ( x)dx . The integration from a to b in the case of the continuous variable is analogous
a
to the summation of probabilities in the discrete case.

Example 6.4:A continuous random variable X has a probability density function given
by
1 1
f(x) = x  , 0  X  1.
4 2
Find the probability that X lies between the interval 0 and 1.
1
1 1 1 1 1 1 5
Solution:   x  dx  x 2  x 10   
0
4 2 8 2 8 2 8
6.3 Expectation and Variance of Random variable
6.3.1 Expectation
The averaging process, when applied to a random variable is called expectation. It is
denoted by E(X) or and is read as the expected value of X or the mean value of X.
Case 1: For discrete random variable
Suppose X is a discrete random variable which takes on values in a finite set x1, x2,…, xn
with probabilities P(xi) = P[X = xi] i= 1, 2, …n, then Expected value of X, E(X) of the
discrete random variable is given by:
55
n
E(x) =  =  x P( x )
i 1
i i
Case 2: For continuous random variable

If X is a continuous random variable then
E(X) = ∫ xf (x) dx provided ∫ ∣ x ∣ f (x) dx < ∞ where f (x) is the probability
density function of the continuous random variable X.
Case 3: Mathematical expectation of some real function h(x) of a discrete random
variable is given by:
n
E[h(x)] =  h( x ) P ( x )
i 1
i i
Similarly if X is a continuous random variable, then

E[h(x)] =∫ h(x)f (x) dx
Some Properties of Expectation
If X and Y are random variables and a, b are constants then:
1. E(k) = k, where k is any constant
2. E (kX) = k E(X), where k is any constant
3. E (X + k) =E(X) + k
4. E(X + Y) = E(X) +E(Y)
5. E(XY) = E(X) E(Y), if X, Y are independent random variables
6.3.2 Mean and variance of a random variables
Mean of X = E(X)
Variance of X =σ = E(X ) − [E(X)]
= E[X − E(X)]
Case 1:
If X is a discrete random variable with expected value μ then the variance of X,
denoted by Var (X), is defined by:
σ =Var(X) = E(X-μ)2 = E(X2) – μ2
=∑ (x ) P(x ) − μ
Alternatively, Var(X) = ∑ (x − μ ) P((x )
Case 2:
If X is a continuous random variable, then var (X),
56
σ = (x − x) f (x) dx
Properties of Variances
 For any random variable X and constant a, it can be shown that
- Var(aX) = a2Var(X)
- Var(X + a) = Var(X) +0 = Var(X)
 If X and Y are independent random variables, then
Var(X + Y) = Var(X) + Var(Y)
More generally if X1, X2 ……, Xk are independent random variables, then
Var (X1 +X2 + …..+ Xk) = Var (X1) +Var (X2) +…. + var (Xk)
i.e., Var ∑ x =∑ Var(x )
Example 6.5: Two fair coins are tossed. Determine Var (X) where X is the number of
heads that appear.
a) Use the definition of the variance.
b) Use the fact that the variance of the sum of independent variables is equal to the sum
of the variance.
Solution:
a) Let X is number of heads with possible values 0,1and2. The Sample spaceconsists of
{HH, TH, HT,TT}
P (X = 0) =¼, P (X = 1) = ½, P(X=2) = ¼
E (X) = 0.P(X=0) +1.P (X=1) +2P(X=2) = 0 (1/4) + 1(1/2) +2(1/4) = 1.
E(X2) = 02P(X=1) +12.P(X=1) +22P(X=2) = 0(1/4) + 1(1/2) +4(1/4) = 3/2.
Implies that, Var (X) = E(X2) – μ2 = 3/2-1=1/2
b) Let X be head on the first coin with possible values 0 and 1
Y be head on the second coin with possible values 0 and 1.
P(X= 0) = ½, P (X = 1) = ½ and P (Y=0) = ½, P(Y=1) = ½
E(X) = 0.P(X=0 + 1.P(X=1) E(Y) = 0.P(Y=0) +1P(Y=1)
= 0(1/2) +1(1/2) = 0(1/2) +1(1/2)
= 1/2 = 1/2
E(X2) = 02 .P(X=0) +12.P(X=1) E (Y2) = 02.P(Y=0) +12P(Y=1)
= 0(1/2) +1(1/2) = 0(1/2) +1(1/2)
=1/2 =½
57
Var (X) = E (X2) – μ2 Var (Y) = E (Y2) - μ2
= ½ - (1/2)2 = ¼ = ½ - (1/2)2 = ¼
X and Y are independent (i.e. the outcome of one coin does not influence the outcome of
the second)
Var (X+Y) = Var (X) +Var (Y) = 1/4 +1/4 = ½ .
x2
Example 6.6:Compute the variance of f(x) = for 0 < x < 3
9
V(x) = E(x2) – [E(x)]2
3 2 3 4
2 x  x 1  x5  27
E(x2) = x 
0  9 dx   dx    3
0  E(x) =
 9 9 5  5
0 ,
3
 x2  1  x4  3 9
0  9
x dx   
 9 4 
0 
4
2
27  9 
Therefore, V(x) = E(x2) – [E(x)]2 =    = 0.34
5 4
6.4 Common Discrete Probability Distributions

6.4.1 Binomial Distribution
The origin of binomial distribution is Bernoulli's trial. Bernoulli's trial is an experiment

where there are only two possible outcomes, “success" or "failure". For instance, while
rolling a fair die, a "success" may be defined as "getting even numbers on top" and odd
numbers as "Failure".
Generally, the sample space in a Bernoulli trial is S = {S, F}, S = Success, F = failure.
Notation: Let probability of success and failure are p and q respectively.
P (success) = P(s) = p and P (failure) = P (f) = q, where q= 1- p.

Definition: Let X be the number of success in n repeated Binomial trials with probability
of success p on each trial, then the probability distribution of a discrete random variable
X is called binomial distribution. Let p = the probability of success q= 1-p= the
probability of failure on any given trial. A binomial random variable with parameters n
and p represents the number of r successes in n independent trials, when each trial has p
probability of success.
58
If X is a random variable, then for i= 0, 1, 2… n
n!
P((X = r)) = p (1 − p)
r! (n − 1)!
!
P((X = r)) = pq where q = 1 – p
!( )!
A binomial experiment is a probability experiment that satisfies the following

assumptions.
1. The experiment consists of n identical trials.
2. Each trial has only one of the two possible mutually exclusive outcomes, success or a
failure.
3. The probability of each outcome does not change from trial to trial.
4. The trials are independent.
If X is a binomial random variable with two parameters n and p then
i) E (X) = np, ii) Var (X) = npq
Example 6.7: A fair coin is flipped 3 times, what is the probability of getting exactly
twoheads?
Solution:
Let X be number of heads with possible values 0,1,2,3
P (getting head) =) p = ½, q = 1-p =1/2, n =3
3!
P((X = 2)) = ( ) ( ) =
2! ((3 − 2)!)
6.4.2 Poisson Distribution
It is a discrete probability distribution which is used in the area of rare events. The
Poisson distribution counts the number of success in a fixed interval of time or within a
specified region.
Examples of random variables that usually obey the Poisson distribution are:
The number of car accidents in a day, Arrival of telephone calls over interval of
times, Natural disasters like earth quake.etc
To apply the Poisson distribution, two conditions must be met:
i) The number of success that occurs in any interval is independent of those that
occur in other non-overlapping intervals.
ii) The probability of a success in an interval is proportional to the size of the
interval.
59
Let X is the number of occurrences in a Poisson process and λ be the actual
average number of occurrence of an event in a unit length of interval, the
probability function for Poisson distribution is,
P((X)) = , x = 0,1,2, ….
!
Remarks
 Poisson distribution possesses only one parameter λ
 If X has a Poisson distribution with parameterλ , then E (X) = λ and Var (X) = λ,
Example 6.8In a small city, 10 accidents took place in a time of 50 days. Find the
probability that there will be a) two accidents in a day and b) three or more accidents in a
day.
Solution:
There are 0.2 accidents per day.
Let X be the random variable, the number of accidents per day
X ～poiss (λ = 0.2) X = 0, 1, 2, ….
(0.2) e .
P((X = 2)) = = 0.0164
2!
b) P (X ≥ 3) = P(X = 3) + P(X = 4) + P(X = 5) +... = 1- [P(X = 0) + P(X = 1) + P(X =
2)]

. . . . . . since  P( x )  1
i0
i
= 1- [0.8187 + 0.1637 + 0.0164] = 0.0012
Common Continuous Probability Distributions

6.5.1 Normal Distributions
In statistical estimation and testing of hypotheses the normal distribution plays an

important role.
A random variable X has a normal distribution with parameters μ&σ2 and it is known as a
normal
random variable iff its pdf is given by:
1  - 1  x    2  1 2 2
f ( x)  exp     e ( x   ) / 2
 2  2      2
60
for       ,    x   &   0.
The graph of the normal distribution is known as the normal curve, which is bell-shaped:
X
Normal probability curve
SOME PROPERTIES OF THE NORMAL CURVE

The following are the important properties of the normal curve:
1. The normal curve is “bell-shaped”

2. The normal curve is symmetrical about the mean.
Since this is the property of the median, it follows that, for the normal distribution,
Mean = Median=Mode.
3. The height of the normal curve is at its maximum when X    mean , which
means, again,
Mean = Median=Mode. The normal curve is asymptotic to the X- axis.
4. The Probability that a random variable will have a value between any twopoints is
equal to the area under the curve between those points.
Standard Normal Distribution

By standardization we mean that the random variable X will be transformed to another
random variable whose mean is 0 and variance is 1. The normal distribution with zero
mean and standard deviation one is known as standard normal distribution. If X has
normal distribution with mean μx and standard deviation σ, then the standard normal
distribution Z is given by
Z= , for population, Z= , for sample
61
Using the properties of expectations, it is now trivial to show that E ( Z )  0 and V(Z)  1 .
1 2
1 2z
The pdf of Z is, thus, given by f ( z )  e ,  z   .
2
z
The entries in Table A of the Appendix are the values of P(0  Z  z )   f ( z )dz .
0
That is, the table gives us the probabilities that a random variable Z having the standard
normal distribution will take on a value on the interval from 0 to z, for
z  0.00, 0.01, 0.02, , 3.98, and 3.99; due to the symmetrical property of the normal curve
with respect to its mean, it is unnecessary to extend the table for negative values of Z.
Note that P(Z  0)  P(Z  0)  0.5.
Table value

0 Z
Tabulated areas under the standard N.D from 0 to z
1 2
z 1  z
That is, the arrowed region is P(0  Z  z )   e 2 dz .
0 2
Basic Properties of the standard normal Curve:

1. Total area under the standard normal curve is equal to 1.
2. The standard normal curve is asymptotic to x-axis.
3. The standard normal curve is symmetric about 0.
4. Most of the area under the standard normal curve lies between z= -3 and z=3.
Given a normal distributed random variable X with mean μ and standard deviation σ
x−μ b−μ
P(a < X < b) = P( << )
σ σ
x−μ a−μ
P(X < a) = P( < )
σ σ
62
But, = Z standard normal random variable P(Z < )
Note: i) P (a<x<b) = P (a ≤X<b) = P (a<X≤ b) =P (a ≤X≤ b)

ii) P (- ∞ <Z < ∞) = 1
Example 6.9: Find the probabilities that a random variable having the standard normal
distribution will take on a value
a) Less than 1.72; b)Less than -0.88;
Solution: By using the normal table,
a) P ( Z  1.72)  P ( Z  0)  P (0  Z  1.72)  0.5  0.4573  0.9573 .
b) P ( Z  0.88 )  P ( Z  0.88 )  0.5  P(0  Z  0.88)  0.5  0.3106  0.1894 .
Application of the Standard Normal Distribution
Let X  N  ,  2 . Suppose that we want to find the probability P ( a  X  b ) .
Since a, b,  and  are known (given), we standardize a, b and X as:
a X  b
P ( a  X  b )  P     P ( z1  Z  z 2 ), say.
    
Now, we need only to get the readings from the Z- table corresponding to z1 and z2 to get
the required probabilities, as we have done in the preceding example.
Also, we can find the following one-sided probabilities:
 b   a
P( X  b)  P Z    P( Z  z 2 ) , and P ( X  a )  P Z    P ( Z  z1 ) .
     
We have seen that a Z- value measures the distance between a particular value of X and
the mean in units of standard deviation.
2
Example 6.10: If X N   ,  ,
 find the probabilities
a) P (     X     ) ; b) P (   2  X    2 ) ; c) P (   3  X    3 ) .
Solution: As in the case of P ( a  X  b ) , we simply replace a and b.
63
a) P(     X     )  P       Z       
   
 P ( 1  Z  1)  2 P (0  Z  1)  2(0.3413) (See Table A)
 0.6828 or 68.28%.
b) Similarly, P (   2  X    2 )  P (2  Z  2)  2 P(0  Z  2) =2(0.4772) =

0.9544.
c) P (   3  X    3 )  P ( 3  Z  3)  2 P(0  Z  3) = 2(0.4987) = 0.9974,

or 99.74%.
From which we can tell that,
a) About 68.30% lies in the region    &    (1 Standard Dev. on either

side).
b) About 95.50% lies in   2 &   2 (2 Standard Deviations on either side).
c) About 99.7% lies in   3 &   3 (3 Standard Deviations on either side).
Notation: Z denotes the value of Z for which the area to its right is equal to  .

This notation is useful in statistical inference, and note that finding Z is identical with
reading anti-logarithms.
Example 6.11: Find a) Z0.01 ; b) Z0.05
Solution: a) Z0.01 corresponds to an entry of 0.5 - 0.01 = 0.4900.
In Table A, look for the value closest to 0.4900, which is 0.4901, and the Z value for
this is Z= 2.33. Thus, Z0.01  2.33 .
b) Again, Z 0.05 is obtained as 0.5 - 0.05 = 0.4500, which lies exactly between 0.4495 and
0.4505, corresponding to Z = 1.64 and Z= 1.65. Hence, using interpolation, Z0.05  1.645
.
Example 6.12: Suppose that X N (165, 9), where X = the breaking strength of cotton
fabric. A sample is defective if X<162. Find the probability that a
randomly chosen fabric will be defective.
64
Solution: Given that   165 and  2  9 ,
 X   162     162  165 

P( X  162)  P    P Z  
     3 
 P ( Z  1)  0.5  P(1  Z  0) (Since P( Z  0)  0.5 )
 0.5  P(0  Z  1) (By symmetry)
 0.5  0.3413  0.1587 (Table value for Z = 1)
6.5.2 Chi-Square Distribution:- Chi-Square distribution may be derived from normal

distributions, if Xi (i = 1, 2… n) are n independent normal varieties with mean μi and
variance σ (i= 1, 2, … , n) then
n
X i  i
χ2 = 
i 1 i
2 is a chi-square variate with n degrees of freedom. The probability
density function of the χ –distribution is given by
f(χ ) = ⁄ (χ )( )
e , 0<χ <∞ where n is the degree of freedom.
( )
Since the Chi-square distribution arises in many important applications, its values have
been extensively tabulated. Table C at the end of this module contains values of  2 ,n
for  =0.05, 0.025, 0.01, 0.005 and n=1, 2, 3, …, 30, where  2 ,n is such that the area to
its right under the Chi-square curve with n degrees of freedom is equal to  . That is,
 2 ,n is such that if X is a random variable having a Chi-square distribution with n

degrees of freedom, then P( X   2  ,n )   .  is known as the level of significance.
When n is greater than 30, the table cannot be used and probabilities related to Chi-square
distributions are usually approximated with normal distributions.
65
0  2 ,
Properties of Chi-square Distribution
1. The exact shape of the distribution depends upon the number of degrees of freedom n.
In general, when n is small, the shape of the curve is skewed to the right and as n gets
larger, the distribution becomes more and more symmetrical.
2. The mean and variance of the χ distribution are n and 2n respectively.
3. As n → ∞ the χ distribution approaches a normal distribution.
4. The sum of independent χ varieties is also χ variety.
6.5.3 The t-distribution:-Let X1,X2,….Xn be a random sample drawn from a normal
distribution having mean μ and standard deviation σ (unknown but estimated by S,
sample standard deviation).
The statistic t = has t – distribution with (n-1) degree of freedom where X is sample
√
mean and S is standard deviation.In view of its importance, the t distribution has been
tabulated extensively. Table B at the end of this module contains values of t , n 1 , for  =
0.10, 0.05, 0.025, 0.01, 0.005, and n = 1, 2, 3, …, 29 degrees of freedom; where t , n 1 is
such that the area to its right under the curve of the t distribution with (n-1) degrees of
freedom is equal to  .
Notation: tα,(n-1) stands for a value of t with (n-1) degree of freedom the right of which an
area equal to a in reading the tabulated values.


 t 0 t
Student’s t Distribution
Note: 1. The table value does not contain values of t , n 1 for  > 0.50, since the curve
issymmetrical about t=0 (like the normal distribution) we have,
t , n 1 =  t , n 1 .
66
2. When (n-1) =30 or more, probabilities related to the t distribution are usually
approximated with the use of normal distributions.
Example 6.13: For a t-distribution with n=20, find t values leaving an area of
a) 0.05 to the right; c) 0.10 to the left;

b) 0.975 to the right; d) half of  =0.01 on either side.
Solution; referring to Table B with (n-1) =19 df, we have
a) t 0.05 =1.725; c)  t 0.10 = -1.328.
b) t 0.975  2.093; d) t   t 0.005  2.861; &  t 0.005  -2.861

2
Applications of t Distribution: -The t distribution has wide applications in Statistics,

only some are listed below:
a) Test of population Mean ( One-sample t-test)

When we are dealing with a random sample of size n<30, from a normal population,
when  2 is unknown, the t distribution with n-1 degrees of freedom, is used to test
the hypothesis that the population mean  equals a given value (say,  O ), against the
alternatives:    O , or    O , or    O .
Then, we calculate t= , which is to be compared with the table value t  , or t

√ 2
with n-1 degrees of freedom.
Note: The assumptions underlying student’s t-distribution for such tests are:
a) The parent population from which the sample is drawn is normal.

b) The sample observations are independent; that is, the sample is random.
c) The population standard deviation (  ) is unknown.
d) n is small; that is n<30.
Example: 6.14: In 16 one-hour test runs, the gasoline consumption of an engine
averaged 16.4 gallons with a standard deviation of 2.1 gallons. In order to test the claim
that the average gasoline consumption of this engine is 12.0 gallons per hour, calculate
the t value and t , n1 , for  =0.05.
67
Solution: Substituting n=16,  =12.0, X =16.4, and S=2.1 in the formula, we get
16.4  12.0
t= = =8.38; and the table value for n-1=15 is t 0.05,15 =1.753.
√ 2.1 / 16
68
CHAPTER SEVEN: SAMPLING AND SAMPLING DISTRIBUTION OF THE
SAMPLE MEAN
7.1 Basic Concepts

Sampling (elementary) unit:- the ultimate unit to be sampled or elements of the
population to be sampled.
Example 7.1
 If some body studies economic status of the house holds, households is the
sampling unit.
 If one studies performance of freshman students in some college, the student is
the sampling unit.
Sampling error:-A type of error that may arise due to inappropriate sampling techniques
applied .A sampling error is the difference between a sample statistic and its
corresponding parameter. We can make probabilistic statements about this sampling error
only if we have a probability sample.
Non-sampling error:-Errors in observation, interview or measurement error, errors due
to non-response and errors in data processing: editing, coding, etc. The non-sampling
error is likely to increase with increase in sample size.
7.2 Reasons for sampling
Sample survey saves money:-It obviously cheaper to gather information from 100 house
holds rather than from 10,000 house holds.
Sample Survey saves time:-sample survey requires a smaller scale of operations at all
stage and it reduces data collection and processing time.
Sample survey provides higher level of accuracy:-This accuracy can be achieved
through more selective recruiting of interviewers and supervisors, more extensive
training programs, a closer supervision of the personnel involved and a more efficient
monitoring of the field work.
Experimentation could be destructive in nature like testing industrial products such as
testing the average duration of burning of bulbs, and testing the quality of wine, beer, etc
7.3 Sampling Techniques
The commonly used sampling techniques may be broadly classified as: Non Probability
and Probability Sampling.
69
A. Random Sampling or probability sampling.
Probability sampling techniques is a method of sampling in which all elements in the
population have a pre-assigned probability to be included in to the sample.
In this sub-section, four different techniques of taking a random sample are discussed.
a/ Simple random sampling, b/ Stratified random sampling, c/ Cluster sampling, d/
Systematic sampling
a) Simple Random Sampling:- Itis a method of selecting n units out of a finite

population of size N by giving equal probability to all units, or a sampling procedure in
which all possible combinations of n units that may be formed from the finite population
of size N units have the same probability of selection. There are N C n distinct possible
samples in the case of sampling without replacement; the chance of selecting each one of
1
them is . There are N possible samples in the case of sampling with replacement,
N C n
the chance of selecting each one of them is 1/N . Conceptually, simple random sampling
is the simplest and most common of the probability sampling techniques.
Lottery method and computer generated random numbers are used to select a random
sample in simple random sampling:
i) Lottery method: This is a very common method of taking a random sample under this
method; we label each member of the population by identifiable ticket or pieces of
papers.
Tickets must be of identical size, color and shape. They are placed in the container and
well mixed before each drawand then draws may be continued until a sample of the
required size is selected. This shows that selection of items depends entirely on chance.
Example 7.2: If we want to take a sample of 25 persons out of a population of 150, the
procedure is to write the names of all the 150 persons on separate slips of papers, fold
these slips, mix them thoroughly and then make a blindfold selection of 25 slips without
replacement.
ii) Table of random numbers
70
This is an alternative method of selecting a simple random sample. It is constructed
from the digits 0, 1, 2,…, 9. There are several tables available in standard books of
Statistics.
Suppose we want to select a sample of size n, then
- Make a list of population to be sampled;

- Give a distinct code number to each unit of the population;
- Choose the direction of selection randomly;
- Take n units whose code numbers coincide with the random numbers as numbers of
the sample
- By omitting those random numbers which do not exist on the list and repeated
numbers if an element is not appear more than once in a sample.
Table of Random Numbers
Column
Row 1 2 3 4 5 6 7 8
1 57172 42088 70098 17333 26902 29959 43909 49607
2 33883 87680 24923 15659 09839 45817 89405 70743
3 77950 15344 35609 87119 15859 74577 42791 75889
4 11607 26596 16796 24498 17009 67119 60557 49521
5 56149 55678 38169 47228 49931 94303 67448 31286
6 80719 65101 77729 83949 83358 75230 56624 27549
7 93809 19505 82000 79068 45552 86776 48980 56684
8 40950 86216 48161 17646 24164 35513 94057 51834
9 12182 59744 83710 41125 14291 74773 66391 50031
10 13382 48076 73151 48724 35670 38453 63154 58116
11 38629 94576 48859 75654 17152 66516 78796 73099
12 60728 52063 12431 23898 23683 10853 04038 75246
71
13 01881 99056 46747 08846 01331 88163 74462 14551
14 23094 08831 24387 23917 07421 97869 88092 72201
Example 7.3: Suppose that N= 40 and we want to select n=10 without replacement,
starting with the 3rd row and 2nd column by reading vertically using the above random
table, we get
Solution: starting with the 3rd row and 2nd column by reading vertically we will get:
15, 26, 19,08, 24, 35, 16, 38, 12 and 17.
b/ Stratified random sampling
In stratified sampling, the population of N units is sub-divided into k sub-populations,

called strata, so that the units in each stratum are as homogeneous as possible and the
means of the different strata are as different as possible. These sub-populations should
be non-overlapping so that they comprise the whole population such that
N1  N 2    N k  N , where Ni represent the population size in the i th strata. Then a
sample is drawn from each stratum independently, the sample size within the ith
stratum being ni (i  1,2,, k ) such that n1  n2    nk  n .
Remarks: In stratified random sampling, the following two points are equally important
to ensure accuracy.
a) proper stratification of the population into various strata, and
b) a suitable sample size from each stratum.
For example a population can be stratified based on the following variables:
Sex (male, female), Age (under 18, 18 to 28, 29 to 39), Occupation

(professional, other), Geographical classifications, Administrative regions,etc.
c/ Cluster Sampling:
The population is divided in to non-overlapping groups called clusters. A simple random
sample of groups or cluster of elements is chosen and all the sampling units in the
selected clusters will be surveyed in the case of single stage cluster sampling. Clusters are
formed in a way that elements with in a cluster are heterogeneous, i.e. observations in
each cluster should be more or less dissimilar.
72
d/ Systematic Sampling:
i) divide the population into n equal parts, so that there will be k units in each
group i.e k=N/n,
ii) select a unit say, i between 1 and k randomly, i.e.,1  i  k ,
iii) select every kth unit thereafter.
Example 7.4: Suppose that N = 20 and we want to select a sample of size 4, so that k =
N/n =20/4 = 5. The first element in the sample is selected from the first 5 units randomly,
say 3rd, which is the random start. Then, every 5th unit is selected, and the sample
contains the 3rd,8th, 13th and 18th units of the population.
B. Non-Random Sampling or non-probability sampling.
It is a sampling technique in which the choice of individuals for a sample depends on the
basis of convenience, personal choice or interest.
Types of non-random sampling are:
1. Judgment sampling, 2. Convenience sampling, 3. Quota Sampling.
1. Judgment Sampling
In this case, the person taking the sample has direct or indirect control over which items
are selected for the sample.
2. Convenience Sampling
In this method, the decision maker selects a sample from the population in a manner that
is relatively easy and convenient.
3. Quota Sampling
This is a type of judgment sampling and may be the most commonly used one in the non-
probability category. In a quota sample, quotas are set up according to some specified
characteristics such as income groups, age groups, political or religions groups, etc.
Within the quota, the selection of sampling units depends up on personal judgment.
7.4 Sampling Distribution of the sample mean
Consider all possible samples of size n that can be drawn from a given population (either
with or without replacement). For each sample, we can compute a statistic (such as the
mean & the standard deviation) that will vary from sample to sample. In this manner we
obtain a distribution of the statistic that is called its sampling distribution.
Steps for the construction of Sampling Distribution of the mean
73
1. From a finite population of size N , randomly draw all possible samples of size n.
There are N possible samples if sampling is with replacement and there are N Cn
possible samples if sampling is without replacement.
2. Calculate the mean for each sample.
3. Summarize the mean obtained in step 2 in terms of frequency distribution
Sampling with replacement

In this type of sampling an observation has a chance to be selected at each draw.
Suppose that we take the sample with replacement, there are 32 = 9 possible samples.
Sample (1,1) (1,2) (1,6) (2,1) (2,2) (2,6) (6,1) (6,2) (6,6)
Sample mean 1 1.5 3.5 1.5 2 4 3.5 4 6
The sample mean is a random variable & its probability distribution is:
xi 1 1.5 2 3.5 4 6 Total
P( X = xi ) 1/9 2/9 1/9 2/9 2/9 1/9 1
xi P( X = xi ) 1/9 1/3 2/9 7/9 8/9 6/9 3
fi( xi − 3) 4 4.5 1 0.50 2 9 21
i) Mean of sample means E( X )=∑ X p X = xi =1(1/9) +1.5(2/9) + 2(1/9) +3.5(2/9) +
4(2/9) + 6(1/9) =3.

Mean of sample means, E( X ) =population mean.
∑ ( xi ( X ))
ii) Variance of sample means var( X ) = =21/9 = 2.33
Where k is number of sample means

2 14 / 3
 
V X   x2 
n
=
2
= 14/6 = 2.33
2 14 / 3
In which if sampling with replacement, V X   x2    n
=
2
= 14/6 = 2.33.
In each case the expected value of the sample mean equals the population mean. This
explains why the sample mean is a good estimate of the population mean. If we use the
74
sample mean as an estimate of the population mean we will sometimes overestimate it,
and sometimes under-estimate it, but “on average” we will be accurate.
The example above illustrates an important result:
Remark:
∑ xi
1. Mean of sample means= E( X ) = ∑
= ∑ X p X = xi = population mean.
2
 
2. Variance of sample means, V X   x2 
n
( if sampling is with replacement).
2  N n
3) Variance of sample means V ( x )    ,(if sampling is with out replacement).
n  N 1 
 N n
The quantity   is finite population correction (fpc), and if n/N<0.05, fpc is
 N 1 
ignored.
Note: the square root the Variance of sample means is known as standard error.
The distribution of sample means depends on distribution of the population, sample size
and whether population variance is known or unknown. A sample may be from a
normally distributed population or from a non-normally distributed population, from a
population with variance is known or unknown and the sample size may be large or
small.
Case-I: If sampling is from a normally distributed population with known variance:
When sampling is from a normally distributed population with known variance, the
distribution of sample means, X , is normal what ever the sample size.
Example 7.5:The speed of all cars travelling on a street is normally distributed with
mean 68 km/h and variance 9 km/h. Find the probability that the mean speed of a random
sample of 16 cars travelling on the street is more than 70 km/h.
Solution:
Let X be the speed of cars with mean 68 and variance 9.
A sample of size 16 is taken, the sample mean is a random variable ( X ),
2
X  N   ,   X
=  N 68 , 0.56 ,since the population is normally distributed,
 n 

Probability of a sample mean is greater than 70 is

70 68
P( X >70) = p(Z> ) = p(Z>2.67) = 0.0038
0.56
75
Case-II: When sampling from a non normal population and when the sample size
islarge
If sampling is from a non normal population and when the sample size is large the
distribution of X depends on Central Limit Theorem.
The Central Limit Theorem
If X1, X2, …, Xn is a random sample from a population with mean μ and variance  2 ,
then as n goes to infinity the distribution of the sample mean, X , approximates normal
2 2
distribution with mean μ and variance . In short as n gets large number, X  N   ,  .

n  n
 
We can standardize this to get Z  X    N (0, 1) (approximately as n gets large). When

/ n
population variance is unknown Z  X    N (0, 1) (approximately as n gets large).

S/ n
Example 7.6: The mean weight of 500 male students at a certain university is 151
pounds (lb) and the standard deviation is 15 lb. Assuming that the weights are normally
distributed. Suppose that a sample of 64 students is taken, what is the probability that the
weight in the sample is more than 154.75 lb?
Solution
As we have taken a large (n=64) sample we can use the Central Limit Theorem. This says
that the mean weight of the sample can be approximated by a normal random variable
with a mean of 151 and a variance of 225. If we let X be the mean weight of the students,
it is required to find
P( X >154.75) = X  N 151,225 / 64

154.75  151
P( X >154.75) = p( X   > ) = P (Z>2.00) = 0.5 – 0.4772 = 0.0228.
/ n 15 / 8
Example 7.7: Suppose that 150 customers enter a supermarket on a given day. Each
customer spends a random amount. All they knew about the distribution of these
expenditures that its mean is 7.50 birr and its standard deviation is 3.40 birr. What is the
probability that a person, on average, spent more than 8.00 birr during the day?
Solution
We have n = 150 which is large enough to use the Central Limit Theorem. Mean =7.50
and standard deviation = 3.40.
76
Let X be the mean amount of an individual’s expenditure during the day. X N (7.50,
0.077)
Let X the average amount of an individual’s expenditure during the day, it is required to
find P( X >8)
P( X >8.00) = p( X   > 8.00   ) = p(Z > 8.00  7.5 ) = p(Z>1.80) = 0.5 – P (0<Z<1.80)
/ n / n 3.4 / 150
= 0.5 – 0.4641 = 0.0359
This means there is only 0.0359 probabilities that a person will spent larger than 8.00 birr
on average.
Case-III: When sampling is from normally distributed population with unknown
population variance,
a) If the sample size is large, Z  X    N (0, 1) , where S is an estimate of  .

S/ n
b) If the sample size is small (n<30), t  X   t(n-1). t has t-distribution with (n-1) degree
S/ n
of freedom, where S is an estimate of  .
77
CHAPTER EIGHT: STATISTICAL INFERENCES
8.1 Statistical Estimation:- A statistic used to estimate a parameter is called an estimator

and the value taken by the estimator is called anestimate. Statistical estimation is divided
into two main categories: Point Estimation and Interval Estimation.
8.1.1 Point Estimation:- When we use a single value of a statistic to estimate the
corresponding parameter of a population, it is called point estimation.
Examples:
 A sample mean is an estimate for population mean μ. That is, Xis an estimator
for
population mean μ.
 A sample variance is an estimate for population variance. That is, S2 is an
estimator
for population Variance σ .
8.1.2 Interval estimation: We take interval, ranges of values about an estimate in which
the parameter may lie. This procedure is known as Interval estimation. It is the procedure
that results in the interval of values of a parameter. Interval estimates indicate the
precision or accuracy of an estimate and are, therefore, preferable to point estimates. It
deals with identifying the upper and lower limits of a parameter. Confidence interval for
the parameter is:
Estimate ± critical value × Standard error of the estimator
Example 8.1:: Confidence interval for the population mean is:
X± Critical value × Standard error of ( X)
Confidence interval Estimation for population means
The confidence levelis the probability that the value of the parameter falls within the
range specified by the confidence interval surrounding the statistic. There are different
cases to be considered to construct confidence intervals.
Case-I: Population variance (σ2) is known and parent population is normal.

The sampling distribution of the sample mean is normal with mean μ and variance σ ⁄n,
that is, X ~ N(μ, σ /n) . We can standardize this to get Z= ⁄√

~ N (0, 1).
From the standard normal distribution, we have

P −Z ⁄ <Z<Z ⁄ = 1 − α,
78
Where α is risk probability and 1- αconfidence level. The confidence level is the
probability that the value of the parameter falls within the range specified by the
confidence interval surrounding the statistic. σ⁄√n is the standard error of the statistic .
Standard error is the square root of variance where Var ( X) = σ ⁄n.
Using the standardized form of the sampling distribution of the sample mean in the above
probability statement, we get the limits of the confidence interval as follows:
X −μ
P −Z ⁄ < <Z ⁄ =1−α
σ⁄√n
P −Z ⁄ σ⁄√n < X – μ < Z ⁄ σ⁄√n = 1 − α
P −Z ⁄ σ⁄√n − X < −μ < −X + Z ⁄ σ⁄√n = 1 − α

P X −Z ⁄ σ⁄√n < μ < X + Z ⁄ σ⁄√n = 1 − α
The last statement clearly shows that, there is a (1- α) 100% confidence interval for
population mean (μ) to lie in the interval
(X − Z ⁄ σ⁄√n , X + Z ⁄ σ⁄√n)
This interval is known as a (1- α) 100% confidence interval for population mean (μ).
Here are the Z values corresponding to the most commonly used confidence levels.
(1- α) 100% α α⁄2 Z ⁄
90 0.10 0.05 1.645

95 0.05 0.025 1.96
99 0.01 0.005 2.58
Example 8.2: The weights of full boxes of a certain kind of cereal are normally
distributed with a standard deviation of 0.27 ounce. If a sample of 15 randomly selected
boxes produced a mean weight of 9.87 ounce, find:
a) The 95% confidence interval for the true mean weight of boxes of this
cereal,
b) The 99% confidence interval for the true mean weight of boxes of this
cereal,
Solution:
79
a) Given 1    0.95 , so that  / 2  0.025 ,
n  15,   0.27 ounce, x  9.87 ounce . The 95% C.I. is
P (  Z 0.025  Z  Z 0.025 )  0.95 and  Z  / 2   Z 0.025  1.96 ounce
X 
Where Z  .
/ n
 
Substituting these values in x  Z  / 2     x  Z / 2  , the resulting
n n
confidence interval is (9.73, 10.01).
b) Similarly the 99% C.I. is (9.69, 10.05).
Case-II:When sampling from a non-normal population and when the sample size is
large thedistribution
of depends on Central Limit Theorem (with known and unknown population
variance).
Recall the Central Limit Theorem, which applies to the sampling distribution of the mean
of a sample. Consider samples of size n drawn from a population, whose mean is μ and
standard deviation is σ. The population can have any frequency distribution. The
sampling distribution of Xwill have a mean μ and standard deviation √ . The sampling
distribution of X is normal with a mean μ and variance as n gets large .That is X~ N
(μ, ) (as n gets large). We can standardize this to get Z= ⁄√

~ N(0,1)or Z= ⁄√
~ N(0,1)
when σ is unknown.
A (1-α) 100% confidence interval for population mean (μ) is
(X − Z ⁄ σ⁄√n , X + Z ⁄ σ⁄√n) if σ is known and
(X − Z ⁄ S⁄√n , X + Z ⁄ S⁄√n) if σ is unknown.
Example 8.3: An economist wants to estimate the average amount in checking accounts
at banks in given region. A random sample of 100 accounts givesX = $357.60 and S=
$140.00. Give a 95% confidence interval for μ, the average amount in any checking
account at a bank in the given region.
Solution
Given: n = 100,X = $357.60, S= $140.00 &α = 0.05
A 95% confidence interval for population mean (μ) is
80
(X – Z ⁄ S⁄√n , X + Z ⁄ S⁄√n) … since n is large and σ is unknown
=(357.60 − 1.96 140.00⁄√100 , 357.60 + 1.96(140.00⁄√100))

= (330.16, 385.04).
population variance and
when the sample size is small (n<30).
When population variance σ2 is unknown, we estimate it by sample variance. The
standardized distributions of the sample mean, t = ⁄√
is t-distribution with (n-1)
degrees of freedom. From this distribution, (1-α) 100% confidence interval for population
mean is
(X – t ⁄ ( )√ ,X + t ⁄ ( )√ ).
Example 8.4: From a normal sample of size 25 a mean of 32 was found .Given that the
standard deviation is 4.2. Find a 95% confidence interval for the population mean.
Solution:
Given: n = 25 X = 32, S = 4.2, 1-α = 0.95 ⟹ α = 0.05, = 0.025
⟹t , = 2.064 from table.
⟹ The required interval will be (X – t ⁄ ( )√ ,X + t ⁄ ( )√ )

.
= 32 ± 2.064 ×
√
= 32 ± 1.73
= (30.27, 33.73)
8.2 Statistical Hypothesis testing
In section 8.1, we have studied how to make estimations of the mean using point and
interval estimations. The other aspect of statistical inference is known as statistical test
of hypothesis. The branch of statistics which helps us in arriving at the criterion for
deciding about the characteristics of the population, a parameter, based on the
information obtained from the sample data is known as testing of hypothesis.
8.2.1 Hypothesis testing for population mean

8.2.1.1 Some terms in tests of hypothesis
Statistical hypothesis is defined as a statement (or an assertion) about the parameter of a

population or its distribution that may be proved or disproved.
81
Test statistic: is a statistic whose value serves to determine whether to reject or accept
the hypothesisto be tested. It is a random variable.
A given statement concerning a parameter could be true or false. Hence we have two
complementary hypotheses, namely, null hypothesis and alternative hypothesis.
a) Null hypothesis (H0)
It is the hypothesis to be tested for possible rejection under the assumption that it is true
and it is the hypothesis of equality or the hypothesis of no difference.
b) Alternative hypothesis (HA/H1)
It is hypothesis which is the complementary to the null hypothesis. It may be accepted if
Ho is rejected or be rejected if Ho is accepted; It is the hypothesis of difference.
Statistical Test: is a test or procedure used to evaluate a statistical hypothesis for
deciding whether to reject the hypothesis depending on sample data. The decisions we
make are of two types: Either to reject Ho and conclude that HA is accepted or retain Ho
and conclude that we have no enough evidenceto reject Ho.
Types of errors
Statistical test of hypothesis can lead to two kinds of errors. If the statistical test rejects
Ho when it is true, the error is type I error. If the test accepts Ho when it is false, the error
is a type II error.
The following table gives a summary of possible results of any hypothesis testing
procedure:
Decision Ho is true Ho is false
Reject Ho Type I error Correct decision
Accept Ho Correct decision Type II error
Type I error is the error committed in rejecting the null hypothesis when it is true.
Probability of committing type I error is sometimes called level of significance and
denoted by α.
Type II error is the error committed in accepting the null hypothesis when it is false.
Probability of committing type II error is denoted by β.
82
The level of significance 5% (  0.05) implies that in 5 samples out of 100 we are
likely to reject a correct H0. In other words this implies that we are 95% confident that
our decision to reject H0 is correct.
General steps in hypothesis testing on population mean, μ
Step-1 The first step in hypothesis testing is to specify the null hypothesis (H0) and the
alternative hypothesis (H1). Suppose the assumed or hypothesized value of μ is denoted
by μo, then one can formulate two sided and one sided hypothesis as follows:
1. Ho: μ = μo versus H1: μ  μo (two sided test)
2. Ho: μ = μo versus H1: μ < μo (one sided test)
3. Ho: μ = μo versus H1: μ > μo (one sided test)
Step-2: Specify a significance level of α.
Step-3 We should identify the sampling distribution of the estimator and the test statistic.
Case-I: Population variance (σ2) is known and parent population is normal.
The test statistic is Z = ~ N (0, 1).

√
Case-II: When sampling from a non normal population and when the sample size is large
the distribution of X depends on Central Limit Theorem (with known and unknown
variance).
a) The test statistic is: Z = ~ N (0, 1) with known variance

√
b) The test statistic is: Z = ~ N (0, 1) with unknown variance.

√

population variance.
i) When the sample size is large, Z = ~ N (0, 1).

√
ii) When the sample size is small (n<30), t = ~ t(n-1).

√
Step-4. The value of the test statistic can be calculated as follows:
a) Zc = with known variance.

√
b) Zc = with unknown variance& large sample size.

√
c) tc = with unknown variance and small sample size.

√
83
where Xis the sample mean and μ the parameter specified by the null hypothesis.
Step-5: Identify the critical (rejection) region or put the decision rule.
a) For two sided test Ho: μ = μo versus H1: μ  μo , reject Ho if
Zc>Z ⁄ or Zc<−Z ⁄ .
Note:Zc refers to Zcalculated
Graphically, the rejection and acceptance regions are:
Rejection Region Acceptance Region Rejection Region

2 
2
- Z Z
2 2
b) For one sided test (right sided test) Ho: μ = μo versus H1: μ > μoreject Ho if
Zcalculated>Z . Graphically, the rejection and acceptance regions are
Acceptance Region Rejection Region (  )
Z
c) For one sided test (left sided test) Ho: μ = μo versus H1: μ < μoreject Ho if
Zcalculated<−Z . Graphically, the rejection and acceptance regions are
Rejection Region Acceptance Region

 84
 Z
Step 6: Summarization the result and put the conclusion
Decision Table
To test H 0 :   0 against the three alternatives, the rules are summarized as:
Alternative Accept H0 if Reject H0 if Inconclusive if

Hypothesis
  0  Z / 2  Z C  Z  / 2 Z C  Z  / 2 or Z C  Z  / 2 Z C  Z / 2
orZ C   Z  / 2
  0 Z C  Z Z C  Z Z C  Z
  0 Z C  Z Z C  Z Z C  Z
Example 8.5: Test at   0.05 whether the mean of a random sample of size n = 16 is
"significantly less than 10" if the distribution from which the sample was taken is
normal, x  8.4 and   3.2 (known).
Solution:
* H 0 :   10 versus H A :   10 ,   0.05
* Z  Z 0.05  1.645 (critical value)
x  0 8.4  10
* ZC    2 (calculated value)
/ n 3.2 / 4
* Since Z c  2   Z  1.645 , the null hypothesis is rejected. That is, the population
mean 8.4 is significantly less than 10 at 5% level of significance.
Example 8.6: Based upon a random sample of size 100 with an average of 3.4 minutes
and a standard deviation of 2.8 minutes, is the claim that the average telephone call is 4
minutes true with a confidence of 95%?
85
Solution: Given: n  100, x  3.4 min, s  2.8 min,   0.05
H 0 :  4
To test:
H A :  4
Since  is unknown this should be a t-distribution; however, since n  100 is large the
z-satistic is used.
X  0 3.4  4
Zc    2.14
S/ n 2.8 / 10
From the standard normal table we have,  Z  / 2   Z 0.025  1.96
Since the calculated value is less than the tabulated value (-2.14<-1.96), the null
hypothesis will be rejected. Therefore average telephone call is significantly different
from 4 minutes at   0.05.
Example 8.7: A sample of 16 students gave an average mark of 53.8 with a standard
deviation of 5.2. Can we conclude that the population mean of marks is 50 at   0.05
?
Solution: H 0 :   50 H A :   50
  0.05 and hence  / 2  0.025
t / 2 ,n 1  t 0.025 , 15  2.131 .
x  0 53.8  50 3. 8
tC     2.92.
s/ n 5.2 / 16 1 .3
Since tc  2.92  2.131, H 0 is rejected. Therefore the population mean mark is
significantly different from 50 at   0.05.
CHAPTER NINE: SIMPLE LINEAR REGRESSION AND CORRELATION
9.1 Simple Linear Regression of Y on X

Under simple linear regression of Y on X, we have one independent variable which is
influential usually denoted by X and one dependent variable influenced by the
independent variable which we denote it by Y. For example in real world variables that
86
may be related linearly are, production/yield ( Y ) and amount of rainfall(X ), monthly
income (Y ) and level of education (X), …
A simple linear regression model is given as
Y=α+βX+∈
Where α is intercept of the regression line. It gives the value of Y whenever X is zero. If
the range of X does not include zero, α has no practical interpretation. β is the slope. It is
a measure of the rate of change. It shows by how much Y changes for every unit change
in X.
The constants, α and β are parameters and are commonly referred to as regression
coefficients.
- ∈ is a random error term. It is neither observable nor measurable. In real life problems,
even though two variables are linearly related, their relationship is not fixed as
Y=α+βX
The estimated (fitted) regression line is given byY = α + βXi
To estimate this model we take a sample of n independent observations which give rise to
n pairs (Xi, Yi) and find best estimates of the parameters or best fitted line using least
square method of estimation. A best fitting line is one for which the sum of squares of the
errors, ∑ ε is minimum.
In the principle of least square method, one would select α and β such that
∑ ε = ∑(Y − Y ) is minimum where Y = α + βXi
To minimize this function, first we take the partial derivatives of ∑ ε with respect to
α and β respectively then
n
 n  n 
n  xi y i    xi   y i 
 xy  nx y
 i 1  i 1  i 1    x  x  y  y 
β 2 2 2 andα = Y -βX
 x  nx 2  n   x  x 2
n  xi    xi 
 i 1 
These estimates are denoted by α and β.The estimated (fitted) regression line isgiven by:
Y = α + βXi
Before estimating the regression coefficients, it would be wise to plot the observed data
on a graph known as a scatter diagram. Scatter diagram is a plot of all ordered pairs
(xi,yi )on the co-ordinate plane which helps to observe relationship between two
variables. This diagram gives a preliminary idea on the type of relationship the two
variables have.
87
Regression analysis is useful in predicting the value of one variable from the given value
of another variable, Y = α + βXi.
Example 9.1: For the following example [the number of hours (X) a student spent
studying and the marks (Y) each student received in an examination]:
Assuming simple linear relationship between X and Y,
a/ Draw the scatter diagram;
b/ Find the estimated regression equation of Y on X;

c/ Give the predicted value of Y for X= 12
Solution: a) The scatter diagram is as follows:
Student 1 2 3 4 5 6 7 8 9 10 Total
x 8 5 11 13 10 6 18 15 2 9 97
y 65 44 79 72 70 54 90 85 33 56 648
x2 64 25 121 169 100 36 324 225 4 81 1149
xy 520 220 869 936 700 324 1620 1275 66 504 7034
y2 4225 1936 6241 5184 4900 2916 8100 7225 1089 3136 44952
88
Scatte r diagram for num be r of hours s tudied (X) and m ark s obtaine d (Y)
by 10 students
100
90
80
Marks obtained 70
60
50 y
40
30
20
10
0
0 5 10 15 20
hours s pe nt
b) And the necessary statistics are computed below:
β
 xy  nx y 
7034  (10)(9.7)(64.8) 748.4
  3.596 and
 x  nx 2 2
1149  (10)(9.7)2 208.1
α = 64.8-3.596(9.7) =29.92.
Hence, the equation is ŷ = 29.92 + 3.596x.
c) When X = 12, yˆ12  29 .92  3 .596 (12 )  73 .1 .
9.2 Simple Linear Correlation Analysis

Given the paired data (x1,y1), (x2,y2), . . ., (xn,yn) we may want to describe the type &
strength of relationship between the independent variable X and the dependent variable
Y. We can give these two by applying an index called simple correlation coefficient.The
population correlation coefficient is represented by ρ and its estimator by r. The
correlation coefficient r is also called Pearson’s correlation coefficient since it was
developed by Karl Pearson. The computational formula is:
r
 ( x  x )( y  y ) Alternatively: The correlation coefficient is given by
2 2
 (x  x)  ( y  y)
r
 xy  nx y
 x  nx  y
2 2 2
 ny 2 
The correlation coefficient, r is always lies between –1 and +1, inclusive.
• r = -1 implies perfect negative linear relationship between the two variables.
• r = +1 implies perfect positive linear relationship between the two variables.
89
• r = 0 implies there is no linear relationship between the two variables. But the two
variables may have non-linear relationship between them.
• r approaches +1 indicates strong positive linear relationship between the two variables.
• r approaches -1 indicates strong negative linear relationship between the two variables.
• r approaches 0 indicates weak linear relationship between the two variables .
Example 9.2: The research director of the Saving and Loan Bank collected 25
observation of montage interest rates X and number of house sales Y at each interest rate.
The director computed that,
∑ x = 125, ∑ y = 100, ∑ x y = 520 , ∑ x = 650 , ∑ y = 436
Compute and interpret (i) Coefficient of correlation.
(ii) The coefficient of determination.
Solution: i) Coefficient of correlation.
r
 xy  nx y 
520  (25)(5)(4)
=
 x  nx  y
2 2 2
 ny 2  650  25(5)(5) 436  (25)(4)(4) 
0.667
The two variables have positive linear relationship.
ii) Coefficient of determination, r2= (0.667)2 =0.44 this shows that 44% of the variation in
the number of house sales is due to the variation in the interest rate.
9.3 Coefficient of Determination (r2)
 The square of the correlation coefficient, r2, is called the coefficient of

determination. It gives the percentage of Variation in Y due to X rather than
other variables (factors).
1− r2gives the percentage of variation in Y due to other variables or factors rather than
X.
Example 9.3: If r = 0.9, then r2 = 0.81 and 1- r2 =0.19. Approximately 81% of the
variation in Y is due to X rather than other variables (factors).The remaining, 1-r2, 19 %
of the variation in Y is due to other variables or factors rather than X.
9.4 Spearman’s Rank Correlation Coefficient
The simple correlation coefficient (r) cannot be used when we are dealing with a
qualitative data such as judgment about beauty, efficiency, honesty, etc. In such cases,
90
the rank correlation coefficient is used to explain the correlation or if there is an
agreement in ranking. It is denoted by rs and is defined as follows:
Definition: The coefficient of rank correlation, rs ,given by Spearman for n pairs, is
6 d 2
rs  1  , where d is the difference between the rank of x and the corresponding
n( n 2  1)
y.
To calculate rs , we first rank the xs among themselves from least to best or from best to
least; then we rank the y' s in the same way, find the sum of the squares of the
differences, d, between the ranks of the x's and the y’s. When there are ties in rank, we
assign to each of the tied observations (having equal value) the mean of their ranks.
Example 9.4: Assume that ten girls in a beauty contest for Miss Debre Markos were
ranked by two judges as follows:
Girl Number 1 2 3 4 5 6 7 8 9 10
Judge A 4 8 6 7 1 3 2 5 10 9
Judge B 3 9 6 5 1 2 4 7 8 10
Calculate rs and interpret it.
Solution: Since the ranks are given, we need to find only the difference in ranks for
each girl and the square of these differences.
Girl Number 1 2 3 4 5 6 7 8 9 10 Total
D 1 -1 0 2 0 1 -2 -2 2 -1 0
d2 1 1 0 4 0 1 4 4 4 1 20
91
2 6(20)
For these n = 10 pairs, d  20 , and rs = 1 
10(100  1)
 0.88 , which is positive
and close to 1, showing that there is a very good agreement (or concordance)
between the two judges regarding the beauty of the girls.
N.B:  d  0 provides a check in calculations.

Like the values of r, the values of rs also lie between -1 and +1, inclusive, and the
interpretations of its size and sign are analogous to those of r. rs  1  Perfect
positive agreement, rs =-1complete disagreement where the two rankings go

completely in opposite direction.
92
Appendix: Table A. Approximate values of the standard normal distribution
function (i.e. area between z=0 and Z=z OR area between Z= 0 and Z≤z):
z 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09
0.0 0.0000 0.0040 0.0080 0.0120 0.0160 0.0190 0.0239 0.0279 0.0319 0.0359
0.1 0.0398 0.0438 0.0478 0.0517 0.0557 0.0596 0.0636 0.0675 0.0714 0.0753
0.2 0.0793 0.0832 0.0871 0.0910 0.0948 0.0987 0.1026 0.1064 0.1103 0.1141
0.3 0.1179 0.1217 0.1255 0.1293 0.1331 0.1368 0.1406 0.1443 0.1480 0.1517
0.4 0.1554 0.1591 0.1628 0.1664 0.1700 0.1736 0.1772 0.1808 0.1844 0.1879
0.5 0.1915 0.1950 0.1985 0.2019 0.2054 0.2088 0.2123 0.2157 0.2190 0.2224
0.6 0.2257 0.2291 0.2324 0.2357 0.2389 0.2422 0.2454 0.2486 0.2157 0.2549
0.7 0.2580 0.2611 0.2642 0.2673 0.2704 0.2734 0.2764 0.2794 0.2823 0.2852
0.8 0.2881 0.2910 0.2939 0.2969 0.2995 0.3023 0.3051 0.3078 0.3106 0.3133
0.9 0.3159 0.3186 0.3212 0.3238 0.3264 0.3289 0.3315 0.3340 0.3365 0.3389
1.0 0.3413 0.3438 0.3461 0.3485 0.3508 0.3513 0.3554 0.3577 0.3529 0.3621
1.1 0.3643 0.3665 0.3686 0.3708 0.3729 0.3749 0.3770 0.3790 0.3810 0.3830
1.2 0.3849 0.3869 0.3888 0.3907 0.3925 0.3944 0.3962 0.3980 0.3997 0.4015
1.3 0.4032 0.4049 0.4066 0.4082 0.4099 0.4115 0.4131 0.4147 0.4162 0.4177
1.4 0.4192 0.4207 0.4222 0.4236 0.4215 0.4265 0.4279 0.4292 0.4306 0.4319
1.5 0.4332 0.4345 0.4357 0.4370 0.4382 0.4394 0.4406 0.4418 0.4492 0.4441
1.6 0.4452 0.4463 0.4474 0.4484 0.4495 0.4505 0.4515 0.4525 0.4535 0.4545
1.7 0.4554 0.4564 0.4573 0.4582 0.4591 0.4599 0.4608 0.4616 0.4625 0.4633
1.8 0.4641 0.4649 0.4656 0.4664 0.4671 0.4678 0.4686 0.4693 0.4699 0.4706
1.9 0.4713 0.4719 0.4726 0.4732 0.4738 0.4744 0.4750 0.4756 0.4761 0.4767
2.0 0.4772 0.4778 0.4783 0.4788 0.4793 0.4798 0.4803 0.4808 0.4812 0.4817
2.1 0.4821 0.4826 0.4830 0.4834 0.4838 0.4842 0.4846 0.4850 0.4854 0.4857
2.2 0.4861 0.4864 0.4868 0.4871 0.4875 0.4878 0.4881 0.4884 0.4887 0.4890
2.3 0.4893 0.4896 0.4898 0.4901 0.4904 0.4906 0.4909 0.4911 0.4913 0.4916
2.4 0.4918 0.4920 0.4922 0.4925 0.4927 0.4929 0.4931 0.4932 0.4934 0.4936
2.5 0.4938 0.4940 0.4941 0.4943 0.4945 0.4946 0.4948 0.4949 0.4951 0.4952
2.6 0.4953 0.4955 0.4956 0.4957 0.4959 0.4960 0.4961 0.4962 0.4963 0.4964
2.7 0.4965 0.4966 0.4967 0.4968 0.4969 0.4970 0.4971 0.4972 0.4973 0.4974
2.8 0.4974 0.4975 0.4976 0.4977 0.4977 0.4978 0.4979 0.4979 0.4980 0.4981
2.9 0.4981 0.4982 0.4982 0.4983 0.4984 0.4984 0.4985 0.4985 0.4986 0.4986
3.0 0.4987 0.4987 0.4987 0.4988 0.4988 0.4989 0.4989 0.4989 0.4990 0.4990
3.1 0.4990 0.4991 0.4991 0.4991 0.4992 0.4992 0.4992 0.4992 0.4993 0.4993
3.2 0.4993 0.4993 0.4994 0.4994 0.4994 0.4994 0.4994 0.4995 0.4995 0.4995
3.3 0.4995 0.4995 0.4995 0.4996 0.4996 0.4996 0.4996 0.4996 0.4996 0.4997
3.4 0.4997 0.4997 0.4997 0.4997 0.4997 0.4997 0.4997 0.4997 0.4997 0.4998
93
Table B. t-table with right tail probabilities
t α= 0.1 0.05 0.025 0.01 0.005 0.0025 0.001 0.000

df = 1 3.078 6.314 12.706 31.821 63.656 127.321 318.289 636.57
2 1.886 2.920 4.303 6.965 9.925 14.089 22.328 31.60
3 1.638 2.353 3.182 4.541 5.841 7.453 10.214 12.92
4 1.533 2.132 2.776 3.747 4.604 5.598 7.173 8.61
5 1.476 2.015 2.571 3.365 4.032 4.773 5.894 6.86
6 1.440 1.943 2.447 3.143 3.707 4.317 5.208 5.95
7 1.415 1.895 2.365 2.998 3.499 4.029 4.785 5.40
8 1.397 1.860 2.306 2.896 3.355 3.833 4.501 5.04
9 1.383 1.833 2.262 2.821 3.250 3.690 4.297 4.78
10 1.372 1.812 2.228 2.764 3.169 3.581 4.144 4.58
11 1.363 1.796 2.201 2.718 3.106 3.497 4.025 4.43
12 1.356 1.782 2.179 2.681 3.055 3.428 3.930 4.31
13 1.350 1.771 2.160 2.650 3.012 3.372 3.852 4.22
14 1.345 1.761 2.145 2.624 2.977 3.326 3.787 4.14
15 1.341 1.753 2.131 2.602 2.947 3.286 3.733 4.07
16 1.337 1.746 2.120 2.583 2.921 3.252 3.686 4.01
17 1.333 1.740 2.110 2.567 2.898 3.222 3.646 3.96
18 1.330 1.734 2.101 2.552 2.878 3.197 3.610 3.92
19 1.328 1.729 2.093 2.539 2.861 3.174 3.579 3.88
20 1.325 1.725 2.086 2.528 2.845 3.153 3.552 3.85
21 1.323 1.721 2.080 2.518 2.831 3.135 3.527 3.81
22 1.321 1.717 2.074 2.508 2.819 3.119 3.505 3.79
23 1.319 1.714 2.069 2.500 2.807 3.104 3.485 3.76
24 1.318 1.711 2.064 2.492 2.797 3.091 3.467 3.74
25 1.316 1.708 2.060 2.485 2.787 3.078 3.450 3.72
26 1.315 1.706 2.056 2.479 2.779 3.067 3.435 3.70
27 1.314 1.703 2.052 2.473 2.771 3.057 3.421 3.68
28 1.313 1.701 2.048 2.467 2.763 3.047 3.408 3.67
29 1.311 1.699 2.045 2.462 2.756 3.038 3.396 3.66
30 1.310 1.697 2.042 2.457 2.750 3.030 3.385 3.64
40 1.303 1.684 2.021 2.423 2.704 2.971 3.307 3.55
50 1.299 1.676 2.009 2.403 2.678 2.937 3.261 3.49
60 1.296 1.671 2.000 2.390 2.660 2.915 3.232 3.46
1.282 1.645 94
1.960 2.326 2.576 2.807 3.090 3.29
Infinity
Table C. Right tail areas for the Chi-square Distribution
df\area 0.995 0.99 0.975 0.95 0.9 0.25 0.1 0.05 0.025 0.01 0.00
1 0.000 0.000 0.001 0.004 0.016 1.323 2.706 3.841 5.024 6.635 7.87
2 0.010 0.020 0.051 0.103 0.211 2.773 4.605 5.991 7.378 9.210 10.5
3 0.072 0.115 0.216 0.352 0.584 4.108 6.251 7.815 9.348 11.345 12.8
4 0.207 0.297 0.484 0.711 1.064 5.385 7.779 9.488 11.143 13.277 14.8
5 0.412 0.554 0.831 1.145 1.610 6.626 9.236 11.071 12.833 15.086 16.7
6 0.676 0.872 1.237 1.635 2.204 7.841 10.645 12.592 14.449 16.812 18.5
7 0.989 1.239 1.690 2.167 2.833 9.037 12.017 14.067 16.013 18.475 20.2
8 1.344 1.647 2.180 2.733 3.490 10.219 13.362 15.507 17.535 20.090 21.9
9 1.735 2.088 2.700 3.325 4.168 11.389 14.684 16.919 19.023 21.666 23.5
10 2.156 2.558 3.247 3.940 4.865 12.549 15.987 18.307 20.483 23.209 25.1
11 2.603 3.053 3.816 4.575 5.578 13.701 17.275 19.675 21.920 24.725 26.7
12 3.074 3.571 4.404 5.226 6.304 14.845 18.549 21.026 23.337 26.217 28.3
13 3.565 4.107 5.009 5.892 7.042 15.984 19.812 22.362 24.736 27.688 29.8
14 4.075 4.660 5.629 6.571 7.790 17.117 21.064 23.685 26.119 29.141 31.3
15 4.601 5.229 6.262 7.261 8.547 18.245 22.307 24.996 27.488 30.578 32.8
16 5.142 5.812 6.908 7.962 9.312 19.369 23.542 26.296 28.845 32.000 34.2
17 5.697 6.408 7.564 8.672 10.085 20.489 24.769 27.587 30.191 33.409 35.7
18 6.265 7.015 8.231 9.390 10.865 21.605 25.989 28.869 31.526 34.805 37.1
19 6.844 7.633 8.907 10.117 11.651 22.718 27.204 30.144 32.852 36.191 38.5
20 7.434 8.260 9.591 10.851 12.443 23.828 28.412 31.410 34.170 37.566 39.9
21 8.034 8.897 10.283 11.591 13.240 24.935 29.615 32.671 35.479 38.932 41.4
22 8.643 9.542 10.982 12.338 14.041 26.039 30.813 33.924 36.781 40.289 42.7
23 9.260 10.196 11.689 13.091 14.848 27.141 32.007 35.172 38.076 41.638 44.1
24 9.886 10.856 12.401 13.848 15.659 28.241 33.196 36.415 39.364 42.980 45.5
25 10.520 11.524 13.120 14.611 16.473 29.339 34.382 37.652 40.646 44.314 46.9
26 11.160 12.198 13.844 15.379 17.292 30.435 35.563 38.885 41.923 45.642 48.2
27 11.808 12.879 14.573 16.151 18.114 31.528 36.741 40.113 43.195 46.963 49.6
28 12.461 13.565 15.308 16.928 18.939 32.620 37.916 41.337 44.461 48.278 50.9
29 13.121 14.256 16.047 17.708 19.768 33.711 39.087 42.557 45.722 49.588 52.3
30 13.787 14.953 16.791 18.493 20.599 34.800 40.256 43.773 46.979 50.892 53.6
95

Statstics Full Handout

Uploaded by

Copyright:

Available Formats

Statstics Full Handout

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Statstics Full Handout

Uploaded by

Copyright:

Available Formats

INTRODUCTION TO PROBABILITY AND STATISTICS

CHAPTERONE: INTRODUCTION TO STATISTICS

1.2 Stages of Statistical Investigation

2.1 Methods of Data Collection

Data:-is a measurement or observation value recorded for a certain element or variable. it

The three types are:-

I. Categorical FD, II. Ungrouped FD andIII. Grouped FD

CLASS TALLY FREQUANCY PERCENRT

Constructing ungrouped frequency distribution:

Example 2.2: The following data represent the mark of 20 students.

 Relative cumulative frequency (RCf): The class cumulative frequency divided by

RCfi = Cfi/n= Cfi/∑fi

STEPS IN CONSTRUCTING A GFD

Class Frequency Percent Degree

18 – 23 17.5 – 23.5 20.5 ///// // 7 11 16 7/20=0.35 0.8

Men Women Girls Boys

Number of 200 400 450 600

Male 170 350 250 200

Female 30 50 200 400

Draw a simple bar chart of the number of students by department.

600 400 450

I. Component Bar chart Depr tm ent

C) Pictograph:-In this diagram, we represent data by means of some picture

Class limits 15-24 25-34 35-44 45-54 55-64 65-74 75-84

15 - 24 3 19.5 14.5 - 24.5 0.06 6% 3 50

25 – 34 4 29.5 24.5 - 34.5 0.08 8% 7 47

35 - 44 10 39.5 34.5 - 44.5 0.20 20% 17 43

45 - 54 15 49.5 44.5 - 54.5 0.30 30% 32 33

55 - 64 12 59.5 54.5 - 64.5 0.24 24% 44 18

65 - 74 4 69.5 64.5 - 74.5 0.08 8% 48 6

75 - 84 2 79.5 74.5 - 84.5 0.04 4% 50 2

Total 50 1.00 100%

Ogive (cumulative frequency polygon)

An Ogive (pronounced as “oh-jive”) is a line that depicts cumulative frequencies, just as

The More than Ogive

3.1 The Summation Notation (S)

A good average should be:

1. Rigidly defined (unique), 2. Based on all observation under investigation, 3. Easily

4. Simple to compute, 5. Suitable for further mathematical treatment, 6. Little affected

3.3 Types of Measures of Central Tendency

Where N stands for the total number of observations in the population.

Example 3.2: Consider the samples given below:

Arithmetic mean for discrete data arranged in frequency distribution

When the numbers x , x , … , x occur with frequencies f , f , … , f , respectively, then

In this case we have:

The mean numbers of students in ten classes is 50.

Arithmetic Mean for Grouped Continuous Frequency Distribution

is the frequency of the ith class and k is the number of classes

Calculate the mean

Substituting these values with ∑ f = 100, we get

Properties of the Arithmetic Mean

 If the mean of x , x , … , x is x , then

Average grade of the student is approximately 2.69.

The combined mean of the entire students will be 74.21.

3.3.2 Geometric Mean

G.M  n x1 .x2  xn = antilog ( ∑ logx )

Example 3.8: Find the G. M ofb) 2, 4 and 8

Solution: GM= √2x4x8 = √64 = 4

3.3.3 Harmonic Mean