Statstics Full Handout
Statstics Full Handout
Statstics Full Handout
In the plural sense: - statistics is defined as the collection of numerical facts or figures
(or the raw data themselves).
Eg. 1. Vital statistics (numerical data on marriage, births, deaths, etc).
2. The average mark of statistics course for students is 70% would be considered
as a statistics whereas Abebe has got 90% in statistics course is not statistics.
Remark: statistics are aggregate of facts. Single and isolated figures are not statistics as
they cannot be compared and are unrelated.
In its singular sense:- the word Statistics is the subject that deals with the methods of
collecting, organizing, presenting, analyzing and interpreting statistical data.
Classification of Statistics
Statistics is broadly divided into two categories based on how the collected data areused.
Descriptive Statistics:-deals with describing the data collected without going further
conclusion.
Example1.1:Suppose that the mark of 6 students in Statistics course for computer
science students is given as 40, 45, 50, 60, 70 and 80. The average mark of the 6 students
is 57.5 and it is considered as descriptive statistics.
Inferential Statistics:- It deals with making inferences and/or conclusions about a
population based on data obtained from a sample of observations. It consists of
performing hypothesis testing, determining relationships among variables and making
predictions.
Example1.2:In the above example, if we say that the average mark in Statistics course
for science is 57.5, then we talk about inferential statistics (draw conclusion based on the
sample observation).
1
Collection of data: This is the process of obtaining measurements or countsor obtaining
raw data.
Data can be collected in a variety of ways; one of the most common methods isthrough
the use of sample or census survey.
Organization of data: -Data collected from published sources are generally in organized
form. However if an investigator has collected data through a survey, it is necessary to
edit these data in order to correct any apparent inconsistencies, ambiguities, and
recording errors.
This phase also includes correcting the data for errors, grouping data into classes and
tabulating.
Presentation of data:-After the data have been collected and organized they can be
presented in the form of tables, charts, diagrams and graphs. This presentation in an
orderly manner facilitates the understanding as well as analysis of data.
Analysis of data: - the basic purpose of data analysis is to dig out useful information for
decision making. This analysis may simply be a critical observation of data to draw some
meaningful conclusions about it or it may involve highly complex and sophisticated
mathematical techniques.
Interpretation of data: - Interpretation means drawing conclusions from the data
collected and analyzed. Correct interpretation will lead to a valid conclusion of the study
& thus can aid in decision making.
1.3 Definition of some statistical terms
Population: - It is the totality of objects under study. The populationrepresents the target
of an investigation, and the objective of the investigation is todraw conclusions about the
population hence we sometimes call it target population. The word population doesn’t
necessarily refer to people.
Examples:- All clients of Telephone Company, Population of families, etc.
The population could be finite or infinite (an imaginary collection of units).
Sample: - is part or subset of population under study.
Sampling frame: - is the list of all possible units of the population that the sample can be
drawn from it.
Eg. List of all students of AASTU, List of all residential houses in A.A city, etc
2
Survey: - is an investigation of a certain population to assess its characteristics. It may be
census or sample.
Census survey: a complete enumeration of the population under study.
Sample survey: the process of collecting data covering a representative part or portion of
a population.
Parameter: -is a statistical measure of a population, or summary value calculated from a
population. Examples: Average, Range, proportion, variance, etc
Statistic: - is a descriptive measure of a sample, or it is a summary value calculated from
a sample.
Sampling: - The process or method of sample selection from the population.
Sample size: - The number of elements or observation to be included in thesample.
An element: -is a member of sample or population. It is specific subject or object (for
example a person, firm, item, etc.) about which the information is collected.
Variable: - It is an item of interest that can take numericalor non-numerical values for
different elements. It may be qualitative or quantitative.Example: age, weight, sex,
marital status, etc.
Observation (measurement):-is the value of a variable for an element.
Qualitative variables:- are variables that assume non-numerical values. They can be
categorized and they are usually called attributes. Example: - Sex, marital status, ID
number, etc.
Quantitative variables: - are variables which assume numerical values.eg. Age, weight,
etc.
1.4 Applications, uses and limitations of Statistics
Statistics can be applied in any field of study which seeks quantitative evidence. For
instance, Engineering, Economics, Natural Science, etc.
Engineering: Statistics have wide application in engineering.
To compare the breaking strength of two types of materials
To determine the probability of reliability of a product.
To control the quality of products in a given production process.
To compare the improvement of yield due to certain additives such as fertilizer,
herbicides, e t c.
Function/Uses of Statistics
The following are some uses of statistics:
3
• It condenses and summarizes a mass of data: the original set ofdata (raw data) is
normally voluminous and disorganized unless it issummarized and expressed in few
presentable, understandable & precise figures.
• Statistics facilitates comparison of data: measures obtained from different set of data
can be compared to draw conclusion about those sets.Statistical values such as averages,
percentages, ratios, rates, coefficients, etc, are the tools that can be used for the purpose
of comparing sets of data.
• Statistics helps to predict future trends: statistics is very useful for analyzing the past
and present data and forecasting future events.
• Statistics helps to formulate & review policies
• Formulating and testing hypothesis: Statistical methods are extremely useful in
formulating and testing hypothesis and to develop new theories.
Limitations of Statistics
Some of these limitations are:
a) It does not deal with individual values: as discussed earlier, statistics deals with
aggregate of facts. For example, wage earned by an individual worker at any one time,
taken by itself is not a statistics.
b) It does not deal with qualitative characteristics directly: statistics is not applicable
to qualitative characteristics such as beauty, honesty, poverty, standard of living and so
on since these cannot be expressed in quantitative terms.
c) Statistical conclusions are not universally true: since statistics is not anexact
science, as is the case with natural sciences, the statistical conclusionsare true only under
certain assumptions.
d) It can be misused: statistics cannot be used to full advantage in the absence of proper
understanding of the subject matter.
1.5Levels of Measurement
Proper knowledge about the nature and type of data to be dealt with is essential in order
to specify and apply the proper statistical method for their analysis and inferences.
Scale Types
Measurement is the assignment of values to objects or events in a systematic fashion.
Four levels of measurement scales are commonly distinguished: nominal, ordinal,
interval, and ratio. The first two are qualitative while the last two are quantitative.
4
Nominal scale: The values of a nominal attributeare just different names, i.e., nominal
attributes provide only enough information todistinguish one object from another.
Qualities with no ranking or ordering; nonumerical or quantitative value. These types of
data are consists of names, labels and categories.
Example 1.3:Eye color: brown, black, etc, sex: male, female.
In this scale, one is different from the other
Arithmetic operations(+, -, *, ÷) are not applicable, comparison (<, >,≠, etc)is
impossible
Ordinal scale: - defined as nominal data that can be ordered or ranked.
Can be arranged in some order, but the differences between the data valuesare
meaningless.
Data consisting of an ordering of ranking of measurements are said to be onan
ordinal scale of measurements. That is, the values of an ordinal scaleprovide
enough information to order objects.
One is different from and greater /better/ less than the other
Arithmetic operations (+, -, *, ÷)are impossible, comparison (<, >, ≠, etc) is
possible.
Example 1.4 -Letter grading (A, B, C, D, F), -Rating scales (excellent, very good, good,
fair, poor), military status (general, colonel, lieutenant, etc).
Interval Level: data are defined as ordinal data and the differences between data values
are meaningful. However, there is no true zero, or starting point, and the ratio ofdata
values are meaningless. Note: Celsius & Fahrenheit temperature readings haveno
meaningful zero and ratios are meaningless.
In this measurement scale:-
One is different, better/greater and by a certain amount of difference thananother.
Possible to add and subtract. For example; 800c – 500c = 300c, 700c – 400c
=300c.
Multiplication and division are not possible. For example; 600c = 3(200c). Butthis
does not imply that an object which is 600c is three times as hot as an objectwhich
is 200c.
Most common examples are: temperature, IQ.
5
Ratio scale: Similar to interval, except there is a true zero (absolute absence), or starting
point, and theratios of data values have meaning.
Arithmetic operations (+, -, *, ÷) are applicable. For ratio variables, both
differences and ratios are meaningful.
One is different/larger /taller/ better/ less by a certain amount of differenceand so
much times than the other.
This measurement scale provides better information than interval scale
ofmeasurement.
Example1.5:weight, age, number of students.
6
CHAPTER TWO: METHODS OF DATA COLLECTION AND PRESENTATION
Sources of data
The statistical data may be classified under two categories depending up on the sources.
Primary data: - Data collected by the investigator himself for the purpose of a specific
inquiry or study. Three of the most common methods of collecting Primary data are:
Telephone survey
Mailed questionnaire
Personal interview.
Secondary data: - When an investigator uses data, which have already been collected by
others, such data are called secondary data. . Example of secondary data: books, reports,
magazines, etc.
2.2 Methods of Data Presentation
The presentation of data is broadly classified in to the following two categories:
Frequency distribution /Tabular presentation
Diagrammatic and Graphic presentation.
2.2.1 Frequency distribution
Frequency:- is the number of times a certain value or class of values occurs.
Frequency distribution (FD):- is the organization of raw data in table form using classes
and frequency.
There are three types of FD and there are specific procedures for constructing each type.
II. Ungrouped Frequency Distribution (UFD):- Is a table of all potential raw score
values each times each actually could possibly occur in the data along with the number of
times each actually could occur. It is often constructed for small set of data or data of
discrete variable.
First find the smallest and largest raw score in the collected data.
Arrange the data in order of magnitude and count the frequency.
To facilitate counting one may include a column of tallies.
80 76 90 85 80 70 60 62 70 85 65 60 63 74 75 76 70 70 80 85
Construct a frequency distribution, which is ungrouped.
Solution:
Make a table as shown, Tally the data, Compute the frequency.
Mark 60 62 63 65 70 74 75 76 80 85 90 Total
Tally // / / / //// / // / /// /// /
Frequency 2 1 1 1 4 1 2 1 3 3 1 20
8
-Each individual value is presented separately, that is why it is named ungrouped
frequency distribution.
3. Grouped Frequency Distribution (GFD).
When the range of the data is large the data must be grouped in to classes that are more
than one unit in width.
Definition of some basic terms
Grouped frequency distribution: is a FD when several numbers are grouped into
one class.
Class limits (CL): It separates one class from another. The limits could actually
appear in the data and have gaps between the upper limits of one class and the
lower limit of the next class.
Unit of measure (U): This is the possible difference between successive values.
E.g. 1, 0.1, 0.01, 0.001……
Class boundaries: Separate one class in a grouped frequency distribution from the
other. The boundary has one more decimal place than the raw data. There is no gap
between the upper boundaries of one class and the lower boundaries of the
succeeding class. Lower class boundary is found by subtracting half of the unit of
measure from the lower class limit and upper class boundary is found by adding
half unit measure to the upper class limit.
Class width (W): The difference between the upper and lower boundaries of any
consecutive class. The class width is also the difference between the lower limit or
upper limits of two consecutive classes.
Class mark (Midpoint): It is found by adding the lower and upper class limit
(Boundaries) and divided the sum by two.
Cumulative frequency (CF): It is the number of observation less than the upper
class boundary or greater than the lower class boundary of class.
CF (Less than type): it is the number of values less than the upper class boundary
of a given class.
CF (Greater than type): it is the number of values greater than the lower class
boundary of a given class.
Relative frequency (Rf ):The class frequency divided by the total frequency. This
gives the percent of values falling in that class.
9
Rfi = fi/n= fi/∑fi
Solution:-
1) Highest value = 39, Lowest value = 6, 2) Range = 39 – 6 = 33, 3) K = 1+
3.322Log20 = 1 + 3.322(1.301) = 5.3 ≈ 5, 4) W = R / K = 33/5 = 6.6 ≈ 7, 5) U = 1, 6)
LCL1= 6, 7) Find the upper class limits,
8) Find class boundaries, 9) Find class mark
10) Tally the data
10
Class Class Class Tally Freq. CF(<) CF(>) RF RCF(>)
limit boundary Mark
6 – 11 5.5 – 11.5 8.5 // 2 2 20 2/20=0.1 1
12 – 17 11.5 - 17.5 14.5 // 2 4 18 2/20=0.1 0.9
A pie chart is a circle that is divided in to sections or wedges according to the percentage
of frequencies in each category of the distribution. The angle of the sector is obtained
using:
Valueofthepart
Angleofasector = ∗ 360
Thewholequantity
Example 2.4:Draw a pie chartto represent the following population data in a town.
Step 1: Find the percentage, Step 2: Find the number of degrees for each class.
Step 3: Using a protractor, graph each section and write its name with corresponding
percentage.
11
Men 2500 25 90
Women 2000 20 72
Girls 4000 40 144
Boys Men
Boys 1500 15 54
15% 25%
Total 10000 100 360 Wome
Girls n
40% 20%
B) Bar Charts
Used to represent & compare the frequency distribution of discrete variables and
attributes or categorical series. Bars can be drawn either vertically or horizontally.
There are different types of bar charts. The most common being:
Simple bar chart and Component or sub divided bar chart.
Simple bar chart:- Are used to display data on one variable, They are thick lines
(narrow rectangles) having the same breadth.
Example 2.5: Number of students in the four department of Science College given as
follows:
Department Physics Maths Chemistry Biology
Solution:
Simple bar chart
800 600
Frequency
When there is a desire to show how a total (or aggregate) is divided in to its
component parts, we use component bar chart.
12
Example 2.6:Draw a component (sub-divided) bar chart of the number of students by
department is given in the
Sub-divided bar chart
example 2.5.
800
Female
Solution 600
Frequency 400 Male
200
0
Phys Maths Chem Bio
Department
The histogram, frequency polygon and cumulative frequency graph or ogive is most
commonly applied graphical representation for continuous data.
Histogram:-To construct a histogram, the class boundaries or the class marks are
plotted on the horizontal axis and the class frequencies are plotted on the vertical axis.
Example 2.7:Construct a histogram to represent the following data.
Solution:
13
Histogram
Frequency
20
15
15 12
10
10
4 4
5 3 2
0
Class boundaries
Frequency polygon
A frequency polygon is a line graph where class frequencies are plotted against the class
marks and the successive points are connected by straight lines.
Example 2.8:Construct a frequency polygon to represent the previous data in example
2.8.
Solution:
Class Freq. Class Class R.F. % R.F. Less than More than
limits marks boundaries C.F. C. F.
(percent)
Adding two class marks with fi 0 , we have 9.5 at the beginning, and 89.5 at the end,
the following frequency polygon is plotted:
14
Frequency Polygon
20
F
r
15
e
q
10
u
e
n 5
c
y 0
9.5 19.529.539.549.559.569.579.589.5
Class mark
Example 2.9: Draw a both types of ogives for the F.D. of Example 2.7.
Solutions:
30 40
20 30
10 20
0 10
14.5 24.5 34.5 44.5 54.5 64.5 74.5 84.5 0
14.5 24.5 34.5 44.5 54.5 64.5 74.5 84.5
Class Boundaries
Class Boundaries
15
CHAPTER THREE: MEASURES OF CENTRAL TENDENCY
ii. The sample values are: 10.5 2.4 3.6 5.9 8.7
∑ . . . . . .
x= = = = 6.22, The arithmetic mean for sample value is
6.22.
Example 3.3: Calculate the arithmetic mean of the sample of numbers of students in 10
classes:
50 42 48 60 58 54 50 42 50 42
∑
x= = = = = 49.6 ≈ 50
In this case there are three 42’s, one 48, three 50’s, one 54, one 58 and one 60. The
number of times each number occurs is called its frequency and the frequency is usually
denoted by f. The information in the sentence above can be written in a table, as follows.
Value, xi 42 48 50 54 58 60
Frequency, fi 3 1 3 1 1 1
xifi 126 48 150 54 58 60
The formula for the arithmetic mean for data of this type is
17
… ∑
x= …
= ∑
x= = = = 49.6 ≈ 50,
Solution:
The formula to be used for the mean is as follows:
∑
x= ∑
Let us calculate these values and make a table for these values for the sake of
convenience.
Class Interval (CI) 60-62 62-64 64-66 66-68 68-70 70-72 Total
Frequency (f) 5 18 42 20 8 7 100
Mid-Point (x ) 61 63 65 67 69 71
fx 305 1134 2730 1340 552 497 6558
18
n
( x x) 0
i 1
i
n
2
• The sum of squares of deviations from the mean is the least. That is, ( x A)
i 1
i is
minimum when A x .
w1 x1 w2 x2 wn xn wi xi
xw
w1 w2 wn wi
Example 3.5: A student’s final mark in Mathematics, Physics, Chemistry and Biology
are respectively A, B, D and C. If the respective credits received for these courses are 4,
4, 3 and 2, determine the approximate average grade the student has got for the course.
Solution: We use a weighted arithmetic mean, weight associated with each course being
taken as the number of credits received for the corresponding course.
x 4 3 1 2 Total
w 4 4 3 2 13
xw 16 12 3 4 35
w1 x1 w2 x2 wn xn wi xi
xw
w1 w2 wn wi
= = = 2.69
Combined mean: When a set of observations is divided into k groups and x is the mean
of n1 observations of group 1, x is the mean of n2 observations of group2, …, x is the
19
mean of nk observations of group k, then the combined mean,denoted byx , of all
observations taken together is given by
x n +x n +⋯+x n
x
n + n + ⋯+ n
This is a special case of the weighted mean. In this case the sample sizes are the weights.
Example 3.6: In the Previous year there were two sections taking Statistics course. At the
end of the semester, the two sections got average marks of 70 & 78. There were 45 and
50 students in each section respectively. Find the mean mark for the entire students.
Solution:
⋯
x = = = = 74.21
⋯
Geometric mean for individual series: The geometric mean, G.M. of an individual
series of positive numbers x , x , … , x is defined as the nth root of their product.
Solution: GM 3 12 36 6
Example 3.9 A car travels 25 miles at 25 mph, 25 miles at 50 mph, and 25 miles at 75
mph. Find the harmonic mean of the three velocities.
Solution
H .M
n = = 40.9
1 1 1
x1 x2 xn
3.3.4 Median
The median is as its name indicates the middle most value in the arrangement which
divides the data into two equal parts. It is obtained by arranging the data in an increasing
or decreasing order of magnitude and denoted byx.
Median for individual series
We arrange the sample in ascending order of the variable of interest. Then the median is
the middle value (if the sample size n is odd) or the average of the two middle values (if
the sample size n is even).
For individual seriesthe median is obtained by
a/ x = ( ) value if n is odd, and
( ) ( )
b/ x = if n is even
Solution;
i. The data in ascending order is given by:
-5 0 1 2 4 5 6 8 10 15
n=10 n is even. The two middle values are 5th and 6th observations. So the
median is,
( ) ( )
x= value = = = 4.5
21
Median for Discrete data arranged in a frequency distribution:- In this case also, the
median is obtained by the above formula. After arranging the values in an increasing
order find the smallest CF greater than or equal to that value obtained by a&b above
formula and the corresponding value is the median.
Median for grouped continuous data:-For continuous data, the median is obtained by the
following formula.
w n
Median L CF ~
x
f med 2
Where: L= the lower class boundary of the median class; w = the class width of the
median class;
f med = the frequency of the median class; and CF the cum. freq. corresponding to the
class preceding the median class. That is, the sums of the frequencies of all classes lower
than the median class. Where the median class is the class which contains the (n/2)th
observation whether n is odd or even, since the items have already lost their originality
once they are grouped in to continuous classes.
Example 3.11: Calculate the median for the following frequency distribution.
C.I 1 - 5 6 - 10 11 – 15 16 – 20 21 - 25 26 - 30 31 - 35 Total
Freq. 4 8 12 6 3 4 3 40
Freq. 4 8 12 6 3 4 3 40
Cuml. Freq. 4 12 24 30 33 37 40
Since n = 40, 40/2 = 20, and the smallest CF greater than or equal to 20 is 24; thus, the
median class
22
is the third class. And for this class, L = 10.5, w = 5, f med =12, CF = 12. Then applying the
formula,
we get:
~
x =10.5+(20-12)*5/12=13.8
Mode of individual series: - The mode or the modal value of individual series (raw data) is
simplyobtained by locating the observation with the maximum frequency.
Mode for discrete data arranged in a frequency distribution:-In the case of discrete
grouped data, the mode is determined just by looking to that value (s) having the highest
frequency.
In such cases, one can only determine the modal class easily: the class with the highest
frequency.
1
Mode L w , where L = the lower class boundary of the modal class;
1 2
Solution: By inspection, the mode lies in the third class, where L =10.5, fmod = 12, f1=8,
f2=6, w = 5
23
Using the formula, the mode is:
1
Mode L w = 10.5 + (12-8)*5/(12-8)+(12-5) = 12.5
1 2
Let x1 , x 2 , , x n be n ordered observations. The ith quartile Qi is the value of the item
corresponding
That is, after arranging the data in ascending order, Q1, Q2, & Q3 are, obtained by:
( ) ( ) ( )
Q = value, Q = value and Q = value.
24
• Quartiles for discrete data arranged in a frequency distribution:-Arranged in a
frequency distribution this case also, we will follow the same procedure as the median. That
is, we construct the less than cumulative frequency distribution and apply the formula of
quartile for individual series.
• Quartiles in continuous data:- For continuous data, use the following formula:
w in
Qi L CF
f Qi 4
Where i = 1,2, 3, and L, w ,fQi and CF are defined in the same way as the median.
i.e. Q1 = L + − CF , Q2 = L + − CF andQ3 = L + − CF
The class under question is the one including (ixn/4)th value. That is, the class with the
minimum
frequency greater than or equal to (ixn/4) th is the class of the ith quartile.
Deciles: are values dividing the data approximately in to ten equal parts, denoted by
D ,D ,…, D .
• Deciles for Individual Series:
Let x1 , x 2 , , x n be n ordered observations. The ith decile (D ) is the value of the item
corresponding
That is, after arranging the data in ascending order, D1, D2, . . . & D9 are, obtained by:
( ) ( ) ( )
D = value, D = value . . . and D = value.
distribution this case also, we will follow the same procedure as the median. That is, we
construct the less than cumulative frequency distribution and apply the formula of deciles
for individual series.
• Deciles for continuous data: Apply the following formula and follow the procedures of
quartile for continuous data.
25
D = L+ − CF ,i = 1, 2,...,9 . Then
Define the symbols in similar ways as we did in the case of quartiles for continuous data.
Percentiles: are values which divide the data approximately in to one hundred equal
parts, and
denoted by P , P ,…, P .
• Percentiles for Individual Series:
Let x1 , x 2 , , x n be n ordered observations. The ith percentile (P ) is the value of the item
That is, after arranging the data in ascending order, P1, P2, . . . & P99 are, obtained by:
( ) ( ) ( )
P = value, P = value . . . and P = value.
Define the symbols similar ways as we did in the case of quartiles or deciles for
continuous data.
Interpretations
1. Q is the value below which ( i × 25) percent of the observations in the series are found
(where i = 1, 2,3). For instance Q means the value below which 75 percent of
observations in the given series are found.
2. D is the value below which ( i ×10) percent of the observations in the series are found
(where i = 1, 2,...,9 ). For instance D is the value below which 40 percent of the values
are found in the series.
3.P is the value below which i percent of the total observations are found (where i = 1,
2,3,...,99 ). For example 60 percent of the observations in a given series are belowP .
26
Example 3.15: Calculate Q , D , &P for the following data given on the table below.
x 10 11 12 13 14 15 16 17 18
f 2 8 25 48 65 40 20 9 2
Solution: The data is arranged in an increasing order. So we need to construct only the
cumulative frequency table before calculating the required values.
x 10 11 12 13 14 15 16 17 18
f 2 8 25 48 65 40 20 9 2
Cum. 2 10 35 83 148 188 208 217 219
Freq.
The total number of observations is 219 which is odd. Clearly then the median is 14. i.e.
x=( ) =( ) value = 110th value = 14
( ) ( )
Q = value = value = 55th value = 13
( ) ( )
D = value = value = 88th value = 14
( ) ( )
P = value = value = 198th value = 16
Example 3.16: Marks of 50 students out of 85 is given below. Based on the data find Q ,
D andP .
Marks 46-50 51-55 56-60 61-65 66-70 71-75 76-80
fi 4 8 15 5 9 5 4
Solution:- first find the class boundaries and cumulative frequency distributions.
Marks 46-50 51-55 56-60 61-65 66-70 71-75 76-80
class 45.5-50.5 50.5-55.5 55.5-60.5 60.5-65.5 65.5-70.5 70.5-75.5 75.5-80.5
boundary
fi 4 8 15 5 9 5 4
Cum. 4 12 27 32 41 46 50
frequency
27
Q1 Measure of (n/4)th value = 12.5th value which lies in group 55.5 – 60.5
D4 Measure of (4n/10)th value = 20th value which lies in group 55.5 – 60.5.
P7 Measure of (7n/100)th value = 3.5th value which lies in group 45.5 – 50.5
28
CHAPTER FOUR: MEASURES OF DISPERSION(VARIATION)
4.1 Introduction
Just as central tendency can be measured by a number in the form of an average, the
amount of variation (dispersion, spread, or scatter) among the values in the data set can
also be measured.
Dispersion refers to the variation of the items around an average. Thus, dispersion is
defined as scatteredness or spreadness of the individual items in a given series.
R=L−S
Where R=Range, L= Largest value in a given set of data, S= smallest value in a given set
of data.
LS
RelativeRange(RR) =
LS
Example 4.1: Five students obtained the following marks in statistics: 20, 35, 25, 30, 15.
Find the range and relative range
LS 35 15
RR = 0 .4
LS 35 15
Example 4.2: Find out range and relative range of the following given data.
29
Size 5-10 11-15 16-20 21-25 26-30
Frequency 4 9 15 30 40
Solution: Here,
L = Upper class limit of the largest class = 30, L = lower class limit of the smallest class
=5
30 5
Range = 30 – 5 = 25, RR = 0.7143 .
30 5
Inter-quartile range and quartile deviation are other measures of dispersion. The
difference between the upper quartile (Q ) and lower quartile (Q ) is called inter-quartile
range. Symbolically,
The relative measure of quartile deviation also called the coefficient of quartile deviation
(CQD) is defined as:
Q −Q
CQD =
Q +Q
Example 4.3: Find inter-quartile range, quartile deviation and coefficient of quartile
deviation from the following data.
Solution: First arrange the data in ascending order. 15, 18, 20, 24, 27, 28, 30
n+1 7+1
Q = sizeof item = sizeof item = sizeof2 item = 18 marks
4 4
n+1 7+1
Q = sizeof3 itemsizeof 3 item = sizeof6 item = 28 marks
4 4
= 0.217
30
Example 4.4: Find inter-quartile range, quartile deviation and coefficient of quartile
deviation from the following data
Marks 2 3 4 5 6 7 8 9
No. Of students 10 11 12 13 5 12 7 5
Solution:
Marks 2 3 4 5 6 7 8 9
No. of students 10 11 12 13 5 12 7 5
CF 10 21 33 46 51 63 70 75=N
∑| |
MD(X) = … for ungrouped data (individual series).
∑ | |
MD (X) = . . . for discrete data arranged in FD and for grouped
∑| |
MD(X) = … for ungrouped data (individual series).
∑ | |
MD(X) = . . . for discrete data arranged in FD and a grouped
xi 4 4 5 5 5 6 7 7 8 9 Total
|X − X| 2 2 1 1 1 0 1 1 2 3 14
|X − x| 1.5 1.5 0.5 0.5 0.5 0.5 1.5 1.5 2.5 3.5 14
Since the distribution is ungrouped the mean deviation about mean and median:
∑| | ∑| |
MD(X) = = = 1.4, MD X = = = 1.4
( )
CMD(X) = , where MD is the mean deviation calculated about the arithmetic mean.
CMD about the median is given by:
( )
CMD(X) = in which case MD is calculated about the median of the observations.
Example 4.6: Calculate the coefficient of mean deviation about the mean and median
the data in Example 4.5 above.
Solution:
( ) . .
CMD(X) = = = 0.23, CMD X = = .
= 0.25
Population Variance ( )
If we divide the variation by the number of values in the population, we get something
called the population variance. This variance is the "average squared deviation from the
mean".
For ungrouped data (individual series )
32
∑ ( )
= = ∑ X − where is the population arithmetic mean and N
class mark of the ith class, f is the frequency of the ithclass and N=∑ f
Sample Variance ( )
The sum of the squares of the deviations is divided by one less than the sample size.
For ungrouped data
∑ ( )
S = = [∑ x − nx ]Where is the sample arithmetic mean and n is
2 1 m 2
S =
∑ ( )
= ∑ f x − nx or S f i xi x
n 1 i 1
For continuous grouped data,x is the class mark of the ith class, f is the frequency of
the ithclass and n=∑ f .
The Standard Deviation
It is the positive square root of the variance.
Population Standard Deviation (s ), σ = √ where σ is the population variance.
Sample Standard Deviation ( S ), S = √S where S is the sample variance.
Example 4.7: Find the sample variance and standard deviation of:
xi 2 4 5 6 8
fi 2 2 3 1 2
33
2 2 4 4 8
4 2 8 16 32
5 3 15 25 75
6 1 6 36 36
8 2 16 64 128
Sum 10 49 279
1
S = f x − nx
n−1
1. If a constant is added to (or subtracted from) all the values, the variance remains
the same; i.e., for any constant k, V ( xi k ) V ( xi ) .
Example 4.8 Consider the 6 sample values xi: 54,52,53,50,51, and 52.
2. If each and every value is multiplied by a non-zero constant (k), the standard
deviation is multiplied by |k| and the variance is multiplied by k2; i.e.,
V ( kxi ) k 2V ( xi ) .
3. Both the variance and the standard deviation give more weight to extreme values
and less to those which are near to the mean.
Coefficient of Variation
Coefficient of variation is used in problems where we want to compare the variability of
two or more different series.
CV = × 100%
34
For population data:CV = × 100, Where σ is the population standard deviation and μ
is population mean.
For sample data:CV = × 100, Where S is the sample standard deviation and x is sample
mean.
Remark: A distribution having less coefficient of variation is said to be less variable or
more consistent or more uniform or more homogeneous.
Example 4.9: Last semester, the students of Mathematics and Chemistry Departments
took Introduction to Statistics course. At the end of the semester, the following
information was recorded.
Compare the relative dispersions of the two departments’ scores using the appropriate
way.
Solution:
Mathematics Departments Chemistry Departments
CV = × 100 CV = × 100
= × 100 = × 100
= 29.41% = 18.46%
Interpretation: Since the CV of Mathematics Department students is greater than that of
Chemistry Department students, we can say that there is more dispersion relative to the
mean in the distribution of Mathematics students’ scores compared with that of
Chemistry students.
4.4Standard Scores (Z-Scores):-The standard score (z-score) tells us how many
standard deviations a specific value is above or below the mean value of the data set.
That is, the z-score is the number of standard deviations the data value falls above
(positive z-score) or below (negative z-score) the mean for the data set.
35
X−μ
Z =
σ
X−X
Z
S
Example 4.10: What is the Z-score for the value of 14 in the following sample data set?
3 8 6 14 4 12 7 10
Solution:
The data value of 14 is located 1.57 standard deviations above the mean 8 because the
z-score is positive.
Example 4.11: Suppose that a student scored 66 in Statistics and 80 in Mathematics. The
score of the summary of the courses is given below.
From these two standard scores, we can conclude that the student has scored better in
Statistics course relative to his classmates than in Mathematics course.
4.5.1 Moments
36
The moments of a distribution are the arithmetic mean of the various powers of the
deviations of items from some number. In our course, we shall use it in the study of
Skewness and Kurtosis of statistical distribution.
∑
M = , Where r = 0, 1, 2, 3, …
Moments about the origin for grouped frequency distribution andfor ungrouped
frequency distribution is
∑
M = , Where f is the frequency ofX . X is the midpoint in the case of grouped
Note that:M = X, M = 1
∑(X − X)
M′ =
n
Moments about the mean for grouped frequency distribution andfor ungrouped frequency
distribution.
∑ ( )
M′ = , Where f is the frequency ofX . X is the midpoint in the case of grouped
∑(X − A)
M′ =
n
Moments about any arbitrary constant A for grouped frequency distribution andfor
ungrouped frequency distribution
∑ ( )
M′ = .
37
Example 4.12: Find the first four moments about the mean for the following individual
series
X: 3 6 8 10 18
Solution: n=5,
Ser. ( − ) ( − ) ( − ) ( − )
No
1 3 -6 36 -216 1296
2 6 -3 9 -27 81
3 8 -1 1 -1 1
4 10 1 1 1 1
5 18 9 81 729 6561
Total X = 45 (X − X) = 0 (X − X) = 128 (X − X) (X
= 486 − X)
= 7940
∑( ) ∑( ) ∑( )
Thus, X = = 9, M = = 0, M = = = 25.6, M = =
= 97.2
∑(X − 9) 7940
M = = = 1588
5 5
4.5.2 Skewness
38
Negatively Skewed distribution: In a negatively skewed distribution mode is greater
than the mean and the median lies in between mean and mode. .
Note that: In moderately skewed distributions the averages have the following
relationship.
Measures of skewness ( )
It gives information about the shape of the distribution and the degree of variation on
either side of the central value. The three most commonly used measures of skewness are
Pearson’s coefficient of skewness, Bowley’s coefficient of skewness and coefficient of
skewness based on moments.
α = / =
Where, M'r = ∑ (x − x) /n
α > 0, the distribution is positively skewed/skewed to the right,i.e mode < median
<mean
smaller observations are more frequent than larger observations. i.e., the majority of
α < 0, the distribution is negatively skewed/skewed to the left.i.e., mean < median <
mode
smaller observations are less frequent than larger observations. i.e., the majority of
4.5.3 Kurtosis
40
Measures of Kurtosis ( )
α = =
.
b/ α = = = 2.26 < 3, the curve is platykurtic.
.
Example 4.14: Findthe coefficient of skewness and the coefficient of kurtosis for
the above example 4.13.
Solution:
. .
i) α = / = = .
= 0.75
( . )
ii) α = =
.
= 2.42
Example 5.1: In an experiment of rolling a fair die, S = {1, 2, 3, 4, 5, 6}, each sample
point is an equally likely out come. It is possible to define many events on this sample
space as follows:
Example 5.2: If we toss a coin the sample space (S) of this experimentS = {head, tail}
where head and tail are two faces of a coin. If we are interested the outcome of head will
turn up then the event E= {head}.
Example 5.3: Find the sample space of tossing a coin three times.
S= {HHH, HHT, HTH, HTT, THH, THT, TTH, TTT}
Mutually exclusive event: - two events A and B are said to be mutually exclusive if
there is no sample point which is common to A and B. i.e. A ∩ B = ϕ
Independent event: two or more events are said to be independent if the occurrence
or non-occurrence of an event does not affect the occurrence or non-occurrence of the
other.
Dependent Events: Two events are dependent if the first event affects the outcome
or occurrence of the second event in a way the probability is changed.
Complement of an Event: the complement of an event A means nonoccurrence of A
and is denoted by A', or Ac contains those points of the sample space which don’t
belong to A.
42
Equally likely outcomes: if each outcome in a sample space has the same chance to
be occurred.
Example 5.4: Casting a fair die all possible outcomes are equally likely.
5.2 Counting rules:addition, multiplication, Permutation&Combination rule
In order to calculate probabilities, we have to know
The number of elements of an event.
The number of elements of the sample space.
That is in order to judge what is probable, we have to know what is possible.
In order to determine the number of out comes one can use several rules of counting:
1. The addition rule
2. The multiplication rule
3. Permutation rule
4. Combination rule
1. The addition Rule
Suppose that a procedure, designated by 1, can be done in n1 ways. Assume that a second
procedure designated by 2, can be done in n2 ways. Suppose furthermore, that it is not
possible that both 1 and 2 done together. Then, the number of ways in which we can do1
or 2 is n + n ways.
Example 5.5:suppose we are planning a trip to some place. If there are 3 bus routs & two
train routs that we can take, then there are 3+2=5 different routs that we can take.
2. Multiplication rule: If an operation consists of k steps and the 1st step can be done in
n1 ways, the 2nd step can be done in n2 ways (regardless of how the 1st step was
performed), the kth step can be done in nk ways, (regardless of how the preceding steps
were performed), then the entire operation can be performed in n1· n2·… · nkways.
Example 5.6: Suppose that a person has 2 different pairs of trousers and 3 shirts. In how
many ways can he wear his trousers and shirts?
Solution: He can choose the trousers in n1 2 ways, and shirts in n 2 3 ways. Therefore,
he can wear in n1 n2 2 3 6 possible ways.
3. Permutation:-An arrangement of objects with attention given to order of arrangement
is called permutation. The number of permutation of n different objects taken r at a time
is obtained by:
n!
n Pr for r 0,1, 2, , n
(n r )!
43
Permutation Rule:
a) The number of permutations of n objects taken all together is n!
n! n!
i.e. n!= n*(n-1)*(n-2)*…*3*2*1 = n Pn n!
(n n)! 0!
Note: By definition 0! = 1
b) The arrangement of n distinct objects in a specific order using r objects at a time is is
called the permutation of n objects taken r objects at a time. It is written as nPr and the
formula is
n!
n Pr
( n r )!
c) The number of distinct permutation of n objects in which n1 are alike, n2 are alike,..., nk
are alike is
n! for n n1 n2 nk
n1 !.n 2 !. .n k !
Example 5.7: Find number of permutations of the letters in the word ‘‘statistics’’.
Solution:
There are 3 s’s, 3t’s, 1a’s, 2i’s and 1c’s. i.e. n = 3, n = 3,n = 1,n = 2 and n = 1
Therefore 10! = 50,400.
3!.3!.1!.2!1!
Example 5.8: A photographer wants to arrange 3 persons in a row for photograph. How
many different types of photographs are possible?
Solution:
Assume 3 persons Aster (A), lemma (L), Yared (Y) and n=3
Since n! =3! = 3*2! = 6, there are 6 possible arrangement ALY, AYL, LAY, LYA,YLA
and YAL.
Example 5.9: Suppose we have a letters A,B, C, D&E
a) How many permutations are there taking all the four?
b) How many permutations are there taking two letters at a time?
Solution:
a) Here n = 5, there are four distinct object.
There are 5! = 120 permutations.
b) Here n = 5, r = 2
There are 5P2 = 5!/(5-2)! = 120/6 = 20 permutations.
44
Example 5.10: Fifteen Ethiopian athletes were entered to the race. In how many different
ways could prizes for the first, the second and the third place be awarded?
Solution
15 objects taken 3 at a time 15P3=15!/(15-3)! = 2730 ways.
4. Combination-A selection of objects considered without regard to order in which they
occur is called Combination. The number of combination of n different objects taking r of
n n!
them at a time is n C r , for r 0,1,2, , n .
r
r!( n r )!
Example 5.11: Given the letters A, B, C, and D list the permutation and combination for
selecting two letters.
Solution:
Permutation Combination
AB BA CA DA AB BC
AC BC CB DB AC BD
AD BD CD DC AD DC
Note that in permutation AB is different from BA but in combination AB is the same as
BA.
Example 5.12: In a club containing 7 members a committee of 3 people is to be formed.
In how many ways can the committee be formed?
n n! 7 7!
Solution: 7C3 = n C r 7 C3 = 35
r r!( n r )! 3 3!(7 3)!
Example 5.13: How many four-digit numbers can be formed with the 10 digits 0,1,2, . .
,9 if
a/ repetitions are allowed, b/ repetitions are allowed, and c/ the last digit must be zero &
repetitions are not allowed.
Solution:
a/ the first digit can be any one of 9 (since 0 is not allowed). The second, third and fourth
digits can be any one of 10. Then 9.10.10.10=9000 numbers can be formed.
b/ the first digit can be any one of 9 & the remaining three can be chosen in 9 P3 ways.
c/ the first digit can be chosen in 9 ways & the next two digits in 9 P2 ways. Thus 9. 8 P2 =
504 numbers can be formed.
45
5.3 Probability of an event
Definition: Probability is a numerical measure of the chance or likelihood that a
particular event will occur & it lies in the range from 0-1, inclusive. Probability is a
building block of inferential statistics.
Definition: Let E be an experiment. Let S be a sample space associated with E. With
each event A in S we associate a real number designated by P (A) and called the
probability of A.
Generally probability can be divided into two
i) Subjective probability: - probability determined based on individual’s own judgment,
experience, information, belief . . . is called Subjective probability.
ii) Objective probability: - the probability of an event in a certain experiment based on
experimental evidence.
Basic approaches to probability
There are three different conceptual approaches to the study of probability theory.
These are:
1. The classical approach.2. The frequentist approach.3. The axiomatic approach.
1. Classical approach:
Definition: If there are n equally likely outcomes of an experiment, and out of the n
outcomes event A occur only k times the probability of the event A is denoted by P (A) is
defined as
( )
p(A) = = ( )
=
Note: Classical approach of measuring probability fails to answer for the following
conditions:
If total number of outcomes is infinite or if it is not possible to enumerate all
elements of the sample space.
If each out come is not equally likely.
Example 5.14: Compute a/ the probability of having two boys & one girl is a three child
family using the classical method, assuming boys & girls are equally likely.
b/ using (a) compute the probability of having three boys in a three-child family.
c/ using (a) compute the probability of having three girls in a three –child family.
d/ using (a) compute the probability of having two girls & one boy in three child
family.
46
Solution
( ) 30 50 80
P (A) ) = = * / 0.00001825
( )
10 0 10
b) Let A be the event that 6 will be non defective.
30 50
Total way in which A occur = * NA=n (A)
4 6
47
( ) 30 50 80
P (A) ) = ( )
= * / 0.265
4 6 10
c) Let A be the event that all will be non defective.
30 50
Total way in which A occur = * NA=n (A)
0 10
( ) 30 50 80
P (A) = ( )
= * / 0.00624.
0 10 10
2. The Frequentist Approach (Empirical Probability): This approach to probability is
based on relative frequencies.
Definition: Suppose we do again and again a certain experiment n times and let A be an
event of the experiment and let k be the number of times that event A occurs. Therefore
the probability of the event A happening in the long run is given by:
P(A) = =
In other words given a frequency distribution, the probability of an event (A) being
Example 5.16: The national center for health statistics reported that of every 539 deaths
in recent years, 24 resulted that from automobile accident, 182 from cancer, and 353 from
other disease. What is the probability that particular death is due to an automobile
accident?
Solution
P (automobile) = death due to automobile /total death =24/539 = 0.445
The probability that particular death is due to an automobile accident is 0.445.
3. The axiomatic approach.
Let E be a random experiment and S be a sample space associated with E. With each
event A a real number called the probability of A satisfies the following properties called
axioms of probability or postulates of probability.
1.0≤ P (A) ≤1
2. P(S) =1, S is the sure/certain event.
3. If A1 and A2 are mutually exclusive events, the probability that one or the other occur
equals the sum of the two probabilities. i. e. P(A1∪A2)=P(A1)+P(A2)
Similarly P(A1∪A2∪ . . . An) = P(A1)+P(A2) +. . . +P(An) = ∑ A
4. P (A') =1-P (A)
48
5. P (ø) =0, ø is the impossible event.
5.4 Some probability rules
Rule l: let A be an event and A' be the complement of A with respect to a given sample
space of an experiment, then P(A')=1-P(A)
Proof: let S be a sample space S=AUA' and, A and A' are mutually exclusive
A∩A' = ø
P(S) = P (AUA') = P (A') + P (A) and P(S) = 1
1= P (A') + P (A) => P (A') = 1-P (A)
Rule 2: let A and B are events of a sample space S, then
P (A'∩ B) = P (B)-P (A ∩ B)
Proof: B =S ∩ B = (AUA') ∩ B = (A∩ B) U (A'∩ B)
If A∩B ≠ ø , then P(B) =P (A∩ B) +P (A' ∩ B)
P (A' ∩ B) = P(B) – P(A ∩ B).
Rule 3: Suppose A and B are two events of a sample space, then
P(AUB) = P(A) + P(B) – P(A ∩ B)
Proof:
(AUB) = AU(A' ∩ B), A and A' ∩ B are disjoint sets
∴ P(AU B) = p(A) + p(A' ∩ B) . . . .*
But we have already proved that P (A’ n B) = P (B) – P (A ∩ B)
Put this in equation *
P(A U B) = P(A) + P (B) – P (A ∩ B)
Example 5.17: A fair die is thrown twice. Calculate the probability that the sum of spots
on the face of the die that turn up is divisible by 2 or 3.
Solution
S={(1,1),(1,2),(1,3),(1,4),(1,5),(1,6),(2,1),(2,2),(2,3),(2,4),(2,5),(2,6),(3,1),(3,2),(3,3),(3,4
),(3,5),
(3,6),(4,1),(4,2),(4,3),(4,4),(4,5),(4,6),(5,1),(5,2),(5,3),(5,4),(5,4),(5,5),(5,6),(6,1),(6,2),(6,
3),(6,4),(6,5),(6,6)}
This sample space has 6*6 =36 elements let A be the event that the sum of the spotson
the die is divisible by 2 and B be the event that the sum of the spots on the die isdivisible
by three, then
49
A = {(1,1), (1,3), (1,5), (2,2), (2,4), (2,6), (3,1), (3,3), (3,5), (4,2), (4,4), (4,6), (5,1), (5,3),
(5,5), (6,2), (6,4), (6,6)}
B = {(1,2), (1,5), (2,1), (2,4), (3,3), (3,6), (4,2), (4,5), (5,1), (5,4), (6,3), (6,6)}
A∩B = {(1,5), (2,4), (3,3), (4,2), (5,1), (6,6)}
P (A or B) = P (A U B)= P (A) +P (B) – P (A∩B)= 18/36 + 12/36 -6/36 = 24/36 = 2/3
5.5 Conditional Probability and Independence
5.5.1 Conditional Probability
If A and B are events. Conditional probability of A given B means the probability of
occurrence of A when the event B has already happened.
It is denoted by P (A/B) and is defined by
P (A/B) = P(A ∩ B)/P (B), if P (B)≠0
Conditional probability of B given A means the probability of occurrence of B when the
event A has already happened. It is denoted by P (B/A) and is defined
P (B/A) = P(A ∩ B)/P (A), if P (A)≠0
P (A ∩ B) = P (A) P (B/A) = P (B) P (A/B).
5.5.2 Multiplication Law of Probability
If A and B are events in a sample space S, then
P (A ∩ B) = P (A) P (B/A), P (A) ≠ 0
P (A ∩ B) = P (B) P (A/B), P (B) ≠ 0
Where P (B/A) represents the conditional probability of B given A and P (A/B)
represents the conditional probability of A given B.
Note: Extension of multiplication law of probability for ‘n’ events A1, A2, …, An we
haveP (A1∩ A2∩ …∩An) = P (A1) P (A2/A1) p (A3/A1∩ A2)…P(An/A1∩ A2∩ …∩An-1)
Example 5.18: A coin is tossed twice. If it is already known that the first coin has thrown
a head, what is the probability of getting two heads?
Solution:
S = {HH, HT, TH, TT}, A = the first shows a head = {HH, HT}, B= two heads occur
={HH}P (B/A) = P(A ∩ B)/ P(A)But A ∩ B ={HH}, P(A ∩ B) =1/4, P(A)=1/2,
therefore, P (B/A) = P(A ∩ B)/ P(A) = 1/2
Example 5.19: Let A and B are events such that P (A U B) = ¾, P (A ∩ B) = ¼ and P(A'
) = 2/3.
Find P (A'/B)
Solution:
50
P(A') = 2/3 P (A) = 1- P(A') = 1-2/3 = 1/3
Now, P (A U B) = P (A) + P (B) - P (A ∩ B)
3/4 = 1/3 + P (B) – ¼
P(B) = 3/4 - 1/3 + ¼ = 2/3
Therefore, P (A/B) = P (A ∩ B)/P(B) = 3/8 P(A'/B) =1-P (A/B) = 1-3/8 =5/8.
5.5.3 Probability of Independent Event
Two events A and B are said to be independent if the occurrence of A has no bearing on
occurrence of B. That means knowledge of A has occurred given no information about
the occurrence of B. Two events, A and B, are said to be independent if P(A∩B)
= P(A)P(B).
Suppose A and B are independent events with 0<P (A) <1 and 0<P (B) <1. Thefollowing
statements true:
i. A' and B' are independent, ii. A and B' are independent, iii. A' and B are independent
iv. P(B|A) = P(B), v. P(B|A') = P(B)
Example 5.20: A box contains four black and six white balls. What is the probability of
getting two black balls in drawing one after the other under the following conditions?
a. The first ball drawn is not replaced
b. The first ball drawn is replaced
Solution
Let A= first drawn ball is black
B= second drawn is black
Required P (A n B)
a. P (A ∩ B) = P (B/A) P(A) = (4/10) (3/9) = 2/15
b. P (A ∩ B) = P (A) P (B) = (4/10) (4/10) = 16/100 = 4/25.
5.6 Total probability and Bayes’ Theorem
Total probability:-If events B1, B2, …,& Bk constitute a partition of the sample space S & p(Bi) ≠ 0
for i = 1,2,…,k, then for any event A in S, P(A)= ∑ p(Bi)p(A/Bi).
51
So/n: p(A1)=0.25, p(A2) = 0.35, p(A3) = 0.40, P(D/A1)= 0.05, P(D/A2) = 0.04, P(D/A3)
=0.02P(D)= ∑ p(Ai)p(D/Ai) = p(A1) P(D/A1) + p(A2) P(D/A2) + p(A3) P(D/A3)
Bayes’ Theorem:- If B1, B2, …,& Bk are events which make an exhaustive partition of
the sample space S, if A is any event in S, then the conditional probability of Bi given
P( Bi ) P( A / Bi )
that A has already occurred is: P( Bi / A) k
P( B ) P ( A / B )
i
i i
Example 5.22: Based on the above example, what is the probability that it was
manufactured by machine A1?
P( A1 ) P( D / A1 )
Sol/n:- P( A1 / D) k
= (0.25)(0.05)/0.0345 = 0.3623
P( A ) P( D / A )
i
i i
52
CHAPTER SIX
PROBABILITY DISTRIBUTION
If the random variable X can assume only a particular finite or countably infinite set of
values, it is said to be a discrete random variable.
Example 6.1: Consider an experiment of "flipping a fair coin 3 times". List the elements
of the sample space that are assumed to be equally likely (as this is what is meant by a
fair or balanced coin) and the corresponding values x of the r-v X, the number of heads
observed.
Solution: If H stands for heads and T for tails, then the sample space corresponding to
this experiments is S = {HHH, HHT, HTH, HTT, THH, THT, TTH, TTT}.
Since X= the number of heads observed, the results are shown in the following table:
X 3 2 2 1 2 1 1 0
Thus, we can write X(HHH) = 3, X(HHT) = 2, , X(TTT) = 0, and P(X = 3) = 1/8 = the
probability that the r-v X is 3, P(X= 2) = 3/8, and P(X=0)=1/8.
53
Note that the possible values of X are: xi 0, 1, 2, 3 .
A random variable X is said to be continuous if it can take all possible values (integral as
well as fractional) between certain limits. Continuous random variables occur when we
deal with quantities that are measured on a continuous scale.
Example 6.2: -The height of an individual, -The distance between Debre Markos and
Addis Ababa
Continuous variable is the probability density function (pdf) and is usually denoted by
f(x).
The function f(x) is called probability density function of X. And it satisfies the following
conditions.
i) f(x)≥0 for all x, -∞ <x < ∞
ii) ∫ f (x)d x = 1
f ( x)dx . The integration from a to b in the case of the continuous variable is analogous
a
6.3.1 Expectation
The averaging process, when applied to a random variable is called expectation. It is
denoted by E(X) or and is read as the expected value of X or the mean value of X.
Case 1: For discrete random variable
Suppose X is a discrete random variable which takes on values in a finite set x1, x2,…, xn
with probabilities P(xi) = P[X = xi] i= 1, 2, …n, then Expected value of X, E(X) of the
discrete random variable is given by:
55
n
E(x) = = x P( x )
i 1
i i
56
σ = (x − x) f (x) dx
Properties of Variances
For any random variable X and constant a, it can be shown that
- Var(aX) = a2Var(X)
- Var(X + a) = Var(X) +0 = Var(X)
If X and Y are independent random variables, then
Var(X + Y) = Var(X) + Var(Y)
More generally if X1, X2 ……, Xk are independent random variables, then
Var (X1 +X2 + …..+ Xk) = Var (X1) +Var (X2) +…. + var (Xk)
i.e., Var ∑ x =∑ Var(x )
Example 6.5: Two fair coins are tossed. Determine Var (X) where X is the number of
heads that appear.
a) Use the definition of the variance.
b) Use the fact that the variance of the sum of independent variables is equal to the sum
of the variance.
Solution:
a) Let X is number of heads with possible values 0,1and2. The Sample spaceconsists of
{HH, TH, HT,TT}
P (X = 0) =¼, P (X = 1) = ½, P(X=2) = ¼
E (X) = 0.P(X=0) +1.P (X=1) +2P(X=2) = 0 (1/4) + 1(1/2) +2(1/4) = 1.
E(X2) = 02P(X=1) +12.P(X=1) +22P(X=2) = 0(1/4) + 1(1/2) +4(1/4) = 3/2.
Implies that, Var (X) = E(X2) – μ2 = 3/2-1=1/2
b) Let X be head on the first coin with possible values 0 and 1
Y be head on the second coin with possible values 0 and 1.
P(X= 0) = ½, P (X = 1) = ½ and P (Y=0) = ½, P(Y=1) = ½
E(X) = 0.P(X=0 + 1.P(X=1) E(Y) = 0.P(Y=0) +1P(Y=1)
= 0(1/2) +1(1/2) = 0(1/2) +1(1/2)
= 1/2 = 1/2
E(X2) = 02 .P(X=0) +12.P(X=1) E (Y2) = 02.P(Y=0) +12P(Y=1)
= 0(1/2) +1(1/2) = 0(1/2) +1(1/2)
=1/2 =½
57
Var (X) = E (X2) – μ2 Var (Y) = E (Y2) - μ2
= ½ - (1/2)2 = ¼ = ½ - (1/2)2 = ¼
X and Y are independent (i.e. the outcome of one coin does not influence the outcome of
the second)
Var (X+Y) = Var (X) +Var (Y) = 1/4 +1/4 = ½ .
x2
Example 6.6:Compute the variance of f(x) = for 0 < x < 3
9
3 2 3 4
2 x x 1 x5 27
E(x2) = x
0 9 dx dx 3
0 E(x) =
9 9 5 5
0 ,
3
x2 1 x4 3 9
0 9
x dx
9 4
0
4
2
27 9
Therefore, V(x) = E(x2) – [E(x)]2 = = 0.34
5 4
Generally, the sample space in a Bernoulli trial is S = {S, F}, S = Success, F = failure.
58
If X is a random variable, then for i= 0, 1, 2… n
n!
P((X = r)) = p (1 − p)
r! (n − 1)!
!
P((X = r)) = pq where q = 1 – p
!( )!
59
Let X is the number of occurrences in a Poisson process and λ be the actual
average number of occurrence of an event in a unit length of interval, the
probability function for Poisson distribution is,
P((X)) = , x = 0,1,2, ….
!
Remarks
Poisson distribution possesses only one parameter λ
If X has a Poisson distribution with parameterλ , then E (X) = λ and Var (X) = λ,
Example 6.8In a small city, 10 accidents took place in a time of 50 days. Find the
probability that there will be a) two accidents in a day and b) three or more accidents in a
day.
Solution:
There are 0.2 accidents per day.
Let X be the random variable, the number of accidents per day
X ~poiss (λ = 0.2) X = 0, 1, 2, ….
(0.2) e .
P((X = 2)) = = 0.0164
2!
b) P (X ≥ 3) = P(X = 3) + P(X = 4) + P(X = 5) +... = 1- [P(X = 0) + P(X = 1) + P(X =
2)]
. . . . . . since P( x ) 1
i0
i
A random variable X has a normal distribution with parameters μ&σ2 and it is known as a
normal
1 - 1 x 2 1 2 2
f ( x) exp e ( x ) / 2
2 2 2
60
for , x & 0.
The graph of the normal distribution is known as the normal curve, which is bell-shaped:
X
Normal probability curve
Since this is the property of the median, it follows that, for the normal distribution,
Mean = Median=Mode.
3. The height of the normal curve is at its maximum when X mean , which
means, again,
Mean = Median=Mode. The normal curve is asymptotic to the X- axis.
4. The Probability that a random variable will have a value between any twopoints is
equal to the area under the curve between those points.
61
Using the properties of expectations, it is now trivial to show that E ( Z ) 0 and V(Z) 1 .
1 2
1 2z
The pdf of Z is, thus, given by f ( z ) e , z .
2
z
The entries in Table A of the Appendix are the values of P(0 Z z ) f ( z )dz .
0
That is, the table gives us the probabilities that a random variable Z having the standard
normal distribution will take on a value on the interval from 0 to z, for
z 0.00, 0.01, 0.02, , 3.98, and 3.99; due to the symmetrical property of the normal curve
with respect to its mean, it is unnecessary to extend the table for negative values of Z.
Table value
0 Z
1 2
z 1 z
That is, the arrowed region is P(0 Z z ) e 2 dz .
0 2
Example 6.9: Find the probabilities that a random variable having the standard normal
distribution will take on a value
a) Less than 1.72; b)Less than -0.88;
Solution: By using the normal table,
a X b
P ( a X b ) P P ( z1 Z z 2 ), say.
Now, we need only to get the readings from the Z- table corresponding to z1 and z2 to get
the required probabilities, as we have done in the preceding example.
b a
P( X b) P Z P( Z z 2 ) , and P ( X a ) P Z P ( Z z1 ) .
We have seen that a Z- value measures the distance between a particular value of X and
2
Example 6.10: If X N , ,
find the probabilities
a) P ( X ) ; b) P ( 2 X 2 ) ; c) P ( 3 X 3 ) .
63
a) P( X ) P Z
0.6828 or 68.28%.
This notation is useful in statistical inference, and note that finding Z is identical with
reading anti-logarithms.
In Table A, look for the value closest to 0.4900, which is 0.4901, and the Z value for
this is Z= 2.33. Thus, Z0.01 2.33 .
b) Again, Z 0.05 is obtained as 0.5 - 0.05 = 0.4500, which lies exactly between 0.4495 and
0.4505, corresponding to Z = 1.64 and Z= 1.65. Hence, using interpolation, Z0.05 1.645
.
Example 6.12: Suppose that X N (165, 9), where X = the breaking strength of cotton
fabric. A sample is defective if X<162. Find the probability that a
randomly chosen fabric will be defective.
64
Solution: Given that 165 and 2 9 ,
f(χ ) = ⁄ (χ )( )
e , 0<χ <∞ where n is the degree of freedom.
( )
Since the Chi-square distribution arises in many important applications, its values have
been extensively tabulated. Table C at the end of this module contains values of 2 ,n
for =0.05, 0.025, 0.01, 0.005 and n=1, 2, 3, …, 30, where 2 ,n is such that the area to
its right under the Chi-square curve with n degrees of freedom is equal to . That is,
65
0 2 ,
Properties of Chi-square Distribution
1. The exact shape of the distribution depends upon the number of degrees of freedom n.
In general, when n is small, the shape of the curve is skewed to the right and as n gets
larger, the distribution becomes more and more symmetrical.
2. The mean and variance of the χ distribution are n and 2n respectively.
3. As n → ∞ the χ distribution approaches a normal distribution.
4. The sum of independent χ varieties is also χ variety.
6.5.3 The t-distribution:-Let X1,X2,….Xn be a random sample drawn from a normal
distribution having mean μ and standard deviation σ (unknown but estimated by S,
sample standard deviation).
The statistic t = has t – distribution with (n-1) degree of freedom where X is sample
√
mean and S is standard deviation.In view of its importance, the t distribution has been
tabulated extensively. Table B at the end of this module contains values of t , n 1 , for =
such that the area to its right under the curve of the t distribution with (n-1) degrees of
freedom is equal to .
Notation: tα,(n-1) stands for a value of t with (n-1) degree of freedom the right of which an
area equal to a in reading the tabulated values.
t 0 t
Student’s t Distribution
Note: 1. The table value does not contain values of t , n 1 for > 0.50, since the curve
t , n 1 = t , n 1 .
66
2. When (n-1) =30 or more, probabilities related to the t distribution are usually
approximated with the use of normal distributions.
Example 6.13: For a t-distribution with n=20, find t values leaving an area of
alternatives: O , or O , or O .
Note: The assumptions underlying student’s t-distribution for such tests are:
16.4 12.0
t= = =8.38; and the table value for n-1=15 is t 0.05,15 =1.753.
√ 2.1 / 16
68
CHAPTER SEVEN: SAMPLING AND SAMPLING DISTRIBUTION OF THE
SAMPLE MEAN
69
A. Random Sampling or probability sampling.
Probability sampling techniques is a method of sampling in which all elements in the
population have a pre-assigned probability to be included in to the sample.
In this sub-section, four different techniques of taking a random sample are discussed.
a/ Simple random sampling, b/ Stratified random sampling, c/ Cluster sampling, d/
Systematic sampling
of size N units have the same probability of selection. There are N C n distinct possible
samples in the case of sampling without replacement; the chance of selecting each one of
1
them is . There are N possible samples in the case of sampling with replacement,
N C n
the chance of selecting each one of them is 1/N . Conceptually, simple random sampling
is the simplest and most common of the probability sampling techniques.
Lottery method and computer generated random numbers are used to select a random
sample in simple random sampling:
i) Lottery method: This is a very common method of taking a random sample under this
method; we label each member of the population by identifiable ticket or pieces of
papers.
Tickets must be of identical size, color and shape. They are placed in the container and
well mixed before each drawand then draws may be continued until a sample of the
required size is selected. This shows that selection of items depends entirely on chance.
Example 7.2: If we want to take a sample of 25 persons out of a population of 150, the
procedure is to write the names of all the 150 persons on separate slips of papers, fold
these slips, mix them thoroughly and then make a blindfold selection of 25 slips without
replacement.
70
This is an alternative method of selecting a simple random sample. It is constructed
from the digits 0, 1, 2,…, 9. There are several tables available in standard books of
Statistics.
Column
Row 1 2 3 4 5 6 7 8
71
13 01881 99056 46747 08846 01331 88163 74462 14551
Example 7.3: Suppose that N= 40 and we want to select n=10 without replacement,
starting with the 3rd row and 2nd column by reading vertically using the above random
table, we get
Solution: starting with the 3rd row and 2nd column by reading vertically we will get:
sample is drawn from each stratum independently, the sample size within the ith
stratum being ni (i 1,2,, k ) such that n1 n2 nk n .
Remarks: In stratified random sampling, the following two points are equally important
to ensure accuracy.
73
1. From a finite population of size N , randomly draw all possible samples of size n.
There are N possible samples if sampling is with replacement and there are N Cn
possible samples if sampling is without replacement.
2. Calculate the mean for each sample.
3. Summarize the mean obtained in step 2 in terms of frequency distribution
The sample mean is a random variable & its probability distribution is:
xi 1 1.5 2 3.5 4 6 Total
2 14 / 3
In which if sampling with replacement, V X x2 n
=
2
= 14/6 = 2.33.
In each case the expected value of the sample mean equals the population mean. This
explains why the sample mean is a good estimate of the population mean. If we use the
74
sample mean as an estimate of the population mean we will sometimes overestimate it,
and sometimes under-estimate it, but “on average” we will be accurate.
The example above illustrates an important result:
Remark:
∑ xi
1. Mean of sample means= E( X ) = ∑
= ∑ X p X = xi = population mean.
2
2. Variance of sample means, V X x2
n
( if sampling is with replacement).
2 N n
3) Variance of sample means V ( x ) ,(if sampling is with out replacement).
n N 1
N n
The quantity is finite population correction (fpc), and if n/N<0.05, fpc is
N 1
ignored.
Note: the square root the Variance of sample means is known as standard error.
The distribution of sample means depends on distribution of the population, sample size
and whether population variance is known or unknown. A sample may be from a
normally distributed population or from a non-normally distributed population, from a
population with variance is known or unknown and the sample size may be large or
small.
Case-I: If sampling is from a normally distributed population with known variance:
When sampling is from a normally distributed population with known variance, the
distribution of sample means, X , is normal what ever the sample size.
Example 7.5:The speed of all cars travelling on a street is normally distributed with
mean 68 km/h and variance 9 km/h. Find the probability that the mean speed of a random
sample of 16 cars travelling on the street is more than 70 km/h.
Solution:
Let X be the speed of cars with mean 68 and variance 9.
A sample of size 16 is taken, the sample mean is a random variable ( X ),
2
X N , X
= N 68 , 0.56 ,since the population is normally distributed,
n
75
Case-II: When sampling from a non normal population and when the sample size
islarge
If sampling is from a non normal population and when the sample size is large the
distribution of X depends on Central Limit Theorem.
The Central Limit Theorem
If X1, X2, …, Xn is a random sample from a population with mean μ and variance 2 ,
then as n goes to infinity the distribution of the sample mean, X , approximates normal
2 2
distribution with mean μ and variance . In short as n gets large number, X N , .
n n
Example 7.6: The mean weight of 500 male students at a certain university is 151
pounds (lb) and the standard deviation is 15 lb. Assuming that the weights are normally
distributed. Suppose that a sample of 64 students is taken, what is the probability that the
weight in the sample is more than 154.75 lb?
Solution
As we have taken a large (n=64) sample we can use the Central Limit Theorem. This says
that the mean weight of the sample can be approximated by a normal random variable
with a mean of 151 and a variance of 225. If we let X be the mean weight of the students,
it is required to find
76
Let X be the mean amount of an individual’s expenditure during the day. X N (7.50,
0.077)
Let X the average amount of an individual’s expenditure during the day, it is required to
find P( X >8)
P( X >8.00) = p( X > 8.00 ) = p(Z > 8.00 7.5 ) = p(Z>1.80) = 0.5 – P (0<Z<1.80)
/ n / n 3.4 / 150
= 0.5 – 0.4641 = 0.0359
This means there is only 0.0359 probabilities that a person will spent larger than 8.00 birr
on average.
Case-III: When sampling is from normally distributed population with unknown
population variance,
b) If the sample size is small (n<30), t X t(n-1). t has t-distribution with (n-1) degree
S/ n
77
CHAPTER EIGHT: STATISTICAL INFERENCES
8.1.2 Interval estimation: We take interval, ranges of values about an estimate in which
the parameter may lie. This procedure is known as Interval estimation. It is the procedure
that results in the interval of values of a parameter. Interval estimates indicate the
precision or accuracy of an estimate and are, therefore, preferable to point estimates. It
deals with identifying the upper and lower limits of a parameter. Confidence interval for
the parameter is:
Estimate ± critical value × Standard error of the estimator
Example 8.1:: Confidence interval for the population mean is:
X± Critical value × Standard error of ( X)
Confidence interval Estimation for population means
The confidence levelis the probability that the value of the parameter falls within the
range specified by the confidence interval surrounding the statistic. There are different
cases to be considered to construct confidence intervals.
78
Where α is risk probability and 1- αconfidence level. The confidence level is the
probability that the value of the parameter falls within the range specified by the
confidence interval surrounding the statistic. σ⁄√n is the standard error of the statistic .
Standard error is the square root of variance where Var ( X) = σ ⁄n.
Using the standardized form of the sampling distribution of the sample mean in the above
probability statement, we get the limits of the confidence interval as follows:
X −μ
P −Z ⁄ < <Z ⁄ =1−α
σ⁄√n
P −Z ⁄ σ⁄√n < X – μ < Z ⁄ σ⁄√n = 1 − α
Here are the Z values corresponding to the most commonly used confidence levels.
Example 8.2: The weights of full boxes of a certain kind of cereal are normally
distributed with a standard deviation of 0.27 ounce. If a sample of 15 randomly selected
boxes produced a mean weight of 9.87 ounce, find:
a) The 95% confidence interval for the true mean weight of boxes of this
cereal,
b) The 99% confidence interval for the true mean weight of boxes of this
cereal,
Solution:
79
a) Given 1 0.95 , so that / 2 0.025 ,
n 15, 0.27 ounce, x 9.87 ounce . The 95% C.I. is
P ( Z 0.025 Z Z 0.025 ) 0.95 and Z / 2 Z 0.025 1.96 ounce
X
Where Z .
/ n
Substituting these values in x Z / 2 x Z / 2 , the resulting
n n
confidence interval is (9.73, 10.01).
Case-II:When sampling from a non-normal population and when the sample size is
large thedistribution
of depends on Central Limit Theorem (with known and unknown population
variance).
Recall the Central Limit Theorem, which applies to the sampling distribution of the mean
of a sample. Consider samples of size n drawn from a population, whose mean is μ and
standard deviation is σ. The population can have any frequency distribution. The
sampling distribution of Xwill have a mean μ and standard deviation √ . The sampling
when σ is unknown.
A (1-α) 100% confidence interval for population mean (μ) is
(X − Z ⁄ σ⁄√n , X + Z ⁄ σ⁄√n) if σ is known and
(X − Z ⁄ S⁄√n , X + Z ⁄ S⁄√n) if σ is unknown.
Example 8.3: An economist wants to estimate the average amount in checking accounts
at banks in given region. A random sample of 100 accounts givesX = $357.60 and S=
$140.00. Give a 95% confidence interval for μ, the average amount in any checking
account at a bank in the given region.
Solution
Given: n = 100,X = $357.60, S= $140.00 &α = 0.05
A 95% confidence interval for population mean (μ) is
80
(X – Z ⁄ S⁄√n , X + Z ⁄ S⁄√n) … since n is large and σ is unknown
degrees of freedom. From this distribution, (1-α) 100% confidence interval for population
mean is
(X – t ⁄ ( )√ ,X + t ⁄ ( )√ ).
Example 8.4: From a normal sample of size 25 a mean of 32 was found .Given that the
standard deviation is 4.2. Find a 95% confidence interval for the population mean.
Solution:
Given: n = 25 X = 32, S = 4.2, 1-α = 0.95 ⟹ α = 0.05, = 0.025
⟹t , = 2.064 from table.
In section 8.1, we have studied how to make estimations of the mean using point and
interval estimations. The other aspect of statistical inference is known as statistical test
of hypothesis. The branch of statistics which helps us in arriving at the criterion for
deciding about the characteristics of the population, a parameter, based on the
information obtained from the sample data is known as testing of hypothesis.
Type I error is the error committed in rejecting the null hypothesis when it is true.
Probability of committing type I error is sometimes called level of significance and
denoted by α.
Type II error is the error committed in accepting the null hypothesis when it is false.
Probability of committing type II error is denoted by β.
82
The level of significance 5% ( 0.05) implies that in 5 samples out of 100 we are
likely to reject a correct H0. In other words this implies that we are 95% confident that
our decision to reject H0 is correct.
General steps in hypothesis testing on population mean, μ
Step-1 The first step in hypothesis testing is to specify the null hypothesis (H0) and the
alternative hypothesis (H1). Suppose the assumed or hypothesized value of μ is denoted
by μo, then one can formulate two sided and one sided hypothesis as follows:
1. Ho: μ = μo versus H1: μ μo (two sided test)
2. Ho: μ = μo versus H1: μ < μo (one sided test)
3. Ho: μ = μo versus H1: μ > μo (one sided test)
Step-2: Specify a significance level of α.
Step-3 We should identify the sampling distribution of the estimator and the test statistic.
Case-I: Population variance (σ2) is known and parent population is normal.
Case-II: When sampling from a non normal population and when the sample size is large
the distribution of X depends on Central Limit Theorem (with known and unknown
variance).
83
where Xis the sample mean and μ the parameter specified by the null hypothesis.
Step-5: Identify the critical (rejection) region or put the decision rule.
a) For two sided test Ho: μ = μo versus H1: μ μo , reject Ho if
Zc>Z ⁄ or Zc<−Z ⁄ .
Note:Zc refers to Zcalculated
Graphically, the rejection and acceptance regions are:
2
2
- Z Z
2 2
b) For one sided test (right sided test) Ho: μ = μo versus H1: μ > μoreject Ho if
Zcalculated>Z . Graphically, the rejection and acceptance regions are
Z
c) For one sided test (left sided test) Ho: μ = μo versus H1: μ < μoreject Ho if
Zcalculated<−Z . Graphically, the rejection and acceptance regions are
Decision Table
To test H 0 : 0 against the three alternatives, the rules are summarized as:
0 Z / 2 Z C Z / 2 Z C Z / 2 or Z C Z / 2 Z C Z / 2
orZ C Z / 2
0 Z C Z Z C Z Z C Z
Example 8.5: Test at 0.05 whether the mean of a random sample of size n = 16 is
"significantly less than 10" if the distribution from which the sample was taken is
normal, x 8.4 and 3.2 (known).
Solution:
* H 0 : 10 versus H A : 10 , 0.05
x 0 8.4 10
* ZC 2 (calculated value)
/ n 3.2 / 4
* Since Z c 2 Z 1.645 , the null hypothesis is rejected. That is, the population
mean 8.4 is significantly less than 10 at 5% level of significance.
Example 8.6: Based upon a random sample of size 100 with an average of 3.4 minutes
and a standard deviation of 2.8 minutes, is the claim that the average telephone call is 4
minutes true with a confidence of 95%?
85
Solution: Given: n 100, x 3.4 min, s 2.8 min, 0.05
H 0 : 4
To test:
H A : 4
Since is unknown this should be a t-distribution; however, since n 100 is large the
z-satistic is used.
X 0 3.4 4
Zc 2.14
S/ n 2.8 / 10
Since the calculated value is less than the tabulated value (-2.14<-1.96), the null
hypothesis will be rejected. Therefore average telephone call is significantly different
from 4 minutes at 0.05.
Example 8.7: A sample of 16 students gave an average mark of 53.8 with a standard
deviation of 5.2. Can we conclude that the population mean of marks is 50 at 0.05
?
Solution: H 0 : 50 H A : 50
t / 2 ,n 1 t 0.025 , 15 2.131 .
x 0 53.8 50 3. 8
tC 2.92.
s/ n 5.2 / 16 1 .3
86
may be related linearly are, production/yield ( Y ) and amount of rainfall(X ), monthly
income (Y ) and level of education (X), …
A simple linear regression model is given as
Y=α+βX+∈
Where α is intercept of the regression line. It gives the value of Y whenever X is zero. If
the range of X does not include zero, α has no practical interpretation. β is the slope. It is
a measure of the rate of change. It shows by how much Y changes for every unit change
in X.
The constants, α and β are parameters and are commonly referred to as regression
coefficients.
- ∈ is a random error term. It is neither observable nor measurable. In real life problems,
even though two variables are linearly related, their relationship is not fixed as
Y=α+βX
The estimated (fitted) regression line is given byY = α + βXi
To estimate this model we take a sample of n independent observations which give rise to
n pairs (Xi, Yi) and find best estimates of the parameters or best fitted line using least
square method of estimation. A best fitting line is one for which the sum of squares of the
errors, ∑ ε is minimum.
In the principle of least square method, one would select α and β such that
∑ ε = ∑(Y − Y ) is minimum where Y = α + βXi
To minimize this function, first we take the partial derivatives of ∑ ε with respect to
α and β respectively then
n
n n
n xi y i xi y i
xy nx y
i 1 i 1 i 1 x x y y
β 2 2 2 andα = Y -βX
x nx 2 n x x 2
n xi xi
i 1
These estimates are denoted by α and β.The estimated (fitted) regression line isgiven by:
Y = α + βXi
Before estimating the regression coefficients, it would be wise to plot the observed data
on a graph known as a scatter diagram. Scatter diagram is a plot of all ordered pairs
(xi,yi )on the co-ordinate plane which helps to observe relationship between two
variables. This diagram gives a preliminary idea on the type of relationship the two
variables have.
87
Regression analysis is useful in predicting the value of one variable from the given value
of another variable, Y = α + βXi.
Example 9.1: For the following example [the number of hours (X) a student spent
studying and the marks (Y) each student received in an examination]:
Student 1 2 3 4 5 6 7 8 9 10 Total
x 8 5 11 13 10 6 18 15 2 9 97
y 65 44 79 72 70 54 90 85 33 56 648
xy 520 220 869 936 700 324 1620 1275 66 504 7034
y2 4225 1936 6241 5184 4900 2916 8100 7225 1089 3136 44952
88
Scatte r diagram for num be r of hours s tudied (X) and m ark s obtaine d (Y)
by 10 students
100
90
80
Marks obtained 70
60
50 y
40
30
20
10
0
0 5 10 15 20
hours s pe nt
β
xy nx y
7034 (10)(9.7)(64.8) 748.4
3.596 and
x nx 2 2
1149 (10)(9.7)2 208.1
α = 64.8-3.596(9.7) =29.92.
r
( x x )( y y ) Alternatively: The correlation coefficient is given by
2 2
(x x) ( y y)
r
xy nx y
x nx y
2 2 2
ny 2
The correlation coefficient, r is always lies between –1 and +1, inclusive.
• r = -1 implies perfect negative linear relationship between the two variables.
• r = +1 implies perfect positive linear relationship between the two variables.
89
• r = 0 implies there is no linear relationship between the two variables. But the two
variables may have non-linear relationship between them.
• r approaches +1 indicates strong positive linear relationship between the two variables.
• r approaches -1 indicates strong negative linear relationship between the two variables.
• r approaches 0 indicates weak linear relationship between the two variables .
Example 9.2: The research director of the Saving and Loan Bank collected 25
observation of montage interest rates X and number of house sales Y at each interest rate.
The director computed that,
∑ x = 125, ∑ y = 100, ∑ x y = 520 , ∑ x = 650 , ∑ y = 436
Compute and interpret (i) Coefficient of correlation.
(ii) The coefficient of determination.
Solution: i) Coefficient of correlation.
r
xy nx y
520 (25)(5)(4)
=
x nx y
2 2 2
ny 2 650 25(5)(5) 436 (25)(4)(4)
0.667
The two variables have positive linear relationship.
ii) Coefficient of determination, r2= (0.667)2 =0.44 this shows that 44% of the variation in
the number of house sales is due to the variation in the interest rate.
9.3 Coefficient of Determination (r2)
The simple correlation coefficient (r) cannot be used when we are dealing with a
qualitative data such as judgment about beauty, efficiency, honesty, etc. In such cases,
90
the rank correlation coefficient is used to explain the correlation or if there is an
6 d 2
rs 1 , where d is the difference between the rank of x and the corresponding
n( n 2 1)
y.
To calculate rs , we first rank the xs among themselves from least to best or from best to
least; then we rank the y' s in the same way, find the sum of the squares of the
differences, d, between the ranks of the x's and the y’s. When there are ties in rank, we
assign to each of the tied observations (having equal value) the mean of their ranks.
Example 9.4: Assume that ten girls in a beauty contest for Miss Debre Markos were
ranked by two judges as follows:
Girl Number 1 2 3 4 5 6 7 8 9 10
Judge A 4 8 6 7 1 3 2 5 10 9
Judge B 3 9 6 5 1 2 4 7 8 10
Solution: Since the ranks are given, we need to find only the difference in ranks for
each girl and the square of these differences.
D 1 -1 0 2 0 1 -2 -2 2 -1 0
d2 1 1 0 4 0 1 4 4 4 1 20
91
2 6(20)
For these n = 10 pairs, d 20 , and rs = 1
10(100 1)
0.88 , which is positive
and close to 1, showing that there is a very good agreement (or concordance)
between the two judges regarding the beauty of the girls.
92
Appendix: Table A. Approximate values of the standard normal distribution
function (i.e. area between z=0 and Z=z OR area between Z= 0 and Z≤z):
z 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09
0.0 0.0000 0.0040 0.0080 0.0120 0.0160 0.0190 0.0239 0.0279 0.0319 0.0359
0.1 0.0398 0.0438 0.0478 0.0517 0.0557 0.0596 0.0636 0.0675 0.0714 0.0753
0.2 0.0793 0.0832 0.0871 0.0910 0.0948 0.0987 0.1026 0.1064 0.1103 0.1141
0.3 0.1179 0.1217 0.1255 0.1293 0.1331 0.1368 0.1406 0.1443 0.1480 0.1517
0.4 0.1554 0.1591 0.1628 0.1664 0.1700 0.1736 0.1772 0.1808 0.1844 0.1879
0.5 0.1915 0.1950 0.1985 0.2019 0.2054 0.2088 0.2123 0.2157 0.2190 0.2224
0.6 0.2257 0.2291 0.2324 0.2357 0.2389 0.2422 0.2454 0.2486 0.2157 0.2549
0.7 0.2580 0.2611 0.2642 0.2673 0.2704 0.2734 0.2764 0.2794 0.2823 0.2852
0.8 0.2881 0.2910 0.2939 0.2969 0.2995 0.3023 0.3051 0.3078 0.3106 0.3133
0.9 0.3159 0.3186 0.3212 0.3238 0.3264 0.3289 0.3315 0.3340 0.3365 0.3389
1.0 0.3413 0.3438 0.3461 0.3485 0.3508 0.3513 0.3554 0.3577 0.3529 0.3621
1.1 0.3643 0.3665 0.3686 0.3708 0.3729 0.3749 0.3770 0.3790 0.3810 0.3830
1.2 0.3849 0.3869 0.3888 0.3907 0.3925 0.3944 0.3962 0.3980 0.3997 0.4015
1.3 0.4032 0.4049 0.4066 0.4082 0.4099 0.4115 0.4131 0.4147 0.4162 0.4177
1.4 0.4192 0.4207 0.4222 0.4236 0.4215 0.4265 0.4279 0.4292 0.4306 0.4319
1.5 0.4332 0.4345 0.4357 0.4370 0.4382 0.4394 0.4406 0.4418 0.4492 0.4441
1.6 0.4452 0.4463 0.4474 0.4484 0.4495 0.4505 0.4515 0.4525 0.4535 0.4545
1.7 0.4554 0.4564 0.4573 0.4582 0.4591 0.4599 0.4608 0.4616 0.4625 0.4633
1.8 0.4641 0.4649 0.4656 0.4664 0.4671 0.4678 0.4686 0.4693 0.4699 0.4706
1.9 0.4713 0.4719 0.4726 0.4732 0.4738 0.4744 0.4750 0.4756 0.4761 0.4767
2.0 0.4772 0.4778 0.4783 0.4788 0.4793 0.4798 0.4803 0.4808 0.4812 0.4817
2.1 0.4821 0.4826 0.4830 0.4834 0.4838 0.4842 0.4846 0.4850 0.4854 0.4857
2.2 0.4861 0.4864 0.4868 0.4871 0.4875 0.4878 0.4881 0.4884 0.4887 0.4890
2.3 0.4893 0.4896 0.4898 0.4901 0.4904 0.4906 0.4909 0.4911 0.4913 0.4916
2.4 0.4918 0.4920 0.4922 0.4925 0.4927 0.4929 0.4931 0.4932 0.4934 0.4936
2.5 0.4938 0.4940 0.4941 0.4943 0.4945 0.4946 0.4948 0.4949 0.4951 0.4952
2.6 0.4953 0.4955 0.4956 0.4957 0.4959 0.4960 0.4961 0.4962 0.4963 0.4964
2.7 0.4965 0.4966 0.4967 0.4968 0.4969 0.4970 0.4971 0.4972 0.4973 0.4974
2.8 0.4974 0.4975 0.4976 0.4977 0.4977 0.4978 0.4979 0.4979 0.4980 0.4981
2.9 0.4981 0.4982 0.4982 0.4983 0.4984 0.4984 0.4985 0.4985 0.4986 0.4986
3.0 0.4987 0.4987 0.4987 0.4988 0.4988 0.4989 0.4989 0.4989 0.4990 0.4990
3.1 0.4990 0.4991 0.4991 0.4991 0.4992 0.4992 0.4992 0.4992 0.4993 0.4993
3.2 0.4993 0.4993 0.4994 0.4994 0.4994 0.4994 0.4994 0.4995 0.4995 0.4995
3.3 0.4995 0.4995 0.4995 0.4996 0.4996 0.4996 0.4996 0.4996 0.4996 0.4997
3.4 0.4997 0.4997 0.4997 0.4997 0.4997 0.4997 0.4997 0.4997 0.4997 0.4998
93
Table B. t-table with right tail probabilities
df\area 0.995 0.99 0.975 0.95 0.9 0.25 0.1 0.05 0.025 0.01 0.00
1 0.000 0.000 0.001 0.004 0.016 1.323 2.706 3.841 5.024 6.635 7.87
2 0.010 0.020 0.051 0.103 0.211 2.773 4.605 5.991 7.378 9.210 10.5
3 0.072 0.115 0.216 0.352 0.584 4.108 6.251 7.815 9.348 11.345 12.8
4 0.207 0.297 0.484 0.711 1.064 5.385 7.779 9.488 11.143 13.277 14.8
5 0.412 0.554 0.831 1.145 1.610 6.626 9.236 11.071 12.833 15.086 16.7
6 0.676 0.872 1.237 1.635 2.204 7.841 10.645 12.592 14.449 16.812 18.5
7 0.989 1.239 1.690 2.167 2.833 9.037 12.017 14.067 16.013 18.475 20.2
8 1.344 1.647 2.180 2.733 3.490 10.219 13.362 15.507 17.535 20.090 21.9
9 1.735 2.088 2.700 3.325 4.168 11.389 14.684 16.919 19.023 21.666 23.5
10 2.156 2.558 3.247 3.940 4.865 12.549 15.987 18.307 20.483 23.209 25.1
11 2.603 3.053 3.816 4.575 5.578 13.701 17.275 19.675 21.920 24.725 26.7
12 3.074 3.571 4.404 5.226 6.304 14.845 18.549 21.026 23.337 26.217 28.3
13 3.565 4.107 5.009 5.892 7.042 15.984 19.812 22.362 24.736 27.688 29.8
14 4.075 4.660 5.629 6.571 7.790 17.117 21.064 23.685 26.119 29.141 31.3
15 4.601 5.229 6.262 7.261 8.547 18.245 22.307 24.996 27.488 30.578 32.8
16 5.142 5.812 6.908 7.962 9.312 19.369 23.542 26.296 28.845 32.000 34.2
17 5.697 6.408 7.564 8.672 10.085 20.489 24.769 27.587 30.191 33.409 35.7
18 6.265 7.015 8.231 9.390 10.865 21.605 25.989 28.869 31.526 34.805 37.1
19 6.844 7.633 8.907 10.117 11.651 22.718 27.204 30.144 32.852 36.191 38.5
20 7.434 8.260 9.591 10.851 12.443 23.828 28.412 31.410 34.170 37.566 39.9
21 8.034 8.897 10.283 11.591 13.240 24.935 29.615 32.671 35.479 38.932 41.4
22 8.643 9.542 10.982 12.338 14.041 26.039 30.813 33.924 36.781 40.289 42.7
23 9.260 10.196 11.689 13.091 14.848 27.141 32.007 35.172 38.076 41.638 44.1
24 9.886 10.856 12.401 13.848 15.659 28.241 33.196 36.415 39.364 42.980 45.5
25 10.520 11.524 13.120 14.611 16.473 29.339 34.382 37.652 40.646 44.314 46.9
26 11.160 12.198 13.844 15.379 17.292 30.435 35.563 38.885 41.923 45.642 48.2
27 11.808 12.879 14.573 16.151 18.114 31.528 36.741 40.113 43.195 46.963 49.6
28 12.461 13.565 15.308 16.928 18.939 32.620 37.916 41.337 44.461 48.278 50.9
29 13.121 14.256 16.047 17.708 19.768 33.711 39.087 42.557 45.722 49.588 52.3
30 13.787 14.953 16.791 18.493 20.599 34.800 40.256 43.773 46.979 50.892 53.6
95