5db83ef1f71e482 PDF
5db83ef1f71e482 PDF
5db83ef1f71e482 PDF
DJM1C - STATISTICS
(From the academic year 2016-17)
UNIT IV : Sampling - Definition - Large samples. Small samples- Population with one samples
and population with two samples - Students - t - test - Applications - chi - square test and
goodness of fit - applications.
UNIT V : Index Numbers - Types of index numbers - Tests - Unit test commodity reversal test,
time reversal test, factor reversal test - Chain index numbers - cost of living index - Interpolation
- Finite differences operators - Newton‟s forward, backward interpolation formulae, Lagrange‟s
formula.
Books:
Definition:
If the two variable two variables deviate in the same direction the correlation is said to be
direct or positive.
Definition:
𝑐𝑜𝑣 (𝑥,𝑦)
Hence 𝛾𝑥𝑦 =
𝜍𝑥 𝜍𝑦
Example:
Height in c.m x
160 161 162 163 164
Weight in kgs y 50 53 54 56 57
Now 𝑥𝑖 − 𝑥 𝑦𝑖 − 𝑦 = −2 −4 + −1 −1 + 0 + 1 × 2 + 2 × 3 = 17
𝑥 𝑖 −𝑥 𝑦 𝑖 −𝑦 17 17× 12 17 ×3.46
∴ 𝛾𝑥𝑦 = = = = = 0.98
𝑛 𝜍𝑥 𝜍𝑦 5 2 6 60 60
𝑛 𝑥𝑖 𝑦𝑖 − 𝑥𝑖 𝑦𝑖
𝛾𝑥𝑦 = 1
𝑛 𝑥𝑖2 − 𝑥𝑖 2 2 𝑛 𝑦𝑖2 − 𝑦𝑖 2 1 2
Proof:
𝑥𝑖 − 𝑥 𝑦𝑖 − 𝑦
𝛾𝑥𝑦 = … … … … … … … … (1)
𝑛𝜍𝑥 𝜍𝑦
𝑥𝑖 − 𝑥 𝑦𝑖 − 𝑦 = 𝑥𝑖 𝑦𝑖 − 𝑥 𝑦𝑖 − 𝑦 𝑥𝑖 + 𝑛𝑥 𝑦
= 𝑥𝑖 𝑦𝑖 − 𝑥 𝑛𝑦 − 𝑦 𝑛𝑥 + 𝑛𝑥𝑦
= 𝑥𝑖 𝑦𝑖 − 𝑛𝑥𝑦
1
= 𝑥𝑖 𝑦𝑖 − (𝑛 ) 𝑥𝑖 𝑦𝑖
1
= 𝑛 [n 𝑥𝑖 𝑦𝑖 − 𝑥𝑖 𝑦𝑖 ] … … … … … … … . . (2)
Also,
1
𝜍𝑥2 = 𝑥𝑖 − 𝑥 2
𝑛
1
=𝑛 [ 𝑥𝑖2 − 2𝑥 𝑥𝑖 + 𝑛 𝑥 2 ]
1
=𝑛 [ 𝑥𝑖2 − 2𝑛 𝑥 2
+ 𝑛 𝑥 2]
1 1
=𝑛 𝑥𝑖2 − 𝑥𝑖 2
𝑛
1 2
= 𝑛 2 [𝑛 𝑥𝑖2 − ( 𝑥𝑖 )]
1 2 1 2
∴ 𝜍𝑥 = 𝑛 𝑥𝑖2 − 𝑥𝑖 … … … … … … … . (3)
𝑛
1 2 1 2
𝜍𝑦 = 𝑛 𝑦𝑖2 − 𝑦𝑖 … … … … … . (4)
𝑛
Theorem:
-1 ≤ 𝛾 ≤ 1
Proof:
𝑥 𝑖 −𝑥 𝑦 𝑖 −𝑦
𝛾𝑥𝑦 = 𝑛 𝜍𝑥 𝜍𝑦
1
𝑥 𝑖 −𝑥 𝑦 𝑖 −𝑦
𝑛
= 1 1
1 2 1 2
𝑥 𝑖 −𝑥 2 𝑦 𝑖 −𝑦 2
𝑛 𝑛
Let 𝑎𝑖 = 𝑥𝑖 − 𝑥 and 𝑏𝑖 = 𝑦𝑖 − 𝑦
2
2
𝑎𝑖 𝑏𝑖
∴ 𝛾𝑥𝑦 =
𝑎𝑖2 𝑏𝑖2
2 2 2
𝑎𝑖 𝑏𝑖 ≤ 𝑎𝑖 𝑏𝑖
2
Hence 𝛾𝑥𝑦 ≤1
∴ │𝛾𝑥𝑦 │ ≤ 1
∴ −1 ≤ 𝛾 ≤ 1
Note: 1
Note: 3
Note: 4
Problem: 1
Ten students obtained the following percentage of marks in the college internal
test (x) and in the final university examination (y). Find the correlation coefficient
between the marks of the two tests.
X 5 6 6 4 5 6 6 6 4 5
x 51 63 63 49 50 60 65 63 46 50
4 7 7 5 4 6 7 4 6 5
y 49 72 75 50 48 60 70 48 60 56
Solution:
Choosing the origin A = 63 for the variable x and B= 60 for y and taking 𝑢𝑖 =
𝑥𝑖 − 𝐴 and 𝑣𝑖 = 𝑦𝑖 − 𝐵.
𝒙𝒊 𝒖𝒊 𝒚𝒊 𝒗𝒊 𝒖𝟐𝒊 𝒗𝟐𝒊 𝒖𝒊 𝒗𝒊
51 -12 49 -11 144 121 132
63 0 72 12 0 144 0
63 0 75 15 0 225 0
49 -14 50 -10 196 100 140
50 -13 48 -12 169 144 156
60 -3 60 0 9 0 0
65 2 70 10 4 100 20
63 0 48 -12 0 144 0
46 -17 60 0 289 0 0
50 -13 56 -4 169 16 52
Total -70 - -12 980 994 500
𝛾𝑥𝑦 = 𝛾𝑢𝑣
𝑛 𝑢 𝑖 𝑣𝑖 − 𝑢𝑖 𝑣𝑖
= 1 1
𝑛 𝑢 𝑖2 − 𝑢𝑖 2 2 𝑛 𝑣𝑖2 − 𝑣𝑖 2 2
4160
= 70×98.97
=0.6
Problem: 2
If x and y are two variable. Prove that the correlation coefficient between 𝑎𝑥 + 𝑏 and
𝑐𝑦 + 𝑑 is
Proof:
Let 𝑢 = 𝑎𝑥 + 𝑏and 𝑣 = 𝑐𝑦 + 𝑑
∴ 𝑢 = 𝑎𝑥 + 𝑏 𝑎𝑛𝑑 𝑣 = 𝑐𝑦 + 𝑑
1
𝜍𝑢2 = 𝑢−𝑢 2
𝑛
𝑎2
= 𝑥𝑖 − 𝑥 2
𝑛
=𝑎2 𝜍𝑥2
Similarly,
𝜍𝑣2 = 𝑐 2 𝜍𝑦2
Now,
𝑢 −𝑢 (𝑣−𝑣 )
𝛾𝑢𝑣 =
𝑛𝜍𝑢 𝜍𝑣
𝑎 𝑥−𝑥 𝑐(𝑦 −𝑦 )
= 𝑛 𝑎𝑐 𝜍𝑥 𝜍𝑦
𝑎𝑐
= 𝛾𝑥𝑦 .
𝑎𝑐
Problem: 3
Corrected 𝑥 = 300 − 18 − 12 + 10 + 20
= 300
Corrected 𝑦 = 210 − 20 − 10 + 15 + 15
= 210
= 3750
= 1950
𝑛 𝑥𝑦 − 𝑥 𝑦
𝛾𝑥𝑦 =
𝑛 𝑥2 − 𝑥 2 1 2 𝑛 𝑦2 − 𝑦 2 1 2
62100 −63000
= 1 1
112500 −9000 2 58500 −44100 2
−900
= 1
(22500 )2 (14400 )1/2
900
=-
150 × 120
1
=-
20
= - 0.05
Solution:
𝐺𝑖𝑣𝑒𝑛 𝜍𝑥 = 𝜍𝑦 = 𝜍𝑧 = 𝜍 .
Let 𝑢 = 𝑥 + 𝑦 𝑎𝑛𝑑 𝑣 = 𝑦 + 𝑧
Now ,
1
𝜍𝑢2 = (𝑢 − 𝑢)2
𝑛
1 2
=𝑛 𝑥 − 𝑥 + (𝑦 − 𝑦)
1
=𝑛 [ (𝑥 − 𝑥 )2 + (𝑦 − 𝑦)2 + 2 𝑥−𝑥 𝑦−𝑦 ]
= 2𝜍 2
Similarly,
𝜍𝑣2 = 2𝜍 2
= 0+ n𝜍𝑦2 + 0 + 0
= n𝜍 2
= ½.
Problem: 5
Show that the variable 𝑢 = 𝑥𝑐𝑜𝑠𝛼 + 𝑦 𝑠𝑖𝑛 𝛼 𝑎𝑛𝑑 𝑣 = 𝑦𝑐𝑜𝑠𝛼 − 𝑥𝑠𝑖𝑛𝛼 𝑎𝑟𝑒 𝑢𝑛𝑐𝑜𝑟𝑟𝑒𝑙𝑎𝑡𝑒𝑑. If
1 2𝛾𝑥𝑦 𝜍 𝑥 𝜍 𝑦
𝛼 = 2 tan−1 𝜍𝑥 2 −𝜍𝑦 2
Solution:
1
∴ 𝛾𝑥𝑦 𝜍𝑥 𝜍𝑦 cos 2𝛼 = 𝑠𝑖𝑛2𝛼 𝜍𝑥2 − 𝜍𝑦2 .
2
2𝛾𝑥𝑦 𝜍𝑥 𝜍𝑦
∴ 𝑡𝑎𝑛2𝛼 =
𝜍𝑥2 − 𝜍𝑦2
1 2𝛾𝑥𝑦 𝜍𝑥 𝜍𝑦
∴ 𝛼= tan−1
2 𝜍𝑥2 − 𝜍𝑦2
Theorem:
6 𝑥−𝑦 2
𝜌= 1- .
𝑛(𝑛 2 −1)
Proof:
let 𝑥𝑖 and 𝑦𝑖 be the ranks of the 𝑖 𝑡 individual in the two different rankings.
1 1
∴ 𝑥 = 2 𝑛 + 1 = 𝑦 and 𝜍𝑥2 = 12 𝑛2 − 1 = 𝜍𝑦2 .
2 2
Now, 𝑥−𝑦 = 𝑥−𝑥 − 𝑦−𝑦 (𝑠𝑖𝑛𝑐𝑒 𝑥 = 𝑦)
2 2
= 𝑥−𝑥 + 𝑦−𝑦 −2 𝑥 − 𝑥 (𝑦 − 𝑦)
6 𝑥−𝑦 2
1−𝜌= 𝑛(𝑛 2 −1)
6 (𝑥−𝑦)2
𝜌= 1- .
𝑛(𝑛 2 −1)
Problem: 6
Find the rank correlation coefficient between the height in cm and weight in
kg of 6 soldiers in Indian Army.
Solution:
6 𝑥−𝑦 2 6×16
𝜌 =1− = 1- 6×35
𝑛(𝑛 2 −1)
= 1-0.457
= 0.543.
Problem: 7
Physics 3 5 5 6 4 3 4 5 1 2
(P) 35 56 50 65 44 38 44 50 15 26
Chemistr 5 3 7 2 3 5 7 6 5 3
y (Q) 50 35 70 25 35 58 75 60 55 35
Solution:
We rank the marks of physics and chemistry and we have the following table.
Now,
6 𝑥−𝑦 2
𝜌= 1- 𝑛(𝑛 2 −1)
6×188
= 1 - 10×99
1128
= 1- 990
= 1-1.139
= -0.139.
Problem: 8
Judge Mr.x 1 2 3 4 5 6 7 8
Judge Mr.y 3 2 1 5 4 7 6 8
Judge Mr.y 1 2 3 4 5 7 8 6
g y z
𝟐 𝟐 𝟐
x y Z x-y 𝒙−𝒚 y-z 𝒚−𝒛 z-x 𝒛−𝒙
- 2 0
1 3 1 -2 4 2 4 0 0
0 0 0
2 2 2 0 0 0 0 0 0
y - -
3 1 3 3 9 -2 4 -1 1
- 1 1
4 5 4 -2 4 1 1 1 1
3 - -
5 4 5 3 9 -1 1 -2 4
- 0 2
6 7 7 -1 1 0 0 1 1
- - 3
7 6 8 -1 1 -2 4 3 9
0 2 -
8 8 6 0 0 2 4 -2 4
- - -
Total - 28 - 18 - 20
6 𝑋−𝑦 2
𝜌𝑥𝑦 = 1- 𝑛 (𝑛 2 −1)
6× 28
= 1- 8× (82 −1)
168
= 1- 504
=1-0.333= 0.667.
6 ×18
𝜌𝑦𝑧 = 1- 8×63
108
= 1- 504
= 1-o.214
= 0.786.
6×20
𝜌𝑧𝑥 = 1- 8×63
= 1-0.238
= 0.762.
Since 𝜌𝑦𝑧 is greater than 𝜌𝑥𝑦 and𝜌𝑥𝑧 the judges Mr. Y and Mr. Z have nearest
approach to common taste in beauty.
Problem: 9
Solution:
6 (𝑥−𝑦 )2
𝜌𝑥𝑦 = 1- .
𝑛 𝑛 2 −1
6 𝑥−𝑦 2
0.8 = 1- 10(10 2 −1)
6 𝑥−𝑦 2
= 1- 990
6 𝑥 −𝑦 2
= 1-0.8 = 0.2
990
= 198
2
∴ 𝑥−𝑦 = 33
2
Corrected 𝑥−𝑦 = 33 − 52 + 82 = 72
6×72
Now, after correction 𝜌𝑥𝑦 = 1- 10(102 −1)
432
= 1- 990
= 1-0.486
Exercises:
Economics 7 6 3 9 2 7 8 9 6 3
78 65 36 98 25 75 82 90 62 39
Statistics 8 5 5 9 6 6 6 8 5 4
84 53 51 91 60 68 62 86 58 47
2. The following table shows how 10 students were ranked according to their
achievements in the laboratory and lecture portions of a biology course. Find the coefficient
of rank correlation.
Laboratory 8 3 9 2 7 1 4 6 1 5
8 3 9 2 7 10 4 6 1 5
Lecture 9 5 1 1 8 7 3 4 2 6
9 5 10 1 8 7 3 4 2 6
REGRESSION:
Definition:
It we fit a straight line by the principle of least squares to the points of the
scatter diagram in such a way that the sum of the squares of the distance parallel to the y-
axis from the points to the line is minimized we obtain a line of best fit for the data and its is
called the regression line of y and x.
Theorem:
𝜍
The equation of the regression line of 𝑦 𝑜𝑛 𝑥 is given by 𝑦 − 𝑦 = 𝛾 𝜍𝑥 (𝑥 − 𝑥)
𝑦
⇒ 𝑥𝑖 𝑦𝑖 = 𝑎 𝑥𝑖2 + 𝑏 𝑥𝑖 ………(1)
𝜕𝑠
= 0 ⇒ −2 [ 𝑦𝑖 − 𝑎𝑥𝑖 + 𝑏 =0
𝜕𝑏
⇒ 𝑦𝑖 = 𝑎 𝑥𝑖 + 𝑛𝑏 … … … … … (2)
Now, shifting the origin to this point (𝑥, 𝑦) by means of the transformation
𝑋𝑖 = 𝑥𝑖 – 𝑥 and 𝑌𝑖 = 𝑦𝑖 − 𝑦.
𝑦 = 𝑎𝑥 ….………(4)
Corresponding to this 𝑙𝑖𝑛𝑒 𝑦 = 𝑎𝑥 the constant 𝑎 can be determined from the normal
equation .
a 𝑋𝑖2 = 𝑥𝑖 𝑦𝑖
𝑋 𝑖 𝑌𝑖
a= 𝑋𝑖2
𝑥 𝑖 −𝑥 (𝑦 𝑖 −𝑦 )
= 𝑥 𝑖 −𝑥 2
𝛾𝜍𝑥 𝜍𝑦
= 𝜍𝑥2
𝜍
= 𝛾 𝜍𝑥
𝑦
𝜍𝑦
The required regression line (4) becomes 𝑌 = 𝛾 𝜍 𝑋
𝑥
Definition:
Theorem:
Proof:
⇔ 𝑏𝑦𝑥 + 𝑏𝑥𝑦 ≥ 2𝛾
𝜍𝑦 𝜍𝑥
⇔𝛾 + 𝛾 ≥ 2𝛾
𝜍𝑥 𝜍𝑦
Theorem:
𝑘𝜍𝑣
= 𝛾𝑢𝑣 𝜍𝑢
𝑘
= 𝑏𝑢𝑣 … … … … … … . 1
similarly 𝑏𝑥𝑦 = ( 𝑘 ) 𝑏𝑢𝑣 ……………(2)
From (1) and (2) ⇒ 𝑏𝑦𝑥 and 𝑏𝑥𝑦 depend upon the scales and 𝑘, but not on the
origins A and B .
Theorem:
1−𝛾 2 𝜍𝑥 𝜍𝑦
The angle between two regression line is given by 𝜃 = tan−1 𝛾 𝜍𝑥2 +𝜍𝑦2
Proof:
𝜍𝑥
𝑥−𝑥 = 𝛾 𝑦 − 𝑦 … … … … . (2)
𝜍𝑦
𝜍𝑦 𝜍𝑦
Slopes of the two lines (1) and (2) are 𝛾 and𝛾𝜍 .
𝑥 𝑥
𝛾 2 −1 𝜍𝑥 𝜍𝑦
= 𝛾 𝜍𝑥2 +𝜍𝑦2
1−𝛾 2 𝜍𝑥 𝜍𝑦
= 𝛾 𝜍𝑥2 +𝜍𝑦2
1 − 𝛾2 𝜍𝑥 𝜍𝑦
∴ 𝜃 = tan−1
𝛾 𝜍𝑥2 + 𝜍𝑦2
Problem: 10
The following data relate to the marks of 10 students in the internal test and the
university examination for the maximum of 50 in each.
Internal 2 2 3 3 3 3 3 3 4 4
marks 25 28 30 32 35 36 38 39 42 45
University 2 2 2 3 2 1 2 3 3 3
marks 20 26 29 30 25 18 26 35 35 46
ii) The most likely internal mark for the university mark of 25.
iii) the most likely university mark for the internal mark of 30.
Solution:
(i) Let the marks of internal test and university examination be denoted by x and
y respectively.
1 1
We have 𝑥 = 𝑥𝑖 = 35 𝑎𝑛𝑑 𝑦 = 𝑦𝑖 = 29.
10 10
For the calculation of regression we have the following table.
𝑦 𝑖 −𝑦 2 1
𝜍𝑦2 = = 10 𝑦𝑖 − 29 2
= 59.8
𝑛
𝑥𝑖 − 𝑥 (𝑦𝑖 − 𝑦)
∴𝛾=
𝑛𝜍𝑥 𝜍𝑦
324
= 10×5.98 × 7.73
324
=
462.254
= 0.7 (approximately)
𝜍𝑦
Now the regression of y on x is 𝑦 − 𝑦 = 𝛾 𝜍 (𝑥 − 𝑥)
𝑥
𝜍𝑦 𝑥 𝑖 −𝑥 (𝑦 𝑖 −𝑦 )
∴𝛾 =
𝜍𝑥 𝑛𝜍𝑥2
324
= 358 = 0.905
𝜍 𝑥 𝑖 −𝑥 (𝑦 𝑖 −𝑦 ) 324
Similarly, 𝛾 𝜍𝑥 = = 598 = 0.542
𝑦 𝑦 𝑖 −𝑦 2
ii) the most likely internal mark for the university mark of 25 is got from the regression
equation 𝑜𝑓 𝑥 𝑜𝑛 𝑦 by putting 𝑦 = 25
iii) The most likely university mark for the internal mark of 30 is got from the regression
equation 𝑜𝑓 𝑦 𝑜𝑛 𝑥 by putting 𝑥 = 30
Problem: 11
The two variable x and y have the regression lines 3x+2y-26 = 0 and 6x+y-
31=0. Find
Solution:
we have 3𝑥+𝑦=26……(1)
6𝑥+𝑦=31……(2)
−3
Hence we get the regression coefficients 𝑎𝑠 𝑏𝑦𝑥 = 2
Now,
−3 −1
𝛾2 = × (6)=¼
2
1
𝛾= ±
2
1
Since both the regression coefficients are negative we take 𝛾 = − 2
iii) Given 𝜍𝑥 = 5
𝜍𝑦
We have 𝑏𝑦𝑥 = 𝛾 𝜍𝑥
−3 −1 𝜍𝑦
∴ =
2 2 5
𝜍𝑦 = 15 .
Problem: 12
Solution:
We know that if 𝜃 is the acute angle between the two regression on lines
we have,
1−𝛾 2 𝜍𝑥 𝜍𝑦
tan 𝜃 = ………..(1)
𝛾 𝜍𝑥2 +𝜍𝑦2
1−𝛾 2 1
(1)⇒ tan 𝜃 ≤ 𝛾 2
1 − 𝛾2
∴ tan 𝜃 ≤
2𝛾
1−𝛾 2
Hence sin 𝜃 ≤ 1+𝛾 2
sin 𝜃 ≤ 1 − 𝛾 2 .
Exercise:
1.calculate the coefficient of correlation of correlation and obtain the lines of regression
for the following data.
X 1 2 3 4 5 6 7 8 9
Y 9 8 10 12 11 13 14 16 15
cos
(𝑥,𝑦)
The correlation coefficient between x and y is given by 𝛾𝑥𝑦 = 𝜍𝑥 𝜍𝑦
𝑛 𝑚 1 𝑛 𝑚
𝑖=1 𝑗 =1 𝑓 𝑖𝑗 𝑥 𝑖 𝑦 𝑗 −𝑁 𝑖=1 𝑔 𝑖 𝑥 𝑖 𝑗 =1 𝑓 𝑗 𝑦 𝑗
∴ 𝛾𝑥𝑦 = 2
𝑛 𝑔 𝑥2−1 𝑛 𝑔 𝑥 2× 𝑚 𝑓 𝑦 2−1 𝑚 𝑓 𝑦
𝑖=1 𝑖 𝑖 𝑁 𝑖=1 𝑖 𝑖 𝑗 =1 𝑗 𝑗 𝑁 𝑗 =1 𝑗 𝑗
Note: Since correlation coefficient is independent of origin and scale if x and y are
𝑥−𝐴 𝑦 −𝐵
transformed to u and v by the formula u =
and v = 𝑘
then we 𝑎𝑣𝑒 𝛾𝑥𝑦 = 𝛾𝑢𝑣 .
Find the correlation coefficient between x and y from the following table:
x 5 10 15 20
y
4 2 4 5 4
6 5 3 6 2
8 3 8 2 3
Solution:
𝑿 x1 x2 x3 x4 Total
𝒀 5 10 15 20
y1 4 2 4 5 4 F1=15
y2 6 5 3 6 2 F2=16
y3 8 3 8 2 3 F3=16
Total g1=10 g2=15 g3=13 g4=9 N=47
𝑓𝑗 𝑦𝑗 = 60 + 96 + 128 = 284
1
3410 − (575×284)
47
𝛾𝑥𝑦 = 1 1
8275 − 575 2 × 1840 − 284 2
47 47
3410 × 47−(575×284)
=
8275 ×47−575 2 × 1840 ×47−284 2
160270 −163300
= 388925 −330625 × 86480 −80656
−3030 −3030
= = 241.5 ×76.3
58300 × 5824
−3030
= 18426 .5
= -0.16
Problem: 14
Find the correlation coefficient between the heights and weight of 100 students which
are distributed as follows.
Let 𝑥𝑖 denote the mid value of the classes of weights and 𝑦𝑗 denote the mid value
of the classes of heights.
𝑥𝑖 45
35 55 65 75 𝑓𝑗 𝑣𝑗 𝑓𝑗 𝑣𝑗 𝑓𝑗 𝑣𝑗2 𝑓𝑖𝑗 𝑢𝑖 𝑣𝑗
𝑦 𝑗
𝑔𝑖
4 15 37 28 16 10 - 57 133 (31)
0
𝑢𝑖
-2 -1 0 1 2 -
𝑔𝑖 𝑢𝑖
-8 -15 0 28 32 37
𝑔𝑖 𝑢𝑖2
16 15 0 28 64 12
3
𝑓𝑖𝑗 𝑢𝑖 𝑣𝑗
(0) (-8) (0) (17) (22) (3
1)
3100 − 37 × 57
=
12300 − 372 × 13300 − 572
991
=
104.5 × 100.25
= 0.09.
PROBABILITY
Introduction:
In this chapter we develop the mathematical theory of probability and introduce the
concept of random variables which form the basis for various types of theoretical distributions.
Definition:
Each experiment ends with an outcome. For example, a research student in “statistics”
when undertaking a pre election sample survey.
An experiment is called a random experiment if, when repeated under the same
conditions, it is such that the outcome cannot be predicted with certainty but all possible
outcomes can be determined prior to the performance of the experiment.
Each performance of the random experiment is called trail. The collection of all possible
outcomes of a random experiment is called the sample space S. The elements of sample space
are called sample points.
Example:
When two cons are tossed at a time the outcome is an ordered pair (H,H) or (H, T) or
(T,H) or (T,T). Hence for the random experiment of tossing two coins, sample space S={(H,H),
(H,T), (T,H), (T,T)}.
Definition:
Definition:
Example:
When two cons are tossed at a time the outcome is an ordered pair (H,H) or
(H, T) or (T,H) or (T,T). Hence for the random experiment of tossing two coins,
sample space S={(H,H), (H,T), (T,H), (T,T)}.
Definition:
The event S is called a sure event and the event 𝜑 is called an impossible event.
Definition:
Proof:
Let 𝐴 ⊆ 𝐵.
of S.
Hence 𝑃 𝐵 ≥ (𝑃𝐴)
Corollary:
Proof:
∅ ⊂A⊆S
Theorem:
Now, B = (A∩B) ∪(𝐴 ∩B) and A∩B and 𝐴 ∩B are disjoint sets.
Example:
A∩B = {(H,H)}
1 1 3
P(A) = 2 , P(B) = 2, P(A∪B) = 4
1
And P(A∩ 𝐵) = 4
2+2−1 3
= =4
4
Hence it is verified.
Let S= {(i,j) /i,j ∈N, 1≤i≤ 6, 1 ≤j≤6} be the sample space of the random experiment of
throwing two dice. We assign the uniform probably of 1/36 to each of the 36 sample
points in the sample space S.
A∩B = ∅, P(A∩ 𝐵) = 0
P(A∪ 𝐵) = 𝑃 𝐴 + 𝑃 𝐵 − 𝑃(𝐴 ∩ 𝐵)
6 3 9 1
= 36 + 36 = 36 = 4
Definition:
Let S= {(i,j) / i, j ∈ N, 1≤ 𝑖 ≤ 6, 1 ≤ 𝑗 ≤ 6}
Clearly A∩B = ∅ and A∪B = S Hence A and B are mutually exclusive and exchaustive
events.
Theorem:
Proof:
(𝐴 ∩ 𝐵) ∪ 𝐴 ∩ 𝐵 = 𝐵
Remark:
Theorem:
i).When B⊂A, B and (A∩ 𝐵 ) are mutually exclusive events and their union
is A.
P(A∩ 𝐵 )≥ 0 ⇒ 𝑃 𝐴 − 𝑃(𝐵) ≥ 0.
𝑃 𝐵 ≤𝑃 𝐴 .
Corollary:
Statement:
If A and B are any two events (subsets of a sample space S) and are not disjoint
then, P(A∪B) = P(A) + P(B) – P(A∩B)
Proof:
We have, A∪ 𝐵 = 𝐴 ∪ 𝐴 ∩ 𝐵
Proof:
We have,
for all 1 ≤ 𝑖 ≤ 𝑗 ≤ 𝑟.
(Distributive Law)
𝑟 𝑟−1
= 𝑖=1 𝑃 𝐴𝑖 − 𝑃 𝐴𝑖 ∩ 𝐴𝑗 +…+ −1 P(𝐴1 ∩ 𝐴2 ∩ … ∩ 𝐴𝑟 ) + 𝑃 𝐴𝑟+1 −
𝑟
𝑃 𝑖=1 𝐴𝑖 ∩ 𝐴𝑟+1
For all 1≤ 𝑖 ≤ 𝑗 ≤ 𝑟
𝑟+1 𝑟−1
= 𝑖=1 𝑃 𝐴𝑖 - 𝑃 𝐴𝑖 ∩ 𝐴𝑗 + ⋯ + −1 P
𝑟
(𝐴1 ∩ 𝐴2 ∩ … ∩ 𝐴𝑟 ) − 𝑖=1 𝑃 𝐴𝑖 ∩ 𝐴𝑟+1 − 𝑃 𝐴𝑖 ∩ 𝐴𝑗 ∩ 𝐴𝑟+1
+ −1 𝑟−1 P (𝐴1 ∩ 𝐴2 ∩ … ∩ 𝐴𝑟 ∩ 𝐴𝑟+1 )]
For all 1≤ 𝑖 ≤ 𝑗 ≤ 𝑟
𝑟+1 𝑟
= 𝑖=1 𝑃 𝐴𝑖 − 𝑃 𝐴𝑖 ∩ 𝐴𝑗 + 𝑖=1 𝑃 𝐴𝑖 ∩ 𝐴𝑟+1 + −1 𝑟 P (𝐴1 ∩ 𝐴2 ∩
… ∩ 𝐴𝑟 ∩ 𝐴𝑟+1 )
For all 1≤ 𝑖 ≤ 𝑗 ≤ 𝑟
𝑟+1
= 𝑖=1 𝑃 𝐴𝑖 − 𝑃 𝐴𝑖 ∩ 𝐴𝑗 + ⋯ + −1 𝑟 P (𝐴1 ∩ 𝐴2 ∩ … ∩ 𝐴𝑟 ∩ 𝐴𝑟+1 )
For all 1≤ 𝑖 ≤ 𝑗 ≤ 𝑟 + 1
Hence by the principle of mathematical induction, it is true for all positive integral values of n.
≥ 𝑃 𝐴𝑖 − 𝑟 − 1 + 𝑃 𝐴𝑟+1 − 1
𝑖=1
𝑟+1 𝑟+1
⇒P( 𝑖=1 𝐴𝑖) ≥ 𝑖=1 𝑃 𝐴𝑖 − 𝑟
We get
= 1-P(𝐴1 ∪ 𝐴2 ∪ … ∪ 𝐴𝑛 )
= P(𝐴1 ∪ 𝐴2 ∪ … ∪ 𝐴𝑛 )
Theorem:
Proof:
We have,
Now,
𝑟+1 𝑟
P( 𝑖=1 𝐴𝑖 ) =P[ 𝑖=1 𝐴𝑖 ∪ 𝐴𝑟+1 ]
𝑟 𝑟
= P( 𝑖=1 𝐴𝑖 + 𝑃 𝐴𝑟+1 − 𝑃 [( 𝑖=1 𝐴𝑖 ) ∩ 𝐴𝑟+1 )]
𝑟 𝑟
= P( 𝑖=1 𝐴𝑖 + 𝑃 𝐴𝑟+1 − 𝑃 [( 𝑖=1(𝐴𝑖 ∩ 𝐴𝑟+1 )]
𝑟 𝑟
≥ 𝑖=1 𝑃 𝐴𝑖 − 1≤𝑖≤𝑗 <𝑟 𝑃 𝐴𝑖 ∩ 𝐴𝑗 + 𝑃 𝐴𝑟+1 − 𝑃 𝑖=1 𝐴𝑖 ∩ 𝐴𝑟+1
Hence, if the theorem is true for n=r, it is also true for n=r+1
Theorem:
= P(B).P(A│B), P(B)>0
Proof:
Suppose the sample space contains N occurrences of which 𝑛𝐴 occurrences belong to the
event A and 𝑛𝐵 occurrences belong to the event B.
Let 𝑛𝐴𝐵 be the number of occurrences favorable to the compound event A∩B then, the
unconditional probabilities are given by
𝑛𝐴 𝑛𝐵 𝑛 𝐴𝐵
P(A) = , P(B) = and P(𝐴 ∩ 𝐵) =
𝑁 𝑁 𝑁
Now, the conditional probability P(A│B) refers to the sample space of 𝑛𝐵 occurances,
out of which 𝑛𝐴𝐵 occurrences pertain to the occurrence of A, when B has already
happened.
𝑛 𝐴𝐵
P(A│B) = 𝑛𝐵
𝑛 𝐴𝐵
Similarly P(B│A) = 𝑛𝐴
n AB
Now, P(A ∩ 𝐵) = 𝑁
𝑛 𝐴𝐵 nA
= .
𝑛𝐴 𝑁
= P(B│A) P(A)
n AB 𝑛 nB
And P(A∩B) = = 𝑛𝐴𝐵 .
𝑁 𝐵 𝑁
= P(A│B) P(B)
Thus the conditional probabilities P(B│A) and P(A│B) are defined iff P(A) ≠0
and P(B) ≠0 respectively.
Theorem:
Where,
Proof:
P(A1 ∩ A2 ∩ 𝐴3 ) = 𝑃(A1 ∩ 𝐴2 ∩ 𝐴3 )
= P(A1) P(A2∩A3│A1)
= P(A1) P(A2│A1)P(A3│A1 ∩ 𝐴2 )
So that,
Now,
Theorem:
For any three events A,B,C;P(A∪ 𝐵│𝐶) = 𝑃(𝐴│𝐶) + 𝑃(𝐵│𝐶) − 𝑃(𝐴 ∩ 𝐵│𝐶)
Proof:
We have
𝑃[(𝐴∪𝐵)∩𝐶]
⇒ = P(A│C) + P(B│C) – P(A∩ 𝐵)│𝐶)
𝑃(𝐶)
Theorem:
Proof:
𝑃(𝐴∩𝐵 ∩𝐶) 𝑃(𝐴∩𝐵∩𝐶)
P(A∩ 𝐵 │𝐶) + 𝑃 𝐴 ∩ 𝐵│𝐶 = +
𝑃(𝐶) 𝑃(𝐶)
𝑃 𝐴∩𝐵 ∩𝐶 + 𝑃(𝐴∩𝐵∩𝐶)
= 𝑃(𝐶)
𝑃 (𝐴∩𝐶)
= = P(A│𝐶)
𝑃(𝐶)
Theorem:
𝑃 𝑛 𝐴𝑛 𝐵
= 𝑃(𝐵)
𝑛 𝑃 (𝐴𝑛 𝐵) 𝑃 𝐴𝑛 𝐵
= = 𝑛
𝑃(𝐵) 𝑃 𝐵
= 𝑛 𝑃 𝐴𝑛 │𝐵
Theorem:
For any three events A,B and C defined on the sample space S such that B⊂C and
P(A>0),P(B│A) ≤P(C│A)
Proof:
𝑃(𝐶∩𝐴)
P(C│A) = 𝑃(𝐴)
𝑃[ 𝐵∩𝐶∩𝐴 ∪ 𝐵 ∩𝐶∩𝐴 ]
= 𝑃(𝐴)
⇒ P(C│A)≥P(B│A)
Theorem:
If A and B are independent events then A and 𝐵 are also independent events
WE have P(A ∩ 𝐵 ) = 𝑃 𝐴 − (𝐴 ∩ 𝐵)
= P(A) [1-P(B)]
=P(A) P(B)
Theorem:
If A and B are independent events then 𝐴 and 𝐵 are also independent events
Proof:
=1-P(A∪ 𝐵)
= [1-P(B)] [1-P(A)]
= [1-P(A)] [1-P(B)]
= P(𝐴) P(𝐵)
Theorem:
Proof:
= P(A∩ 𝐶) + 𝑃 𝐵 ∩ 𝐶 − 𝑃(𝐴 ∩ 𝐵 ∩ 𝐶)
= P(C) P(A∪ 𝐵) = 𝑅. 𝐻. 𝑆
Theorem:
Prove that if A,B and C are random events in a sample space and if A,B,C are pair wise
independent and A is independent of
(B∪ 𝐶), 𝑡𝑒𝑛 𝐴, 𝐵 𝑎𝑛𝑑 𝐶 𝑎𝑟𝑒 𝑚𝑢𝑡𝑢𝑎𝑙𝑙𝑦 𝑖𝑛𝑑𝑒𝑝𝑒𝑛𝑑𝑒𝑛𝑡
Proof:
We are given,
P(A∩ 𝐵) = 𝑃 𝐴 𝑃(𝐵)
𝑃 𝐵∩𝐶 = 𝑃 𝐵 𝑃 𝐶 (1)
𝑃 𝐴∩𝐶 = 𝑃 𝐴 𝑃 𝐶
Now,
= P(A∩ 𝐵) + 𝑃 𝐴 ∩ 𝐶 − 𝑃[ 𝐴 ∩ 𝐵 ∩ 𝐴 ∩ 𝐶 ]
And
P(A)P(B∪ 𝐶) = 𝑃 𝐴 [𝑃 𝐵 + 𝑃 𝐶 − 𝑃 𝐵 ∩ 𝐶 ]
P(A∩ 𝐵 ∩ 𝐶) = 𝑃 𝐴 𝑃(𝐵 ∩ 𝐶)
Theorem:
Proof:
We have
A = (A∩ 𝐵 ) ∪ (𝐴 ∩ 𝐵)
= P(A∩ 𝐵 ) + 𝑃(𝐴 ∩ 𝐵)
But P 𝐴 ∩ 𝐵 ≥ 0
∴ 𝑃(𝐴) ≥ 𝑃(𝐴 ∩ 𝐵)
Similarly P(B) ≥ 𝑃 𝐴 ∩ 𝐵
⇒ P(B) – P(A∩ 𝐵) ≥= 𝑜
Now P(A∪ 𝐵) = 𝑃 𝐴 + [𝑃 𝐵 − 𝑃 𝐴 ∩ 𝐵 ]
P(A∪ 𝐵) ≥ 𝑃(𝐴)
⇒ 𝑃 𝐴 ≤ 𝑃(𝐴 ∪ 𝐵)
P(A∩ 𝐵) ≤ 𝑃 𝐴 ≤ 𝑃 𝐴 ∪ 𝐵 ≤ 𝑃 𝐴 + 𝑃(𝐵)
Example:
Two dice, one green and the other red, are thrown. Let A be the event that the sum of the
points on the faces shown is odd and B the event of at least one ace (number „1‟)
a. Describe the
ii) events A,B, 𝐵, A∩B, A∪B, and A∩ 𝐵 and find their probabilities assuming
all the 36 saple points have equal probabilities.
Solution:
for example, the ordered pair (4,5) refers to the elementary event that the green
die shows 4 and the red die shows 5.
A= the event that the sum of the numbers shown by the two dice is odd.
= {(1,2); (2,1); (1,4); (2,3); (3,2); (4,1); (1,6); (2,5); (3,4); (4,3); (5,2); (6,1); (3,6);
(4,5); (5,4); (6,3); (5,6); (6,5)}
Therefore,
𝑛 (𝐴) 18
P(A) = 𝑛 (𝑆) = 36
therefore
𝑛 (𝐵)
P(B) = 𝑛 (𝑆)
11
= 36
= {(2,2); (2,3); (2,4); (2.5); (2,6); (3,2); (3,3); (3,4); (3,5); (3,6); (4,2); (4,3);
(4,4); (4,5); (4,6); (5,2); (5,3); (5,4); (5,5); (5,6); (6,2); (6,3); (6,4); (6,5); (6,6)}
therefore
𝑛 (𝐵 )
P(𝐵) = 𝑛 (𝑆)
25
=36
A∩B = the event that sum is odd and atleast one face is an ace.
A∪B = {(1,2); (2,1); (1,4); (2,3); (3,2); (4,1); (1,6), (2,5); (3,4); (4,3); (5,2); (6,1); (3,6);
(4,5); (5,4); (6,3); (5,6); (6,5); (1,1); (1,3); (1,5); (1,5) (3,1), (5,1)}
𝑛(𝐴∪𝐵) 23
∴ 𝑃(𝐴 ∪ 𝐵) = = 36
𝑛(𝑠)
A∩ 𝐵 = {(2,3); (2,5); (3,2); (3,4); (3,6); (4,1); (4,5); (5,2); (5,4) (5,6), (6,3) (6,5)}
𝑛(𝐴 ∩ 𝐵 )
𝑃 𝐴∩𝐵 =
𝑛(𝑆)
12
= 36
1
= 3
b. i. P (𝐴 ∪ 𝐵 ) = 𝑃 𝐴 ∩ 𝐵
= 1-P (𝐴 ∩ 𝐵)
= 1-P(A∪ 𝐵)
13
= 36
iii. 𝑃 𝐴 ∩ 𝐵 = 𝑃 𝐴 − 𝑃(𝐴 ∩ 𝐵)
18 6
= 36 - 36
12
= 36
1
=3
5
= 36
V) P(A∩ 𝐵 ) = 1 − 𝑃(𝐴 ∩ 𝐵)
1
= 1- 6
5
=6
2
= 3
vii) P( A ∪ B) = 1 − 𝑃(𝐴 ∪ 𝐵)
23 13
= 1-36 = 36
= P (𝐴 ∩ 𝐵) [ A∩ 𝐴 = ∅]
5
= 36
= P(A) + P(𝐴 ∩ 𝐵)
18 5
= +
36 36
23
= 36
𝑃(𝐴∩𝐵)
x. P(A│B) = 𝑃(𝐵)
6/36
= 11/36
6
= 11
𝑃(𝐴∩𝐵)
P(B│A) = 𝑃(𝐴)
6/36
= 18/36
6
= 18
1
=3
𝑃(𝐴 ∩𝐵)
xi. P(𝐴│B) = 𝑃(𝐵)
13/36
= 25/36
13
=25
𝑃(𝐴 ∩𝐵 )
P (𝐵/𝐴) = 𝑃(𝐴 )
Example:
If two dice are thrown, what is the probability that the sum is a) greater than 8 and b) neither 7
nor 11?
Solution:
a.) If S denotes the sum on the two dice then we want P(S>8)
i. S=9, (ii) S=10, iii) S=11 iv) S=12
n(S) = 36
4 3 2 1 10 5
P(S>8) = 36 + 36 + 36 + 36 = 36 = 18
b. Let A denotes the event of getting the sum of 7 and B, the event of getting the sum of 1
with a pair of dice.
= 1-P(A∪ 𝐵)
1
= 1- 𝑃 𝐴 + 𝑃 𝐵 (∴ 𝐴 𝑎𝑛𝑑 𝐵 𝑎𝑟𝑒 𝑑𝑖𝑠𝑗𝑜𝑖𝑛𝑡 𝑒𝑣𝑒𝑛𝑡𝑠) =1− −
6
1
18
7
= 9
Example:
i. 2 or 4 (ii) 3 (iii) 1 or 9
Solution:
Since the probability of choosing any urn is ½ the required probability P is given by
P= P(I) + P(II)
1 2 1 2 5
=2 × + 2 × 6 = 12
4
1 1 1
ii) Required probability = 2 × 4 + 2 × 0
1 1 1 1 5
iii) Required probability = 2 × 4 + 2 × = 24
6
Example:
A card is drawn from a well – shuffled pack of playing cards. What is the
probability that it is either a spade or on ace.
Solution:
Let A and B denote the events of events drawing a spade card and an ace respectively. Then A
consists of 13 sample points and B consists of A sample points.
13 4
i.e. P (A) = 52 and P (B) = 52
The probability that the card drawn is either a spade or an ace is given by
P𝐴 ∪ 𝐵) = 𝑃 𝐴 + 𝑃 𝐵 − 𝑃(𝐴 ∩ 𝐵)
13 4 1
= 52 + - 52
52
4
= 13
Example:
A box contains 6 red, 4 white and 5 black balls. A person draws 4 balls from
that among the balls drawn there is at least one ball of each color
Solution:
The required event E that in a draw of 4 balls from the box at random there is at least
one ball of each color can materialize the following mutually disjoint ways.
Hence by the addition theorem of probability the required probability is given by,
1
= 15𝑐 [ 6×4×10+15×4×5+6×6×5]
4
4!
=15×14×13×12 [ 240+300+180]
24 × 720
= 15×14×13×12 = 0.5275
Example:
Solution:
Let A ,B ,C denote the events that the problem is solved by the students A,B,C
respectively.
𝑃 𝐴∪𝐵∪𝐶
= 𝑃 𝐴 + 𝑃 𝐵 + 𝑃 𝐶 − 𝑃 𝐴∩𝐵 − 𝑃 𝐴∩𝐶 − 𝑃 𝐵∩𝐶
+𝑃 𝐴∩𝐵∩𝐶
= 𝑃 𝐴 + 𝑃 𝐵 + 𝑃 𝐶 − 𝑃 𝐴 .𝑃 𝐵 − 𝑃 𝐴 .𝑃 𝐶 − 𝑃 𝐵 .𝑃 𝐶 +
𝑃 𝐴 . 𝑃 𝐵 . 𝑃(𝐶)
1 3 1 1 3 3 1 1 1 1 3 1
=2+ +4− .4 - 4 .4 − 2 . + 2.4 . 4
4 2 4
29
= 32
Example:
A bag contains 6 white 9 black balls. Four balls are drawn at a time. Find the
probability for the first draw to give 4 white and the second to give 4 black balls in each
of the following cases.
Solution:
1. The experiment of drawing 4 balls from a bag containing 6 white and 9 black
balls result in 15C4 ways and hence the sample space consist 15C4 sample points.
Let A be the event that the first drawing gives 4 white balls and B be the event that the
second drawing gives 4 black balls.
The event A consists of 6C4 sample points as there are 6 white balls and 4 are to be
chosen from them
6𝐶
P(A) = 15𝐶4
4
Now, if the drawn balls are not replaced our sample space is reduced to 11C4 points
only. The event B that the second draw results in 4 black balls.
𝐵 9𝐶4
P(𝐴 ) = 11𝐶4
3
= 715
Whether A has occurred or not, the probability of drawing 4 black ball in the second
draw is 9𝐶4 │15𝐶4
P(A∩ 𝐵) = 𝑃 𝐴 × 𝑃(𝐵│𝐴)
= P(A).P(B), as B is independent of A
6𝐶 9𝐶 6
= 15𝐶4 × 15𝐶4 = 5926
4 4
1. A bag contains 6 balls of different colors and a ball is drawn from its. A speaks truth
thrice out of 4 times and B speaks truth 7 times out of times. If both A and B say that a
red ball was drawn, find the probability of their joint statement being true (Ans : 7/15)
2. A and B are two very weak students of statics and their chances of solving a problem
correctly are 1/8 and 1/12 respectively if the probability of their making a common
mistake is 1/1001 and they obtain the same answer, find the chance that their answer is
correct (Ans : 13/14)
3. A bag contains 10 balls, two of which are red three blue and 5 black. Three balls are
drawn at random from the bag, that is every ball has an equal chances of being included
is the three what is the probability that
Mathematical expectation.
Definition:
Let x be a discrete random variable which can assume any of the values x1, x2,..xn with
corresponding probabilities Pi= P(x=xi) i=1,2,… Then the mathematical expectation of
x, denoted by E(x) is defined by
Example:
1.Let X be the discrete random variable taking the values 1,2,…, 6 with corresponding
probabilities Pi = 1/6
1 1 1
= 6 (1) + 6 2 + … … . + 6 6
1 7
=6 1+2+3+4+5+6 = 2
P(x) = { 𝑥
6
0
𝑖𝑓 𝑥 = 1,2,3
𝑜𝑡𝑒𝑟𝑤𝑖𝑠𝑒
98 36
=6 + 2 × 6
170
= 6
85
= 3
Definition:
Example:
𝑥+2
Let x have a p.d.f f(x) = 𝑖𝑓 − 2 < 𝑥 < 4
18
0 otherwise
Solution:
1 64 −8
= 18 + 16 − + 4
3 3
1 108
= 18 [ ] =2
3
= E(x2) +4E(x) + 4
4 𝑥+2
= ∫−2 𝑥 2 ( 18 ) 𝑑𝑥 + 4𝐸 𝑥 + 4
1 𝑥4 2𝑥 3 4
= 18 [ 4 + ] + (4x2 + 4)
3 −2
1 128 16
= 18 [ 64 + − (4 − )]+12
3 3
1 320 4
= 18 [ + 3] + 12
3
= 18 + 12 = 30
Definition:
Hence 𝑋 = 𝜇= E(X)
E(Xr), r≥1 is called the rth moment of X about the origin and is denoted by
𝜇𝑟1 .Hence 𝜇𝑟′ = 𝐸(𝑋 𝑟) and
𝑥 = 𝜇1′
E(X- 𝜇)2 is called the variance of X and is denoted by 𝜍2. The positive square root
𝜍 of the variance is called the standard deviation of X.
Lemma:
Proof:
𝜍 2 = E [(x − 𝜇)2 ]
= 𝐸(𝑥 2 − 2𝜇𝑥 + 𝜇 2 )
= 𝐸 𝑥 2 − 2𝜇𝐸 𝑥 + 𝜇 2
= 𝐸 𝑥 2 − 2[𝐸 𝑥 ]2 + [𝐸 𝑥 ]2
= 𝐸 𝑥 2 − [𝐸 𝑥 ]2
2
= 𝜇2′ − 𝜇1′
Lemma:
Proof:
𝜇𝑟 = 𝐸 (𝑋 − 𝜇)𝑟
=𝐸 𝑋 𝑟 − 𝑟𝑐1 𝜇𝑋 𝑟−1 + ⋯ . .
In particular, 𝜇1 = 𝐸 𝑋 − 𝜇 = 𝐸 𝑋 − 𝜇 = 𝜇 − 𝜇 = 0.
𝜇2 = 𝜇2′ − 2𝜇1′ 𝜇 + 𝜇 2 𝜇0
2
∴ 𝜇2 = 𝜇2′ − 𝜇1′ (Since 𝜇 = 𝜇1′ 𝑎𝑛𝑑𝜇0′ = 1)
Problem:
A random variable X is defined as the sum of the numbers on the faces when
two dice are thrown. Find the expected value of X.
Solution:
P(𝑥𝑖 ) 1 2 3 4 5 6 5 4 3 2 1
36 36 36 36 36 36 36 36 36 36 36
𝐸 𝑋 =
𝑥𝑖 𝑝(𝑥𝑖 )
=2(1/36)+3(2/36)+4(3/36)+5(4/36)+6(5/36)+7(6/36)+8(5/36)+9(4/36)+10(3/36)+11(2/3
6)+12(1/36)=252/36=7.
Problem:
𝑥𝑖 -2 -1 0 1 2 3
P(𝑥𝑖 ) 0.1 K 0.2 2k 0.3 k
Find (i) the value of k (ii) mean (iii)
variance (iv) 𝑝(𝑥 ≥ 2) (v) 𝑝(𝑥 < 2)
Solution:
(i) 𝑝𝑖 = 1
⇒ 0.1+k+0.2+2k+0.3+k=1
⇒4k=0.4
⇒k=0.1.
𝑥𝑖 -2 -1 0 1 2 3
P(𝑥𝑖 ) 0.1 0.1 0.2 0.2 0.3 0.1
=(-2)(0.1)+(-1)(0.1)+(0)(0.2)+(1)(0.2)+2(0.3)+3(0.1)
=0.8.
=2.8-0.64=2.16.
=0.2+0.2+0.3=0.7.
Problem:
𝑥+1
Let X have the p.d.f f(x)= , if -1<x<1
2
0, otherwise.
Solution:
∞ 1 𝑥+1
𝜇 = 𝐸 𝑥 = ∫−∞ 𝑥𝑓 𝑥 𝑑𝑥 = ∫−1 𝑥 𝑑𝑥
2
1
𝑥3 𝑥2
=1/2 +
3 2 −1
𝟏
𝜇 =𝟑
𝜍 2 =E(X2)-[E(X)]2
1 𝑥2 1 2
= ∫−1 𝑥 + 1 dx-
2 3
1
𝑥4 𝑥3 1
=1/2 + −9
4 3 −1
2
𝜍2=
9
Definition:
The moment generating function (m.g.f) for any random variable X about the
origin is defined by 𝑀𝑋 𝑡 = 𝐸 𝑒 𝑡𝑥 =
Where the integration or summation is taken over the entire range of X and t is a real
parameter.
Definition:
Example :
6
, 𝑖𝑓 𝑥 = 1,2,3, …
Define P(x) = 𝜋 2𝑥 2
0, 𝑜𝑡𝑒𝑟𝑤𝑖𝑠𝑒
6 6 ∞ 1
Now, 𝑃 𝑥 = = 𝜋2 𝑛=1 𝑛 2
𝜋2𝑥2
6 𝜋2 1 1 𝜋2
=𝜋 2 × (Since 1 + 22 + 33 … += )=1
𝟔 6
6 ∞ 𝑒 𝑡𝑛
=𝜋 2 𝑛=1 𝑛 2
Proof:
𝜇 2′ 2 𝜇 𝑟′ 𝑟
Then 𝑀𝑋 𝑡 = 1 + 𝜇1′ + 𝑡 + ⋯+ 𝑡 +⋯
2! 𝑟!
𝑑𝑟 ′
𝜇 𝑟′ 𝑟! 𝑡𝜇 𝑟+1
∴ 𝑑𝑡 𝑟
(𝑀𝑋 𝑡 ) = 𝑟!
+(𝑟+1)!
+…
𝑑𝑟
At t=0, (𝑀𝑋 𝑡 )=𝜇𝑟′
𝑑𝑡 𝑟
Problem:
Solution:
∞ 𝑡𝑥 1
= 𝑥=1 𝑒 (2 𝑥 )
𝑥
∞ 𝑒𝑡
= 𝑥=1 2
2
𝑒𝑡 𝑒𝑡
=2 + +…
2
𝑒𝑡 1
=2 𝑒𝑡
1−
2
𝑒𝑡
𝑀𝑋 𝑡 = 2−𝑒 𝑡
𝑑 2−𝑒 𝑡 𝑒 𝑡 +𝑒 𝑡 𝑒 𝑡 2𝑒 𝑡
(ii) 𝑑𝑡 𝑀𝑋 𝑡 = =
(2−𝑒 𝑡 )2 (2−𝑒 𝑡 )2
𝑑
∴ 𝜇1′ = 𝑀𝑋 𝑡 =2
𝑑𝑡 𝑡=0
2
𝑑2 2−𝑒 𝑡 2𝑒 𝑡 +2𝑒 𝑡 2(2−𝑒 𝑡 )
(𝑀𝑋 𝑡 =
𝑑𝑡 2 (2−𝑒 𝑡 )4
8𝑒 𝑡 −2𝑒 2𝑡
= (2−𝑒 𝑡 )3
𝑑2
∴ 𝜇2′ = 𝑀𝑋 𝑡 =6
𝑑𝑡 2 𝑡=0
=6-4 = 2
Problem :
Solution:
𝑀𝑥 𝑡 = 𝐸(𝑒 𝑡𝑥 ) = 𝑒 𝑡𝑥 𝑓(𝑥)𝑑𝑥
−∞
𝒆𝟐𝒕 −𝑒 −𝑡
= when t≠ 0
𝟑𝒕
𝟏 𝟐 𝟏 𝟐
when t= 0, 𝑀𝑥 𝑡 = 𝟑 ∫−𝟏 𝒅𝒙 =𝟑 𝒙 −𝟏 =1
𝒆𝟐𝒕 − 𝑒 −𝑡
𝑀𝑥 𝑡 = , when t ≠ 0
𝟑𝒕
𝟏 when t = 0
Introduction:
In this chapter we discuss some important distribution of random variable which are frequently
used is statistics. We make a detailed study of binomial distribution, Poisson distribution which
are of discrete type and normal distribution which is of continuous type.
Binomial Distribution:
Definition:
𝑛𝑐𝑥 𝑝 𝑥 𝑞 𝑛 −𝑥 𝑖𝑓 𝑥 = 0,1,2, … . 𝑛
Define 𝑝(𝑥) =
0 𝑜𝑡𝑒𝑟𝑤𝑖𝑠𝑒
A discrete random variable with the above p.d.f. is said to have binomial distribution
and the p.d.f itself is called a binomial distribution.
Note: 1
The two independent constants n and p in the distributions are known as the parameters of the
distribution. If x is a binomial variate with parameters n and p we write as
𝑋~𝐵 𝑛, 𝑃 .
In this experiment is repeated N times (say) then the frequency function of the binomial
distribution is given 𝑏𝑦 𝑓 𝑥 = 𝑁𝑃 𝑥 = 𝑁𝑛𝑐𝑥 𝑝 𝑥 𝑞 𝑛−𝑥 , 𝑥 = 0,1,2, … … . . 𝑛.
Theorem: 1
(𝑞 + 𝑝𝑒 𝑡 )𝑛 .
Proof:
𝑛
𝑀𝑋 𝑡 = 𝐸 𝑒 𝑋𝑡 = 𝑒 𝑋𝑡 𝑝 𝑥 = 𝑒 𝑡𝑥 𝑛𝑐 𝑥 𝑝 𝑥 𝑞 𝑛−𝑥
𝑥=0
𝑛
= 𝑥=0 𝑛𝑐𝑥 𝑝𝑒 𝑡 𝑥 𝑛−𝑥
𝑞
= 𝑞 + 𝑝𝑒 𝑡 𝑛
.
Proof:
∴ 𝑀𝑋1 = 𝑞 + 𝑝𝑒 𝑡 𝑛1
𝑀𝑋2 𝑡 = 𝑞 + 𝑝𝑒 𝑡 𝑛2
= 𝑞 + 𝑝𝑒 𝑡 𝑛1
+ 𝑞 + 𝑝𝑒 𝑡 𝑛2
= 𝑞 + 𝑝𝑒 𝑡 𝑛 1 +𝑛 2
𝑛1 + 𝑛2 and 𝑃.
Theorem: 3
𝑛
Characteristic function of binomial distribution is 𝑞 + 𝑝𝑒 𝑖𝑡 .
Proof:
𝑛 𝑥 𝑛−𝑥
= 𝑥=0 𝑛𝑐𝑥 𝑝𝑒 𝑖𝑡 𝑞
= (𝑞 + 𝑃𝑒 𝑖𝑡 )𝑛 .
Example:
The unbiased coins are tossed and number of heads noted. The experiment is
repeated 64 times and the following distribution is obtained.
No.of 0 1 2 3 4 5 Total
heads
Frequencies 3 6 24 26 4 1 64
Solution:
P(1)= 5(1/32).
Hence f(1)=10
Problem:
In a binomial distribution the mean is 4 and the variance is 8/3.Find the mode of the
distribution.
Solution:
∴ 𝑛𝑝 = 4 and npq=8/3
𝑛𝑝𝑞 8 2
=3×4 = 3
𝑛𝑝
q = 2/3,p=1/3
∴ 𝑛𝑝 = 4
n=12
Consider (n+1)p=13/3=4.3
A discrete random variable X has the mean 6 and variance 2. If it is assumed that
the distribution is binomial. Find the probability that 5≤ 𝑥 ≤ 7.
Solution:
Also n=9
2 5 1 4 2 6 1 3 2 7 1 2
=9C5 +9C6 + 9C7
3 3 3 3 3 3
126 26 27
= x25 + 84 × + 36 ×
39 39 39
25
= [126+168+144]
39
25
= x 438
39
32×438
= 19683
= 0.712
Problem :
An insurance agent accepts policies of 5 men all of identity age and in good health. The
probability that man of this age will be alive 30 years hence is 2/3. Find the probability that is 30
years (i) all five men (ii) at least one man (ii) almost three will be alive.
Solution:
P(X=x) = nCxPxqn-x
ii) Probability of at least one being alive = 1- probability of no one being alive.
Probability of no one being alive
= 1-1/243
= 242/243
iii) Probability of almost 3 being alive = probability of one man being alive or
probability of 2 men being alive (or) probability of 3 men being a live.
= 1- [P(x=4) + P(x=5)]
= 1-[5(16/243)+ 32/243]
= 1-112/243 = 131/243
Problem :
Six dice are thrown 729 times. How many times do you except at least 3
dice to show a five or six.
Solution:
Here n = 6, N= 729
Q = 1-1/3 = 2/3
= 729 (6C3 (1/3)3 + 6C4(1/3)4 (2/3)2 + 6C5 (1/3)5 2/3 + 6C6 (1/3)6]
= 169857/729
=233.
Problem :
Solution:
i) We have E(X) = 𝜇1 = 𝑛𝑝
E(X) = 8x0.4
= 3.2
= e2t Mx (3x)
= e 2t (0.6+0.4e 3t) 8
Poisson distribution:
Definition:
𝑒 −𝑥 𝜆𝑥
𝑝 𝑥 =𝑝 𝑋=𝑥 =𝑓 𝑥 = , 𝑖𝑓 𝑥 = 0,1,2 …
𝑥!
0, 𝑜𝑡𝑒𝑟𝑤𝑖𝑠𝑒
Example:
x 0 1 3 42
x 0 1 2 3 4 Total
f 5 2 3 1
f 123 9 14 3 1 200
Solution:
59+28+9+4
= =0.5
200
𝜆 = 0.5
𝑒 −0.5 0.5 𝑥
𝑝 𝑥 =
𝑥!
∴ 𝑝 0 = 𝑒 −0.5 = 0.6065
Hence f(0)=Np(0)
=200× (0.6065)
x
Probabilities using Expected Observed
p(x+1)=(λ/(x+1))p(x) frequencies frequencies
f(x)=Np(x)
0 P(0)=0.6065 121.3 123
𝜆
VP(x+1)= 𝑝(𝑥)
𝑥+1
0.5
We have p(1) = 𝑝(0)
=0.3033
=60.66
Problem :
The probabilities of a Poisson variable taking the values 3 and 4 are equal calculate the
probabilities of variates taking the values 0 and 2
Solution
𝑒 −λ λ x
∴P(X=x) = 𝑥!
Given P(X=3)=P(X=4)
𝑒 −λ λ 3
P(X=3)=P(X=4) ⇒ 3!
𝑒 −λ λ 4
= ⇒ λ=4
4!
𝑒 −4 40
P(X=0)= =𝑒 −4 = 0.0183
0!
𝑒 −4 42
P(X=2)= =0.146.
2!
Assuming that one in 80 births in a case twins. Calculate the probability of 2 or more
birth of twins on a day when 30 births occur using (i) binomial distribution (ii) Poisson
approximation.
Solution:
𝑥 30−𝑥
𝑝 𝑥 = 30𝑐𝑥 0.0125 0.9875
𝑝 𝑋 ≥ 2 = 1−𝑝 𝑋 <2
=1-[p(X=0)+p(X=1)]
=1-[(0.9875)30+30(0.0125)(0.9875)29
=1-0.6943(1.3625)
=0.054
𝑒 −𝜆 𝜆 𝑥 𝑒 −0.375 0.375 𝑥
We get 𝑝 𝑥 = =
𝑥! 𝑥!
𝑝 𝑋 ≥2 =1− 𝑝 𝑋 =0 +𝑝 𝑋 =1
=1-e-0.375+e-0.375(0.375)
=1-0.6873(1.375)
=0.0550.
∞ −𝑦 2
We have ∫−∞ 𝑒 2 𝑑𝑦 = 2𝜋
−𝑦 2
∞ 𝑒 2
Hence ∫−∞ 2𝜋 𝑑𝑦=1
− 𝑥 −𝜇 2
𝑥−𝜇 ∞ 𝑒 2𝜍 2
Put 𝑦 = then we have ∫−∞ 𝑑𝑥 = 1
𝜍 𝜍 2𝜋
Definition:
𝜇 𝑎𝑛𝑑 𝜍 are constants and 𝜍>0 and are called the parameters of the distribution and we
write 𝑋~𝑁(𝜇, 𝜍 2 ).
2. The mean, median and mode coincide the maximum ordinate at 𝑥 = 𝜇 is given by
1
.
𝜍 2𝜋
3. 𝜇 ± 𝜍 are the points of inflexion of the normal curve and hence the points of
inflexion are also equidistant from the median.
6.Q.D:M.D:S.D=10:12:15.
Problem :
Solution:
1 −𝑥 2
Hence the normal is 𝑓 𝑥 = 𝑒 2
2𝜋
∞
Now 𝐸 𝑋 2 = ∫−∞ 𝑥 2 𝑓 𝑥 𝑑𝑥
1 ∞ −𝑥 2
= ∫ 𝑥2 𝑒
2𝜋 −∞
2 𝑑𝑥
1 −𝑥 2 ∞ −𝑥 2
= [−𝑥𝑒 2 ]∞
−∞ +∫−∞ 𝑒 2 𝑑𝑥
2𝜋
1
= 0 + 2𝜋
2𝜋
=1.
Problem :
Given
𝑋~𝑁 8,4
𝑋−8
Hence the standard normal variate 𝑍 = 4
When X=5;z=-0.75
When X=10;Z=0.50
When X=15;Z=1.75
=0.2734+0.1915
=0.4649.
𝑖𝑖 𝑃 10 ≤ 𝑋 ≤ 15 =P(0.5≤ 𝑍 ≤ 1.75)
= 0.4599 – 1.1915
= 0.2684
= 0.5 – p (0 ≤ 𝑧 ≤ 1.75)
= 0.5 – 0.4599
= 0.0401
= 0.5 – p(-0.75 ≤ 𝑧 ≤ 0)
= 0.5 – p( 0 ≤ 𝑧 ≤ 0.75)
= 0.2266
Problem :
The marks of 1000 students in a university are found to be normally distributed with
mean to and s.d. 5. Estimate the number of students whose marks will be i) between 60
and 75 (ii) more than 75 (iii) less than 68.
Solution :
= 0.4772 + 0.3413
= 0.8185
∴ The number of students whose marks is between 60 and 75 is 1000 x 0.8185 = 819.
When x = 75; z =1
= 0.5 – 0.3413
= 0.5 – 0.1554
= 0.3546.
Problem :
Assume the mean height of soldiers to be 68.22 inches with variance of 10.8 inches. How
many soldiers in a regiment of 2000 soldiers would you expect to be over six feet tall. Assume
heights to be normally distributed.
Solution :
72−68.22
= 3.286
3.78
= 3.286
= 1.15
= 0.5 – p (0 ≤ 𝑧 ≤ 1.15)
= 0.5 – 0.3749
= 0.1251
∴ The number of soldiers in the regimet of 2000 over 6 feet tall is 2000 x 0.1251= 250
Problem :
A set of examination marks is approximately distributed with mean 75 and S.D. of 5. If the
top 5% of students get grade A and the bottom 25% get grade B what mark is the lowest A and
what mark is the highest B?
Solution :
It x denote the marks in the examination.
Given x is normally distributed with mean 𝜇 = 75 𝑎𝑛𝑑 𝜍 = 5
ie) X ~N (75, 25)
Let 𝑥1 be the lowest marks for A and 𝑥2 be the highest marks for B. Given
p(x >𝑥1 ) = 0.05 and p(x <𝑥2 ) = 0.25
The standard normal variate
𝑥 1−𝜇
𝑧= 𝜍
𝑥 1 −75
= = 𝑧1
5
𝑥 2 −𝜇 𝑥 2 −75
Z= = = −𝑧2 …………. (1)
𝜍 5
𝑧1 = 0.45
𝑧2 = 0.675
(1)⇒ 𝑥1 = 75 + 5𝑧1
𝑥1 = 83.225
𝑥2 = 75 – 5𝑧2
𝑥2 = 71.625
𝑥2 ≈ 72
Hence the lowest mark for grade A is 83 and the highest mark for B is 72.
Problem :
In a normal distribution 31% of the items are under 45 and 8% are over 64.
Find the mean and standard deviation.
Solution :
Let x denote the normal variate with mean 𝜇 𝑎𝑛𝑑 𝑆. 𝐷. 𝜍
Given p(x < 45) = 0.31 and p (x > 64) = 0.08
45− 𝜇
When x = 45, 𝑧 = = −𝑧1
𝜍
45− 𝜇
When x = 64, 𝑧 = = 𝑧2
𝜍
(1)⇒ 45 –𝜇 = 0.496 𝜍
64 – 𝜇 = 1.405 𝜍
𝜍 = 9.99 ≈ 10
𝜇 = 49.96 ≈ 50
Problem :
Find the probability of getting between 3 heads to 6 heads in 10 tosses of a fair
coin using (i) binomial distribution (ii) the normal approximation to the binomial
distribution.
Solution :
(i). Take x as the binomial variate,
1
We have X ~ 𝐵 (10, 2)
792
= 210
= 0.7734
= 5
𝜍= 𝑛𝑝𝑞 = 1.58
∴ 𝑋 ~ 𝑁 5, 1.58
𝑋− 𝜇
The standard normal variate is 𝑧 = 𝜍
2.5−5
For X = 2.5 ;𝑧 = 1.58
= -1.58
6.5−5
X = 6.5 ;𝑧 = 1.58
Z = 0.95
THEORY OF ATTRIBUTES
Attributes:
The qualitative characteristics of a population are called attributes and they cannot be
measured by numeric quantities. Hence the statistical treatment required for attributes is different
from that of quantitative characteristic.
Suppose the population is divided into two classes according to the presence or absence
of a single attribute. The positive class denotes the presence of the attributes and the negative
class denotes the absence of the attribute. Capital Roman letter such as A,B,C,D… are used to
denote positive Greek letters such as 𝛼, 𝛽, 𝛾, 𝛿 …… are used to denote negative classes.
For example If A represents the attribute richness then 𝛼 represents the attribute non-
richness (poor).
For example,
A,B,C, 𝛼, 𝛽, 𝛾, 𝛿 are all of first order, AB, A𝛽, 𝛼𝐵, 𝛼𝛽 are of second order, and ABC,
A𝛽𝛾, 𝐴𝛽𝐶, 𝛼𝛽𝛾 are of the third order.
The number of individuals possessing the attributes in a class of nth order is called a class
frequency of order „n‟ and class frequencies are denoted by bracketing the attributes.
Thus (A) stands for the frequency of A the number of individuals possessing the attribute
A and (A𝛽) stands for the number of individuals possessing of the attributes A and not B.
1. Class frequencies of the type (A), (AB), (ABC) are known as positive
class frequencies.
2. Class frequencies of the type (𝛼), 𝛽 , 𝛼𝛽 , 𝛼𝛽𝛾 … . 𝑎𝑟𝑒 known as
negative class frequencies.
3. Class frequencies of the type (𝛼𝐵), 𝐴𝛽 , 𝐴𝛽𝛾 , 𝛼𝛽𝐶 …. are known as
contrary frequencies.
4. The classes of highest order are called the ultimate classes and their
frequencies are called the ultimate class frequencies.
Examples:
1. AB = (ABC) + (AB𝛾)
Consider, (AB𝛾) = 𝐴𝐵𝛾. 𝑁
=AB(1-C).N
=AB.N – ABC.N
=(AB) – (BC)
∴ 𝐴𝐵 = 𝐴𝐵𝐶 + (𝐴𝐵𝛾)
2. If there are two attributes A and B we have,
N = (A) + (𝛼) = (B) + (𝛽)
Hence N = (A) + (𝛼)
N = (AB) + (A𝛽) + (𝛼B) + (𝛼𝛽)
Thus,
= (1-𝛾) 1 − 𝛽 . 𝑁
= (1-𝛼 − 𝛽 + 𝛼𝛽). 𝑁
= 𝑁 − 𝛼. 𝑁 − 𝛽. 𝑁 + 𝛼𝛽. 𝑁
= N- (𝛼) − 𝛽 + (𝛼𝛽)
Problem :
Solution:
i) N = (A) + (𝛼) = 30 + 30 = 60
ii) (𝛽) = 𝑁 − 𝐵 = 60 − 25 = 35
iii) 𝐴𝐵 = 𝐴𝐵. 𝑁
= (1-𝛼) 1 − 𝛽 . 𝑁
= N-(𝛼) − 𝛽 + (𝛼𝛽)
= 60-30-35+20
= 15
Problem :
Given the following ultimate class frequencies of two attributes A and B. Find the
frequencies of positive and negative class frequencies and the total number of observations.
Solution:
Taking,
Problem :
Given the following positive class frequencies find the remaining class frequencies N =
20 (A) = 9; (B) = 12; (C) = 8; (AB) = 6; (BC= 4); (CA) = 4; (CA) = 4; (ABC) = 3
Solution:
We are given only 8 class frequencies and we have to find the remaining 19 class
frequencies. They are
Order 1:
(𝛼) = 𝑁 − 𝐴 = 20 − 9 = 11.
(𝛾) = 𝑁 − 𝐶 = 20 − 8 = 12
Order 2:
𝐴𝛽 = 𝐴 1 − 𝐵 . 𝑁
= (A) – (B)
= 9-6= 3
( 𝛼𝐵) = 1 − 𝐴 𝐵. 𝑁
= (B) – (AB)
= 12-6 = 6
(A𝛾) = 𝐴 1 − 𝐶 . 𝑁
= (A) – (AC)
= 9-4
=5
(𝛼𝐶) = 1 − 𝐴 𝐶. 𝑁
=(C) - (AC)
= 8-4=4
(B𝛾) = 𝐵 1 − 𝐶 𝑁
= (B) – (BC)
= 12-4 = 8
= (C) - (BC)
= 8-4=4
(𝛼𝛽) = 1 − 𝐴 1 − 𝐵 . 𝑁 = 𝑁 − 𝐴 − 𝐵 + (𝐴𝐵)
= 20-9-12+ 6=5
(𝛽𝛾) = 1 − 𝐵 1 − 𝐶 . 𝑁
= 20-12-8+4
=4
(𝛼𝛾) = 1 − 𝐴 1 − 𝐶 . 𝑁
= 20-9-8+4
=7
Order 3:
(A𝛽𝛾) = 𝐴𝐵 1 − 𝐶 . 𝑁
= (AB) – (ABC)
= 6-3=3
(A𝛽𝐶) = 𝐴 1 − 𝐵 𝐶. 𝑁
= (AC) – (ABC)
= 4-3 = 1
(A𝛽𝛾) = 𝐴 1 − 𝐵 1 − 𝐶 . 𝑁
= 9-4-6+3 = 2
= (BC) – (ABC)
= 4-3=1
(𝛼𝐵𝛾) = 1 − 𝐴 1 − 𝐶 . 𝐵. 𝑁
= 12-4-6+3
=5
(𝛼𝛽𝐶) = 1 − 𝐴 1 − 𝐵 𝐶. 𝑁
= 8-4-4+3=3
(𝛼𝛽𝛾) = 1 − 𝐴 1 − 𝐶 . 𝑁
= 20-9-12-8+6+4+4-3 = 2
Problem :
In a class text in which 135 candidates were examined for proficiency in English and
Maths. It was discovered that 75 students failed in English, 90 failed in Maths and 50 failed in
both. Find how many candidates i) have passed in Maths ii) have passed in English, failed in
Maths iii) have passed in both.
Solution:
i) (B) = N-(𝛽)
= 135-90
= 45
⇒ 𝐴𝛽 = 𝛽 − 𝛼𝛽
= 90 − 50
= 40
= N- (𝛼) − 𝛽 + (𝛼𝛽)
= 135-75-90 + 50
= 20
Problem :
Solution:
Since there are 3 attributes there are 23=8.Ultimate class frequencies we are
given two.
To find the frequencies of positive classes: (A), (B), (C); (AB), (BC), (AC).
𝐴 + 𝛼 = 1200(= 𝑁)
Adding,
2(A)=1200+192
2(A)=1392
(A)=696
= 1200 – 270
= 930.
Second order:
= 660
= 706
i. (AB𝛾) = 𝐴𝐵 1 − 𝐶 . 𝑁
= (AB) – (ABC)
= 660 – 600
= 60
ii. (A𝛽𝐶) = 𝐴𝐶 1 − 𝐵 . 𝑁
= (AC) – (ABC)
= 620 – 600
= 20
iii. (𝛼𝐵𝐶) = 1 − 𝐴 𝐵𝐶. 𝑁
= (BC) – (ABC)
= 706 – 600
= 106
iv. (A𝛽𝛾) = 𝐴 1 − 𝐵 1 − 𝐶 . 𝑁
= 16
v. (𝛼𝐵𝛾) = (1 − 𝐴) 1 − 𝐶 𝐵. 𝑁
= (B) – (AB) – (BC) + (ABC)
= 910 – 660 – 706+ 600
= 144.
vi. (𝛼𝛽𝐶) = 1 − 𝐴 1 − 𝐵 𝐶. 𝑁
= (C) - (AC) – (BC) + (ABC)
= 930 – 620 – 706 + 600 = 204
Solution:
i. (AB) = AB.N
= (1-𝛼) (1 − 𝛽). 𝑁
= N- (𝛼) − 𝛽 + (𝛼𝛽)
= N – N/2 – N/2 + (𝛼𝛽)
(AB) = (𝛼𝛽)
(A𝛽) = (𝛼𝐵)
Problem :
Of 500 men in a locality exposed to cholera 172 in all were attacked, 178
were inoculated and of these 128 were attacked. Find the number of persons.
i. (𝛼𝛽) = 𝛼𝛽. 𝑁
= (1-A) (1-B).N
= 278
i. (𝛼𝐵) = 𝛼𝐵. 𝑁 = 1 − 𝐴 𝐵. 𝑁
= (B) – (AB)
= 178 – 128 = 50
= (A) – (AB)
= 172 – 128 = 44
Problem:
There were 200 students is a college whose results in the first semester, second semester and the
third semester are as follows: 80 passed in the first semester; 75 passes in the second semester.
96 passed in the third semester 25 passed in all the three semester 46 failed in all the three
semester 29 passed in the first two and failed in the third semester 42 failed in the first two
Solution:
Denoting “pass in first semester” as “A‟ Pass in second semester „B‟ and pass in the third
semester as „C‟ we get.
= 96 – 42 = 54
Thus the number of students who passed in atleast two semester is 83.
Problem :
Given (ABC) = 149; (AB𝛾) = 738; 𝐴𝛽𝐶 = 225 ; 𝐴𝛽𝛾 = 1196; 𝛼𝐵𝐶 =
204; 𝛼𝐵𝛾 = 1762; 𝛼𝛽𝐶 = 171; 𝛼𝛽𝛾 =
21842. 𝑓𝑖𝑛𝑑 𝐴 , 𝐵 , 𝐶 , 𝐴𝐵 , 𝐴𝐶 , 𝐵𝐶 𝑎𝑛𝑑 𝑁.
Solution:
= 26287
= 2308
= 2853
(C) = 749
Problem :
In a very hotly fought battle 70% of the solders at least lost an eye 75% at least lost an
ear 80% at least an arm and 85% at least lost a leg. How many at least must have lost all the
four?
Solution:
We have
(ABCD) ≥ 𝐴 + 𝐵 + 𝐶 + 𝐷 − 3𝑁
≥ 70 + 75 + 80 + 85 − 300
= 10
(ABCD) ≥ 10
A company producers tube lights and conducts a test on 5000 lights for production
defects of frames (F); chokes (C); starters (S) and tubes (T). The following are the records of
defects.
Find the percentage of the tube lights which pass all the four tests.
Solution:
= 5000-451+456-395+5
= 5461-846=4615
Out of 5000 tube lights 4615 pass the four tests for defects.
Exercises:
2. Given the following ultimate class frequencies find the frequencies of the
positive and negative classes and the total number of observations.
3. A survey reveals that out of 1000 people in locality 800 like coffee, 700 like tea,
660 like both coffee and tea. Find how many people like neither coffee nor tea.
4. An examination result shows the following data. 56% at least failed in part I
Tamil, 76% at least failed in part II English 82% at least failed in major – chemistry
and 88% at least failed ancillary maths. How many at least failed in all the four?
Consistency of data:
Definition:
A set of class frequencies is said to the consistent if none of them is negative otherwise the given
set of class frequencies is said to be inconsistent.
We have the following set of criteria for testing the consistency in the case of single attributes
and three attributes.
Note:
conditions (ix) to (xii) can be used to check the consistency of data when the class of
first and second order alone are known.
Problem :
Find whether the following data are consistent. N= 600; (A) = 300;(B) = 400;
(AB)=50.
Solution:
= -50
Show that there is some error in the following data: 50% of people are
wealthy and healthy 35% are wealthy but not healthy 20% are healthy but not wealthy.
Solution:
(𝛼𝛽) = 𝛼𝛽. 𝑁 = 1 − 𝐴 1 − 𝐵 . 𝑁
= 50+35=85
= 50+20
= 70
(𝛼𝛽) = 100 − 85 − 70 + 50
= -5
(𝛼𝛽)<0
Of 2000 people consulted 1854 speak Tamil; 1507 speak Hindi; 572 Speak English;
676 speak Tamil and Hindi; 286 speak Hindi and English; 114 speak Tamil; Hindi and English.
Show that the information as it stands is incorrect.
Solution:
Let A,B,C denote the attribution of speaking Tamil, Hindi, English respectively.
=- 815
∴ 𝛼𝛽𝛾 < 0.
Problem :
(A𝛽) = 7 𝑎𝑛𝑑 𝐴𝛾 = 18
= 48-7 = 41
= 48-18 = 30
41+(BC) + 30 ≥ 48 + 62 + 45 − 125
(AB) +(AC)-(BC)≤ 𝐴
⇒(BC) ≥ 𝐴𝐵 + 𝐴𝐶 − 𝐴
= 41+30-48= 23
(BC) ≥ 23 … … … … … … … … … . 𝑖𝑖
⇒(BC) ≤ 𝐵 + 𝐴𝐶 − 𝐴𝐵
= 62+ 30-41
=51
∴(BC) ≤ 51 … … … … … … … … … . 𝑖𝑖𝑖
⇒(BC) ≤ 𝐶 + 𝐴𝐵 − 𝐴𝐶
= 45+41-30
= 56
∴(BC)= 56 ……………………………..(iv)
23≤ 𝐵𝐶 ≤ 56
Problem :
Find the greatest and least value of (ABC) if (A)=50, (B)=60, (C)= 80, (AB) = 35, (AC)= 45 and
(BC)=42
Solution:
The problem involves 3 attributes and we are given positive class frequencies of first
order and second order only.
Using positive class conditions (ii), (iii), (iv) of consistency for 3 attributers
(ABC) ≤ 𝐴𝐵 ⇒ 𝐴𝐵𝐶 ≤ 35
(ABC) ≤ 𝐵𝐶 ⇒ 𝐴𝐵𝐶 ≤ 42
(ABC) ≤ 𝐴𝐶 ⇒ 𝐴𝐵𝐶 ≤ 45
⇒ 𝐴𝐵𝐶 ≤ 45 … … … … … … … (𝑖)
(ABC) ≥ 𝐴𝐵 + 𝐴𝐶 − 𝐴
⇒ (𝐴𝐵𝐶) ≥ 35 + 45 − 50 = 30
⇒ (ABC) ≥ 35 + 42 − 60 = 17
(ABC) ≥ 𝐴𝐶 + 𝐵𝐶 − 𝐶
⇒ (ABC) ≥ 45 + 42 − 80 = 7
Thus (ABC) ≥ 30
(ABC) ≥ 17
(ABC) ≥ 7
⇒ (ABC) ≥ 30 … … … … … … … 2
∴The least value of (ABC) is 30 and the greatest value of (ABC) is 35.
Problem :
(𝐴𝐵) (𝐴𝐶) 𝐵𝐶
= = = 𝑦 . prove that neither 𝑥 𝑛𝑜𝑟 𝑦 can exceed ¼ .
𝑁 𝑁 𝑁
Solution:
(AB) ≤ 𝐴
(𝐴𝐵) (𝐴)
⇒ ≤
𝑁 𝑁
y≤ 𝑥
Similarly,
(BC)≤ 𝐵 ⇒ 𝑦 ≤ 2𝑥
Now, (AB)≥ 𝐴 + 𝐵 − 𝑁
𝐴𝐵 𝐴 𝐵
⇒ ≥ + − 1
𝑁 𝑁 𝑁
Thus, (AB) ≥ 𝐴 + 𝐵 − 𝑁
y≥ 3𝑥 − 1
Similarly
(BC) ≥ 𝐵 + 𝐶 − 𝑁
⇒y≥ 5𝑥 − 1
⇒y≥ 5𝑥 − 1 … … … … … … … . 2
(AC) ≥ 𝐴 + 𝐶 − 𝑁
1
Taking 5x-1≤ 𝑥 𝑤𝑒𝑔𝑒𝑡 𝑥 ≤ 4
1
Taking y≤ 𝑥 𝑤𝑒𝑔𝑒𝑡 𝑦 ≤ 4
Exercises:
2. A market investigator returns the following data of 2000 people consulted 1754 liked
chocolates 1872 liked toffee and 572 liked biscuits, 678 liked chocolate and coffee, 236 liked
chocolates and biscuits, 270 liked chocolates and biscuits, 270 liked toffee and biscuits, 114
liked all the three .Show that the information it started must be incorrect.
Two attributes A and B are said to be independent if there is same proportion of A‟s
amongst B as amongst 𝛽‟s.
(𝐴𝐵) (𝐴𝛽 )
= ……………………………….(i)
𝐵 (𝛽 )
or
(𝐴𝐵) (𝛼𝐵 )
= ……………………………….(ii)
𝐴 (𝛼)
𝐴 (𝐵)
∴(AB) = ………………….(1)
𝑁
𝐴 (𝛽 )
And (A𝛽) = …………………(2)
𝑁
𝐵 −(𝐴𝐵) 𝛽 − (𝐴𝛽 )
=
(𝐵) 𝛽
(𝛼𝐵 ) (𝛼𝛽 )
=
𝐵 (𝛽)
(𝛼𝐵 ) (𝛼𝛽 )
∴ =
𝐵 (𝛽)
𝛼𝛽 + (𝛼𝐵)
=
(𝛽)+ 𝐵
(𝛼 )
= 𝑁
𝛼 (𝛽 )
(𝛼𝛽) = …………………………(3)
𝑁
𝛼 (𝐵)
And (𝛼𝐵) = ………………………………(4)
𝑁
(1),(2),(3),(4) are all equivalent conditions for independent of the attribute A and B.
𝐴 (𝐵)
If (AB) ≠ we say that A and B are associated. There are two possibilities.
𝑁
𝐴 (𝐵) 𝐴 (𝐵)
If (AB) > we say that A and B are positively associated and If (AB) < we
𝑁 𝑁
𝐴 𝐵
Let us denote 𝛿 = 𝐴𝐵 − 𝑁
1
ie. 𝛿 = [ (AB) (𝛼𝛽) − 𝐴𝛽 (𝛼𝛽)]
𝑁
Note:
Coefficient of association:
There are several measures indicating the intensitivity of association between two
attribution
A and B.
The most commonly used measures are the Yule‟s coeficiency of association Q and
coefficient of colligation Y which are defined as follows.
𝐴𝐵 𝛼𝛽 − 𝐴𝛽 𝛼𝐵
𝑄=
𝐴𝐵 𝛼𝛽 + 𝐴𝛽 𝛼𝐵
𝑁𝛿
𝑄=
𝐴𝐵 𝛼𝛽 + 𝐴𝛽 𝛼𝐵
𝐴𝛽 𝛼𝐵
1−
𝐴𝐵 𝛼𝛽
𝑌=
𝐴𝛽 𝛼𝐵
1+
𝐴𝐵 𝛼𝛽
Problem :
Check whether the attributes A and B are independent given that (i) = 30 (B)=
60, (AB)= 12, N= 150
(ii)(AB) = 256, (𝛼𝐵) = 768, 𝐴𝛽 = 48 , 𝛼𝛽 = 144.
Solution:
𝐴 (𝐵)
(AB) =
𝑁
Consider,
𝐴 𝐵
∴ (AB)=
𝑁
𝐴 𝐵 304 × 1024
Now = = = 256 = 𝐴𝐵
𝑁 1216
𝐴 𝐵
∴ (AB) =
𝑁
Problem :
Solution:
(𝛼) = 𝑁 − 𝐴 = 135 − 75 = 60
(𝛽) = 𝑁 − 𝐵 = 135 − 90 = 45
(𝛼𝐵) = 𝐵 − 𝐴𝐵 = 90 − 50 = 40
(A𝛽) = 𝐴 − 𝐴𝐵 = 75 − 50 = 25
(𝛼𝛽) = 𝛼 − 𝛼𝐵 = 60 − 40 = 20
50×20−25×40
𝑄=
50×20+20×40
𝑄=0
∴ A and B are independent hence failure in physics and chemistry are completely
independent of each other.
Problem :
Solution:
𝐴 𝐵 300 × 400
i) = = 129.03
𝑁 930
𝐴 (𝐵)
Now, 𝛿 = 𝐴𝐵 − 𝑁
= 100.97
Here 𝛿 > 0
𝐴𝐵 𝛼𝛽 − 𝐴𝛽 (𝛼𝐵 )
ii) 𝑄 =
𝐴𝐵 𝛼𝛽 + 𝐴𝛽 (𝛼𝐵 )
327×235−545×741
=
327×235+545 × 741
76845 −4038845
=
76845 +403845
−32700
=
480690
= -0.6803
𝑄 < 0.
= 470 + 530
= 1000
= 300 + 150
= 450
= 300- 2155
= - 1825
∴ 𝛿 < 0.
𝐴𝐵 𝛼𝛽 − 𝐴𝛽 (𝛼𝐵 )
iv. 𝑄 =
𝐴𝐵 𝛼𝛽 − 𝐴𝛽 (𝛼𝐵 )
66×136−88×102
= = 0.∴ A and B are independent.
66×136+88×102
Problem :
Calculate the co-efficient of associate between intelligence of father and son from
the following data.
Intelligent father with intelligent sons 200.Intelligent fathers with dull sons 50.
Dull fathers with intelligence sons 110. Dull fathers with dull sons 600. Comment on
the result.
Solution:
we have
𝐴𝐵 𝛼𝛽 − 𝐴𝛽 (𝛼𝐵)
𝑄=
𝐴𝐵 𝛼𝛽 + 𝐴𝛽 (𝛼𝐵)
= 0.91235
Since 𝑄 is positive it means that intelligent fathers are likely to have intelligent sons.
Problem :
Investigate from the following data between inoculations against small pox
prevention from attack.
(𝛼𝛽) = 160.
𝐴𝐵 𝛼𝛽 − 𝐴𝛽 (𝛼𝐵 )
𝑄=
𝐴𝐵 𝛼𝛽 + 𝐴𝛽 (𝛼𝐵 )
25 𝑋 160−220 𝑋 90
=
25 𝑋 160+220 𝑋 90
400−19800
=
400+19800
−15800
=
23800
= -0.6638.
i.e. “Inoculation” and “attack from small pox” are negatively associated.
Thus inoculation against small pox can be taken as the preventive measure.
Problem :
From the following data compare the association between marks in physics and
chemistry in MKU and MSU
We have,
MKU MSU
N=1600 N=200
(A) = 320 (A)=80
(A)= 90 (A)= 40
(AB) = 30 (AB) = 20
From the above data we get the rest of the class frequencies for MKU and MSU.
We now find the coefficient of association between A and B for MKU and MSU
3. From the figures given in the following table compare the association between literacy
and un employment in rural and urban areas- and given reasons for the difference if any
Time series:
Definition:
The various forces affecting the values of a phenomenon in a time series may be broadly
classified into the following three categories generally known as the components of a time series.
The general tendency of a time series is to increase or decrease or stagnate over a period
of several years. Such a long run tendency of a time series to increase or decrease over a period
of time is known as secular trend or simply trend. Though the term “long” is a relative term it
depends upon the nature of the series under consideration.
In most of the time series a number of forces repeat themselves periodically over a period
of time preventing the values of the series to move in a particular direction. The variations
caused by such forces are called short term fluctuations. This short term fluctuations may
broadly be classified into (a) seasonal variation (B) cyclical variation
a) Seasonal variation:
b) Cyclical variation
3. Irregular fluctuations
The fluctuation which are purely random and due to unforeseen and unpredictable forces
are called Irregular fluctuations
Measurement of trends
The following are the four study of measurement of the trend in a time series
i) Graphic method
ii) Method of curve fitting by the principles of least squares.
iii) Method of semi averages
iv)Method of moving averages.
i) Graphic Method
This is the simplest method of determining the trend. In this method all values of the time
series are plotted on a graph paper and a smooth curve is drawn by free hand to pass through as
many points as possible. The smoothing of the curve eliminates the other components such as
seasonal, cyclic and random variations.
This is the best method of fitting a trend and it is commonly used in practice.
In this method the whole time series data is classified into two equal parts with respect to
time. Having divided the given series into two equal parts we calculate the arithmetic mean for
each part. These means are called semi-averages. Then these average are plotted against the mid
values of the respective period covered by each part. The line joining these points give the
straight line trend for the time series.
This method for measuring the trend consists of obtaining a series of moving average of
successive m terms of the time series. This averaging process smoothens the fluctuations and the
UPS and down in the given data. It has been observed and proved mathematically that if a trend
is liner the period of the moving average is taken to be the period of oscillation.
There is a simple method for measuring the seasonal variation which involves
simple averages.
Step 1:
Step 2:
Step 3:
𝑥1+𝑥 2+⋯…….+𝑥 12
𝑥= 12
𝑥𝑖
Thus seasonal index for ith month = × 100 Take X = x – 1987 and Y = y-42
𝑥
Y=ax + b
-19 = 110 a
−19
⇒a = = - 0.17
110
17 = 11b
17
⇒b11 = 1.55
Problem:
Use the method least squares and fit a straight line trend to the following data given from 82 to
92. Hence estimate the trend values for 1993.
Y=ax + b
−19
From the table, -19 = 110 a ⇒a = = - 0.17
110
X= Y= y-42
X x-1987 Y XY 𝑋2
-5 3 -15 25
1982 45
-4 46 4 -16 16
1983
-3 44 2 -6 9
1984
-2 47 5 -10 4
1985
-1 42 0 0 1
1986
17
17 = 11𝑏 ⇒b11 = 1.55
Thus the trend values are 44.4, 44.23, 44.06, 43.89, 43.72, 43.58, 43.38, 43.21, 43.04,
43.04, 42.87, 42.7
Problem:
I II III IV
1991 1992 1993 1994
April 12 14 16 18 60 15 15
12
x100 = 125
July 12 14 13 17 56 14 14
x100 = 116.7
12
August 9 8 11 16 44 11 11
x100 = 91.7
12
December 12 13 15 16 56 14 14
12
x100 = 116.7
Total 144
12
1992 48 - - -
Problem:
Compute the trend values by the method of A yearly moving average for the data
given in problem 1.
Problem:
Determine the suitable period of moving average for the data given in problem 1
Thus the data shows 3 cycles with varying periods 2,5,2 respectively.
periods.
2+5+2
Hence = 3 is the period of moving average.
5
Problem:
Compute the seasonal indices for the following data by simple average method
Solution:
1. Room the data given below calculate the seasonal indicates assuming that trend is
absent
2. Compute the seasonal index for the following data assuming that there is no need to
adjust the data for the trend
Sampling - Definition - Large samples. Small samples - Population with one samples and
population with two samples - Students – t – test - Applications - chi - square test and goodness
of fit - applications.
INTRODUCTION:
Any statistical investigation usually deals with the study of some characteristics of a
collection of objects
SAMPLING:
Definition:
A finite subset of population is called a sample and the number of objects in a sample is
called the sample size.
Some of the important types of sampling are (i) purposive sampling (ii) Random
sampling (iii) Simple sampling (iv) stratified sampling.
(i)Purposive sampling:
If the sample elements are selected with a definite purpose in mind then the sample
selected is called purposive sample.
A random sample is one in which each element of the population has an equal
chance of inclusion in the sample.
Simple sampling is a special type of random sampling in which each element of the
population has an equal and independent chance of being included in the sample.
The sample which is the aggregate of the sampled individuals of each stratum is called
stratified sample and the technique of selecting such sample is called stratified sampling.
(A) (i) Test for single mean if standard deviation of the population σ is
known. (ie) 𝐻0∶ 𝜇 = 𝜇0 , 𝜍 is known.
(B) (ii) Tests for single mean if σ is not known 𝐻0 : 𝜇 = 𝜇0 ,σ is
unknown.
(C) (i) Test for equality of means of 2 normal populations with
Known standard deviations (ie) 𝐻0 : 𝜇1 = 𝜇2 ; 𝜍1 , 𝜍2 is known.
(ii) Test for equality of means of 2 normal populations with same standard
deviation though unknown 𝐻0 : 𝜇1 = 𝜇2, 𝜍1 = 𝜍 = 𝜍2 .
III. Test for standard deviations.
(A) : Test for single standard deviation 𝐻0 : 𝜍 = 𝜍0
(B) : Test for equality for 2 standard deviation (ie)𝐻0 : 𝜍1 = 𝜍2
𝑋 ~𝑁(𝑛𝑝, 𝑛𝑃𝑄)
I (B)Difference of proportions.
Let 𝑋1 be number of persons possessing the attribute A in the first sample and 𝑋2 be the number
of persons possessing the same attribute in the second sample
𝑋1 𝑋2
𝑝1 = ; 𝑝2 =
𝑛1 𝑛2
As before 𝐸(𝑝1 ) = 𝑃1 and 𝐸 𝑝2 = 𝑃2 where 𝑃1 𝑎𝑛𝑑 𝑃2 are the proportions in
𝑃1 𝑄1 𝑃2 𝑄2
the populations.𝑉 𝑝1 = 𝑎𝑛𝑑 𝑉 𝑝2 = .
𝑛1 𝑛2
𝑝 1 −𝑝 2
∴𝑍= 1 1
~ 𝑁(0,1)
𝑃𝑄 +
𝑛1 𝑛2
Problem:
A coin is tossed 144 times and a person gets 80 heads. Can we say that the coin is unbiased one?
Solution:
1
80 − 144 2 80 − 72 8
𝑍= = = = 1.33 < 1.96.
1 1 36 6
144 2 2
Problem:
A die is thrown 10000 times and a throw of 1 or 2 was obtained 4200 times. On the assumption
of random throwing do the data indicate an unbiased die?
Solution:
1
𝑋 − 𝑛𝑃 4200 − 10000 3 4200 − 3333.3 866.7
∴𝑍= = = = = 18.4
𝑛𝑃𝑄 2 47.14 47.14
10000(9)
Problem:
A manufacturer claimed that at least 95% of the equipment which he supplied to a factory
conformed to specification. An examination of a sample of 200 pieces of
equipment revealed that 18 were faulty. Test his claim at a significant level of (i)5%
(ii)1%.
Solution:
X=200-80=182
182
𝑝= = 0.91.
200
Set the null hypothesis 𝐻0 :P=0.95,Q=0.05.𝐻1 : 𝑃 < 0.95.
𝑝−𝑃 . 91 − .95 . 04
𝑍= = =− = −2.6
𝑃𝑄 . 95 × .05 . 0154
𝑛 200
(i) Since the alternative hypothesis is left tailed and the significant
value of Z at 5% level of significant for left tail is -1.645.
Z=-2.6<-1.645.
Problem:
A sample of 1000 products from a factory are examined and found to be 2.5% defective. Another
sample of 1500 similar products from another factory are found to have only 2% defective. Can
we conclude that the products of the first factory are inferior to those of the second?
Solution:
Problem:
A machine puts out 16 imperfect articles in a sample of 500 articles. After the machine
overhauled it. Puts out 3 defective articles in sample of 100.Has the machine improved?
Solution:
Consider two different normal populations with 𝜇1 and 𝜇2 and s.d 𝜍1 and 𝜍2 respectively. Let a
sample of size 𝑛1 be drawn from the first population and an independent sample of size 𝑛2 be
drawn from the second population. Let 𝑥1 be the mean of the first sample from the first
population and 𝑥2 be the mean of second sample from the second population. If the sample sizes
𝜍12
are large we know 𝑥1 is a normal variate with mean 𝜇1 and variance and 𝑥2 is an independent
𝑛1
𝜍22
normal variate and normal variate with mean 𝜇2 and variance .
𝑛2
𝑥 1 −𝑥 2
The test statistic becomes 𝑍 = which can be tested at any level of
𝜍2 2
1 +𝜍 2
𝑛1 𝑛2
significance.
Problem:
The number of accidents per day were studied for 144 days in Madras city and for 100 days in
Delhi city. The mean numbers of accidents and the s.ds were respectively 4.5 and 1.2 for Madras
city and 5.4 and 1.5 for Delhi city. Is Madras city more prone to accidents than Delhi city?
Solution:
∴ │𝑍│ = 4.99 > 3 we reject the hypothesis that the two cities have the same
accident rates. However since Delhi city has higher rate of accident than Madras city.
Therefore Delhi more prone to accidents.
Problem:
The mean yields of rice from two places in a district were 210 kgs and 220 kgs per acre from 100
acres and 150 acres respectively. Can it be regarded that the sample were drawn from the same
district which has the s.d of 11kgs per acre?
Solution:
𝑛1 = 100; 𝑥1 = 210; 𝜍 = 11
𝑛2 = 150; 𝑥2 = 220
Set the null hypothesis 𝐻0 : 𝜇1 = 𝜇2
𝑥1 − 𝑥2
∴𝑍=
𝜍 1 𝑛1 + 1 𝑛2
𝑍 = 7.04 > 3.The value is highly significant and hence we reject the null hypothesis.
Hence the samples are certainly not from the same district with the s.d 11.
The s.d of weight of all students in a first grade college was found to be 4 kgs. Two samples are
drawn. The s.ds of the weight of 100 undergraduate students is 3.5kgs and 50 post graduate
students are 3 kgs. Test the significance of the difference of standard deviations of the samples at
5% level.
Solution:
𝑠1 − 𝑠2 3.5 − 3
∴𝑍= = = 1.02
𝜍 1 + 1 2𝑛 4 1 1
2𝑛1 2 200 + 100
Problem:
The mean production of wheat of a sample of 100 plots is 200kgs per acre with s.d of 10 kgs.
Another sample of 150 plots gives the mean production of wheat as 220kgs. With s.d of 12kgs.
Assuming the s.d of the 11kgs for the universe find at 1% level of significance ,whether two
results are consistent.
Solution:
∴ │𝑍│ =14.1>3.Hence the two means differ significantly at 5% level even a 1% level.
For 𝐻0 : 𝜍1 = 𝜍2 .
𝑠1 −𝑠2 10−12
𝑍= = = −1.99
𝜍 1 1 1 1
2𝑛 1 + 2𝑛 2 11 200 + 300
Hence the difference of s.d is significant at 5% level and not significant and 1% level.
∴At 1% level the difference between s.d is not significant but between means it is
significant. Hence we can conclude that at 1% level the two results are not consistent.
1.Test for the difference between the mean of a sample and that of a population
II. Test for the difference between the means of two samples
II.A. If 𝑥1 and 𝑥2 are the means of two independent samples of sizes 𝑛1 and 𝑛2 from a normal
𝑥 1 −𝑥 2
population with mean µ and standard deviation σ. It found that ~𝑁 0,1 .
𝜍 1 1
𝑛1 + 𝑛2
𝑥1 − 𝑥2
𝑡=
𝑛1 𝑠12 + 𝑛2 𝑠22 1 1
𝑛1 + 𝑛2 − 2 𝑛1 + 𝑛2
II.B. suppose the sample sizes are equal (ie) 𝑛1 = 𝑛2 = 𝑛.Then we have 𝑛 pairs of values.
Further we assume that the 𝑛 pair are independent .Then the test statistic 𝑡 in (1) becomes
𝑥 1 −𝑥 2
𝑡= .
𝑛 𝑠2 2
1 +𝑠 2 2
2𝑛 −2 𝑛
𝑥 1 −𝑥 2
∴𝑡= is a students 𝑡 variate with
𝑠12 +𝑠22
𝑛−1
𝑣 = 𝑛 + 𝑛 − 2 = 2𝑛 − 2.
II. (C) suppose the sample size are equal and if then n pairs of values in this case are not
independent.
𝑥 −𝜇
The test statistic 𝑡 = 𝑠 to test whether the means of differences is
𝑛−1
significantly different from zero. In this case the d.f is n-1.
Confidence limits (Fiducial limits). If𝜍 is not known and 𝑛 is small then
𝑠𝑡 .05 𝑠𝑡 .05
1. 95% confidence limits for µ is 𝑥 − ,𝑥 +
𝑛−1 𝑛−1
𝑠𝑡 .01 𝑠𝑡 .01
2. 99% confidence limits for µ is 𝑥 − ,𝑥 +
𝑛−1 𝑛−1
A random sample of 10 boys has the following I.Q (intelligent quotients). 70, 120, 110, 101, 88,
95, 98, 107, 100. Do these data support the assumption of a population mean of a population
mean I.Q of 100?
Solution:
2
2
𝑥𝑖 − 𝑥 1833.60
𝑠 = = = 183.36.
𝑛 10
Hence 𝑠 =13.54.
∴ │𝑡│ = .62(nearly).
∴ │𝑡│ = .62 < 𝑡.05. Hence the difference is not significant at 5% level. Hence 𝐻0 may
be accepted at 5% level hence the data support the assumption of population mean 100.
Problem:
It was found that a machine has produced pipes having a thickness .05 mm. to determine whether
the machine is in proper working order a sample of 10 pipe is chosen for which the mean
thickness is .53mm and s.d is 0.3mm .test the hypothesis that the machine is in proper working
order using a level of significance of (1) .05 (2) .01
Solution :
. 03 × 3
= = 3.
. 03
(i)The table value for 𝑣 = 9d.f at 5% level of significance is 𝑡.05 =2.26
(ie)│t│=3>𝑡.05.
(ii) The table value for 𝑣 = 9 d.f at 1% level of significance is𝑡.01 = 3.25.
Problem:
A group of 10 rats fed on a diet A and another group of 8 rats fed on a different diet B recorded
the following increase in weight in gms.
Diet 5 6 8 1 12 4 3 9 6 10
A
Diet 2 3 6 8 1 10 2 8 - -
B
Solution :
Given 𝑛1 = 10; 𝑛2 = 8.
5+6+⋯+10 64
Mean of the first sample 𝑥1 = = 10 = 6.4.
10
2+3+⋯+8 40
Mean of the second sample 𝑥2 = = = 5.0.
8 8
Standard deviation 𝑠1 and 𝑠2 of the first and second sample can be found as
𝑥1 − 𝑥2 6.4 − 5
𝑡= =
10 × 10.24 + 8 × 10.25 1 1
𝑛1 𝑠12 + 𝑛2 𝑠22 1 1
10 + 8 − 2 10 + 8
𝑛1 + 𝑛2 − 2 𝑛1 + 𝑛2
1.4
= = .875.
11.525 . 1 + .125
Problem:
The table gives the biological values of protein from 6 cows milk and 6 buffalo‟s
milk . Examine whether the differences are significant .
1.8 2.0
2.0 1.8
1.9 1.8
1.6 2.0
1.8 2.1
1.5 1.9
Solution:
𝑠22 = .01.
𝑥 1 −𝑥 2
Set null hypothesis 𝐻0 : 𝑥1 = 𝑥2 . Under this null hypothesis the test statistic is 𝑡 =
𝑠2 2
1 +𝑠 2
𝑛 −1
and the d.f 𝑣 = 2𝑛 − 2 = 10.
−.1 −.1
= = = −1.11.
. 03 + .01 . 04
5 5
is 2.23.
Problem:
Ten soldiers participated in a shooting competition in the first week. After intensive training they
participated in the competition in the second week. Their scores before and after coaching were
given as follows.
Soldiers 1 2 3 4 5 6 7 8 9 10
Score 67 24 57 55 63 54 56 68 33 43
before(x)
Score 70 38 58 58 56 67 68 75 42 38
after(y)
Do the data indicate that the soldier have been identified by the training ?
Solution:
Here we are connected with the same set of the soldiers in the 2 competitions and their scores
which are related to each other because of the intensive training .we compute the difference in
their scores 𝑧 = 𝑦 − 𝑥and calculate the mean 𝑧 and the s.d 𝑧 as follow
24 38 14 9 81
57 58 1 -4 16
55 58 3 -2 4
63 56 -7 -12 144
54 67 13 8 64
56 68 12 7 49
68 75 7 2 4
33 42 9 4 16
43 38 -5 -10 100
- 50 482
- -
50 𝑧−𝑧 2 482
𝑧=10 = 5; 𝑠 2 = = = 48.2
10 10
Hence the null hypothesis is accepted .We can conclude that there is no significant
improvement in the training .
INTRODUCTION:
𝝌𝟐 -TEST.
Problem:
A random sample of size 25from a population gives the sample standard deviation 8.5.Test the
hypothesis that the population s.d is 10.
Solution:
Given σ=10,n=25,s=8.5 𝐻0 : 𝜍 = 10
𝑛𝑠 2 25×8.52
𝜒2 = = = 18.06.
𝜍02 100
Test the hypothesis that σ=8 given that s=10 for a random sample of size 51.
Solution:
Given 𝑛1 =51,σ=8,s=10.
Let 𝐻0 : 𝜍 = 8.
𝑛𝑠 2 51×10 2
𝜒2 = = = 79.7.
𝜍02 82
Since𝑍 = 2𝜒 2 - 2𝑛 − 1= 2 × 79.7- 2 × 51 − 1
=2.58
𝟐
-TEST TO TEST THE GOODNESS OF FIT
𝑰𝑰. 𝝌
The 𝜒 2 -distribution can be used to test the goodness of fit. This test can also be applied
to test for compatibility of observed frequencies and theoretical frequencies. Let
𝑜1 , 𝑜2 , … 𝑜𝑛 be the observed frequencies and 𝑒1 , 𝑒2 , … 𝑒𝑛 be the corresponding expected
𝑛 𝑛
frequencies such that 𝑖=1 𝑜𝑖 =𝑁= 𝑖=1 𝑒𝑖 where N is the number of members in the
population.
𝑛 𝑜 𝑖 −𝑒 𝑖 2
Define 𝜒 2 = 𝑖=1 .It is a 𝜒 2 variable with n-1 degrees of freedom.
𝑒𝑖
The theory predicts that the proportion of an object available in four groups A,B,C,D should be
9:3:3:1. In an experiment among 1600 items of this object the members in the four groups were
882,313,287 and188.use 𝜒 2 -test to verify whether the experimental result support the theory.
Solution:
𝑜𝑖 =882+313+287+118=1600
𝑒𝑖 =1600= 𝑜𝑖 .
𝑜 𝑖 −𝑒 𝑖 2
∴ 𝜒2 = .
𝑒𝑖
Problem:
Fit a poisson distribution for the following data and test the goodness of fit.
x 0 1 2 3 4 5 6 Total
f 273 70 30 7 7 2 1 390
𝑓𝑥 70+60+21+28+10+6 195
𝑥= =273+70+30+7+7+2+1 = 390
𝑓
𝜆 = 1 2.
𝑥
𝑁𝑒 −𝜆 𝜆𝑥 390 1
𝑓 𝑥 = = ; 𝑥 = 0,1,2, … 6.
𝑥! 𝑒𝑥! 2
freedom =4-1 =3 .
390 1 0
The expected frequencies are by𝑓 0 = 𝑒 2
= 236.4;
390 1 1
𝑓 1 = = 118.2 ………………….,
𝑒1! 2
390 1 6
𝑓 6 = 𝑒6! 2
= 0.005.
𝑜𝑖 273 70 30 7 7 2 1 390
𝑒𝑖 236.4 118.2 29.5 4.9 .6 .1 0 389.7
Since the sum of the expected frequencies is 389.7.It can be adjusted in the last two
frequencies by adding .3.
𝑜𝑖
273 70 30 17 390
𝑒𝑖 236.4 118.2
29.5 5.9 390
Degrees of freedom=7-1-1-3=2.
2
Since 𝜒 2 =46.3>5.99=The table value of 𝜒.05 it is much significant at 5% level of significance.
Hence the hypothesis is rejected at 5% level and hence the poisson distribution is not a good fit
to the data.
Index Numbers - Types of index numbers - Tests - Unit test commodity reversal test, time
reversal test, factor reversal test - Chain index numbers - cost of living index – Interpolation -
Finite differences operators - Newton’s forward, backward interpolation formulae, Lagrange’s
formula.
INDEX NUMBERS
Index Numbers :
An index number is widely used statistical device for comparing the level of a certain
phenomenon with the level of the same phenomenon at some standard period.
In the computation of an index number, if the base year used for comparison is kept constant
throughout, then it is called fixed base method. If on the other hand, for every year the previous
year is used as a base for comparison, then the method is called chain base method.
A) Aggregate method
B) Average of price relatives method.
In this method total of current prices for various commodities is divided by the total of the
base year. In symbols if 𝑝0 denotes the price of the base year and 𝑝1 the price of the current
year.
𝑃1
𝑝01 = 𝑥 100 𝑤𝑒𝑟𝑒 𝑃1 is total of the current year
𝑃0
From the following data construct the simple aggregative index number for 1992.
Price in Price in
Commodities 1991 1992
Rs Rs
Rice 7 8
Wheat 3.5 3.75
Oil 40 45
Gas 78 85
Flour 4.5 5.25
Solution :
147
= × 100
133
= 110.5
Price relatives denoting the price of a commodity of a base year as 𝑃0 and the price of the
𝑃
current year as 𝑃1 the ratio of the prices 𝑃1 is called the price relatives.
0
𝑃
Index number for the current year is = 𝑃01 = 𝑃1 × 100
0
𝑃1
×100
𝑃0
i) The Arithmetic mean index number 𝑃01 = 𝑛
1
𝑃1 𝑛
𝑃01 = 𝜋 × 100, 𝑤𝑒𝑟𝑒 𝜋 𝑑𝑒𝑛𝑜𝑡𝑒𝑠 𝑡𝑒 𝑝𝑟𝑜𝑑𝑢𝑐𝑡
𝑃0
𝑃
log 𝑃1 × 100
0
𝐻𝑒𝑛𝑐𝑒 log 𝑃01 =
𝑛
Problem:
For the above example, we find the index number of the price relatives taking 1991 as the base
year using i) Arithmetic mean ii) Geometric mean.
Solution:
Price in Price in 𝑷𝟏 𝑷𝟏
𝐥𝐨𝐠
Commodities 1991 1992 𝑷𝟎 𝑷𝟎
𝑷𝟎 𝑷𝟏 × 𝟏𝟎𝟎 × 𝟏𝟎𝟎
Rice
7 8 114.3 2.0580
Oil
40 45 112.5 2.0512
Flour
4.5 5.25 116.7 2.0671
Total
559.6 10.2435
559.6
𝑃01 = = 111.92
5
10.2435
log 𝑃01 = = 2.048%
5
Problem:
From the following data of the whole sale price of rice for the 5 years construct the index
numbers taking (i) 1987 as the base (ii) 1990 as the base.
Index
Price of
Years numbers (base
rice per kg
1987)
From the index number table we observe that from 1987 to 1988 these is a increase of
20% in the price of rice per kg; for 1987 to 1989 there is a increase of 30% in the price of
rice per kg etc.
Rice
700 750 825
Wheat
540 575 600
Ragi
300 325 310
Cholam
250 280 295
Flour
320 330 335
Ravai
325 350 360
750 825
× 100 × 100
Rice 700 750 825 700 700
= 107.1 = 117.9
575 600
× 100 × 100
Wheat 540 575 600 540 540
= 106.5 = 111.1
325 310
× 100 × 100
Ragi 300 325 310 540 300
= 108.3 = 103.3
280 295
× 100 × 100
Cholam 250 280 295 250 250
= 112 = 118
330 325
× 100 × 100
Flour 320 330 335 320 320
= 103.1 = 101.6
350 360
× 100 × 100
Ravai 325 350 360 325 325
= 107.7 = 110.8
From the following average prices of the three groups of commodities given in rupees per unit
find (i) fixed base index number (ii) chain base index numbers with 1988 as the base year and
A 2 3 4 5 6
B 8 10 12 15 18
C 4 5 8 10 12
Solution:
3 5
× 100 × 100 6
A 100 2 2 × 100 = 300
2
= 150 = 250
12
10 8 15
× 100 × 100 18
B 100 8 8
× 100 × 100 = 225
8
= 125 = 188
= 150
2 3 5 6
2 2 4 4 5
× 100
A 3
× 100 × 100 × 100 × 100
= 133.3
= 100 = 150 = 125 = 120
8 10 12 15 18
8 8 10 12 15
B
× 100 × 100 × 100 × 100 × 100
4 5 10 12
4 4 8 8 10
× 100
C 5
× 100 × 100 × 100 × 100
= 160
= 100 = 125 = 125 = 120
Total
Index
number
100 133.3 137.8 125 120
(AM)
Though there are many formulae to calculate index number in this method we give below
some standard formulae which are very often used.
According to Laspeyre‟s method the prices of the commodities in the base year as well as the
current year are known and they are weighted by the quantities used in the base year.
𝑃1 𝑞 0
𝐿𝐼01 = × 100
𝑝0 𝑞0
According to paasche‟s method current year quantities are taken as weights and
hence paasche‟s index number is defined.
𝑃1 𝑞 1
𝐿𝐼01 = × 100
𝑝0 𝑞1
According to this method the weight is the sum of the quantities of the base
period and current period.
𝑃1 𝑞 0 + 𝑃1 𝑞 1
𝑀𝐼01 = × 100
𝑝0 𝑞0 + 𝑝0 𝑞1
1 𝑃1 𝑞 0 𝑃1 𝑞 1
𝐵𝐼01 = 2 + × 100
𝑝0 𝑞0 𝑝0 𝑞1
𝑃1 𝑞 0 𝑃1 𝑞 1
𝐼01 = × x 100
𝑝0 𝑞0 𝑝0 𝑞1
According to Kelley, weight may be taken as the quantities of the period which is
not necessarily the
base year or current year. The average quantity of two or more years may be taken as
the weight.
𝑃1 𝑞
𝐾𝐼01 = x 100. Where q is the average quantity of two or more years.
𝑃0 𝑞
Example :
Calculate i) Laspeyre‟s (ii) Paasche‟s iii) Fisher‟s index number for the following data given
below. Hence or otherwise find Edgeworth and Bowley‟s index number.
A
2 10 3 12
B
5 16 6.5 11
C
3.5 18 4 16
D
7 21 9 25
E
3 11 3.5 20
Solution:
1 1
Commodities 990 992 𝑷𝟎 𝒒𝟎 𝑷𝟎 𝒒𝟏 𝑷𝟏 𝒒𝟎 𝑷𝟏 𝒒𝟏
𝑷𝟎 𝒒𝟎 𝑷𝟏 𝒒𝟏
A 2 10 3 12 20 24 30 36
B 5 16 6.5 11 80 55 104 71.5
C 3.5 18 4 16 63 56 72 64
D 7 21 9 25 147 175 189 225
E 3 11 3.5 20 33 60 38.5 70
Total 343 370 433.5 466.5
𝑃1 𝑞 0
i) Las Peyre‟s Index number = × 100
𝑝0 𝑞0
433.5
= × 100 = 126.4
343
𝑃1 𝑞 0 𝑃1 𝑞 1
iii) Fisher‟s Ideal index number = × × 100
𝑝0 𝑞0 𝑝0 𝑞1
433.5 × 466.5
= × 100
343 × 370
= 126.2
𝐿𝐼01 + 𝑃𝐼01 126.4+126.1
iv)Bowly‟s Index numbers = =
2 2
= 126.25
𝑃1 𝑞 0 + 𝑃1 𝑞 1
v) Edge – Worth‟s Index number = × 100
𝑝0 𝑞0 + 𝑝0 𝑞1
433.5 + 466.5
= × 100
343 + 370
900
= × 100
713
= 126.2
In this method the index number is computed by taking the weighted Arithmetic mean of
price relatives. Thus if P is the price relative and V is the value weights 𝑃0 𝑞0 then the index
𝑃𝑉
number 𝑃01 = 𝑉
Example :
Qua
Price Price
ntity P
in in 𝑉
Commodity in 𝑃1 PV
1990 1992 𝑃0 𝑞0
1990 𝑃0
𝑃0 𝑃1
𝑞0 × 100
An index number is said to be ideal index number if it is subjected to the following three
test.
Let 𝐼(01) denote the index number of the current year 𝑦1 relative to the base year 𝑦0
without considering percentage, and 𝐼(01) denotes the index number of the base year 𝑦0 relative
to the current year 𝑦1 without considering the percentage. If 𝐼(01) × 𝐼 10 = 1, then we say that
the index number satisfies the time reversal test.
In this test the prices and quantities are interchanged, without considering the percentage,
𝑃1 𝑞 1
satisfying the following relation 𝐼(𝑝𝑞 ) × 𝐼(𝑞𝑝 ) = , where 𝐼(𝑝𝑞 ) is the price index of the
𝑝0 𝑞0
The index number should be independent of the order in which different commodities are
considered. This test is satisfied by almost all index numbers.
Problem:
Construct Fisher‟s index number and show that it statistics both the factor reversal test
and time reversal test.
Commodity aA bB cC dD
Base year price in 55 66 44 33
Rupees
Base year quantity
in Quintals 50 40 120 30
Current year price 7 8 5 4
in Rupees
Current year 35
quantity in Quintals 60 50 110
Solution:
Base Current
Commodity 𝑷𝟎 𝒒𝟎 𝑷𝟎 𝒒𝟏 𝑷𝟏 𝒒𝟎 𝑷𝟏 𝒒𝟏
year year
𝑷𝟎 𝒒𝟎 𝑷𝟏 𝒒𝟏
A 5 50 7 60 250 300 350 420
B 6 40 8 50 240 300 320 400
C 4 120 5 110 480 440 600 550
𝑃1 𝑞 0 𝑃1 𝑞 1
Fisher‟s Index number is 𝐼01 = × × 100
𝑝0 𝑞0 𝑝0 𝑞1
1390 1510
= × × 100
1060 1145
𝑃1 𝑞 0 𝑃1 𝑞 1
Now, 𝐼01 = ×
𝑝0 𝑞0 𝑝0 𝑞1
1390 1060
= ×
1510 1390
𝑃1 𝑞 0 𝑃1 𝑞 1 1390 1510
𝐼 𝑃𝑞 = × = × 1145
𝑝0 𝑞0 𝑝0 𝑞1 1060
𝑝1 𝑞0 𝑃1 𝑞 1
𝐼 𝑃𝑞 = ×
𝑝0 𝑞0 𝑃0 𝑞 1
1145 1510
= ×
1060 1390
1510 𝑃1 𝑞 1
=1060 = 𝑝0 𝑞0
Problem:
Find the missing price in the following data if the ratio between Laspeyre‟s and Paasche‟s index
numbers is 25:24.
Current
Base Year
year
Commodities
Solution :
Base Current
Commodities 𝑷𝟎 𝒒𝟎 𝑷𝟎 𝒒𝟏 𝑷𝟏 𝒒𝟎 𝑷𝟏 𝒒𝟏
year year
𝑷𝟎 𝒒𝟎 𝑷𝟏 𝒒𝟏
A 1 15 2 15 15 15 30 30
B 2 15 X 30 30 60 15x 30x
Total 45 75 30+15x 30+30x
𝑃1 𝑞0
𝐿𝐼 01 = × 100
𝑝0 𝑞0
𝑃1 𝑞1 30 + 30𝑥
𝑃𝐼 01 = × 100 = × 100
𝑝0 𝑞1 75
Given 𝐿𝐼 01 ∶ 𝑃𝐼 01 = 25 ∶ 24
30 + 15𝑥 30 + 30𝑥
∴ × 100 ∶ × 100
45 75
= 25:24
30+15𝑥 30+30𝑥
∴ 24 = 25
45 75
72 30 + 15𝑥 = 45 30 + 30𝑥
8 30 + 15𝑥 = 5 30 + 30𝑥
8 × 15 2 + 𝑥 = 5 × 30 1 + 𝑥
4 2+𝑥 =5 1+𝑥
∴ 8 + 4𝑥 = 5 + 5𝑥
5𝑥 − 4𝑥 = 8 − 5
𝑥=3
Interpolation:
Definition:
Interpolation is the process of finding the most appropriate estimate for missing data. It is the art
of reading between the lines of a table.
It is also possible that we may require information for future in which case the process of
estimating the most appropriate value is known as extrapolation. There are two methods in
interpolation.
i) Graphic method is a simple method in which we just plot the available data on a graph
sheet and read off the value for the missing period from the graph itself.
There are several methods used for interpolation of which we deal with the
following.
Finite Differences:
Definition:
Example :
x 0 1 2 3 4
𝑼𝒙 8 11 9 15 6
X 𝑼𝒙 ∆𝑼𝒙 ∆ 𝟐 𝑼𝒙 ∆ 𝟑 𝑼𝒙 ∆ 𝟒 𝑼𝒙
8
3
0
11 -5
1
-2 13
2
9 8 -36
3
6 -23
4 15 -15
-9
6
The differences ∆𝑈𝑥 , ∆2 𝑈𝑥 𝑒𝑡𝑐.are called forward differences. In contrast to the forward
differences we have another kind of differences known as backward differences.
∆𝑟 𝑈𝑥 = constant if r=n
0 if r=n
i.e.) the 𝑛𝑡 order difference of a polynomial of degree n is constant and differences of
order higher than n are zero.
∆𝑈𝑥 = 𝑈𝑥+ − 𝑈𝑥
− 𝑎0 𝑥 𝑛 + 𝑎1 𝑥 𝑛−1 + ⋯ . + 𝑎𝑛
∴ ∆𝑛 𝑈𝑥 = 𝑎0 𝑛 𝑛 − 1 𝑛 − 2 … . .2.1 𝑛 𝑥 0
= 𝑎0 𝑛! 𝑛
= 𝑐𝑜𝑛𝑠𝑡𝑎𝑛𝑡. 𝑎𝑛𝑑 ∆𝑟 𝑈𝑥 = 0 𝑓𝑜𝑟 𝑟 > 𝑛.
Problem:
𝑥
𝑖)𝑈𝑥 = 𝑎𝑏 𝑐𝑥 𝑖𝑖 𝑈𝑥 = taking interval of differencing as h.
𝑥 2 +7𝑥+12
i) ∆𝑈𝑥 = 𝑈𝑥+ − 𝑈𝑥
= 𝑎𝑏 𝑐 𝑥+
− 𝑎𝑏 𝑐𝑥
= 𝑎𝑏 𝑐𝑥 𝑎𝑏 𝑐 − 𝑎𝑏 𝑐𝑥
= 𝑎𝑏 𝑐𝑥 𝑏 𝑐 − 1
∆2 𝑈𝑥 = 𝑏 𝑐 − 1 ∆ 𝑎𝑏 𝑐𝑥
= 𝑏 𝑐 − 1 2
𝑎𝑏 𝑐𝑥
𝑥
𝑖𝑖) 𝑈𝑥 =
𝑥2 + 7𝑥 + 12
𝑥 3
= − (by partial fraction)
𝑥+4 𝑥+3
4 3 4 3
∆𝑈𝑥 = − − −
𝑥+1 +4 𝑥+1 +3 𝑥+4 𝑥+3
4 3 4 3
= − 𝑥+4 − 𝑥+4 + 𝑥+3
𝑥+5
4 7 3
= − 𝑥+4 + 𝑥+3
𝑥+5
4 11𝑥 10 3
𝑆𝑖𝑚𝑖𝑙𝑎𝑟𝑙𝑦, ∆2 𝑈𝑥 = − + −
𝑥+6 𝑥+5 𝑥+4 𝑥+3
Problem:
If 𝑈0 = 1, 𝑈 = 5, 𝑈2 = 8, 𝑈3 = 3, 𝑈4 = 7, 𝑈5 = 0 𝑓𝑖𝑛𝑑 ∆5 𝑈0 .
X 𝑼𝒙 ∆𝑼𝒙 ∆𝟐 𝑼𝒙 ∆ 𝟑 𝑼𝒙 ∆ 𝟒 𝑼𝒙 ∆ 𝟓 𝑼𝒙
0 1
4
-
1 5 -1
7
3
1
2 8 -8 24
7
-5 -61
3 3 9 -37
-20
4
4 7 -11
-7
5 0
Hence , ∆5 𝑈0 = −61
Problem:
X
0 1 2 3 4
x
𝑈𝑥 1 3 9 -
81
Explain why the resulting value from 33
X 𝑼𝒙 ∆𝑼𝒙 ∆𝟐 𝑼𝒙 ∆ 𝟑 𝑼𝒙 ∆𝟒 𝑼 𝒙
0 1
2
1 3 4
6 a-19
2 9 a-15 124 – 4a
a-9 105-3a
3 Q 90-2a
81-a
4 81
In particular ∆4 𝑈0 = 0
Hence 124 – 4a =0
a = 31
Problem:
∆5 𝑈𝑥 = 0, for all 𝑥
In particular, ∆5 𝑈0 = 0
𝐸 − 1 5 𝑈0 = 0
(𝐸 5 − 5𝐸 4 + 10𝐸 3 − 10𝐸 2 + 5𝐸 − 1)𝑈0 = 0
∴ 𝑈5 − 5𝑈4 + 10𝑈3 − 10𝑈2 + 5𝑈1 − 𝑈0 = 0
∴ 501 − 5 × 497 + 10𝑎 − 10 × 421 + 5 × 391 − 363 = 0
∴ 501 − 2335 + 10𝑎 − 4210 + 1955 − 363 = 0
10𝑎 − 4452 = 0
𝑎 = 445.2 𝑙𝑎𝑘𝑠
Problem:
x 0 5 10 15 20 25
𝑈𝑥 7 11 ? 18 ? 32
Here ,two values are missing. Let the missing values be a and b.
In particular, ∆4 𝑈0 = 0 𝑎𝑛𝑑 ∆4 𝑈1 = 0
𝐸 − 1 4 𝑈0 = 0
∴ 𝐸 4 − 4𝐸 3 + 6𝐸 2 − 4𝐸 + 1 𝑈0 = 0
𝑈4 − 4𝑈3 + 6𝑈2 − 4𝑈1 + 𝑈0 = 0
𝑏 − 72 + 6𝑎 − 44 + 7 = 0
𝑖𝑒) 6𝑎 + 𝑏 = 10𝑞 … … … . . (1)
Taking ∆4 𝑈1 = 0
𝐸 4 − 4𝐸 3 + 6𝐸 2 − 4𝐸 + 1 𝑈1 = 0
𝑈5 − 4𝑈4 + 6𝑈3 − 4𝑈2 + 𝑈1 = 0
32 − 4𝑏 + 108 − 4𝑎 + 11 = 0
4𝑎 + 4𝑏 = 151 … … … . . 2
A = 14.25, b=23.5
Problem:
𝑈2 + 𝑈6 = 5; 𝑈3 + 𝑈5 = 10 𝑓𝑖𝑛𝑑 𝑈4.
𝐼𝑛 𝑝𝑎𝑟𝑡𝑖𝑐𝑢𝑙𝑎𝑟 ∆8 𝑈0 = 0
𝐻𝑒𝑛𝑐𝑒 𝐸 − 1 8 𝑈0 = 0
𝑈8 − 8𝑈7 + 28𝑈6 − 56𝑈5 + 70𝑈4 − 56𝑈3 + 28𝑈2 − 8𝑈1 + 𝑈0 = 0
𝑈0 + 𝑈8 − 8 𝑈1 + 𝑈7 + 28 𝑈2 + 𝑈6 − 56 𝑈3 + 𝑈5 + 70𝑈4 = 0
∴ 80 − 80 + 140 − 560 + 70𝑈4 = 0
∴ 70𝑈4 = 420
∴ 𝑈4 = 6.
Problem:
Solution :
Let 𝑈𝑥 = 𝑎𝑥 2 + 𝑏𝑥 + 𝑐
𝑈1 = 𝑎 + 𝑏 + 𝑐, 𝑈2 = 4𝑎 + 2𝑏 + 𝑐, 𝑈3 = 9𝑎 + 3𝑏 + 𝑐
Given, 𝑈1 + 𝑈2 + 𝑈3 = 25
∴ 14𝑎 + 6𝑏 + 3𝑐 = 25 … … . . 1
𝑁𝑜𝑤, 𝑈4 = 24
⇒ 16𝑎 + 4𝑏 + 𝑐 = 24 … … … … … 2
𝑈5 + 𝑈6 = 113
61𝑎 + 11𝑏 + 2𝑐 = 113
A = 2, b=-1, c=1
∴ 𝑈𝑥 = 2𝑥 2 − 𝑥 + 1
Problem:
If 𝑈1 = 12 − 𝑥 4 + 𝑥 ;
𝑈2 = 5 − 𝑥 4 − 𝑥 ,
𝑈3 = 𝑥 + 18 𝑥 + 6 𝑎𝑛𝑑 𝑈4 = 9.
Solution :
∆3 𝑈1 = 0
𝐻𝑒𝑛𝑐𝑒 𝐸 − 1 3 𝑈1 = 0
∴ 𝐸 3 − 3𝐸 2 + 3𝐸 − 1 𝑈1 = 0
𝑈4 − 3𝑈3 + 3𝑈2 − 𝑈1 = 0
∴ 94 − 3 𝑥 + 18 𝑥 + 6 + 3 5 − 𝑥 4 − 𝑥 − ? 12 − 𝑥 4 + 𝑥 = 0
𝑖𝑒) 𝑥 2 − 107𝑥 − 218 = 0
𝑥 − 109 𝑥 + 2 = 0
𝑥 = 109 𝑜𝑟 − 2
Newton’s Formula:
∆𝑈𝑎 ∆2 𝑈𝑎
𝑈𝑥 = 𝑈𝑎 + 𝑥 − 𝑎 + 𝑥−𝑎 𝑥−𝑎−
1! 2! 2
∆𝑛 𝑈𝑎
+ ⋯…. 𝑥 − 𝑎 𝑥 − 𝑎 − …… 𝑥 −𝑎 − 𝑛 − 1
𝑛! 𝑛
∇𝑈𝑎 +𝑛 ∇2 𝑈𝑎 +𝑛
𝑈𝑥 = 𝑈𝑎+𝑛 + 𝑥 − 𝑎 + 𝑛 +
1! 2! 2
∇𝑛 𝑈𝑎+𝑛
𝑥 − 𝑎 + 𝑛 (𝑥 − 𝑎 + 𝑛 − 1 + ⋯ … . + 𝑥 − 𝑎 + 𝑛 …
𝑛! 𝑛
… (𝑥 − 𝑎 + )
Problem:
Solution :
∴ 𝑎 + 𝑟 = 79
75 + 5𝑟 = 79
4
𝑟= =0.8
5
𝑟 𝑟 𝑟 −1
𝑈𝑎+𝑟 = 𝑈𝑎 + 1! ∆𝑈𝑎 + ∆2 𝑈𝑎 + ⋯ …
2!
X 𝑼𝒙 ∆𝑼𝒙 ∆ 𝟐 𝑼𝒙 ∆ 𝟑 𝑼𝒙
75 246
-44
80 202 -40
-84 46
85 118 6
-78
90 40
Problem:
By using Gregory – Newton‟s formula find 𝑈𝑥 for the following data. Hence estimate
(i) 𝑈1.5 𝑖𝑖 𝑈9
𝑈0 𝑈1 𝑈2 𝑈3 𝑈4
1 11 21 28 29
X 𝑈𝑥 ∆𝑼𝒙 ∆ 𝟐 𝑼𝒙 ∆ 𝟑 𝑼𝒙 ∆ 𝟒 𝑼𝒙
10
0 1
0
1 11
10 -3
2 21
-3 0
3 28
7 -3
4 29
1 -6
1
Here the third order difference are constant and hence required function is a
polynomial of degree 3.
∇𝑈𝑎 ∆2 𝑈0 2
𝑈𝑥 = 𝑈𝑎 + 𝑥 − 𝑎 + 𝑥−𝑎 𝑥−𝑎− + ⋯ ….
1! 2!
10 0 −3
∴ 𝑈𝑥 = 1 + 𝑥 − 0 × + 𝑥−1 × +𝑥 𝑥−1 𝑥−2 ×
1! 2! 3!
𝑥 𝑥−1 𝑥−2
= 1 + 10𝑥 −
2
1
= 2 + 20𝑥 − 𝑥 3 + 3𝑥 2 − 2𝑥
2
1
𝑈𝑥 = −𝑥 3 + 3𝑥 2 + 18𝑥 + 2
2
1
i) ∴ 𝑈1.5 = [−(1.5)3 + 3(1.5)2 + 18 1.5 + 2]
2
1
= −3.375 + 6.75 + 27 + 2
2
= 16.188
Problem:
Solution :
Year
Population 𝑈𝑥 ∆𝑼𝒙 ∆ 𝟐 𝑼𝒙 ∆ 𝟑 𝑼𝒙 ∆ 𝟒 𝑼𝒙 ∆ 𝟓 𝑼𝒙
x
300
1941 2500 100
1951 2800 400 0
1961 3200 100 50
∴ 𝑈𝑎+𝑟 = 𝑈1945
𝐻𝑒𝑛𝑐𝑒 1941 + 10 = 1945
𝑟 = 0.4
50 25
0.4 0.4 − 1 0.4 − 3 × + 0.4 0.4 − 1 0.4 − 2 0.4 − 3 0.4 − 4 ×
4! 5!
= 2500 + 120 − 12 − 2.08 + 0.75
= 2606.67 ≅ 2607
4. From the following data estimate the number of persons whose daily wage is
between Rs. 40-50.
0-20 120
20-40 145
40-60 200
60-80 250
80-100 150
The less than cumulative frequency table of the above data is given by
20 120
40 265
60 465
80 715
100 865
X 𝑈𝑥 ∆𝑼𝒙 ∆ 𝟐 𝑼𝒙 ∆ 𝟑 𝑼𝒙 ∆ 𝟒 𝑼𝒙
20 120
145 55
40 265
200 -5
60 465
250 50 -145
80 715
150 -150
100 865 -100
Find 𝑈50
𝑈50 = 𝑈𝑎+𝑟
𝑎 = 20, = 20
50 = 20 + 20𝑟
𝑟 = 1.5
Problem:
The following data gives the melting point of an alloy of lead and zinc. 𝜃is the
temperature in degrees centigrade and x is the temperature of lead.
X 40 50 60 70 80 90
𝜃 184 204 226 250 276 304
Find 𝜃 𝑤𝑒𝑛 𝑖 𝑥 = 42 𝑖𝑖 𝑥 = 38
Solution :
Problem:
The following table gives the census population of a town for the years 1931 – 1971. Estimate
the population (i) for the year 1965, (ii) for the year 1933 by using an appropriate interpolation
formula.
Y
ear 1931 1941 1951 1961 1971
Population
in lakhs 36 66 81 93 101
Population
𝛁𝑼𝒙 𝛁 𝟐 𝑼𝒙 𝛁 𝟑 𝑼𝒙 𝛁 𝟒 𝑼𝒙
Year 𝑈𝑥
1931 36
30
1941 66 -15
15 12
1951 81 -3 -13
12 -1
1961 93 -4
8
1971 101
To find 𝑈1965
𝐻𝑒𝑟𝑒 𝑎 + 𝑛 = 1971, = 10
𝑈𝑎+𝑛+𝑟 = 𝑈1965
𝑎 + 𝑛 + 𝑟 = 1965
∴ 1971 + 𝑟 = 1965
𝑟 = −0.6
Population
Year ∆𝑼𝒙 ∆ 𝟐 𝑼𝒙 ∆ 𝟑 𝑼𝒙 ∆ 𝟒 𝑼𝒙
𝑈𝑥
1931 36
30 -15
1941 66
15 12
1951 81
12 -3 -13
1961 93
8 -1
1971 101
-4
To find 𝑈1933
𝑎 = 1931 𝑎𝑛𝑑 = 10
𝑈𝑎+𝑟 = 𝑈1931
𝐻𝑒𝑛𝑐𝑒 𝑎 + 𝑟 = 1933
∴ 1931 + 10𝑟 = 1933
𝑟 = 0.2
Lagrange’s formula
𝑥 − 𝑎2 𝑥 − 𝑎3 … … 𝑥 − 𝑎𝑛 𝑥 − 𝑎1 𝑥 − 𝑎3 … … 𝑥 − 𝑎𝑛
𝑈𝑥 = 𝑈𝑎 1 + 𝑈
𝑎1 − 𝑎2 𝑎1 − 𝑎3 … … 𝑎1 − 𝑎𝑛 𝑎2 − 𝑎1 𝑎2 − 𝑎3 … … 𝑎2 − 𝑎𝑛 𝑎 2
𝑥 − 𝑎1 𝑥 − 𝑎2 … … 𝑥 − 𝑎𝑛−1
+ ⋯+ 𝑈
𝑎𝑛 − 𝑎1 𝑎𝑛 − 𝑎2 … … 𝑎𝑛 − 𝑎𝑛−1 𝑎 𝑛
Problem:
Solution :
Take 𝑎1 = 1, 𝑎2 = 2; 𝑎3 = 4; 𝑎4 = 7 𝑎𝑛𝑑 𝑥 = 5.
𝑈5
5−2 5−4 5−7 5−1 5−4 5−7
= × 4+ ×7
1−2 1−4 1−7 2−1 2−4 2−7
5−1 5−2 5−7 5−1 5−2 5−4
+ × 13 + × 30
4−1 4−2 4−7 7−1 7−2 7−4
3 × 1 × −2 4 × 1 × −2 4 × 3 × −2
= × 4+ ×7+
−1 −3 −6 1 −2 −5 3 × 2 × −3
4 × 3×1
× 13 × 30
6×5×3
4 28 52
= − + +4
3 5 3
= 17.06
Find the form of the function 𝑈𝑥 for the following data. Find 𝑈3
X 0 1 2 5
𝑈𝑥 2 3 12 147
Solution :
Here 𝑎1 = 0; 𝑎2 = 1; 𝑎3 = 2; 𝑎4 = 5
− 𝑥 3 − 8𝑥 2 + 17𝑥 − 10 3 𝑥 3 − 7𝑥 2 + 10𝑥
= + − 2 𝑥 3 − 6𝑥 2 + 5𝑥
5 4
𝑥 3 − 3𝑥 2 + 2𝑥
+ × 147
60
1 3
= 𝑥 −12 + 45 − 120 + 147 + 𝑥 2 96 − 315 + 720 − 441
60
+ 𝑥 −204 + 450 − 600 + 294 + 120
1
= 60𝑥 3 + 60𝑥 2 − 60𝑥 + 120
60
∴ 𝑈𝑥 = 𝑥 3 + 𝑥 2 − 𝑥 + 2
∴ 𝑈3 = 33 + 32 − 3 + 2
= 35
% number of
Age
criminals
Under 25 years 52.0
Under 30 years 67.3
Under 40 years 84.1
Under 50 years 94.4
Solution :
𝑈35 =
35−30 35−40 35−50 35−25 35−40 35−50
× 52.0 + × 67.3 +
25−30 25−40 25−40 30−25 30−40 30−50
35−25 35−30 35−50 35−25 35−30 35−40
× 84.1 + × 94.4
40−25 40−30 40−50 50−25 50−30 50−40
Prepared by
Ms. C. KANI
Assistant Professor of Mathematics, St. Jude‟s College,
Thoothoor - 629 176, Kanyakumari District.