15ma305 - Statistics For Information Technology: Dr. N. Balaji Asst. Professor (SG)
15ma305 - Statistics For Information Technology: Dr. N. Balaji Asst. Professor (SG)
15ma305 - Statistics For Information Technology: Dr. N. Balaji Asst. Professor (SG)
Prepared by
PURPOSE The purpose of this course is to make the students learn about the applications of statistical
tools and techniques in different field.
STUDENT
INSTRUCTIONAL OBJECTIVES
OUTCOMES
At the end of the course, student will be able
1. To gain knowledge in measures of central tendency and dispersion a e
2. To learn about methods of studying correlation and regression. a e
3. To have knowledge about analysis of time series a e
4. To gain knowledge about ANOVA a e
5. To understand the fundamentals of quality control and the methods used to a e
control systems and processes
C-
Session Description of Topic Contact IO Referenc
D-I-
hours s e
O
UNIT I: INTRODUCTION TO STATISTICS (numerical 12
problems only)
1. Introduction to uni-variate data 1 C, I 1 1-7
Measures of central tendency: Arithmetic mean, Median,
2. 2 C,I 1 1-7
Definition,Problems
M di Geometric
Mode, D fi i i Mean
P bl
and Harmonic Mean: Definition,
3. 2 C,I 1 1-7
Problems
4. Measures of dispersion: Range, Quartile deviation, Mean
2 C,I 1 1-7
deviation, Definition, Problems
5. Standard deviation and Co-efficient of variation: Definition,
2 C,I 1 1-7
Problems
6. Skewness, Definition, Problems 1 C,I 1 1-7
7. Kurtosis and Moments, Definition, Problems 2 C,I 1 1-7
K
UNIT II: CORRELATION AND REGRESSION 11
ANALYSIS
20. Method of simple averages (weekly, monthly and quarterly) 2 C,I 3 1,3,4
30. control charts for variables - Mean and Range chart (X Bar
2 C,I 5 1,3,4
and R)
31. control charts for variables - Mean and Standard deviation
2 C,I 5 1,3,4
chart (X Bar and s)
32. Introduction to Attributes Control charts 1 C,I 5 1,3,4
33. Control chart for the number of defectives (np-chart) 2 C,I 5 1,3,4
34. Control chart for the fraction of defectives (p-chart) 2 C,I 5 1,3,4
35. Control chart for the number of defects (c-chart) 2 C,I 5 1,3,4
Solution:
Given data x : 10, 20, 30, 40, 50.
1 P 1 150
∴ A.M.= x = [10 + 20 + 30 + 40 + 50] = = 30
N 5 5
Solution:
1
Unit I − Introduction to Statistics 2
1.1.2 Median
Solution:
First, we rearrange the data in ascending order
Sl.No. 1 2 3 4 5 6 7 8 9 10
Data 384 391 407 522 591 672 753 777 1490 2488
th
N +1
∴ Median = the size of item
2
th
10 + 1
= the size of item
2
= the size of 5.5th item
5th item + 6th item 591 + 672
= = = 631.5
2 2
Solution:
STATISTICS FOR INFORMATION TECHNOLOGY 3
Income(Rs.) in
f c.f
ascending order
800 16 16
1000 24 40
1500 26 66
1800 30 96
2000 20 116
2500 6 122
N =122
Here N/2 = 61.
∴ the c.f. just greater than 61 is 66.
∴ Median = 1500.
Wages(Rs.) f c.f
49.5 − 59.5 15 15
59.5 − 69.5 40 55
Thus, median=Rs.84.08
Unit I − Introduction to Statistics 4
1.1.3 Mode
Problem 1 From the following data of the height of 100 persons in a commercial concern determine the modal
height:
x 58 60 61 62 63 64 65 66 68 70
f 4 6 5 10 20 22 24 6 2 1
Solution:
First, form the grouping table as follows:
Analysis table:
col. 58 60 61 62 63 64 65 66 68 70
1 1
2 1 1
3 1 1
4 1 1 1
5 1 1 1
6 1 1 1
Total 1 3 5 4 1
Since the value 64 has occurred the maximum number of times. i.e., 5.
Thus the mode is 64 inches.
No. of No. of
Marks Marks
students students
Above 0 80 Above 60 28
” 10 77 ” 70 16
” 20 72 ” 80 10
” 30 65 ” 90 8
” 40 55 ” 100 0
” 50 43
Solution: Since this is cumulative frequency distribution, we first convert into a simple frequency distribution.
5
No. of
Marks
students
0 − 10 3
10 − 20 5
20 − 30 7
30 − 40 10
40 − 50 12
50 − 60 15
60 − 70 12
70 − 80 6
80 − 90 2
90 − 100 8
h( f1 − f0 )
M ode = l +
2 f1 − f0 − f2
Where
l =lower limit= 50
h =common width= 10
f1 =frequency of the modal class= 15
f0 =frequency of the preceding modal class= 12
f2 =frequency of the succeeding modal class= 12
h(f1 − f0 )
∴ M ode =l+
2f1 − f0 − f2
10(15 − 12)
= 50 +
2(15) − 12 − 12
30
= 50 +
6
= 50 + 5 = 55.
Solution
x log x
85 1.9294
70 1.8451
15 1.1761
75 1.8751
500 2.6990
8 0.9031
45 1.0532
250 2.3979
40 1.6021
36 1.5563
17.6373
Unit I − Introduction to Statistics 6
X
∴ log x = 17.6373
n
P
log xi
i=1
∴ G.M. = Antilog
n
17.6373
= Antilog
10
Solution
Marks Mid. x f f log m
(m)
0 − 10 5 5 3.4950
10 − 20 15 7 8.2327
20 − 30 25 15 20.9685
30 − 40 35 25 38.6025
40 − 50 45 8 13.2256
60 84.5243
n
P
f log mi
i=1
∴ G.M. = Antilog
N
84.5243
= Antilog
60
Solution
1
x x
2574 0.0004
475 0.0021
75 0.0133
5 0.2000
0.8 1.2500
0.08 12.5000
0.005 200.0000
0.0009 1111.1111
1325.0769
Introduction to Statistics 7
N 8
∴ H.M. = P = = 0.006
1/x 1325.0769
Problem 2 From the following data compute the Harmonic mean:
Marks: 10 20 25 40 50
No. of
20 30 50 15 5
students:
Solution
x f f /x
10 20 2.000
20 30 1.500
25 50 2.000
40 15 0.375
50 5 0.100
120 5.975
N 120
∴ H.M. = P = = 20.08
(f /x) 5.975
Note: Relation among averages: A.M ≥ G.M. ≥ H.M.
Miscellaneous Problems
Problem 1 Calculate mean, median and mode from the following data:
x 10 − 20 20 − 30 30 − 40 40 − 50 50 − 60 60 − 70 70 − 80 80 − 90
f 4 12 40 41 27 13 9 4
Solution:
x − 45
C.I. m f d= fd c.f.
10
10 − 20 15 4 −3 −12 4
20 − 30 25 12 −2 −24 16
30 − 40 35 40 −1 −40 56
40 − 50 45 41 0 0 97
50 − 60 55 27 1 27 124
60 − 70 65 13 2 26 137
70 − 80 75 9 3 27 146
80 − 90 85 4 4 16 150
150 20
Mean:
P
fd
∴ x =A+h
N
20
= 45 + 10 × = 45 + 1.333 = 46.333
150
Median:
Here N/2 = 75, ∴ Median class = 40 − 50
h N
∴ Median = l + − c.f
f 2
10
= 40 + (75 − 56)
41
= 40 + 4.634 = 44.634
Unit I − Introduction to Statistics 8
Mode:
Since the highest frequency is 41, Mode lies in the class 40−50. Here l = 40, h = 10, f1 = 41, f0 = 40, f2 = 27
h(f1 − f0 )
M ode =l+
2f1 − f0 − f2
10(41 − 40)
= 40 +
2(41) − 40 − 27
= 40 + 0.67 = 40.67
Problem 2 Compute arithmetic mean, median and mode from the following data:
x Below 10 Below 20 Below 30 Below 40 Below 50 Below 60 Below 70 Below 80
f 147 150
Solution:
This is a cumulative frequency distribution. Let us first convert it to a simple frequency distribution and then
calculate mean, median and mode.
x − 45
C.I. m f d= fd c.f.
10
0 − 10 5 5 −4 −20 5
10 − 20 15 14 −3 −42 19
20 − 30 25 29 −2 −58 48
30 − 40 35 21 −1 −21 69
40 − 50 45 25 0 0 94
50 − 60 55 21 1 21 115
60 − 70 65 10 2 20 125
70 − 80 75 7 3 21 132
80 − 90 85 15 4 60 147
90 − 100 95 3 5 15 150
150 −4
Mean:
P
fd
∴ x =A+h
N
4
= 45 − 10 × = 45 − 0.267 = 44.733
150
Median:
Here N/2 = 75, ∴ Median class = 40 − 50
h N
∴ Median =l+ − c.f
f 2
10
= 40 + (75 − 69)
25
= 40 + 2.4 = 42.4
STATISTICS FOR INFORMATION TECHNOLOGY 9
Mode:
Since it is an irregular distribution, we first, form the grouping table as follows:
Analysis table:
col. 0 − 10 10 − 20 20 − 30 30 − 40 40 − 50 50 − 60 60 − 70 70 − 80 80 − 90 90 − 100
1 1
2 1 1
3 1 1
4 1 1 1
5 1 1 1
6 1 1 1
Total 1 4 5 3 1
h(f1 − f0 )
M ode =l+
2f1 − f0 − f2
10(21 − 29)
= 30 +
2(21) − 29 − 25
= 30 + 6.67 = 36.67
Solution:
Range:
Range = G − S = 121 − 46
Coefficient of range:
G−S 121 − 46
Coefficient = = = 0.449
G+S 121 + 46
Problem 2 Compute the coefficient of quartile deviation (Q.D.) from the following data.
Unit I − Introduction to Statistics 10
Marks: 10 20 30 40 50 80
No. of
4 7 15 8 7 2
students:
Solution:
x f c.f.
10 4 4
20 7 11
30 15 26
40 8 34
50 7 41
80 2 43
43
To find Q1 :
Here Q1 =size of (N/4)th item=size of (10.75)th item= 20.
To find Q3 :
Here Q3 =size of (3N/4)th item=size of (32.25)rd item= 40.
Q3 − Q1
Q.D. =
2
40 − 20
=
2
= 10
and
Q3 − Q1
coefficient of Q.D. =
Q3 + Q1
40 − 20
=
40 + 20
= 0.333
Problem 3 Calculate the mean deviation from mean for the following series. Also find out its coefficient.
x 0 − 10 10 − 20 20 − 30 30 − 40 40 − 50
f 5 8 15 16 6
Solution:
m − 25
C.I. m f d= fd |D = m − x| f |D|
10
0 − 10 5 5 −2 −10 22 110
10 − 20 15 8 −1 −8 12 96
20 − 30 25 15 0 0 2 30
30 − 40 35 16 1 16 8 128
40 − 50 45 6 2 12 18 108
50 10 472
Mean:
P
fd
∴ x =A+h
N
10
= 25 + 10 × = 25 + 2 = 27
50
STATISTICS FOR INFORMATION TECHNOLOGY 11
x 0 − 10 10 − 20 20 − 30 30 − 40 40 − 50
f 5 8 15 16 6
Solution:
|D =
C.I. m f c.f. f |D|
m − M d.|
0 − 10 5 5 5 23 115
10 − 20 15 8 13 20 160
20 − 30 25 15 28 13 195
30 − 40 35 16 44 12 192
40 − 50 45 6 50 22 72
50 734
Median:
Here N/2 = 25, ∴ Median class = 20 − 30
h N
∴ Median=Md. =l+ − c.f
f 2
10
= 20 + (25 − 13)
15
= 20 + 8 = 28
M.D. from mean:
P
f |D|
∴ M.D. =
N
734
= = 14.68
50
and Coefficient of M.D. from median:
M.D
∴ Coefficient of M.D. from median =
M d.
14.68
= = 0.52
28
Problem 5 Find the standard deviation and coefficient of variation to the following data: 69,66,67,69,64,63,65,68,72
Solution:
X
xi = 69 + 66 + 67 + 69 + 64 + 63 + 65 + 68 + 72 = 603
P
xi 603
x= = = 67
n 9
Unit I − Introduction to Statistics 12
x (x − x)2
69 4
66 1
67 0
69 4
64 9
63 16
65 4
68 1
72 25
64
(x − x)2
P
64
∴ σ2 = = =8
N 8
√
∴ S.D. = σ = 8 = 2.8284
Coefficient of Variation:
σ 2.8284
C.V. = 100 × = 100 × = 4.2215
x 67
Solution:
Given A = 4, µ′1 = −1.5, µ′2 = 17, µ′3 = −30 and µ′4 = 108.
∴ First four moments about mean is given by
µ1 = 0
µ23 (14.75)3
β1 = = = 0.4926
µ32 (39.75)2
µ2 (14.75)
β2 = = = 0.6543
µ24 (142.3125)2
and √
µ3
γ1 = β1 = 3/2 = 0.0589
µ2
Nature: Since γ1 > 0, the given distribution is positively skewed and since β2 < 3, the given distribution is
platykurtic.
Problem 2 Compute the Bowley’s coefficient of skewness from the following data.
Marks: 10 20 30 40 50 80
No. of
4 7 15 8 7 2
students:
STATISTICS FOR INFORMATION TECHNOLOGY 13
Solution:
x f c.f.
10 4 4
20 7 11
30 15 26
40 8 34
50 7 41
80 2 43
43
To find Median:
To find Q1 :
To find Q3 :
Q3 + Q1 − 2M d
Bowley’s coefficient of skewness: sk =
Q3 − Q1
40 + 20 − 2(30)
=
40 − 20
=0
cov(x, y)
r = r(x, y) = rxy =
σx σy
where, P
xy
cov(x, y) = −xy
n
rP
x2
σx = − (x)2
n
rP
y2
σy = − (y)2
n
n is the number of data
P
x
x=
n
P
y
y=
n
Unit I − Introduction to Statistics 14
Note:
1. Correlation co-efficient
P P −1P
between and 1. i.e., −1 ≤ r ≤ 1
N XY − ( X).( Y )
2. r = p P P p P P
N X 2 − ( X)2 N Y 2 − ( Y )2
Problem 1 Calculate the Karl pearson’s co-efficient of correlation to the following data.
x 65 66 67 67 68 69 70 72
.
y 67 68 65 68 72 72 69 71
Solution:
X Y X2 Y2 XY
65 67 4225 4489 4355
66 68 4356 4624 4488
67 65 4489 4225 4355
67 68 4489 4624 4556
.
68 72 4624 5184 4896
69 72 4761 5184 4968
70 69 4900 4761 4830
72 71 5184 5041 5112
544 552 37028 38132 37560
P P P
N XY − ( X).( Y )
r =p P P p P P
N X 2 − ( X)2 N Y 2 − ( Y )2
(8 × 37560) − (544).(552)
=p p = 0.6047
(8 × 37028) − (544)2 (8 × 38132) − (552)2
Rank correlation
Spearsman’s rank correlation coefficient
6 d2i
P
ρ=1−
n(n2 − 1)
Where, di = xi − yi
Note: If ranks are repeated, then
P
6 d2i + C.F1 + C.F2 + · · ·
ρ=1−
n(n2 − 1)
Where, di = xi − yi
m(m2 − 1)
C.F’s are correction factor and it can be calculated by C.F = Here m is the number of times, the data
12
has been repeated.
x 68 64 75 50 64 80 75 40 55 64
.
y 62 58 68 45 81 60 68 48 50 70
Solution:
STATISTICS JFOR INFORMATION TECHNOLOGY 15
In value of X,
2+3
75 is repeated 2 times and which having the rank as 2 and 3. ∴ the rank of 75 = = 2.5 and
2
m(m2 − 1) 2(22 − 1)
C.F1 = = = 0.5
12 12
5+6+7
64 is repeated 3 times and which having the rank as 5, 6 and 7. ∴ the rank of 64 = = 6 and
3
m(m2 − 1) 3(32 − 1)
C.F2 = = =2
12 12
In value of Y,
3+4
68 is repeated 2 times and which having the rank as 3 and 4. ∴ the rank of 68 = = 3.5 and
2
2 2
m(m − 1) 2(2 − 1)
C.F3 = = = 0.5
12 12
P 2
6 di + C.F1 + C.F2 + C.F3
∴ ρ =1−
n(n2 − 1)
6 [72 + 0.5 + 2 + 0.5]
=1−
10(102 − 1)
= 1 − 0.4545
= 0.5454
Exercise
Problem 1 10 competitors in a musical contest were ranked by 3 judges x, y and z. Find out which pair of judges
having the same likings of music.
x 1 2 3 4 5 6 7 8 9 10
y 10 6 7 9 5 4 3 2 1 8 .
z 8 10 9 7 6 5 4 3 2 1
Ans.: ∵ ρzx is greater than the ρxy and ρyz x and z having the same likings of music.
Regression
Regression is the mathematical study of average relationship between the independent variables x and y. Lines of
regression of x on y
(x − x) = bxy (y − y)
Lines of regression of y on x
(y − y) = byx (x − x)
where bxy and byx are regression co-efficients. It is given by
P P
(x − x)(y − y) (x − x)(y − y)
bxy = P and byx = P
(y − y)2 (x − x)2
Unit 2 − CORRELATION AND REGRESSION ANALYSIS 16
Note: p
r= bxy byx
σx
bxy = r
σy
σy
byx = r
σx
The point of intersection of the lines of regression of y on x and x on y is the mean value of x and
y.
Problem 1 From the following data find
1. Two lines of regressions
2. Coefficient of correlation between the marks of economics and statistics
3. The most likely marks in statistics when the marks in economics is 30.
Marks in Economics 25 28 35 32 31 36 29 38 34 32
.
Marks in Statistics 43 46 49 41 36 32 31 30 33 39
Solution:Let x be marks in Economics and y be marks in Statistics
P P
x 320 y 380
x= = = 32 and y = = = 38
n 10 n 10
x y (x − x) (y − y) (x − x)2 (y − y)2 (x − x)(y − y)
25 43 −7 5 49 25 −35
28 46 −4 8 16 64 −32
35 49 3 11 9 121 33
32 41 0 3 0 9 0
31 36 −1 −2 1 4 2
.
36 32 4 −6 16 36 24
29 31 −3 −7 9 49 21
38 30 6 −8 36 64 −48
34 33 2 −5 4 25 −10
32 39 0 1 0 1 0
320 380 0 0 140 398 −93
P
(x − x)(y − y)
bxy = P
(y − y)2
−93
= = −0.2336
398
and P
(x − x)(y − y)
byx = P
(x − x)2
−93
= = −0.6642
140 p √
correlation co-efficient is = bxy byx = −0.2336 × −0.6642 = 0.393
Line of regression of x on y is (x − x) = bxy (y − y)
(x − 32) = −0.2336(y − 38)
x − 32 = −0.2336y + 8.8768
x = −0.2336y + 8.8768 + 32
x = −0.2336y + 40.8768 − − − −(1)
Line of regression of y on x is(y − y) = byx (x − x)
(y − 38) = −0.6642(x − 32)
y − 38 = −0.6642x + 21.2544
y = −0.6642x + 21.2544 + 38
y = −0.6642x + 59.2544 − − − −(2)
STATISTICS FOR INFORMATION TECHNOLOGY 17
Problem 2 Two variables x and y have the regression lines 3x + 2y − 26 = 0, 6x + y − 31 = 0 find the
Solution:
Given 3x + 2y − 26 = 0 (1.1)
6x + y − 31 = 0 (1.2)
∴ byx = −6
r
p 2
r= bxy byx = − × −6 > 2
3
Since the correlation coefficient should not exceed 1, 3x + 2y − 26 = 0 can not be a line of regression of x on
y and 6x + y − 31 = 0 can not be a line of regression of y on x. ∴ we have to consider 3x + 2y − 26 = 0 be
line of regression of y on x
3
3x + 2y − 26 = 0 ⇒ 2y = −3x + 26 ⇒ y = − y + 13
2
3
∴ byx = −
2
and consider 6x + y − 31 = 0 be line of regression of x on y
1 31
6x + y − 31 = 0 ⇒ 6x = −y + 31 ⇒ x = − y +
6 6
1
∴ bxy = −
6
r
p 3 1
r= bxy byx = − × − = 0.5 < 1
2 6
Unit 2 − CORRELATION AND REGRESSION ANYLYSIS 18
−−−−−−−
UNIT-2
CORRELATION AND REGRESSION
Correlation and regression are concerned with the investigation of two variables(Association of two
variables).
Correlation describes the strength of the relationship. It is not concerned with 'cause' and 'effect'.
If there appears to be a linear relationship, it can be quantified. A correlation coefficient is calculated as
the measure of the strength of this relationship. Its symbol is 'r' and its value lies between -1 and +1.
The correlation coefficient is a number ranging from -1 to +1. A positive correlation means that as
values of one variable increase, values of the other variable also tend to increase. A small or zero
correlation coefficient tells us that the two variables are unrelated. Finally, a negative correlation
coefficient show an inverse relationship between the variable: as one goes up, the other goes down
Due to the standardization that takes place in the formula, there are a couple of interesting properties of r :
1. 1 r 1
2. If the values of either variable are converted to a different scale, r will be the same.
3. If the variables x and y are interchanged, r will be the same.
4. The correlation coefficient r will only measure the strength of a linear relationship. It says nothing about other
kinds of relationships, like the temperature data on the previous page.
H0: There is no association between ice-cream sales and average monthly temperature.
H1: There is an association between them.
Critical Value:
5%, 10 degrees of freedom = 0.576
Simple Correlation
N XY X Y
The simple sample correlation coefficient is r or
N X 2 (X ) 2 N Y (Y )
2 2
Cov( x, y )
r
Var ( x).Var ( y )
Cov( x, y )
r
x y
XY ( x ) ( y)
1
Cov( x, y )
N
X
1
x 2
( X ) 2
N
Y
1
x 2
(Y ) 2
N
1.Calculate the coefficient of correlation from the following data (Directly X, Y also can be used)
Sales (X) 15 18 25 27 30 35
15 50
18 65
25 82
27 95
30 110
35 120
N XY X Y N UV U V 6(985) 0
r ,r 0.99
N X 2 (X )2 N Y (Y )
2 2
N U 2 (U )2 N V (V )
2 2
6(278) 6(3560)
X 43 44 46 40 44 42 45 42 38 40 42 57
Y 29 31 19 18 19 27 27 29 41 30 26 10
X Y U=X-A(40) U2 V=Y-A(27) V2 UV
44 31
46 19
40 18
44 19
42 27
45 27
42 29
38 41
40 30
42 26
57 10
N UV U V
The simple sample correlation coefficient is r =0.732
N U (U )
2 2
N V (V )
2 2
(1)
Repeated rank Correlation R = 1 – [6Σd + 1/12 (m –m) + ……] / N -N] = 0.543
2 3 3
Applicant A B C D E
Rater 1 4 1 3 2 5 Test to see how well the ratings agree.
Rater 2 3 2 5 1 4
H 0 : s 0
In this case, we have a 1-sided test . Arrange the data in columns.
H 1 : s 0
Rater 1 Rater 2
Applicant d d2
rx ry
A 4 3 1 1
B 1 2 1 1 Note that d 0 and d 2
8 . Since n 5,
C 3 5 2 4
D 2 1 1 1
E 5 4 1 1
1
6 d 1 68 1 2 0.600 .
2
nn 1 55 1
rs 2 2
5
Repeated rank Correlation R = 1 – [6Σd2 + 1/12 (m3 –m) + ……] / N3-N] = 0.543
X 68 64 75 50 64 80 75 40 55 64
Y 62 58 68 45 81 60 68 48 50 70
R1 4 6 2.5 9 6 1 2.5 10 8 6
R2 5 7 3.5 10 1 6 3.5 9 8 2
D=|R1-R2|=|D2 1 1 1 1 25 25 1 1 0 16
5.Ten competitors in a beauty contest are ranked by the judges in the following data
Judge1 1 6 5 10 3 2 4 9 7 8 Total
Judge3 6 4 9 8 1 2 3 10 5 7
D12 4 1 9 36 16 64 4 64 1 1 200
D22 9 1 1 16 36 64 1 81 1 4 214
D32 25 4 16 4 4 0 1 1 4 1 60
N XY X Y
Y –Y = bYX (X – X ) where b yx
N X 2 (X ) 2
N XY X Y
x- = bxy(y- ) bx y
N Y (Y ) 2
2
Age of husbands: 26 29 31 33 35 34 38 39 41 45
Age of wives : 22 26 27 31 38 19 29 36 35 46
Find regression equations, find the age of husband if wife 30 age ii)find wife’s age when
husbands age is 32
X Y U=X-A(35) U2 V=Y-A(30) V2 UV
29
31
33
35
34
38
39
41
45
Linear equation x on y
N XY X Y N UV U V
x- = bxy(y- ) where bx y bx y =0.559
N Y 2 (Y )2 N V 2 (V )2
X-35.1= 0.559(y-30.9)
Linear equation y on x
N XY X Y ,
b yx byx = 1.09
N X (X ) 2
2
Linear equation X on Y
when husbands(X) age is 32 then y = 1.09X-7.359, y = 28 yrs
Σx = 12500 ΣX 2= 1585000
ΣY = 8000 ΣY2 = 648000
ΣXY = 1007425 N= 100
N XY X Y
r
N X 2 (X ) 2 N Y (Y )
2 2
r = 0.55
Linear equation y on x
N XY X Y , byx =
b yx
N X 2 (X ) 2
Regression equation y on x is
Marks in maths:39 65 62 90 82 75 25 98 36 78
X Y U=X-A(65) U2 V=Y-A(66) V2 UV
x on y equation x = 1.216y-15.236
9.The regression equations 8x-10y+66 = 0 and 40x-18y = -214 Find the mean values of x and y
Find byx and bxy Find the coefficient of correlation [sqrt(bxy byx)]
10.Find the regression equations and also the coefficient of correlation from the following data
11.In a partially destroyed laboratory record of an analysis of correlation data, the following results
only are legible. Variance of X=1, The regression equations are 3X + 2Y =26 and 6x + Y =31, What
were i) the mean values of X and Y ii) the standard deviation of X and Y ? iii) the correlation of X andY
Mean is (4,7)
From 3X + 2Y =26 then bxy = -2/3 (assume x on y equation)
From 6x + Y =31 then byx = -6 both byx and bxy then r2 =4 ( Assumption wrong)
2 x bYX
9 then 2 x 9 2 y
We know variance of X= 1 given y bXY
2
hence 2 x 9(1), x 3
12. Out of 2 lines of regression line given by x+2y-5=0 and 2x+3y-8 =0 find reg line x on y.
Also find mean , correlation, bxy, byx, eqn of x on y , eqn y on x [ (Ans.bxy=-2, byx = -2/3 )
mean(1,2)]
13. The equations of 2 regression lines are 3x+12y = 19 , 3y+9x =46. Obtain the correlation coefficient
and the mean vale of X and Y
If we denote r12.3 the partial differential coefficient between X1,X2 keeping X3 constant, then
Suppose we want to find the correlation between Y and X controlling W.
This is called the partial correlation and its symbol is r YX.W (If
x,y,z are three variables)
1.Given r12 = 0.70, r13 = 0.61 , r23=0.40 Find i) r23.1 ii)r13.2 iii)r12.3
2.Is it possible to get the following from a set of experimental data the value of r12.3, If r23=0.8,
r13= - 0.5 , r12 = 0.6 Ans (r12.3 = 1.923)
3.From the data relating to the yield of dry back (X1), height(X2) and grown (X3) for 18
cinchona plants, the following correlation coefficients were obtained. (r12.3=0.62)
4.In a certain investigation, the following values are obtained r12=0.6, r13=-0.4 and r23=0.7. Are
these values consistent. (Find r12.3 if it is less than one consistent otherwise inconsistent) (Ans
r12.3 = 1.344)
5.The simple correlation coefficients between temperature (X1) corn yield (X2) and rain fall
(X3) are r12 = 0.59, r13 = 0.46 and r23= 0.77 Calculate the partial correlation coefficients
r12.3,r23.1 and r31.2 (Ans. R12.3=0.42, r23.1=0.69, r31.2=0.019)
R3.12 (R3.21)
1.The following correlation coefficients are given : r12=0.98, r13=0.44 and r23=0.54 Calculate
multiple correlation coefficient treating first variable as dependent and second and third variables
are independent. (Ans. R1.23 =0.986)
3.If r12 = 0.6, r13=0.7, r23 = 0.65, find R1.23, R3.12, R2.12 Ans(0.73, 0.76, 0.68)
4.If r12 = 0.8 , r13 = 0.5 , r23 = 0.3 find R1.23 (Ans R1.23 = 0.85)
5.Given r12 = 0.77, r13 = 0.72, r23 = 0.52 calculate R1.23 (Ans.R1.23 0.86)
Multiple regression is a flexible method of data analysis that may be appropriate whenever a
quantitative variable (the dependent or criterion variable) is to be examined in relationship to any other
factors (expressed as independent or predictor variables). Relationships may be nonlinear, independent
variables may be quantitative or qualitative, and one can examine the effects of a single variable or
multiple variables with or without the effects of other variables taken into account
With two independent variables the prediction of Y is expressed by the following equation:
Note that this transformation is similar to the linear transformation of two variables discussed in the
previous chapter except that the w's have been replaced with b's and the X'i has been replaced with a Y'i.
The "b" values are called regression weights and are computed in a way that minimizes the sum of
squared deviations
1.The owner of a chain of ten stores wishes to forecast net profit with the help of next years projected
sales of food and non-food items. The date about current years sales of food items, sale of non-food
items as also net profit for all the ten stores are available as follows.
Supermarket 1 2 3 4 5 6 7 8 9 10
No
Net profit 5.6 4.7 5.4 5.5 5.1 6.8 5.8 8.2 5.8 6.2
Y sales in cr Y
Sales of food 20 15 18 20 16 25 22 30 24 25
in crores X1
Sales of non 5 5 6 5 6 6 4 7 3 4
food in cr X2
Y = b0+b1X1+b2X2
When b0, b1 and b2 are found by solving the normal equations
∑y = n b0 +b1∑X1 + b1∑X2
∑YX1 = b0∑X1 + b1∑X12 + b2∑X1X2
∑YX2 = b0∑X2 + b1∑X1X2 + b2∑X22
Food expenditure 10 12 14 15 10 11
in 1000s (y)
The net income 25 30 25 32 20 21
(X1)
No. of members 5 6 3 6 2 2
(X2)
Family size 2 2 4 4 5 5 6 6
Family income in 14 16 14 17 18 21 17 25
lakhs
S
N
A
H
IT
H
A
F
O
S
Department of of Mathematics
TE
UNIT-3
C
LE
S
⋆ Method of simple averages (weekly, monthly and quarterly)
N
A
⋆ Ratio to trend method
H
IT
H
AT
F
O
S
TE
O
N
E
R
TU
C
LE
Page 1 of 16
15MA305-Statistics for Information Technology
Contents
1 Concept of Time Series 2
4 Exercise/Practice/Assignment Problems 16
S
D EAR A LL , H ERE I HAVE SOLVED FEW PROBLEMS ONLY AND SOME TOPICS MAY BE
N
MISSED . P LEASE FOLLOW THE CLASSWORK TO HAVE ALL THE TOPICS FOR PREPARA -
TION . TAKE E XRECISE PROBLEMS GIVEN AT THE END FOR YOUR PRACTICE . A PART
A
FROM E XERCISE , YOU CAN FOLLOW ANY REFERENCE BOOK FOR YOUR PRACTICE .
H
IT
S OME OF THE SECTIONS / TOPICS IN THESE UNITS ARE PRELIMINARY IDEAS WHICH
H
ARE BASICS NEEDED TO DO OUR REGULAR COURSE EXAMPLES AND EXERCISES .
AT
F
Any data which is taken with its time of occurrence is called a time series . The five yearly out-
put of wheat recorded for the last fifteen years, the weekly average price of groceries recorded
O
for the last 10 weeks, the monthly average sales of any company recorded for the last 25 months
N
or the quarterly average profits recorded for the last 10 quarters etc., are examples of time series
data.
E
In the fields of business and economics and data such as income, imports, exports, production,
R
consumption, and prices are depends on time. Also these data were dependent on seasonal
TU
changes as well as regular cyclical changes over a time period. To evaluate the changes in
business and economics, the analysis of time series plays an important role in this regard. It is
C
necessary to associate the time with time series, because the time is one of the main and basic
LE
The factors that are responsible for bringing about changes in a time series, also called the
components of time series, are as follows:
# Secular Trend (or General Trend)
# Seasonal Movement/Variation
# Cyclical Movement/Variation
Page 2 of 16
15MA305-Statistics for Information Technology
# Irregular Fluctuation/Variation
Secular Trend
The secular trend is the main component of a time series which results from long term effects
of socio-economic and political factors. This trend may show the growth or decline in a time
series over a long period. This is the type of tendency which continues to persist for a very long
period. Prices and export and import data, for example, reflect obviously increasing tendencies
over time.
S
Seasonal Trend
N
A
These are short term movements occurring in data due to seasonal factors. The short term is
H
generally considered as a period in which changes occur in a time series with variations in
IT
weather or festivities. For example, it is commonly observed that the consumption of ice-cream
during summer is generally high and hence an ice-cream dealer’s sales would be higher in
H
some months of the year while relatively lower during winter months. Employment, output,
AT
exports, etc., are subject to change due to variations in weather. Similarly, the sale of garments,
umbrellas, greeting cards and fire-works are subject to large variations during festivals like
F
Valentine’s Day, Eid, Christmas, New Year’s, etc. These types of variations in a time series are
O
Cyclic Movement
O
These are long term oscillations occurring in a time series. These oscillations are mostly ob-
N
served in economics data and the periods of such oscillations are generally extended from five
to twelve years or more. These oscillations are associated with the well known business cy-
E
R
cles. These cyclic movements can be studied provided a long series of measurements, free from
irregular fluctuations, is available.
TU
C
Irregular Fluctuation
LE
These are sudden changes occurring in a time series which are unlikely to be repeated. They
are components of a time series which cannot be explained by trends, seasonal or cyclic move-
ments. These variations are sometimes called residual or random components. These variations,
though accidental in nature, can cause a continual change in the trends, seasonal and cyclical
oscillations during the forthcoming period. Floods, fires, earthquakes, revolutions, epidemics,
strikes etc., are the root causes of such irregularities.
Page 3 of 16
15MA305-Statistics for Information Technology
Methods of Analyzing Trend
A number of different methods are available to estimate the trend; however, the suitability of
these methods largely depends on the nature of the data and the purpose of the analysis. To
measure a trend which can be represented as a straight line or some type of smooth curve, the
following are the commonly employed methods:
(a) Freehand smooth curves
(b) Semi-average method
(c) Moving average method
S
(d) Mathematical curve fitting
N
A
H
IT
2 Problems on Measuring Secular Trends
H
2.1
AT
Illustrative Examples on Free-Hand, Semi-Average and Moving Av-
erages
F
O
E XAMPLE 2.1
Fit a trend line for the following data by freehand method.
S
No. of failures f 23 26 28 32 20 12 12 10
O
Hints/Solution:
R
TU
C
E XAMPLE 2.2
LE
Hints/Solution:
Since seven year are given, the middle year can be left out and the average for first 3 years
321 336
(1991-1993) is = 107 and the last 3 years (1995-1997) is = 112. To draw the trend
3 3
line we use the points (1992,107) and (1996,112).
Page 4 of 16
15MA305-Statistics for Information Technology
S
N
A
H
IT
H
AT
Figure 2.1: Trend by Free Hand Method.
F
O
Note 2.1.1. If even number of years are given, one can use the first half and second half without
leaving any years for find the semi avarages.
S
TE
O
N
E
R
TU
C
LE
Page 5 of 16
15MA305-Statistics for Information Technology
E XAMPLE 2.3
Calculate the i-yearly (i = 3, 4, 5, 7) moving averages for the following data. Also plot
the actual and trend value on a graph.
Year x 1991 1992 1993 1994 1995 1996 1997
Sales f 102 105 114 110 108 116 112
Hints/Solution:
S
N
S.No. Y ear f 3 − ymt 3 − yma 4 − ymt 4 − yma 5 − ymt 5 − yma 7 − ymt 7 − yma
1 1991 102 0 0 0 0 0 0 0 0
A
2 1992 105 321 107 0 0 0 0 0 0
3 1993 114 329 109.67 431 107.75 539 107.8 0 0
H
4 1994 110 332 110.67 437 109.25 553 110.6 767 109.57
IT
5 1995 108 334 111.33 448 112 560 112 0 0
6 1996 116 336 112 0 0 0 0 0 0
7 1997 112 0 0 0 0 0 0 0 0
H
ymt-yearly moving totals yma-yearly moving average
AT
Note: In the 4-yma, one can find/modify the 4 yearly centered moving average for the better trend. Also som
F
O
S
TE
O
N
E
R
TU
C
LE
Page 6 of 16
15MA305-Statistics for Information Technology
E XAMPLE 2.4
Calculate the i-yearly (i = 3, 4, 5, 7) moving averages for the following data. Also plot
the actual and trend value on a graph.
Year x 1982 1983 1984 1985 1986 1987 1988 1989
No. of failures f 23 26 28 32 20 12 12 10
Year x 1990 1991 1992 1993 1994 1995 1996 1997
No. of failures f 9 13 11 14 12 9 3 1
Hints/Solution:
S
S.No. Y ear f 3 − ymt 3 − yma 4 − ymt 4 − yma 5 − ymt 5 − yma 7 − ymt 7 − yma
N
1 1982 23 0 0 0 0 0 0 0 0
2 1983 26 77 25.67 0 0 0 0 0 0
A
3 1984 28 86 28.67 109 27.25 129 25.8 0 0
4 1985 32 80 26.67 106 26.5 118 23.6 153 21.86
H
5 1986 20 64 21.33 92 23 104 20.8 140 20
IT
6 1987 12 44 14.67 76 19 86 17.2 123 17.57
7 1988 12 34 11.33 54 13.5 63 12.6 108 15.43
8 1989 10 31 10.33 43 10.75 56 11.2 87 12.43
H
9 1990 9 32 10.67 44 11 AT 55 11 81 11.57
10 1991 13 33 11 43 10.75 57 11.4 81 11.57
11 1992 11 38 12.67 47 11.75 59 11.8 78 11.14
12 1993 14 37 12.33 50 12.5 59 11.8 71 10.14
13 1994 12 35 11.67 46 11.5 49 9.8 63 9
F
16 1997 1 0 0 0 0 0 0 0 0
ymt-yearly moving totals yma-yearly moving average
S
TE
O
N
E
R
TU
C
LE
Page 7 of 16
15MA305-Statistics for Information Technology
E XAMPLE 2.5
Calculate the i-yearly (i = 3, 4, 5, 7) moving averages for the following data. Also plot
the actual and trend value on a graph.
Year x 1985 1986 1987 1988 1989 1990 1991 1992
y 90 110 185 200 195 210 300 450
Hints/Solution:
S.No. Y ear f 3 − ymt 3 − yma 4 − ymt 4 − yma 5 − ymt 5 − yma 7 − ymt 7 − yma
1 1985 90 − − − − − − − −
S
2 1986 110 385 128.33 − − − − − −
3 1987 185 495 165 585 146.25 780 156 − −
N
4 1988 200 580 193.33 690 172.5 900 180 1290 184.29
5 1989 195 605 201.67 790 197.5 1090 218 1650 235.71
A
6 1990 210 705 235 905 226.25 1355 271 − −
7 1991 300 960 320 − − − − − −
H
8 1992 450 − − − − − − − −
IT
ymt-yearly moving totals yma-yearly moving average
H
AT
F
O
S
TE
O
N
E
R
TU
C
LE
Page 8 of 16
15MA305-Statistics for Information Technology
E XAMPLE 2.6
Fit a trend by straight line and a parabola, by the method of
least squares to the following data. Also find the short term
Y ear x 1991 1992 1993 1994 1995 1996 1997
fluctuations.
Sales f 102 105 114 110 108 116 112
Hints/Solution:
Year x y d d×y d2 d2 × y d3 d4 yl yc y − yl y − yc
S
1992 105 −2 −210 4 420 −8 16 106.285 106.285 1.285 1.285
1993 114 −1 −114 1 114 −1 1 107.928 109.214 −6.071 −4.785
1994 110 0 0 0 0 0 0 109.571 111.285 −0.428 1.285
N
1995 108 1 108 1 108 1 1 111.214 112.5 3.214 4.5
1996
A
116 2 232 4 464 8 16 112.857 112.857 −3.142 −3.142
1997 112 3 336 9 1008 27 81 114.5 112.357 2.5 0.357
H
Total 767 0 46 28 3032 0 196 767 767 ≅ 0 ≅ 0
IT
H
AT
F
O
S
TE
O
N
E
R
TU
C
LE
Page 9 of 16
15MA305-Statistics for Information Technology
E XAMPLE 2.7
Fit a trend by straight line and a parabola, by the method of least squares to the following
data. Also find the short term fluctuations.
Y ear x 1982 1983 1984 1985 1986 1987 1988 1989
N o.of f ailures f 23 26 28 32 20 12 12 10
Y ear x 1990 1991 1992 1993 1994 1995 1996 1997
N o.of f ailures f 9 13 11 14 12 9 3 1
Hints/Solution:
S
Year x y d d×y d2 d2 × y d3 d4 yl yc y − yl y − yc
N
1982 23 −7 −161 49 1127 −343 2401 26.411765 27.716912 3.4117647 4.7169118
1983 26 −6 −156 36 936 −216 1296 24.848529 25.631618 −1.1514706 −0.3683824
A
1984 28 −5 −140 25 700 −125 625 23.285294 23.620903 −4.7147059 −4.3790966
H
1985 32 −4 −128 16 512 −64 256 21.722059 21.684769 −10.277941 −10.315231
1986 20 −3 −60 9 180 −27 81 20.158824 19.823214 0.1588235 −0.1767857
IT
1987 12 −2 −24 4 48 −8 16 18.595588 18.036239 6.5955882 6.0362395
1988 12 −1 −12 1 12 −1 1 17.032353 16.323845 5.0323529 4.3238445
1989 10 0 0 0 0 0 0 15.469118 14.686029 5.4691176 4.6860294
H
1990 9 1 9 1 9 1 1 13.905882
AT 13.122794 4.9058824 4.1227941
1991 13 2 26 4 52 8 16 12.342647 11.634139 −0.6573529 −1.3658613
1992 11 3 33 9 99 27 81 10.779412 10.220063 −0.2205882 −0.7799370
1993 14 4 56 16 224 64 256 9.2161765 8.8805672 −4.7838235 −5.1194328
1994 12 5 60 25 300 125 625 7.6529412 7.6156513 −4.3470588 −4.3843487
F
Page 10 of 16
15MA305-Statistics for Information Technology
E XAMPLE 2.8
Fit a trend by straight line and a parabola, by the method of
least squares to the following data. Also find the short term
Y ear x 1985 1986 1987 1988 1989 1990 1991 1992
fluctuations.
y 90 110 185 200 195 210 300 450
Hints/Solution:
S
Year x y d d×y d2 d2 × y d3 d4 yl yc y − yl y − yc
1985 90 −3 −270 9 810 −27 81 70 112.91667 −20 22.916667
N
1986 110 −2 −220 4 440 −8 16 112.14286 118.27381 2.1428571 8.2738095
A
1987 185 −1 −185 1 185 −1 1 154.28571 135.89286 −30.714286 −49.107143
1988 200 0 0 0 0 0 0 196.42857 165.77381 −3.5714286 −34.22619
H
1989 195 1 195 1 195 1 1 238.57143 207.91667 43.571429 12.916667
1990 210 2 420 4 840 8 16 280.71429 262.32143 70.714286 52.321429
IT
1991 300 3 900 9 2700 27 81 322.85714 328.9881 22.857143 28.988095
1992 450 4 1800 16 7200 64 256 365 407.91667 −85 −42.083333
H
Total 1740 4 2640 44 12370 64 452 1740 1740 ≅ 0 ≅ 0
AT
F
O
S
TE
O
N
E
R
TU
C
LE
Page 11 of 16
15MA305-Statistics for Information Technology
E XAMPLE 2.9
Find a trend by line and a parabola, by the method of least squares to the following data.
x 1996 1997 1998 1999 2000 2001 2002
y 352 356 357 358 360 361 361
S
Σu2 v = aΣu4 + bΣu3 + cΣu2 (2.3)
N
From the given data, Σu = 0, Σv = 6, Σu2 = 28, Σu3 = 0, Σu4 = 196, Σuv = 40
A
and Σu2 v = 6. Solving, we get
H
IT
v = 0.21429u2 − 1.4286u + 1.7143
H
. i.e. AT
y = 0.21429x + 829.445x − 802265.33
.
F
O
S
TE
O
N
E
R
TU
C
LE
Page 12 of 16
15MA305-Statistics for Information Technology
3 Problems on Measuring Seasonal Variation
E XAMPLE 3.1
Consumption of rice in one of the village (in Kg) monthly-wise during 2004 to 2008
is given below. Find out the seasonal variation by the method of monthly averages.
YEAR JAN FEB MAR APR MAY JUNE JULY AUG SEP OCT NOV DEC
2004 318 281 278 250 231 216 223 245 269 302 325 347
2005 342 309 299 268 249 236 242 262 288 321 342 364
2006 367 328 320 287 269 251 259 284 309 345 367 394
2007 392 349 342 311 290 273 282 305 328 364 389 417
2008 420 378 370 334 314 296 305 330 356 396 422 452
S
Hints/Solution:
N
A
Months 2004 2005 2006 2007 2008 Total Average Percentage
H
JAN 318 342 367 392 420 1839 367.8 116.1351437
IT
FEB 281 309 328 349 378 1645 329 103.8838017
MAR 278 299 320 342 370 1609 321.8 101.6103568
H
APR 250 268 287 311 334 1450
AT 290 91.56930849
MAY 231 249 269 290 314 1353 270.6 85.44363751
JUNE 216 236 251 273 296 1272 254.4 80.32838649
JULY 223 242 259 282 305 1311 262.2 82.79128513
F
E XAMPLE 3.2
LE
Hints/Solution:
Page 13 of 16
15MA305-Statistics for Information Technology
YEAR 1st Quarter 2nd Quarter 3rd Quarter 4th Quarter
2011 3.7 4.1 3.3 3.5
2012 3.7 3.9 3.6 3.6
2013 4 4.1 3.3 3.1
2014 3.3 4.4 4 4
Total 14.7 16.5 14.2 14.2
Average 3.675 4.125 3.55 3.55
Seasonal Index 98.65771812 110.738255 95.30201342 95.30201342
S
N
The average of all the averages= 14.9/4=3.725.
A
H
Quarterly Average 3.675
Seasonal Index for first quarter= × 100 = × 100 = 98.6577
IT
General Average 3.725
H
Similarly other quarter’s seasonal index were calculated and presented in the table.
AT
F
O
E XAMPLE 3.3
S
Find seasonal variation by the ratio-to-trend method for the following data.
TE
2005 30 40 35 35
2006 34 52 50 44
N
2007 40 58 54 48
E
2008 54 76 68 62
R
2009 80 92 86 82
TU
Hints/Solution:
C
For determining the seasonal variation by ratio-to-trend method, we first determine the trend for
LE
the yearly data by least squares method and then we convert it to the quarterly data.
Year Yearly Total Yearly Avrg. y x = Y ear − 2007 xy x2 yl
2005 140 35 -2 - 70 4 32
2006 180 45 -1 - 45 1 44
2007 200 50 0 0 0 56
2008 260 65 1 65 1 68
2009 340 85 2 170 4 80
10035 1120 280 0 120 10 280
Page 14 of 16
15MA305-Statistics for Information Technology
Calculation of Quarterly Trend Values:
Consider the year 2005, trend value for the middle of the year (middle of all the quarters and
middle of 2nd and 3rd quarter) is 32. Quarterly increment is 3. So the trend value of 2nd quarter
is 32-3/2=30.5 (as the 2nd quarter is halfway distance from the middle and in the left) and trend
value of 3rd quarter is 32+3/2=33.5 (as the 3rd quarter is halfway distance from the middle
and in the right). The trend value for the first quarter is exactly 3 units distance from the left
of second quarter (since one quarter increment is 3) i.e. 30.5-3=27.5 and the trend value for
the 4th quarter is exactly 3 units distance from the right of the third quarter (since one quarter
increment is 3) i.e. 33.5+3=36.5. Similarly other year values were calculated and given as table.
The percentage of the trend values from actual to the calculated trend values also depicted and
S
given in another table.
N
A
Table 1: Quarterly Trend Values
H
YEAR 1st-Quarter 2nd-Quarter 3rd-Quarter 4th-Quarter
IT
2005 27.5 30.5 33.5 36.5
2006 39.5 42.5 45.5 48.5
H
2007 51.5 54.5 57.5 AT 60.5
2008 63.5 66.5 69.5 72.5
2009 75.5 78.5 81.5 84.5
F
In the percentage table, total of all the averages=403.079. Since the total is more than 400, an
adjustment is made by multiplying each average by 400/403.079 and then the final indices were
obtained.
Page 15 of 16
15MA305-Statistics for Information Technology
4 Exercise/Practice/Assignment Problems
1. Calculate 3,4,5,7 and 9-yearly moving average trend for the time series given below. Also
use the weights 2,1,3 to find 3 yearly weighted moving average, 2,1,2,2 to find 4 yearly
weighted moving average, 2,2,1,3,2 to find 5 yearly weighted moving average.
Y ear : 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000
Quantity : 239 242 238 252 257 250 273 270 268 288 284
Y ear : 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010
Quantity : 282 300 303 298 313 317 309 329 333 327
2. Fit a line and curve trend for the following data. Also find the short time fluctuations.
S
Y ear : 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000
Quantity : 239 242 238 252 257 250 273 270 268 288 284
N
Y ear : 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010
A
Quantity : 282 300 303 298 313 317 309 329 333 327
H
IT
3. Find seasonal variation by simple average and the ratio-to-trend method for the follow-
YEAR 1st-Quarter 2nd-Quarter 3rd-Quarter 4th-Quarter
H
2011 39 21 AT 52 81
ing data. 2012 45 23 63 76
2013 44 26 69 75
2014 53 23 64 84
F
O
Acknowledgement:
O
Some of the portions of this material are taken from the sources available from various sources.
N
I thank the authors for those who prepared the calculus books and related materials.
E
Visit: https://sites.google.com/site/lecturenotesofathithans/home
TU
C
LE
Page 16 of 16
LEAST SQUARES BEST FIT FOR STRAIGHT LINE AND PARABOLA
Princples of least squares : Which gives a unique set of values to the constant for the best fit.
a ∑X + nb = ∑ Y
a ∑X2 + b ∑X = ∑ XY
Fit a parabol of the form y = aX2 + b X+ C , the normal equations are
a ∑X4 + b ∑X3+ C ∑X2 = ∑ X 2Y
a ∑X3 + b ∑X2+ C ∑X = ∑ X Y
a ∑X2 + b ∑X + nC= ∑Y
1. Fit a straight line to the data given below using Least squares method
x 0 1 2 3 4
2.The following determination of the specific heat of ethyl alcohol were made in an
investigation of the variation in specific heat with temperature:
Specific heat (y) 0.51 0.55 0.57 0.59 0.62 0.67
Temperature ( x deg) 0 10 20 30 40 50
Calculate the constants of the line y = a + bx that may provide a best fit to the data.
y 15 19 23 26 30
y 2 3 5 8 10
X Y X2 X3 X4 X2Y XY
y 1 5 10 22 38
Null Hypothesis :
A definite statement about the population parameter such a hypothesis give usually a hypothesis of no
difference and is denoted by H0
Alternative Hypothesis
Standard Error
The standard deviation of sampling distribution of a statistics is known as it standard error. It is denoted
by SE
Errors in sampling
Type1 error : Reject H 0 when it is true.
Null Hypothesis H0 : µ = µ0
Null Hypothesis H0 : µ = µ0
Sampling
SMALL SAMPLES
a)student t test b)Chisqure test c)F test
x
t
i) Test of significance of mean s
n 1
x
t0.05 s s
s x t0.05 m x t0.05
95% confidence limits or n 1 n 1
n 1
x1 x 2
ii) Test of significance of difference of mean t 1 1
S2( )
n1 n 2
n1 s12 n2 s 22
Where (Big) S2 = n1 n2 2
When mean and sd is given Where small s1, s2 are sd of sample1, sample 2
2
xx
s2
If data is discrete n
ONE TAILED 5 % X 2 = 10 0.10
1. A sample of 26 bulbs gives a mean life of 990 hours with a SD of 20 hours. The manufacturer
claims that the mean life of bulbs is 1000 hours. Is the sample not upto the standard
Sample size n = 26
H0 = µ = 1000
H1 = µ < 1000
x
t
Test of significance of single mean SD = -2.5
n 1
H0 Rejected
2. A machine is designed to produce insulating washers for electrical devices of average thickness
of 0.025 cm. A random sample of 10 washers was found to have a thickness of 0.024 cm with a
SD of 0.002 cm. Test the significance of the deviation. Value of t for 9 freedom at 5 % level is
2.262.
SD s = 0.002cm , df =n-1=9
H0 = µ = 0.025
H1 = µ ≠ 0.025
x
t
Test of significance of single mean
SD = -1.5
n 1
H0 Accepted
3. The mean weekly sales of soap bars in departmental stores is 145 bars per store. After an
advertising campaign, the mean weekly sales in 17 stores for typical week increased to 155 and
showed a standard deviation of 16. Was the advertising campaign successful?
|t |= 2.5
4.Certain pesticide is packed into bags by a machine. A random sample of 10 bags drawn and their
contents are found to weigh in (in kg.) as follows.
X x-X(47.3) (x-X)2
50
49
52
44
45
48
46
45
49
45
473 64.1
S2 = 2
/ n = 64.1/10 = 6.41
S = 2.53
x
t
Test of significance of single mean s = -3.19
n 1
Tabulated value at 5 % level calculated value
5.The heights of 10 males of a given locality are found to be 70, 67, 62, 68, 61, 68, 70, 64, 64, 66
inches . It is reasonable to believe that the average height is greater than 64 inches . Test at 5 %
significance level
X x-X(66) (x-X)2
70
67
62
68
61
68
70
64
64
66
660 90
Mean = 70+67+……….+66/10 = 66
S2 = 2
/ n = 90/9 = 9
S =3
Null Hypothesis H0 : µ = 64 kg
N1 = 25 n2 = 25
X1 = 200 x2 = 250
S1 =20 S2 = 25
n1 s12 n2 s 22
S2 = n1 n2 2 (when s1 and s2 is given)
( x x )2 ( y y) 2
When s2 = n1 n2 2
n1 s12 n2 s 22
S2 = n1 n2 2 =
533.85
S =23.10
H1 : µ1 ≠ µ2
x1 x 2
Test of significance of difference of mean t 1 1 = -7.65
S ( )
2
n1 n 2
H0 Rejected, Result : Bothe the machines are not equally efficient at 1% level of significance
Diet A 25 32 30 34 24 14 32 24 30 31 35 25
Diet B 44 34 22 10 47 31 40 30 32 35 18 21 35 29 22
Test if the two diets differ significantly as regards their effect on increase in weight
Alternative Hypothesis H1 : µ1 ≠ µ2
25 44
32 34
30 22
34 10
24 47
14 31
32 40
24 30
30 32
31 35
35 18
25 21
35
29
22
S2 = n1 n2 2
= 71.6
(OR)
n1 s12 n2 s 22
d d1 d d2
2 2 2 2
2
S = n1 n2 2 WHERE, S 1
2
n1
1
n1
, S2 2
n2
2
n2
,
x1 x 2
Test of significance of difference of mean t 1 1 = - 0.61
S ( )
2
n1 n 2
Horse A 28 30 32 33 33 29 34
Horse B 29 30 30 24 27 27 -
Test whether you can discriminate between two horses. You can use the fact that 5 % value
S2 = n1 n2 2
= 5.29,
n1 s12 n2 s 22
d d1 d d2
2 2 2 2
S =2
n1 n2 2 WHERE, S 1
2
n1
1
n1
, S2 2
n2
2
n2
= 5.29
x1 x 2
Test of significance of difference of mean t = 2.70
1 1
S2( )
n1 n 2
d
t where [ dbar = mean value of the differences]
SD / n 1
1. The following data related to the marks obtained for 11 students in 2 tests 1 held at
beginning of the year and the other at the end of the year, after intensive coaching. Do
the data indicate that the students have benefited by coaching
Test1 : 19 23 16 24 17 18 20 18 21 19 20
Test2 : 17 24 20 24 20 22 20 20 18 22 19
Test1(x) Total
215
Test2(y) 226
d=x-y 2 -1 -4 0 -3 -4 0 -2 3 -3 1 -11
d2 69
S = 2.296
n n
H0 : X1 = X2
H1 : X1< X2
d 1
t = = 1.38
SD / n 1 2.296 / 11 1
2.To verify whether a course in accounting improved performance, a similar test was given to 12
participant both before and after the course. The marks are
Before : 44 40 61 52 32 44 70 41 67 72 53 72
After : 53 38 69 57 46 39 73 48 73 74 60 78
Test1(x) 72 Total
Test2(y) 78
d=x-y -6
d2 36 578
d d
2 2
d
S =4.81 , t = 3.44 (Table value 1.8)
n n SD / n 1
3.A company is testing two machines. A random sample of 8 employees is selected and each
employee uses each machine for one hour. The number of components produced is shown
in the following table.
-23
d2 211
d d
2 2
S = 4.225
n n
H0 : , X1 = X2
H1 : X1< X2
d(bar) = ∑d/n = -23/8 = -2.873
d 2.875
t = = 0.255
SD / n 1 4.255 / 8 1
Calculated(1.787) < Table value at 5% level (1.90)
H0 Accepted, Result : X1=X2 there is no evidence of difference b/w the machines in the mean
number of components produced.
4.Acompany arranged an intensive training course for its team of salesmen. A random sample
of 10 sales men was selected and the value ( in `000) of their sales made in the weeks
immediately before and after the course are shown in the following data.
Salesman 1 2 3 4 5 6 7 8 9 10 Total
Sales 12 23 5 18 10 21 19 15 8 14
before
Salesafter 18 22 15 21 13 22 17 19 12 16
-30
d2 196
Dbar = 3, s = 3.26, t =2.76, H0 rejected, there is no evidence of increase in sales after training.
|t|= -2.764
Or
Whether two independent estimates of the population variance are homogeneous ( Equality of
variance) or not
x x y y
2
2
s12 s2 2
n1 n2
1.A sample size of 13 gave an estimated population variance of 3.0, while another sample of size 15 gave
an estimate of 2.5 could both samples be from populations with the same variance
H0 : The two samples have come from populations with the same
F= / = =3.26/2.64= 1.234
Table value at (df n1-1, n2-1 )(12, 14) is 2.53 > Calculate value 1.2
H0 Accepted.
Result : Both the samples come from the populations with the same variance
1 10 15 90
2 12 14 108
Test whether the samples come from the same normal population
H0 : The two samples have been drawn from the same normal population
H0 = µ1 = µ2 and =
n1 = 10 n2 = 12,
x x y y
2 2
90 108
,
2
xx 90
s12 9
n1 10
y y
2
108
s2 2 9
n2 12
90 12(9)
= 10 = 9.818
9 11
F= / = = 10/9.818 = 1.01
Table value at (df n1-1, n2-1 )(9, 11) is 3.07 > Calculate value 1.018
H0 Accepted.
Result : Both samples come from the same populations
H0 Accepted
Result : the given samples drawn from the same normal population
3.The nicotine contents in two random samples of tobacco are given below
Sample1 21 24 25 26 27
Sample2 22 27 28 30 31 36
Can you say that the two samples come from the same population? [ xbar = 24.6, y bar =29]
Total
Sample1 21 24 25 26 27 123
2
xx 21.2
Sample2 22 27 28 30 31 36 174
2
y y 108.96
n1 s12 n2 s 22 x1 x 2
t
2
S = n1 n2 2 ,
t-test 1 1 = -1.92 < 2.26 ( table value)
S2( )
n1 n 2
H0 Accepted, Could have been drawn from the same normal population
4..Two Independent samples of hieght and seven items respectively had the following values of the
variable
Sample1 9 11 13 11 15 9 12 14
Sample 2 10 12 10 14 9 8 10
H0 : =
H0 : ≠
N1= 8 n2 =7
= 1138
= 4.79
=3.39
F= / = =4.79/3.96 = 1.21
Table value at (df n1-1, n2-1 )(7, 6) is 4.21 > Calculate value 1.21
H0 Accepted.
Result : and does not differ significantly
Applications
To test the hypothetical value of the population variance is (sigma square)
To test the homogeneity of independent estimates of the population correlation coefficient
(O E ) 2
i)Goodness of fit E
2
, df = n-1
1.The company keeps records of accidents during a recent safety review. A random
sample of 60 accidents were selected and classified by day of the week when they
occurred. Test whether the accidents are uniformly distributed over the week
No.of accidents 8 12 9 14 17
O 8 12 9 14 17 Total
E 60
(O-E)2 54
(O E ) 2
2
=4.5
E
2.The following observations show a particular data in a telephone directory observed from the data
Number 0 1 2 3 4 5 6 7 8 9
Frequency 115 118 120 140 135 137 139 142 144 150
Avg =134
O 115 118 120 140 135 137 139 142 144 150 Total
(O-E)2
|O-E|2/E 9.73
H0 Accepted
3.A sample analysis of exam results of 500 students was made. It was found that 200 students have
failed, 170 students secured a 3rd calss, 90 have secured a 2nd class and the rest a first class. So these
figures support the general belief that the above categories are into the ratio 4:3:2:1 respectively. Is
the results support the ration.
O Total
E 500
(O-E)2 600
|O-E|2/E 5.66
(O E ) 2
2 = 5.66
E
H0 Accepted
4.The following table gives the number of aircraft accidents that occurred during the various days of the
week . Test whether the accidents are uniformly distributed over the week [chisquare = 2.143]tv=11.07
No.ofAccidents 14 18 12 11 15 14
Digits 0 1 2 3 4 5 6 7 8 9
Frequency 1026 1107 997 966 1075 933 1107 972 964 853
6.Fit a binomial distribution for the following data and also test the goodness of fit
X 0 1 2 3 4 5 6 Total
F 5 18 28 12 7 6 4 80
ANSWER
Fx 0 18 56 36 28 30 24 192
The expected frequencies are 80 (0.6 +0.4) 6 , n=6, p = 0.4, q =0.6 , np = 2.4
O 5 18 28 12 7 6 4
(O-E)2
|O-E|2/E 51.538
X 0 1 2 3 4 5 Total
F 142 156 69 27 5 1 400
ANSWER
X=0 =147.15
X=1 =147.15
X=2 =73.58
X=3 =24.53
X=4 =6.13
X=5 =1.23
E 142 147 74 25 6 1 Total
O 142 156 69 27 5 1
(O-E)2
|O-E|2/E 1.39
(O E ) 2
Goodness of fit E
2
= 1.39
H0
Result
To test the significance of discrepancy( differences) between experimental (practical )values and
theoretical values.
Degrees of freedom
GOODNESS OF FIT
(O E ) 2
Test of significance =
2
1.The table given below shows the data obtained during an epidemic of cholera. Test the
effectiveness of inoculative the preventing the attack of cholera.
Inoculated 31 469
H0:
(O-E)2
(O E ) 2
2 = 14.64
E
H0 Rejected
(O-E)2
(O E ) 2
2 = 15.237
E
(O E ) 2
2 = 39.59
E
4.Given the following contingency table for hair colour and eye colour. Find the value of chi square. Is
there good association between the two
Hair colour
BLUE 15 5 20 40
EYE GREY 20 10 20 50
COLOUR
BROWN 25 15 20 60
Total 60 30 60 150
16 8 16
20 10 20
24 12 24
2.In a sample of 1000 people in Maharashtra 540 are rice eaters and the rest are wheat eaters can we
assume that both rice and wheat are equally popular in this state at 1 % level of significance.
3.In a city a sample of 1000 people were taken and out of 540 are vegetarians and the rest
Non vegetarians. Can we say that both habits of eating (veg and Non veg) are equally popular in the
city i) 1% LOS ii) 5% LOS
4.Twenty people were attacked by a disease and only 18 survived. Will you reject the hypothesis that
the survival rate if attacked by this disease is 85 in favour of the hypothesis that is more at 5% level
5.A machine is producing bolts of which a certain fraction is defective. A random sample of 400 is
taken from a large batch and is found to certain 30 defective bolts. Does the indicate that the
proportion of defectives is larger than that claimed by the manufactured at 5% LOS
6. A machine puts out 16 imperfect articles in a sample of 500. After the machine is overhauled, it
puts out 3 imperfect articles in a batch of 100. Has the machine improved?
7.A cigarette manufacturing firm claims that its Brand A line of cigarettes outsells its Brand B by 8%.
If its found that 42 out of a sample of 200 smokers prefer Brand A and 18 out of another sample of
100 smokers prefer Brand B. Test whether 8% difference is a valid claim.
8.In a random sample of 400 students of the University teaching departments, it was found that 300
students failed in the examination. In another random sample of 500 students of the affiliated
colleges, the number of failures in the same examination was found to be 300. Find out the
proportion of failures in the university teaching departments and affiliated colleges taken together.
9.A survey is proposed to be conducted to know the annual earnings of the odd statistics graduates of
Delhi university . How large should the sample be taken inorder to estimate the mean annual
earnings within an individuals Rs.1000/- at 95% confidence level? The SD of the annual earnings of the
population is known to Rs.3000
means of a specified classification differ significantly. The ANOVA is classified into two ways
One way Classification : In one way classification the data are classified
according to only one criteria. (based on only one factor)
CF = GT2 / N
Q = ΣΣ Xij2 – CF
Q1= Σ [CT2/ R] – CF
Q2=Q-Q1
(Rows) SSE
c-no.of columns
n-given no.of observations
MSC – Mean squares columns
MSE - Mean squares error
CF – correction factor
GT – Grand total
CT - column total
SSC-Sum of squares columns
SSE-Sum of squares Error
PLOT
Variety of 1 2 3 4 5 Toatl
Wheat
A1 6 8 5 12 9 40
A2 5 3 8 7 7 30
A3 10 7 11 10 12 50
6 36 5 25 10 100
8 64 3 9 7 49
5 25 8 64 11 121
12 144 7 49 10 100
9 81 7 49 12 144
Q2=Q-Q1 =100-40 = 60
A 10 12 13 11 10 14 15 13
B 9 11 10 12 13
C 11 10 15 14 12 13
H0 : There is no significance difference between samples
H1:
CF = GT2 / N =(228)2 /19 =
Q = ΣΣ Xij2 – CF = (1224+615+955)-2736 =
Q1= Σ [CT2/ R] – CF =[982/8+552/5+752/6]-2736=
Q2=Q-Q1 =51
H0 Accepted, RESULT
A 25 19 21 15
B 18 35 28 23
C 21 30 32 25
D 29 28 23 20
H0: There is no significance difference between 4 salesman in their performance of sales
H1:
X1 X12 X2 X22 X3 X32 X4 X42
25
19
21
15
80 1652 104 2862 108 2990 100 2554
CF = GT2 / N =922 /16 = 9604
Q = ΣΣ Xij2 – CF = (1652+2862+2990+2554)-9604 = 454
Q1= Σ [CT2/ R] – CF = 116
Q2=Q-Q1 =338
MSC=116/3 =F1=38.67
MSE=338/12=F2=28.16
RESULT
Perform an analysis of variance test homogeneity of the mean lives of four brands of lamps
X1 X2 X3 X4
1460
1550
1600
1620
1640
1660
1740
1820
11770 19817500 8310 1328100 13090 21503700 9410 14778700
2
CF = GT / N =69732938.26
Q = ΣΣ Xij2 – CF =
Q1= Σ [CT2/ R] – CF =
Q2=Q-Q1 =
(Rows) SSE
F =2.21
H0 Accepted
Result
Q = ΣΣ Xij2 – CF
Q1= Σ [CT2/ R] – CF
Q2= Σ [RT2/ C] – CF
Q3=Q-Q1-Q2
Seasons A B C D
Summer 38 40 41 39
Winter 45 42 49 36
Monsoon 40 38 42 42
H0 : There is no significance difference between 4 salesman
Result1 : All the sales performance is same ( No significant difference b/w sales)
Result2 : All the seasons are same in sales( No significant difference b/w seasons)
2)An Experiment was designed to study the performance of 4 different detergents for cleaning fue
injectors. The following cleanness readings were obtained with specially designed equipment for 12
tanks of gas distributed over 3 different model of engines.
DetergentA 45 43 51
DetergentB 47 46 52
DetergentC 48 50 55
DetergentD 42 37 49
Looking on the detergents of treatments and the Engines at blocks, Obtain the appropriate anova table
and test at 1% level of significance
H0:
RESULT1
RESULT2
Land 1 36 36 21 35 128
Land2 28 29 31 32 120
Land3 26 28 29 29 112
90 93 81 96
H0 :
H1 :
Q = ΣΣ Xij2 – CF =210
36 128
28 120
26 112
SSE 22.67
ESULT1
RESULT2
Non parametric test donot require such assumption. Hence, non parametric test are known distribution
free test. Non parametric test statistics utilise some simple aspects of sample data such as the sighns of
measurement, order relationships or category frequencies. There for stretching and compressing the
scale does not alter them.
The mean and variance of the sampling distribution of U are mean = n1n2/2
n1n2 (n1 n2 1)
Variance =
12
U E[U ]
The standard normal variate of U is | z | N (0,1)
v(U )
I. Note : 1.Combine all the given samples (from smallest to largest) and the assign ranks to all
these values.
II. Assign the average of the ranks, if the sample values are same
III. Find the sum of the ranks for each of the sample. Let us denote these sums by R1 and R2
1)The nicotine contents of 2 brands of cigarettes (in mg) was found to be as follows :
1 2 3 4 5 6 7 8 9 10.5
0.6 1.6 1.9 2.1 2.2 2.5 3.1 3.3 3.7 4
n1 (n1 1)
U = n1n2 R1( sample1)
n
U = 80 +(72/2) – 93 = 23
n1n2 (n1 n2 1)
Variance = = [80 (19)]/12 = 126.67
12
U E[U ]
The standard normal variate of U is | z | where meanof u and var ianceof u
V (U )
23 40
| z | where meanof u and var ianceof u
126.67
|z| = 1.51
2. From the following data, test the hypothesis of the differences between the Mine I and Mine II .
Using the Man Whitney U test . Use α=0.05
Value of 31 25 38 33 42 40 44 26 43 35
mine1
Value of 44 30 34 47 35 32 35 47 48 34
Mine II 46
H0 = µ1 = µ2, (There is no significant difference b/w )
1 2 3 4 5 6 7.5 7.5 10 10
25 26 30 31 32 33 34 34 35 35
n1 (n1 1)
U = n1n2 R1( sample1)
n
n1n2 (n1 n2 1)
Variance = = [100 (21)]/12 = 175
12
U E[U ]
The standard normal variate of U is | z | where meanof u and var ianceof u
V (U )
61.5 50
| z | where meanof u and var ianceof u
175
|z| = 0.86
Result :
Uses of SQC
1.Improvement in Quality
Control charts
2. To determine whether the goal set is being achieved by finding out, the production is control or not
3.CC is a device which helps in attainment of the specified goals by pointing out whether the variations at
Control limits
The control line represents the quality and standard to be achieved and it is plotted as a dark line.
UCL(Upper control lin) and LCL( lower control line) are usually plotted as dotted line.
Process Control
The main objective of any production process is to control and maintain the quality of the manufacturing
product so that it confirms to specific quality stands.
Process Control : the quality of goods while they are in the process of production. To achieve process
control, repeated random samples are taken from the population of items.
Product control
By product control we mean controlling the quality of the product by critical examination at strategic
(important, danger) points and this is achieved thr` sampling Inspection plans.
Control line
The central line represents the quality standard to be achieved and ti is plotted as a dark line. Upper
control Limit and Lower CL are usually plotted as as dotted lines.
Tolerance limits of a quality characteristic are defined as those values between which nearly all the
manufactured items will lie.
In control charts 2 control limits have been set at a distance of 3𝜎 on either side of the mean.
If the measurable quality characteristics X is assumed to be normally distributed with mean µ and SD 𝜎,
If the variable x is normally distributed, the probability that the random observation would be with in
µ±3𝜎 is 0.9973. It means that the probability of an observation falling outside of these limits is
0.0027(0.27%) These control limits are also known as tolerance limits.
2.Mean Chart and Range chart are in the category ( X and R chart)
3.Control charts for Attributes :Control charts for attributes the sampled units are divided into 2
categories
Defective and Non defective
( c chart is used when no.of defects/unit are counted instead of classifying the item as defective or non-
defective ) or(in the proportion denominator will not be given
1. Variable: continuous data. Things we can measure. Example includes length, weight, time, temperature, diameter, etc.
2. Attribute: discrete data. Things we count. Examples include number or percent defective items in a lot, number of defects
per item etc.
3. n quality control a variable is a characteristic that can be measured, an attribute is a
characteristic that can be counted.
4. All variable control charts must track only one quality characteristic of one product on the same
chart.
Attributes Chart
Operators have a high degree of control over assignable causes.
Assembly operations are complex.
Quality can only be measured in terms of good or bad.
Historical information is needed for management review.
Many characteristics must be measured at one time.
Cost of measurement is high.
Production runs are large.
np defectives)
np and P np whereN no. samples,
N np
np whereN no. samples,
n sample s in eachbatch N
n sample s in eachbatch
1
p np 1
No.of Defectives n p
n
np
CL np
UCL np 3 np (1 p )
p (1 p )
UCL p 3
LCL max 0, np 3 np(1 p)
n
p (1 p )
LCL max 0, p 3
n
C Chart Ci
CL c
N
No.of Units
1. You are given the value of sample means mean and range for 10 samples of size 5 each. Draw X bar
and R chart and comment on the state of control of the process
Sample 1 2 3 4 5 6 7 8 9 10
Mean 43 49 37 44 45 37 51 46 43 47
Range 5 6 5 7 7 4 8 6 4 6
2.The following data gives the data of an automobile path 5 samples of 4 items were taken on a random
samples basis. Draw the mean chart and R chart and whether the production process is in control
Sample 1 2 3 4 5
10 10 10 11 12
12 12 10 10 12
Production 10 13 9 9 12
12 13 11 14 12
Example 3 Given below are the values of sample mean X and sample range R for 10 samples, each of size 5.
Draw the appropriate mean and range charts and comment on the state of control of the process.
Sample 1 2 3 4 5 6 7 8 9 10
Mean 43 49 37 44 45 37 51 46 43 47
Range 5 6 5 7 7 4 8 6 4 6
Example 4 A machine fills boxes with dry cereal. 15 samples of 4 boxes are drawn randomly. The weights of
the sampled boxes are shown as follows. Draw the control charts for the sample mean and sample range
and determine whether the process is in a state of control.
Sample Number 1 2 3 4 5 6 7 8
10.0 10.3 11.5 11.0 11.3 10.7 11.3 12.3
Weights of boxes (X) 10.2 10.9 10.7 11.1 11.6 11.4 11.4 12.1
11.3 10.7 11.4 10.7 11.9 10.7 11.1 12.7
12.4 11.7 12.4 11.4 12.1 11.0 10.3 10.7
9 10 11 12 13 14 15
11.0 11.3 12.5 11.9 12.1 11.9 10.6
13.1 12.1 11.9 12.1 11.1 12.1 11.9
13.1 10.7 11.8 11.6 12.1 13.1 11.7
12.4 11.5 11.3 11.4 11.7 12.0 12.1
1 2 15
X
bar
R
Example5(HW)The following data give the average life in hours and range in hours of 12 samples each of 5
lamps. Construct the control charts for X and R and comment on the state of control.
X : 120 127 152 157 160 134 137 123 140 144 120 127
R: 30 44 60 34 38 35 45 62 39 50 35 41
11 12 13 14 15
14.5 9.5 12.0 10.5 11.5
3.9 5.1 4.7 3.3 3.3
Formulas X bar and S chart
x Chart Control Limits S chart
x1 x 2 x 3 ........x n x1 x 2 x 3 ........x n
CL = x = 12.36 CL = x = S 4.02
N N
n
LCL = x - A1 s 12.36 (1.88)(4.02) 4 / 3 3.63
n 1 LCL = B3 S 0
n UCL = B4 S (2.26)(4.02) 9.109
UCL = x + A1 s 21.08
n 1
Example8 The following data given the coded measurements of 10 samples each of size 5, drawn from a
production process at intervals of 1 hour. Calculate the sample means and S.D.’s and draw the control charts
for X and s.
Sample 1 2 3 4 5 6 7 8 9 10
Number
Coded meas- 9 10 10 8 7 12 9 15 10 16
urements (X) 15 11 13 13 9 15 9 15 13 14
14 13 8 11 10 7 9 10 14 12
9 6 12 10 4 16 13 13 7 14
13 10 7 13 5 10 5 17 11 14
Avg. 12 10 10 11 7 12 9 14 11 14
x- x
∑(x- x )2
∑(x- x )2/n
2
S=sqrt[∑(x- x ) /n 2.5 2.3 2.3 1.9 2.3 3.3 2.5 2.4 2.4 1.3
p (1 p )
UCL p p 3
n
p (1 p )
LCL p p 3
n
np-chart
The use of attribute control charts arises when items are compared with some standard and then
are classified as to whether they meet that standard or not. The Np control chart is used to
determine if the rate of nonconforming product is stable, and will detect when a deviation from
stability has occurred. There are those who argue that there should only be an Upper Control
Limit (UCL), and NOT a Lower Control Limit (LCL) since rates of nonconforming product
outside the LCL is actually a good thing. However, if we treat the LCL violations as another
search for an assignable cause, we could learn where lower nonconformity rates lie and perhaps
eliminate them further.
Collect the data recording the number inspected (N) and the number of defective products
(Np). Divide the data into subgroups. Usually, the data is grouped by date or by lot
numbers. The subgroup size (N) should be over 50, and it is strongly recommended you
stick with the constant sample size of 100 for subgroups
np ch ar t
1
p np
n
np
CL
N
LCL max 0, np 3 np (1 p )
UCL np 3 np (1 p )
p (1 p )
UCL np 3 np (1 p )
UCL p p 3
n
p (1 p )
LCL p p 3
n
Example0 : 15 samples of 200 items each were drawn from the output of a process. The number of defective
items in the samples are given below. Prepare a control chart for the fraction defective and comment on the
state of control.
Sample No. (i) : 1 2 3 4 5 6 7 8 9 10
No. of defective (np) : 12 15 10 8 19 15 17 11 13 20
: 11 12 13 14 15
: 10 8 9 5 8
np ch ar t
1
P Chart p np
No. of defectives in a sample n
n p
No.of items inspectd in the sample np
CL
N
p
CL p
Tota l no. of defectives in asample
LCL max 0, np 3 np (1 p )
n Total number of item s inspected inall samples
p (1 p )
UCL np 3 np (1 p )
UCL p p 3
n
p (1 p )
LCL p p 3
n
Example11 : 10 samples each of size 50 were inspected and the number of defectives in the inspection were: 2,
1, 1, 2, 3, 5, 5, 1, 2, 3. Draw theappropriate control chart for defectives.
Example12: Construct a control chart for defectives for the following data:
Sample No. : 1 2 3 4 5 6 7 8 9 10
No. inspected : 90 65 85 70 80 80 70 95 90 75
No. of defectives : 9 7 3 2 9 5 3 9 6 7
Example14(HW) On inspection of 10 samples, each of size 400, the numbers of defective articles were:
19, 4, 9, 12, 9, 15, 26, 14, 15, 17.
Draw the np-chart and p-chart and comment on the state of control.
np ch ar t
P Chart 1
p np
n p
No. of defectives in a sample n
No.of items inspectd in the sample
np
CL
N
p
CL p
Tota l no. of defectives in asample
LCL max 0, np 3 np (1 p )
n Total number of item s inspected inall samples
p (1 p )
UCL np 3 np (1 p )
UCL p p 3
n
p (1 p )
LCL p p 3
n
Example15(HW) Draw the appropriate control chart for the following data and comment
on the state of control:
Day: 1 2 3 4 5 6 7 8 9 10
No. inspected: 150 184 181 196 180 174 210 210 195 210
No. of defectives: 25 10 3 14 6 15 43 28 39 25
Ci
CL c
N
UCL= c 3 c
LCL= max 0, c 3 c
16.Example : 15 tape-recorders were examined for quality control test. The number
of defects in each tape-recorder is recorded below. Draw the appropriate control
chart and comment on the state of control.
Unit no. (i) : 1 2 3 4 5 6 7 8 9 10 11
No. of defects (c) : 2 4 3 1 1 2 5 3 6 7 3
12 13 14 15
1 4 2 1
C chart
Ci
CL c
N
LCL max 0, c 3 c
UCL c 3 c
17.Example : A plant produces paper for newsprint and rolls of paper are inspected for defects.
The results of inspection of 20 rolls of papers are given below: Draw the c-chart and comment
on the state of control.
11 12 13 14 15 16 17 18 19 20
16 14 8 7 6 4 5 6 8 9
Ci 220
CL c
N 20
LCL= c 3 c = 1.05
UCL= c 3 c = 20.95
2. The following are scores made on a math test 80, 90, 90, 85, 60, 70, 75, 85,
A.14 22 15 15 9 B.14 22 14 15 4
C.3 14 19 25 14 D.25 15 14 3 7
4. 4. The harmonic mean, arithmetic mean and geometric mean are all
considered as
6. In which of the following manner the geometric mean, harmonic mean and
arithmetic mean are related?
A. AM < GM < HM B. AM > GM < HM
C. AM > GM > HM D. AM < GM > HM
10. If the variance of a set of observations is 100, then the SD of the set is
A. (L-S)/(L+S) B. (L+S)/(L-S)
C. (LS)/(L+S) D. (LS)/(L-S)
A. Median=Mode B. Mean=Mode.
C. Mean = Median = Mode D. Mean=Median +Mode
13. If the mode is not well defined then Pearson’s coefficient of skewness is
given by
A. 2(median-mean)/standard deviation
B. 3(median-mean)/standard deviation
C. 2(mean-median)/standard deviation
D. 3(mean-median)/standard deviation
f x x f x x
A. B.
N 2N
2 2
f x x
2
f x 2 x
C. D.
2N N
17. In kurtosis, the β2 is greater than three, then the frequency distribution is
preferred to as
18. The three times of difference between mean and median is divided by
standard deviation to calculate coefficient of skewness by method of
19. The variability which is defined as the difference between third and first
quartile is considered as
20. If the distribution is moderately asymmetrical, the mean, median and mode
obey the empirical relationship by Karl Pearson as
21. In a frequency curve of scores, the mode is found to be higher than the
mean, this shows that the distribution is
A. 40.33 B. 30.33
C. 33.33 D. 13.33
A. 15 B. 10
C. 8.5 D. 7.5
25. The measure of central tendency which does not give more weightage to
smaller values is
2. In regression, the equation that describes how the response variable (y) is
4. The two lines of regression are given as x+2y-5 = 0 , 2x+3y = 8. Then the mean values
of x and y are respectively given by
5. The tangent of the angle between two regression lines is given as 0.6 and SD of y
is known to be twice that of x .Then r(x,y) is
8. For calculating rank coorelation, the correction factor for repeated rank is
m(m 2 1) m 2 (m 1) m(m 2) m
(i) , (ii) , (iii) , (iv)
12 12 6 2
(i) X and Y are indepent variables (ii) X and Y are dependent variables
(iii) X and Y are negatively correlated (iv) X and Y are positively correlated
10.The correlation coefficient (X-independent, Y-dependent) will have the sign when
11.Correlation coefficient
(i) can take any value in between -1 and 1 (ii) is always less than 1
(i) (-1/3 , -3/4) (ii) (-3/4 , -1/3) (iii) (1, 1/3) (iv) (-3/4,1)
1−𝑟 2 𝜎𝑥+ σ
(𝑖𝑣)𝑡𝑎𝑛𝜃 = ( ) (𝜎2 +𝜎y2 )
𝑟 𝑥 𝑦
17. If the lines of regressions are y = x/4 and x = ( y/9) +1 then r(x,y) is
18. when the relationship between more than two variables are studied , the correlation is
known as
19. The study of correlation between two variables excluding some other variables is
called
(i) 0 and 1 (ii) -1 and 0 (iii) -1 and 1 (iv) –0.5 and 0.5
2
21.The value of r for a particular situation is 0.81.what is co-efficient of
correlation?
22. When the no.of items are greater than 30 and the ranks are given, the co-
efficient of correlation of the following method is used
1. If a researcher takes a large enough sample, he/she will almost always obtain:
a. virtually significant results
b. practically significant results
c. consequentially significant results
d. statistically significant results
ANSWER: d
11. The chi-square test is not very effective if the sample is:
a. small
b. large
c. irregular
d. heterogeneous
ANSWER: a
16. _________ is the values that mark the boundaries of the confidence interval.
a. Confidence intervals
b. Confidence limits
c. Levels of confidence
d. Margin of error
ANSWER: b
17. _____ results if you fail to reject the null hypothesis when the null hypothesis is
actually false.
a. Type I error
b. Type II error
c. Type III error
d. Type IV error
ANSWER: b
18. When the researcher rejects a true null hypothesis, a ____ error occurs.
a. Type I
b. Type A
c. Type II
d. Type B
ANSWER: a
a. Type I error
b. Type II error
c. Type A error
d. Type B error
ANSWER: a
20. Which of the following statements is/are true according to the logic of hypothesis
testing?
a. When the null hypothesis is true, it should be rejected
b. When the null hypothesis is true, it should not be rejected
c. When the null hypothesis is false, it should be rejected
d. When the null hypothesis is false, it should not be rejected
e. Both b and c are true
ANSWER: e
26. Student’s t-distribution has (n-1) d.f. when all the n observations in the sample are
(a) Dependent (b) Independent (c) Maximum (d) Minimum
ANSWER: b
30. t- test and F- test are used only for --- samples.
a) Large b) 90 c) small d) 80
6. If b yx 1 , then b xy is
(a) Less than 1 (b) Greater than 1 (c) Equal to 1 (d) Equal to 0
x 1 3 4 5 7 8 10
y 2 6 8 10 14 16 20
(a) Perfect correlation (b) Perfect positive correlation
(c) Perfect negative correlation (d) cannot be determined
11. When the correlation co-efficient r 1 , then the two regression lines are
(a) are perpendicular to each other (b) coincide
(c) are parallel to each other (d) do not exist
12. The regression co-efficients are b 2 and b1 , then the correlation co-efficient r is
b1 b2
(a) (b) (c) b1.b 2 (d) b1.b 2
b2 b1
13. In one-way classification the data are classified according to only --- criterion.
a) two b) one c) five d)six
14.In two-way classification the data are classified according to --- different factor.
a) two b) one c) five d) six
Answers:
1 d
2 b
3 b
4 c
5 a
6 b
7 c
8 a
9 c
10 c
11 c
12 d
13 b
14 a
15
1. In statistical quality control , by quality we mean an attributes of the product that determines
its --- for use.
a) Cost b) price c) manpower d) Fitness
2. Quality control is a powerful --- technique for effective diagnosis of lack of quality in any
of the materials
a) productivity b) quantitative c) non-productivity d) cost
3. By quality of materials , we mean a good quality will result in smooth processing there by
reducing the waste and increasing the ---
a) input b) output c) cost d) production cost
4. By quality of manpower , we mean the trained and qualified personal will give increased
efficiency due to the better quality production through the application skill and reduce the --
- and waste.
a) Production cost b) quantity c) material d) business
5. By quality of machines , we mean a better quality --- which will result in efficient work.
a) Cost b) equipment c) manpower d) production cost
6. Quality control based on process production are classified into --- factors.
a) one b) two c) three d) four
7. SQC is a productivity enhancing and regulatory technique with three factors - management,
methods and ---
a) mathematics b) chemistry c) physics d) biology
10. The main objective in production process is to --- and maintain the quality of the
manufactured products.
a) control b) uncontrol c) assign d) produce
11. Control charts provide criteria for detecting lack of ----- control
a) physical b) statistical c) chemical d) biological
12. 𝑋̅ and R charts are employed to control the mean and --- respectively of the characteristic.
a) median b) mode c) S.D d) skewness
13. Shewhart's control chart for the number of defects per unit is used when the characteristic
representing the quality of a product is --- variable.
a) continuous b) uniform c) discrete d) exponential
15. If 'd' is the number of defectives in a sample of size n then the sample proportion defective
is ---
a) p d b) p d c) p d d) p d
n s n
20. In the control chart, the central line CL is plotted as a ------ line
a) dotted b) scattered c) empty d) bold
21. np - chart and p -chart are used when p ≥ 0.05 and n p ≥ ----
a) 1 b) 2 c) 3 d)4
22.c -chart is used when c ≥ ----
a) 1 b) 2 c) 3 d)4
UNIT -V 11. b
ANSWERS 12. c
1. d 13. c
2. a 14. d
3. b 15. a
4. a 16. c
5. b 17. b
6. d 18. b
7. a 19. c
8. b 20. d
9. c 21.d
10. a 22.d
(oR) 2. For the individual observations, the reciprocal of arithmetic mean is called
b. A teacher wishes to test three different leading methods I, II and III. To do this, the teacher (A) Geometric mean (B) Harmonic mean
chooses at random three groups of five students each and teaches each group by a different (C) Deviation square mean (D) Paired mean
- method. The same examination is then given to all the students and the marks obtained are given
below: Determine whether there is a significance difference between the teaching methods at 3. In a s)"rnmetric distribution
cr : 0.05. (A) Mean-Mode (B) Mean = Mode
Method I 78 62 7l 58 73 (C) Mean = Median: Mode (D) Mean: N4gdian + Mode
Method lI 76 85 77 90 87
Method III 74 79 60 75 80 4. The variability which is defined as t}e difference between third and fust quartile is considered as
(A) Quartile range (B) Deciles ra.nge
(C) Percentile range (D) Interquartilerange
32. a. 6iysn below are the values of sample mean -trland sample range R for 10 samples each of size 5.
Draw the appropriate mean and range charts and comment on the state of control of the process. 5. Ifbl and b: are regression coefficients, then the correlation coefficient is
Sample No. I 2 J 4 5 6 7 I 9 l0 (A) E G) q,+bz
Mean 43 49 37 44 45 37 51 46 43 47 b22
Range 5 6 5 7 7 4 8 6 4 6 (c) 4b2 to) JW
(oR) 6. When the correlation coefficient r -F 1, then regression lines
b. The specifications for a certain quahty characteristics are (60+ 24) in coded values. The table (A) Are perpendicular to each other (B) Coincide
given below gives the measurements obtained in 10 samples. Find the tolerance limits for the (C) Are parallel to each other (D) Do not exist
and test if the process meets the ons.
Specification 7. Correlation coeffrcient
Sample No
I 2 J 4 5 6 7 8 9 10 (A) Can take any value between -1 and +1 (B) Is always less then 1.
13. The mean of t-distribution is 25. Calculate the tend values by the method of moving averages assuming a four yearly cycle from
(A) 0 (B) 1 the followins
o data:
(c) -l (D) 2 Year 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 r99t 1992
Sugar
14. The assumptions in analysis of variance are the same as
Production 37.4 31.1 38.7 39.s 47.9 42.6 48.4 64.6 58.4 38.6 51.4 84.4
(A) Chi-square test (B) t-test (lakhs tonnes)
(C) F-test @) Median
26. Ten cartons are taken at mndom from an automatic filling machine. The mean net weight of
15. In RBD, the degrees of freedom for residual (error) is
cartons is 1 1.8 kg and the standard deviation is 0.1 5 kg. Does the sample mean differ significantly
(A) c-l (B) r-t from the intended weight of l2kg (Given v: 9,hos-- 2.26)
(C) (c-1)(r-1) (D) c-2
27. 15 taperecorders were examined for quality control test. The number of defects in each tape
16. The range ofF-distribution is
ate of contro
is siven bel o\ v. Draw the aonrooriate chart and comment on the state
recorder rs con o.l^
(A) 0 to co (B) -oo to co
Unit no. 2 ] 4 5 6 7 8 9 l0 l1 l2 13 l4 15
(C) -1 to oo (D) Itoo
1
No. of defects 2 4 1 I I 2 5 3 6 7 3 I 4 2 I
17. Quality control based on process production are classified into factors.
(A) One @) Two PART-C(5x12=60Marla)
(C) Three @) Four Aaswer ALL Questions
22. The weekly salaries of a group of employees are given in the following table. Find the (oR)
standard deviation of the salaries. b. Obtain the lines of regression for the following: