Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

ANOVA Unit3 BBA504A

Download as pdf or txt
Download as pdf or txt
You are on page 1of 9

For the students of

M. Com. (Applied Economics) Sem. IV

Paper: Research Methodology (Unit IV)

Note: Study material may be useful for the courses wherever Research Methodology paper is
being taught.

Prepared by:
Dr. Anoop Kumar Singh
Dept. of Applied Economics,
University of Lucknow

Topic: F-TEST and Analysis of Variance (ANOVA)

Introduction

Analysis of variance (ANOVA) is statistical technique used for analyzing the difference between
the means of more than two samples. It is a parametric test of hypothesis. It is a step wise
estimation procedures (such as the "variation" among and between groups) used to attest the
equality between two or more population means .

ANOVA was developed by statistician and eugenicist Ronald Fisher. Though many statisticians
including Fisher worked on the development of ANOVA model but it became widely known
after being included in Fisher's 1925 book “Statistical Methods for Research Workers”. The
ANOVA is based on the law of total variance, where the observed variance in a particular
variable is partitioned into components attributable to different sources of variation. ANOVA
provides an analytical study for testing the differences among group means and thus generalizes
the t-test beyond two means. ANOVA uses F-tests to statistically test the equality of means.

Concept of Variance

Variance is an important tool in the sciences including statistical science. In the Theory of
Probability and statistics, variance is the expectation of the squared deviation of a random
variable from its mean. Actually, it is measured to find out the degree to which the data in series
are scattered around its average value. Variance is widely used in statistics, its use is ranging
from descriptive statistics to statistical inference and testing of hypothesis.

Relationship Among Variables


under the said analysis, we use to examine the differences in the mean values of the

1
dependent variable associated with the effect of the controlled independent variables,
after taking into account the influence of the uncontrolled independent variables.

We take the null hypothesis that there is no significant difference between the means of
different populations. In its simplest form, analysis of variance must have a dependent
variable that is metric (measured using an interval or ratio scale). There must also be
one or more independent variables. The independent variables must be all categorical
(non-metric). Categorical independent variables are also called factors. A particular
combination of factor levels, or categories, is called a treatment.

What type of analysis would be made for examining the variations depends upon the
number of independent variables taken into account for the study purpose. One-way
analysis of variance involves only one categorical variable, or a single factor. If two or
more factors are involved, the analysis is termed n-way (eg. Two-Way, Three-Way etc.)
Analysis of Variance.

F Tests

F-tests are named after the name of Sir Ronald Fisher. The F-statistic is simply a ratio of two
variances. Variance is the square of the standard deviation. For a common person, standard
deviations are easier to understand than variances because they’re in the same units as the data
rather than squared units. F-statistics are based on the ratio of mean squares. The term “mean
squares” may sound confusing but it is simply an estimate of population variance that accounts
for the degrees of freedom (DF) used to calculate that estimate.

For carrying out the test of significance, we calculate the ratio F, which is defined as:

𝑆2 (𝑋1 −𝑋̅1 )2
𝐹 = 𝑆12 , where 𝑆12 =
2 𝑛1 −1

(𝑋2 −𝑋̅2 )2
And 𝑆22 = 𝑛2 −1

It should be noted that 𝑆12 is always the larger estimate of variance, i.e., 𝑆12 > 𝑆22
𝐿𝑎𝑟𝑔𝑒𝑟 𝑒𝑠𝑡𝑖𝑚𝑎𝑡𝑒 𝑜𝑓 𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒
F= 𝑆𝑚𝑎𝑙𝑙𝑒𝑟 𝑒𝑠𝑡𝑖𝑚𝑎𝑡𝑒 𝑜𝑓 𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒

1= 𝑛1 -1 and 2 = 𝑛2 -1

1= degrees of freedom for sample having larger variance.

2 = degrees of freedom for sample having smaller variance.

2
The calculated value of F is compared with the table value for 1 and 2 at 5% or 1% level of
significance. If calculated value of F is greater than the table value then the F ratio is considered
significant and the null hypothesis is rejected. On the other hand, if the calculated value of F is
less than the table value the null hypothesis is accepted and it is inferred that both the samples
have come from the population having same variance.

Illustration 1: Two random samples were drawn from two normal populations and their values
are:

A 65 66 73 80 82 84 88 90 92
B 64 66 74 78 82 85 87 92 93 95 97

Test whether the two populations have the same variance at the 5% level of significance.

(Given: F=3.36 at 5% level for 1=10 and 2 =8.)

Solution: Let us take the null hypothesis that the two populations have not the same variance.

Applying F-test:

𝑆2
F=𝑆12
2

A (𝑋1 -𝑋̅1 ) 𝑥12 B (𝑋2 -𝑋̅2 ) 𝑥22


𝑋1 𝑥1 𝑋2 𝑥2
65 -15 225 64 -19 361
66 -14 196 66 -17 289
73 -7 49 74 -9 81
80 0 0 78 -5 25
82 2 4 82 -1 1
84 4 16 85 2 4
88 8 64 87 4 16
90 10 100 92 9 81
92 12 144 93 10 100
95 12 144
97 14 196
∑𝑋1 = 720 ∑𝑥1 =0 ∑𝑥12 -=798 ∑𝑋2 =913 ∑𝑥2 =0 ∑𝑥22 =1298

∑𝑋 720
𝑋̅1 = 𝑛 1 = 9 = 80;
1

∑𝑋 913
𝑋̅2 = 𝑛 2 = 11 = 83
2

798
𝑆12 = ∑𝑥12 /n1-1= 9−1= 99.75

3
734
𝑆22 = ∑𝑥22 /n2-1= 11−1= 129.8

𝑆12 99.75
F= = = 0.768
𝑆22 129.8

𝐴𝑡 5 𝑝𝑒𝑟𝑐𝑒𝑛𝑡 𝑙𝑒𝑣𝑒𝑙 𝑜𝑓 𝑠𝑖𝑔𝑛𝑖𝑓𝑖𝑐𝑎𝑛𝑐𝑒, For 1= 10 and 2 =8, the table value of 𝐹0.05 =3.36.

The calculated value of F is less than the table value. The hypothesis is accepted. Hence the two
populations have not the same variance.

TESTING EQUALITY OF POPULATION (TREATMENT) MEANS:

ONE-WAY CLASSIFICATION

In one way classification, following steps are carrying out for computing F- ratio through most
popular method i.e. short-cut method:

1. Firstly get the squared value of all the observation for different samples (column)

2. Get the sum total of sample observations as ∑X1, ∑X2,……. ∑Xk in each column.

3. Get the sum total of squared values for each column as ∑𝑋12 , ∑𝑋22 ,……. ∑𝑋𝐾2 in each column.

4. Finding the value of “T” by adding up all the sums of sample observations i.e. T= ∑X 1+
∑X2+……. ∑Xk

5. Compute the Correction Factor by the formula:


𝑇2
C.F.= 𝑁

6. Find out Total sum of Squares (SST) through squared values and C F:

SST= ∑𝑋12 + ∑𝑋22 , +……. ∑𝑋𝐾2 - CF

7. Find out Sum of square between the samples SSC by following formula:

(∑𝑋1 )2 (∑𝑋2 )2 (∑𝑋𝐾 )2


SSC= + +…………… – CF
𝑛1 𝑛2 𝑛𝑘

8. Finally, find out sum of squares within samples i.e. SSE as under:

SSE= SST-SSC

ANALYSIS OF VARIANCE (ANOVA) TABLE

Source of Sum of squares Degrees of Mean square (MS) Variance ratio


Variation (SS) freedom (v) of F
Between samples SSC 1 = C-1 MSC= SSC/1

4
(Treatments)
Within samples SSE 2 = N-C MSE SSE/2 𝑀𝑆𝐶
F= 𝑀𝑆𝐸
(error)
Total SST n-1

SSC= Sum of squares between samples (Columns)

SST= Total sum of the squares of Variations.

SSE= Sum of squares within the samples.

MSC= Mean sum of squats between samples

MSE= Mean sum of squares within samples

Illustration 2: To test the significance of variation in the retail prices of a commodity in three
principal cities, Mumbai, Kolkata, and Delhi, four shops were chosen at random in each city and
the prices who lack confidence in their mathematical ability observed in rupees were as follows:

Kanpur 15 7 11 13
Lucknow 14 10 10 6
Delhi 4 10 8 8

Do the data indicate that the price in the three cities are significantly different?

Solution: Let us take the null hypothesis that there is no significant difference in the prices of a
commodity in the three cities.

Calculations for analysis of variance are us under:

Sample 1 Sample 2 Sample 3


Kanpur Lucknow Delhi
𝑥1 2
𝑥1 𝑥2 𝑥22 𝑥3 𝑥32
15 225 14 196 4 16
7 49 10 100 10 100
11 121 10 100 8 64
13 169 6 36 8 64
2 2 2
∑𝑥1 = 46 ∑𝑥1 =564 ∑𝑥2 =40 ∑ 𝑥2 =432 ∑𝑥3 = 30 ∑𝑥3 =244

There are r = treatments (samples) with 𝑛1 =4, 𝑛2 = 4, 𝑛3 = 4, and n= 12.

T= Sum of all the observations in the three samples

= ∑𝑥1 + ∑𝑥2 +∑𝑥3 = 46+40+30 = 116

5
𝑇2 (116)2
CF = Correction Factor = = = 1121.33
𝑛 12

SST = Total sum of the squares

= (∑𝑥12 + ∑𝑥22 +∑𝑥32 ) – CF = (564+ 432+ 244)-1121.33 = 118.67

SSC= Sum of the squares between the samples

(∑𝑥1)2 (∑𝑥2 )2 (∑𝑥3 )2


=[ + + ] – CF
𝑛1 𝑛2 𝑛3

(46)2 (40)2 (30)2


=[ + + ] – 1121.33
4 4 4

2116 1600 900


=[ + + ] – 1121.33
4 4 4

4616
= – 1121.33 = 32.67
4

SSE= SST-SSC= 118.67-32.67= 86

Degrees of freedom: df1 = r-1= 3-1 = 2 and df2 =n-r= 12-3=9


𝑆𝑆𝑇𝑅 32.67 𝑆𝑆𝐸 86
Thus MSTR= = = 16.33 and MSE= = = 9.55
𝑑𝑓1 2 𝑑𝑓2 9

ANOVA TABLE

Source of Sum of Squares Degrees of Mean Squares Test-Statistic


Variation Freedom
Between 32.67 = SSTR 2=r-1 𝑆𝑆𝑇𝑅 𝑀𝑆𝐶 16.335
MSC= F= 𝑀𝑆𝐸 =
𝑟−1 9.55
Samples
32.67 = 1.71
= 2
= 16.335
Within Samples 86= SSE 9=n-r
𝑆𝑆𝐸 86
MSE= 𝑛−𝑟 = =9.55
9

Total 118.67=SST 11=n-1

The table value of F for df1 =2 , df2 = 9, and  = 5% level of significance is 4.26. Since
calculated value of F is less than its critical (or table) value, the null hypothesis is accepted.
Hence we conclude that prices of a commodity in three cities have no significant difference.

6
TESTING EQUALITY OF POPULATION (TREATMENT) MEANS: TWO-WAY
CLLASIFICATION

ANOVA TABLE FOR TWO-WAY CLASSIFICATION

Source of Sum of Degrees of Mean Square


Variation square Freedom
Between SSC c-1 MSC= SSTR/(c-1) 𝐹𝑡𝑟𝑒𝑎𝑡𝑚𝑒𝑛𝑡 =MSC/MSE
columns
Between rows SSR r-1 MSR= SSR/(r-1) 𝐹𝑏𝑙𝑜𝑐𝑘𝑠 = MSR/MSE
Residual error SSE (c-1)(r-1) MSE= SSE/(c-1)(r-1)
Total SST n-1

Total variation consists of three parts: (i) variation between columns, SSC; (ii) variation between
rows, SSR; and (iii) actual variation due to random error, SSE. That is,

SST=SSC+(SSR+SSE).

The degrees of freedom associated with SST are cr-1, where c and r are the number of columns
and rows, respectively.

Degrees of freedom between columns= c-1

Degrees of freedom between rows= r-1

Degrees of freedom for residual error=(c-1)(r-1)

The test-statistic F for analysis of variance is given by:

𝐹𝑡𝑟𝑒𝑎𝑡𝑚𝑒𝑛𝑡 = MSC/MSE; MSC > MSE or MSE/MSC; MSE > MSC

𝐹𝑏𝑙𝑜𝑐𝑘𝑠 = MSR/MSE; MSR > MSE or MSE/MSR; MSE>MSR.

Illustration 3: The following table gives the number of refrigerators sold by 4 salesmen in three
months May, June and July:

Month Salesman

A B C D
March 50 40 48 39
April 46 48 50 45
May 39 44 40 39

7
Is there a significant difference in the sales made by the four salesmen? Is there a significant
difference in the sales made during different months?

Solution: Let us take the following null hypothesis:

𝐻𝑂 ∶ There is no significant difference in the sales made by the four salesmen.

𝐻𝑂 ∶ There is no significant difference in the sales made during different months.

The given data are coded by subtracting 40 from each observation. Calculations for a two-
criteria-month and salesman-analysis of variance are shown below:

Two-way ANOVA Table

Month Salesman Row


A(x1) 𝑥12 B(x2) 𝑥22 C(x3) 𝑥32 D(x4) 𝑥42 Sum
March 10 100 0 0 8 64 -1 1 17
April 6 36 8 64 10 100 5 25 29
May -1 1 4 16 0 0 -1 1 2
Column 15 137 12 80 18 164 3 27 48
sum

T= Sum of all observations in three samples of months= 48

𝑇2 (48)2
CF= Correction Factor= = = 192
𝑛 12

SSC= Sum of squares between salesmen (columns)

(15)2 (12)2 (18)2 (3)2


=[ + + + ] – 192
3 3 3 3

= (75+48+108+3)-192= 42

SSR= Sum of squares between months (rows)

(17)2 (29)2 (2)2


=[ + + ]-192
4 4 4

= (72.25 +210.25+1) -192 = 91.5

SST= Total sum of squares

= (∑𝑥12 +∑𝑥22 +∑𝑥32 +∑𝑥42 )-CF

= (137+80+164+27)-192 = 216

SSE= SST-(SSC+SSR) = 216-(42+91.5) = 82.5

8
The total degrees of freedom are df= n-1=12-1=11.

So dfc= c-1 = 4-1 = 3, dfr = r-1=3-1=2; df =(c-1)(r-1)= 3x2=6

Thus, MSC= SSC/(c-1) = 42/3=14

MSR= SSR/(r-1)= 91.5/2= 45.75

MSE= SSE/(c-1)(r-1) = 82.5/6=13.75

The ANOVA table is shown below:

Source of Sum of Degrees of Mean Squares Variance Ratio


variation squares freedom
Between SSC=42.0 c-1=3 MSC=SSC/(c-1) 𝐹𝑇𝑟𝑒𝑎𝑡𝑚𝑒𝑛𝑡 = MSC/MSE
Salesmen =14.00 =14/13.75
Between SSR=91.5 r-1=2 MSR=SSR/(r-1) =1.018
months =45.75 𝐹𝐵𝑙𝑜𝑐𝑘 = MSR/MSE
Residual SSE=82.5 (c-1)(r-1)=6 MSE=SSE/(c-1)(r-1) =45.75/13.75
error =13.75 =3.327
Total SST=216 n-1=11

(a) The table value of F = 4.75 for df1 =3, df2 = 6, and  =5%. Since the calculated value of
𝐹𝑇𝑟𝑒𝑎𝑡𝑚𝑒𝑛𝑡 = 1.018 is less than its table value, the null hypothesis is accepted. Hence we
conclude that the sales made by the salesmen do not differ significantly.

(b) The table value of F= 5.14 for df1=2, df2=6, and = 5%. Since the calculated value of 𝐹𝐵𝑙𝑜𝑐𝑘 =
3.327 is less than its table value, the null hypothesis is accepted. Hence we conclude that sales
made during different months do not differ significantly.

You might also like