ANOVA Unit3 BBA504A
ANOVA Unit3 BBA504A
ANOVA Unit3 BBA504A
Note: Study material may be useful for the courses wherever Research Methodology paper is
being taught.
Prepared by:
Dr. Anoop Kumar Singh
Dept. of Applied Economics,
University of Lucknow
Introduction
Analysis of variance (ANOVA) is statistical technique used for analyzing the difference between
the means of more than two samples. It is a parametric test of hypothesis. It is a step wise
estimation procedures (such as the "variation" among and between groups) used to attest the
equality between two or more population means .
ANOVA was developed by statistician and eugenicist Ronald Fisher. Though many statisticians
including Fisher worked on the development of ANOVA model but it became widely known
after being included in Fisher's 1925 book “Statistical Methods for Research Workers”. The
ANOVA is based on the law of total variance, where the observed variance in a particular
variable is partitioned into components attributable to different sources of variation. ANOVA
provides an analytical study for testing the differences among group means and thus generalizes
the t-test beyond two means. ANOVA uses F-tests to statistically test the equality of means.
Concept of Variance
Variance is an important tool in the sciences including statistical science. In the Theory of
Probability and statistics, variance is the expectation of the squared deviation of a random
variable from its mean. Actually, it is measured to find out the degree to which the data in series
are scattered around its average value. Variance is widely used in statistics, its use is ranging
from descriptive statistics to statistical inference and testing of hypothesis.
1
dependent variable associated with the effect of the controlled independent variables,
after taking into account the influence of the uncontrolled independent variables.
We take the null hypothesis that there is no significant difference between the means of
different populations. In its simplest form, analysis of variance must have a dependent
variable that is metric (measured using an interval or ratio scale). There must also be
one or more independent variables. The independent variables must be all categorical
(non-metric). Categorical independent variables are also called factors. A particular
combination of factor levels, or categories, is called a treatment.
What type of analysis would be made for examining the variations depends upon the
number of independent variables taken into account for the study purpose. One-way
analysis of variance involves only one categorical variable, or a single factor. If two or
more factors are involved, the analysis is termed n-way (eg. Two-Way, Three-Way etc.)
Analysis of Variance.
F Tests
F-tests are named after the name of Sir Ronald Fisher. The F-statistic is simply a ratio of two
variances. Variance is the square of the standard deviation. For a common person, standard
deviations are easier to understand than variances because they’re in the same units as the data
rather than squared units. F-statistics are based on the ratio of mean squares. The term “mean
squares” may sound confusing but it is simply an estimate of population variance that accounts
for the degrees of freedom (DF) used to calculate that estimate.
For carrying out the test of significance, we calculate the ratio F, which is defined as:
𝑆2 (𝑋1 −𝑋̅1 )2
𝐹 = 𝑆12 , where 𝑆12 =
2 𝑛1 −1
(𝑋2 −𝑋̅2 )2
And 𝑆22 = 𝑛2 −1
It should be noted that 𝑆12 is always the larger estimate of variance, i.e., 𝑆12 > 𝑆22
𝐿𝑎𝑟𝑔𝑒𝑟 𝑒𝑠𝑡𝑖𝑚𝑎𝑡𝑒 𝑜𝑓 𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒
F= 𝑆𝑚𝑎𝑙𝑙𝑒𝑟 𝑒𝑠𝑡𝑖𝑚𝑎𝑡𝑒 𝑜𝑓 𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒
1= 𝑛1 -1 and 2 = 𝑛2 -1
2
The calculated value of F is compared with the table value for 1 and 2 at 5% or 1% level of
significance. If calculated value of F is greater than the table value then the F ratio is considered
significant and the null hypothesis is rejected. On the other hand, if the calculated value of F is
less than the table value the null hypothesis is accepted and it is inferred that both the samples
have come from the population having same variance.
Illustration 1: Two random samples were drawn from two normal populations and their values
are:
A 65 66 73 80 82 84 88 90 92
B 64 66 74 78 82 85 87 92 93 95 97
Test whether the two populations have the same variance at the 5% level of significance.
Solution: Let us take the null hypothesis that the two populations have not the same variance.
Applying F-test:
𝑆2
F=𝑆12
2
∑𝑋 720
𝑋̅1 = 𝑛 1 = 9 = 80;
1
∑𝑋 913
𝑋̅2 = 𝑛 2 = 11 = 83
2
798
𝑆12 = ∑𝑥12 /n1-1= 9−1= 99.75
3
734
𝑆22 = ∑𝑥22 /n2-1= 11−1= 129.8
𝑆12 99.75
F= = = 0.768
𝑆22 129.8
𝐴𝑡 5 𝑝𝑒𝑟𝑐𝑒𝑛𝑡 𝑙𝑒𝑣𝑒𝑙 𝑜𝑓 𝑠𝑖𝑔𝑛𝑖𝑓𝑖𝑐𝑎𝑛𝑐𝑒, For 1= 10 and 2 =8, the table value of 𝐹0.05 =3.36.
The calculated value of F is less than the table value. The hypothesis is accepted. Hence the two
populations have not the same variance.
ONE-WAY CLASSIFICATION
In one way classification, following steps are carrying out for computing F- ratio through most
popular method i.e. short-cut method:
1. Firstly get the squared value of all the observation for different samples (column)
2. Get the sum total of sample observations as ∑X1, ∑X2,……. ∑Xk in each column.
3. Get the sum total of squared values for each column as ∑𝑋12 , ∑𝑋22 ,……. ∑𝑋𝐾2 in each column.
4. Finding the value of “T” by adding up all the sums of sample observations i.e. T= ∑X 1+
∑X2+……. ∑Xk
6. Find out Total sum of Squares (SST) through squared values and C F:
7. Find out Sum of square between the samples SSC by following formula:
8. Finally, find out sum of squares within samples i.e. SSE as under:
SSE= SST-SSC
4
(Treatments)
Within samples SSE 2 = N-C MSE SSE/2 𝑀𝑆𝐶
F= 𝑀𝑆𝐸
(error)
Total SST n-1
Illustration 2: To test the significance of variation in the retail prices of a commodity in three
principal cities, Mumbai, Kolkata, and Delhi, four shops were chosen at random in each city and
the prices who lack confidence in their mathematical ability observed in rupees were as follows:
Kanpur 15 7 11 13
Lucknow 14 10 10 6
Delhi 4 10 8 8
Do the data indicate that the price in the three cities are significantly different?
Solution: Let us take the null hypothesis that there is no significant difference in the prices of a
commodity in the three cities.
5
𝑇2 (116)2
CF = Correction Factor = = = 1121.33
𝑛 12
4616
= – 1121.33 = 32.67
4
ANOVA TABLE
The table value of F for df1 =2 , df2 = 9, and = 5% level of significance is 4.26. Since
calculated value of F is less than its critical (or table) value, the null hypothesis is accepted.
Hence we conclude that prices of a commodity in three cities have no significant difference.
6
TESTING EQUALITY OF POPULATION (TREATMENT) MEANS: TWO-WAY
CLLASIFICATION
Total variation consists of three parts: (i) variation between columns, SSC; (ii) variation between
rows, SSR; and (iii) actual variation due to random error, SSE. That is,
SST=SSC+(SSR+SSE).
The degrees of freedom associated with SST are cr-1, where c and r are the number of columns
and rows, respectively.
Illustration 3: The following table gives the number of refrigerators sold by 4 salesmen in three
months May, June and July:
Month Salesman
A B C D
March 50 40 48 39
April 46 48 50 45
May 39 44 40 39
7
Is there a significant difference in the sales made by the four salesmen? Is there a significant
difference in the sales made during different months?
The given data are coded by subtracting 40 from each observation. Calculations for a two-
criteria-month and salesman-analysis of variance are shown below:
𝑇2 (48)2
CF= Correction Factor= = = 192
𝑛 12
= (75+48+108+3)-192= 42
= (137+80+164+27)-192 = 216
8
The total degrees of freedom are df= n-1=12-1=11.
(a) The table value of F = 4.75 for df1 =3, df2 = 6, and =5%. Since the calculated value of
𝐹𝑇𝑟𝑒𝑎𝑡𝑚𝑒𝑛𝑡 = 1.018 is less than its table value, the null hypothesis is accepted. Hence we
conclude that the sales made by the salesmen do not differ significantly.
(b) The table value of F= 5.14 for df1=2, df2=6, and = 5%. Since the calculated value of 𝐹𝐵𝑙𝑜𝑐𝑘 =
3.327 is less than its table value, the null hypothesis is accepted. Hence we conclude that sales
made during different months do not differ significantly.