RM Note Unit - 4
RM Note Unit - 4
(C) Students t-distribution: When population standard deviation is unknown and the sample size is small
( n < 30 ) we use t - distribution for the sampling distribution of mean
where S12
1
n
n 1 i 1
xi
x
2
= sample variance
The variable t differs from z because we use sample standard deviation ( S1 ) in the calculation of t, where
we use standard deviation of population ( ) in calculation of z. There is a different t- distribution for every
possible sample size i.e., for different degrees of freedom. The degrees of freedom for a sample of size n is n
1. Larger the sample size, shape of the t- distribution approximately equal to the normal distribution. For
sample size n > 30, t - distribution is close to normal distribution & we can use normal to approximate the t-
distribution.
1
(D) Chi-square 2 Distribution: Variances of samples require to add a collection of squared quantities and
it relates to chi-square distribution .
Suppose we have a random sample x1 , x2 ,, xn from N ( ) population then the distribution of statistic
2 2
is Chi-square distribution with (n 1) degrees of freedom. Chi-square distribution is used in taking decisions
about the variance, testing independence of attributes and in testing goodness of fit.
(E) Snedecors F- distribution: Let X and Y be two independent sample statistics such that X and Y both
have Chi-square distribution with degrees of freedom d1 and d2 respectively. Then distribution of statistics
is Snedecors F- distribution with d1 and d2 degrees of freedom. Alternatively if we have two
n 1 1 y
n1 n2 2
1 S
F
2 2
xi x / y x
n1 1
j 2
i 1 j 1 2 S y
18,18 18 0 0
20,18 19 1 2
22,18 20 4 8
2
24,18 21 9 18
18,20 19 1 2
20,20 20 0 0
22,20 21 1 2
24,20 22 4 8
18,22 20 4 8
20,22 21 1 2
22,22 22 0 0
24,22 23 1 2
18,24 21 9 18
20,24 22 4 8
22,24 23 1 2
24,24 24 0 0
Average 21 2.5 5
We see that average of all sample mean is same as population mean 21.
The average of all values is not same as population variance while the average of all is same as
population variance 5. This property of an estimate is called unbiasedness.
An estimate say of parameter say is known as unbiased estimate of if average of all possible values of
(from all possible samples of given size) is same as .
(ii)Consistency: An estimator should approach the value of population parameter as the sample size becomes
larger and larger. This property is referred the property of consistency and it is most desirable.
(iii)Sufficiency: An estimator should use as much as possible the information available from the sample.
This property is property of sufficiency.
(iii)Efficiency: An estimator should have a small variance. The most efficient estimator among a group of
unbiased estimators is one which has smallest variance. This property is property of efficiency.
Interval Estimation
In interval estimation we obtain a desired level of confidence say 99% means probability that interval will
possess true unknown parameter is 0.99.
Let = unknown parameter
T = unbiased point estimate of i.e. E( T ) = .
Fix a desired level of confidence ( 1 )100% ( usually fixed at 95% or 99%). The value of ( 1 ) is
called confidence coefficient. For 95% confidence level, =0.05.
Confidence interval estimate of is [ Th, T+h] means
P( T h ) = 1 & h = critical value.
(i)Estimating population mean : When is known to get interval estimate of a parameter we require
unbiased point estimate of that parameter, std. error of the point estimate, confidence coefficient( to be fixed
by researcher) and critical value.
The unbiased point estimate of population mean is sample mean .The standard error of is
If the population is normal or sample size n is large then the distribution of is N(0,1) . Under
3
KNOWN 2
Ex 9.1 A sample of 11 circuits from a large normal population has a mean 2.2 ohms. We know from past
testing that the population std. deviation is 0.35 ohms. Determine a 95% confidence interval for the time
mean resistance of the population.
Solution.
0.35
Where critical value is given by t- distribution table for (n-1) deg. of freedom.
Ex. 9.2: A sample of 11 circuits from large normal population has a mean resistance of 2.2 ohms. Population
std. Deviation is unknown. Sample std. deviation =0.35 ohm. Find a 95% confidence interval for true
mean resistance of the population.
Soln: =0.35 , n=11
With n-1=10 degrees of freedom & =0.05 from t-distribution table, =2.228=critical value. But X
=2.228 X = 0.235
Required 95% confidence interval is
estimate but is not an unbiased estimate for estimating 2. Both are consistent estimators
of 2 .
For sufficiently large sample .
Case I. Small sample size (n<30): The distribution of statistic (n-1) is chi-square distribution with (n-1)
degree of freedom provided the parent population is normal. Using this distribution (1-) X 100% confidence
interval of 2 is given by
where and are critical values obtained using chi-square distribution with
(n-1) degree of freedom. These critical values can be taken from chi-square table.
Ex. 9.4 The cholesterol concentration in yolks of a sample of 18 randomly selected eggs laid by chickens
were found to have a mean 9.38 mg/g of yolk & std. deviation 1.62 mg/g. Find a 95% confidence interval of
the true variance of cholesterol concentration.
Solution: n=18, =1.62
95% confidence interval =1- 0.95=0.05 with n-1 = 17 degrees of freedom from chi-square table critical
value =27.587 for =0.05.
For 1-= 0.95 with n-1=17 degree of freedom from chi-square table =8.672
5
Hence 95% confidence interval of the true variance of the cholesterol concentration is
= ( ) = (1.61724, 5.144696)
Case-2: Large Sample (n30): By central limit theorem for large sample (n30) the distribution of
is N (0,1). Using this we can obtain (1-) X 100% confidence interval of as
( , ).
Here critical value is obtained by using N(0,1) distribution.
Ex. 9.5 A technologist collects 50 specimens of product from a new process & determines the percent of
water in each Sample mean=43.24% & sample std. deviation =7.93%. Compute 95% confidence interval
estimate of the true variance of the percentage of water for this new process.
Soln. ( , )
= ( , ) = (45.04714, 104.1106)
is 95% confidence interval estimate of the true variance.
Determination of sample size (confidence level approach): In a sample study there arises some sampling
error which can be controlled by taking a sample of adequate size. Researcher will have to specify the
precision that he wants in his estimates. For example if for sample mean desired precision 3 means 97 <
mean < 100 where actual sample mean is 100.
(a) Sample size decision while estimating mean
The confidence interval for the universe mean is given by
N= size of population
e = acceptable error or the precision or estimation error.
Ex. 9.6: Find the sample size for estimating the true weight of the cereal containers for the universe with
N=5000.
Variance of weight = 4 ounces from last records. Estimation should be within 0.8 ounces of the true average
weight with 99% probability.
6
Solution: N=5000, = 2 ounces, e= 0.8 ounces with 99% confidence level = 2.57 from normal table.
n= = = 40.95 41
Hence sample size n = 41 for the given precision & confidence level.
If population is infinite then = (z2/2). = = 41.28 41 .
n= = =187.78 188
7
Characteristic of Hypothesis
1. Hypothesis should be clear & precise, otherwise inferences drawn cannot be reliable.
2. It should be capable of being tested.
3. It should state relationship between variables.
4. It should be limited in scope & specific.
5. It should be stated in simple terms.
6. It should be consistent with most known facts.
7. It should be amenable to testing within a reasonable time.
Null hypothesis and alternative hypothesis
To compare methods A with method B about its superiority it we proceed on assumption that both methods
are equally good then this assumption is called null hypothesis. If method A is superior or inferior to B, then
it is called Alternate hypothesis. Suppose we want to test hypothesis that population mean () equals to
hypothesized mean =100
Null hypothesis: : = =100
Alternative hypothesis :> (population mean>100)
or :< (population mean <100)
Alternative hypothesis is usually one which one wishes to prove & null hypothesis is one which one wishes
to disprove.
Thus a null hypothesis represents the hypothesis is we are trying to reject and alternate hypothesis
represents all other possibilities.
Type I & type II errors: Type I error means rejecting true hypothesis. Type II error means accepting a false
hypothesis.
Decision Actual Situation
True False
Accept No Error Type II Error
Probability= 1- Probability =
If type I error is fixed at 5% it means that there are about 5 chances in 100 that we will reject when
is true. We can control type I error by fixing it at a lower level(say 1%). With a fixed sample size when we
try to reduce type I error, the probability of committing type II error increases. The decision makers decide
the appropriate level of type I error by examining the costs of penalties attached to both types of errors.
Level of significance () : It is the probability of type I error. The 5% level of significance means that
researcher is willing to take as much as a 5% risk of rejecting the null hypothesis when it ( ) happens to be
true. Thus the significance level is the maximum value of the probability of rejecting when it is true and
is usually determined in advance before testing the hypothesis.
8
or : against : >
(ii) := against :
or : against :
In Alternative Hypothesis
When we have sign we have two-tailed test.
When we have > sign, we have right-tailed test.
When we have < sign, we have left-tailed test.
Critical value & decision rule
Critical value divides the area under probability curve of the distribution into critical (or rejection) region &
acceptance region.
In case of two-tailed test when we test := against :
we have two critical values as the critical region is divided into two equal parts on the two ends of
probability curve. Size of each part =
(Two-tailed test)
9
2. Selecting a significance level: In practice 5% or 1% level of significance is adopted. The factors that
affect level of significance are sample size, magnitude of the difference between sample means,
variability of measurements within samples and whether the hypothesis is directional or not?
3. Test statistic : We obtain the value of test statistic using the sample observations selected by the
researcher & the hypothetical parametric value stated under null hypothesis.
4. Critical value: Using the distribution of test statistic, level of significance() & type of test we get
critical value.
5. Decision: Comparing the test statistic value & critical value we reject the null hypothesis when
(a) Value of test statistic > Critical value in a right tailed test
(b) Value of test statistic < Critical value in left tailed test
(c) Value of test statistic < Lower critical value or more than upper critical value in a two tailed test.
HYPOTHESIS TESTING FOR MEAN
Case-I: Population standard deviation is known
= hypothetical population mean
= population mean
In this case population should be normal or sample size should be normal or sample size should be large (n
30 in practice).
The test statistic is = & = sample mean for n= sample size.
Ex.10.1 Average salary = $74914 (ten years ago). An researcher like to test whether this average has
increased or not? A sample of 112 produced a mean salary of $78695. Given = $14530.
Solution: Here we like to test : 74914 against : > 74914 (right tailed test).
But = 78695. Let level of significance = 5%.
Researcher need not to fix type-II error.
He should try to have an optimum size of type-I error so that size of type-II error is very small. As population
standard deviation = $14530 is known & sample size n= 112 which is large enough, the test statistic is =
= = 2.7539
The distribution of this test statistic is students t-distribution with (n-1) degrees of freedom. Critical value
can be obtained from t-distribution table.
Ex.10.2 A sample of 25 people is taken. The length of time to prepare dinner is recorded in minutes as
given below.
10
44,51.9,49.7,40,55.5,33,43.4,41.3,45.2,40.7,41.1,49.1,30.9,45.2,55.3, 52.1,
55.1,38.8,43.1,39.2,58.6,49.8,43.2,47.9,46.6
Is there any evidence that the population mean time to prepare dinner< 48 minutes? Use a level of
significance of 0.05.
Solution: We have to test : 48 against : < 48 (left tailed test).
Population standard deviation is unknown. From the given data n=25, =45.628 (sample mean
calculated). Sample std. deviation =6.9587(calculated). Test statistic = = 1.70435.
The test statistic follows t-distribution with 1 = 24 deg. of freedom. For a left tailed test & for level of
significance 0.05 from distribution table critical value is 1.711.
Since calculated value of test statistic= -1.70435 > - 1.711 (critical value), we cant reject the null hypothesis at
given level of significance 0.05.(i.e. hypothesis is accepted).
Note: In case of finite population(when n/N > 0.05) we should use finite population correction (fpc)=
In such a case we multiply the denominator of both of the test statistic given by & by fpc=
Where P= sample proportion. The distribution of the test statistics is N (0,1) for sufficiently large sample. In
practice n30 or n0 5.
Ex 10.3 A marketing company claims that it receives 8% responses from its mailing. It seems that the
company is giving an exaggerated picture & the said % is less. To test this claim a random sample of 500 was
surveyed with 30 responses. Test companys claim at significance level = 0.05 against that the company is
given exaggerated picture.
Solution: We have to test H0 : = 0.08 against H1 : < 0.08 (Left Tailed Test).
Given sample proportion p=30/500=0.06 & Sample size n=500 (large enough).
The distribution of test statistics is N(0,1). From normal table with 0.05 level of significance we get critical
value = -1.645.
At 5 % level of significance, Computed value of test statistics < critical value.
So we reject the null hypothesis at 5% of significance.
Note: If we change level of significance to 2.5% then critical value will be -1.96 & the null hypothesis will be
accepted. So, the researcher needs to decide level of significance carefully. Acceptance of null hypothesis is
caused by unavailability of any statistical evidence against the null hypothesis provided by the selected
sample.
11
HYPOTHESIS TESTING FOR VARIANCE
We wish to test any of the following
(i) : = , H1 : (two tailed test)
(ii) : = , H1 : (right tailed test)
(iii) : = , H1 : (left tailed test)
The distribution of test statistic is chi-square with n-1=29 degree of freedom. It is a left tailed test with 5%
level of significance. Using chi-square table critical value is 17.708. Since, Computed value of test statistic <
critical value
we reject the null hypothesis at 5% level of significance & conclude that the new machine produces a std.
Deviation < 1.6ml.
Note: Table 3 has critical values for chi-square distribution with d.f.1 to 30.
For larger samples N (0,1) distribution can be used. After obtaining value of test statistic 2 apply the
formula Z = 22 2 1. The computed value of Z can be compared with the critical values
obtained using N(0,1).
HYPOTHESIS TESTING FOR DIFFERENCE OF TWO MEANS:
Here we have two samples, may be drawn from different populations & based on these two samples we
compare the population means.
We have samples x1, x2,............,xn drawn from a population with population mean x and population
variance 2 . Another sample y1,y2,........,ym is drawn from a population with population mean y and
population variance 2 .
We wish to test any of the following:
(i) H0: x= y against H1: x y
(ii) H0: x= y against H1: x > y
or H0: x y against H1: x > y
(iii) H0: x= y against H1: x < y
or H0: x y against H1: x < y
12
Case I: Both samples are independent of each other & both population variances 2 & 2 are known:
Let and be sample means. The parameter of interest is xy . A feasible estimate of this unknown
parameter is .
TE(T)
From central limit theorem for sufficiently large sample, distribution of S.E.(T)
is N(0,1) where T is sample
statistic S.E.(T) is std. error of T.
E(T) is expected value or mean of T.
2 2
Using sampling distribution of mean we have E( - )= x.y and S.E.( - )= +
. Since value of
parameter x-y under null hypothesis is zero, the test statistic is = .
2 2
+
When both populations are normal or both sample sizes are large enough the distribution of this test
statistic is N(0,1).
Ex.10.5 In a survey of buying habits, in a super market A, 400 women shoppers are chosen at random. Their
average weekly food expenditure is Rs. 250 with std. deviation Rs. 40. In another super market B 400
women shoppers chosen at random, the avg. weekly food expenditure is Rs. 220 with std. deviation Rs. 55.
Do these two populations have similar shopping habits? Is the average weekly food expenditure of two
populations of shoppers equal? Test at 5% level of significance.
Solution: We wish to test H0: x= y against H1: x y .
Both sample sizes are very large. So sample variance & population variance are almost same. So we consider
as population variances be known
= 250, = 220, x =40, y = 55, m= 400, n= 400 .
250220
Test statistic, = = = 8.822575 .
4040 5555
2 2 +
+ 400 400
For a two tailed test at 5% significance level critical values are 1.96 and 1.96 using N(0,1) distribution.
As computed value of test statistic > upper critical value 1.96 we reject the null hypothesis at 5% level of
significance & conclude that two populations have different shopping habits.
Case II 2 & 2 are unknown and both samples are independent of each other:
We use students t-distribution which requires two conditions
(i) Both populations have normal distribution
(ii) and 2 = 2 (both population variances are equal)
Unknown variances are estimates by the pooled sample variance given by
1
2 = +2 [=1( )2 + =1( )2 ] . The test statistic is T = 1 1
.
+
Distribution of Tc is student-t-distribution with deg of freedom m+n-2 .
Ex.10.6 You are a financial analyst for a brokerage firm. Is there a difference in divided yield between stocks
listed on NYSE & NASDAQ ?
NYSE NASDAQ
No. Of stocks 21 25
Sample mean 3.27 2.53
Sample std. Dev. 1.3 1.16
13
Assuming both populations are approximately normal with equal variances, is there a difference in average
yield ( = 0.05) ?
Solution: We wish to test H0: x= y against H1: x y
1
Pooled variance: 2 = 21+252 [20 1.32 + 24 1.162 ] = 1.5021 .
Test statistic Tc = = = 2.04 .
1 1
+
Test statistic has t-distribution with 21+25-2=44 degree of freedom.
Critical value of t-distribution can be approximated using N(0,1) distribution(for more than 30 deg. of
freedom).
The critical values for this two tailed test at 5% level of significance are -1.96 and 1.96. Computed value of
test statistic = 2.04 which is larger than upper critical value 1.96 . So we reject the null hypothesis at 5%
level of significance.
Case III : Both Samples Are Related
Here both samples are not independent but related. For example height of a group of 15 children before &
after they are fed a certain nutritional food product for two months make two samples which are related &
observations are paired.
We find differences di=xi-di for i=1,2, ... ,n. Hence mean =(1/n)( i ) & sample std. deviation Sd
1 2
=1 =1( ) .
The test statistic is Tc = . This Tc has t- distribution with n-1 deg. of freedom.
/
Ex.10.7: You send your sales people to a customer service training workshop. Has the training made a
difference in number of complaints? You collect the following data
Sales person Number of complaints
Before After
A 6 4
B 20 6
C 3 2
D 0 0
E 4 0
Conduct the test assuming normal distribution at 1% level of significance.
Solution: We wish to test H0: x = y against H1: x y
In this paired data we first find the differences
A B C D E
di -2 -14 -1 0 -4
1
= (-2-14-1+0-4)= -4.2 &
5
1 2 2 2 2 2
Sd= =1( ) =1 [((-2)(-4.2)) + ((-14)(-4.2)) +((-1)(-4.2)) +0 + ((-4)(-4.2)) ] = 5.67
1 4
Test statistic Tc = = = -1.66 .
/
Using t-distribution with 4 deg of freedom at 1% level of significance, critical values for this two tailed test
are -4.604 and 4.604.
Since computed value -1.66 is smaller than 4.604 and large than -4.604 we accept the null hypothesis.
i.e. -4.604(lower critical value) < -1.66 < 4.604(upper critical value).
14
HYPOTHESIS TESTING FOR DIFFERENCE OF TWO PROPORTIONS
Test statistic is ,where , & are sample sizes.
Zc has N(0,1) distribution provided both sample sizes are large enough( & >30).
Ex.10.8 random sampling of 400 men and 600 women were asked whether they would like to have a
flyover near their residence, 200 men & 325 women were in favour of the proposal. Is the opinion of men &
women the same, in general, on this issue?
Solution: we wish to test, H0 : 1 = 2 against H1 : 1 2
= = 0.525
For this two tailed test at 5% level of significance, using N(0,1) distribution the critical value are -1.96 and
1.96.
-1.96 (lower critical value) < computed value = -1.2936 < 1.96 (upper critical value).
We accept the null hypothesis. Thus we cannot say that men and women are different in opinion.
HYPOTHESIS TESTING FOR DIFFERENCE OF TWO VARIANCES
To compare two population variances using sample variance, we use Snedecors F-distribution.
When both populations are normal, the distribution of Fc is Snedecors F-distribution with (n-1, m-1) degree
of freedom & n,m are sample sizes.
Ex. 10.9 NYSE NASDAQ
No. of stocks 21 25
Sample mean 3.27 2.53
Sample std. dev 1.3 1.16
Is there a difference in the variances (or risks) between the NYSE and NASDAQ?
Solution: We wish to test at 5% significance level.
H0 : against :
15
CORRELATION
Correlation coefficient,
Coefficient of correlation is not affected by change in scale or by change in location. It can be used to
compare the relationships between two pairs of variables. It is a unit free measure of relationship between
two variables and takes values [-1,1]. When r is close to 1 there is strong positive relationship. When r is
close to -1 there is strong negative relationship. Like covariance it also measures linear relationship.
Correlation coeff. Is zero when covariance is zero.
Solution: n=4, = 5, = 20
1
x
2
= ( )2 = [(1-5)2+(3-5)2+(5-5)2+(11-5)2]
1 =1
2
x =18.66 x= 4.32
2
y = )2 = [(10-20)2+(15-20)2+(25-20)2+(30-20)2]
2
y =83.33 y= 9.128
= = 0.92977
REGRESSION
After finding correlation between two variables we like to determine a mathematical relationship between
them so that we can
(i) Predict value of a variable based on the value of the other variable
(ii) Explain the impact of changes in the values of variables on the values of other variable.
We regress the dependent variable on the independent variable. The dependent variable is called regressed
(or study) variable, the independent variable is called regressor variable. For example rainfall is independent
variable which affects the yield of a certain crop. Yield becomes dependent variable.
A linear regression model between a single study variable & single explanatory variable is called simple
linear regression model. The simplest relationship between X and Y is a linear relationship given by Y= a+bX
The exact relationship between X and Y is Y= a+bX+ error.
This error is difference between the observed value & the predictable value of Y. Using collected
observations (x1, y1) (x2, y2) .......(xn,yn), these errors or residuals can be written as (yi-a-bxi) for i= 1,2,.....,n.
We want to have such values of a and b for which these errors are minimum.
16
In least squares method we minimize the sum of squared residuals (or error). For this we differentiate
w.r.to and separately & equate the derivatives to zero. Solving those
equations we get following estimates of and
= - and = = = .
Ex.14.1 A sample of fifteen 10 year old children was taken. The no. of pounds each child was overweight
was recorded. The no. of hours of T.V viewing per weeks was recorded. These data are listed below. Fit the
regression line & describe what the coefficients tell about relation between the two variables.
TV 42 34 25 35 37 38 31 33 19 29 38 28 29 36 18
Over weight 18 6 0 -1 13 14 7 7 -9 8 8 5 3 14 -7
Solution: We prepare the following table
TV (xi) Over weight ( )
42 18 10.5333 12.2667 110.9511 129.2089
34 6 2.5333 0.2667 6.4178 0.6756
25 0 -6.4667 -5.7333 41.8178 37.0756
35 -1 3.5333 -6.7333 12.4844 -23.7911
37 13 5.5333 7.2667 30.6178 40.2089
38 14 6.5333 8.2667 42.6844 54.0089
31 7 -0.4667 1.2667 0.2178 -0.5911
33 7 1.5333 1.2667 2.3511 1.9422
19 -9 -12.4667 -14.7333 155.4178 183.6756
29 8 -2.4667 2.2667 6.0844 -5.5911
38 8 6.5333 2.2667 42.6844 14.8089
28 5 -3.4667 -0.7333 12.0178 2.5422
29 3 -2.4667 -2.7333 6.0844 6.7422
36 14 4.5333 8.2667 20.5511 37.4756
18 -7 -13.4667 -12.7333 181.3511 171.4756
17
(b) Formal experimental designs:
i. Completely randomized design (C.R. Design)
ii. Randomized block design (R.B. Design)
iii. Latin square design (L.S. Design)
iv. Factorial Designs
(a) (i). Before and after without control design
A single test group or area is selected & the dependent variable is measured before the
introduction of treatment. The dependent variable is measured again after treatment has been
introduced.
The effect of treatment =(Y) (X)
Y= level of phenomenon after the treatment.
X= level of phenomenon before the treatment.
The difficulty of such a design is that with the passage of time considerable extraneous variations
may be there in its treatment effect.
(ii). After- only with control design: In this design two groups or areas (test area & control area) are
selected & treatment is introduced into test area only. The dependent variable is then measured in
both the areas at the same time.
Treatment impact is assessed by subtracting the value of the dependent variable in the control area
from its value in the test area. We assume that the two areas are identical w.r. to their behavior
towards the phenomenon considered.
(iii). Before and- After with control design
In this design two areas are selected & the dependent variable is measured in both areas for an
identical time period before the treatment. The treatment is then introduced into test area only &
dependent variable is measured in both for an identical time-period after introduction of the
treatment. The treatment effect is determined by subtracting change in dependent variables in the
control area from change in the dependent variable in the test area.
It is superior to above two designs because it avoids extraneous variation resulting both from the
passage of time and from non-comparability of the test & control areas.
COMPLETELY RANDOMIZED DESIGN
The essential characteristics of the design is that subjects are randomly assigned to experimental
treatments (or vice-versa). One way ANOVA is used to analyze such a design. Its provides
maximum no. of degree of freedom to the error. There are two forms
(i) Two-group Simple randomized experimental design:
First all the population is defined & then from the population a sample is selected randomly.
Items be randomly assigned to the experimental & control groups. It becomes possible to draw
conclusions on the basis of samples applicable for the population.
This design of expt. is common in research studies of behavioral sciences. The merit of such a
design is that it is simple & randomizes the differences among the sample items.
Limitation is that individual differences among those conducting the treatments are not eliminated.
18
(ii) RANDOM REPLICATION DESIGN
From the diagram clearly there are two populations in the replication design. The sample is taken randomly
from the population & is randomly assigned 4 expt. &
4 control groups. Similarly sample is taken randomly from the population available to conduct experiments
& eight individuals so selected should be randomly assigned to the eight groups.
Variables relating to both population characteristics are assumed to be randomly distributed among the two
groups.
RANDOMLY BLOCK DESIGN (R.B. DESIGN):
In R.B. design principle of local control can be applied along with other two principles of exptl. design. The
subjects are first divided into groups known as blocks, such that within each group subjects are relatively
homogeneous in respect to some selected variable. The no. of subjects in a given block would be equal to
19
no. of treatments & one subject in each block would be randomly assigned to each treatment. Main feature
of R.B. design is that in this treatment appears same no. of times in each bock.
For example suppose 4 different forms of a test were given to each of five students and following score they
get.
Very low IQ Low IQ student B Avg. IQ student C High IQ student Very High IQ student E
student A D
Form 1 82 67 57 71 73
Form 2 90 68 54 70 81
Form 3 86 73 51 69 84
Form 4 93 77 60 65 71
If each student separately randomized the order in which he took the 4 tests (using random no. or else) we
refer to R.B. design. The purpose of this randomization is to take care of such possible extraneous factors or
experience from repeated test.
LATIN SQUARE DESIGN (L.S. DESIGN)
This is used in agricultural research. L.S. design is used when there are two major extraneous factors such
as varying seeds and varying soil fertility. The treatments in L.S. design are so allocated among the plots
that no treatment occurs more than once in any one row or any one column.
FERTILITY LEVEL
I II III IV V
X1 A B C D E
X2 B C D E A
Seeds Differences X3 C D E A B
X4 D E A B C
X5 E A B C D
The field is divided into as many blocks as there are varieties of fertilizers, then each block is divided into as
many parts as there are varieties of fertilizers; such that each fertilizer variety is used in each of the block
only once.
Merit of this design is that it enables differences in fertility gradients in the field to be eliminated in
comparison to the effects of different varieties of fertilizers on the yield of crop.
In L.S. design we must assume that there is no interaction between treatments & blocking factors.
Limitation is that it requires equal rows, columns & treatments which reduces utility of this design. If
treatments are 10 rows & columns may not be homogeneous (as size of rows & columns are large)
FACTORIAL DESIGN
It is used in expt. where effects of varying more than one factor are to be determined. Two types, simple
factorial design & complex factorial designs.
(i). Simple Factorial Design: We consider effects of varying two factors on the dependent variable in
simple factorial design. When an expt. is done with more than 2 factors, complex factorial design is used.
In this design the extraneous variable to be controlled by homogeneity is called control variable. The
independent variable which is manipulated is called exptl. variable. There are two treatments of exptl.
20
variable & two levels of control variable. There are 4 cells into which sample is divided. Each of the four
combinations would provide one treatment or experimental condition. Subjects are assigned at random to
each treatment as in randomized group design.
Means of different cells represent the mean scores for different variable & column means are main effect
for treatments neglecting deferential effect due to control variable. Row means are main effects for levels
without regard to treatment.
Control Variable Experimental Variable
The means for columns provide the researcher with an estimate of main effects for treatments and
means for rows provide an estimate of main effects for the levels. Such a design enables the
researcher to determine interaction between treatments & levels.
(ii). Complex Factorial Designs: A design which considers 3 or more independent variables
simultaneously is called complex factorial design.
For 3 factors with one expt. variable having 2 treatments & 2 control variables, each having two
levels, the design used is called 2x2x2 complex factorial design.
Experimental Variable
Treatment A Treatment B
Control Control Control Control
Variable 2 Variable 2 Variable 2 Variable 2
Level I Level II Level I Level II
Control Level I Cell 1 Cell 3 Cell 5 Cell 7
Variable
1 Level II Cell 2 Cell 4 Cell 6 Cell 8
To determine main effects of experimental variable, the researcher must compare combined mean
of data in cells 1,2,3, & 4 for treatment A with combined mean of data in cell 5,6,7,8 for treatment
B. In this way main effect of exptl. variable independent of control variable 1 and variable 2 is
obtained. Similarly main effect of control variable 1 independent of experimental variable and
control variable 2 is obtained if we compare combined mean of data in cells 1,3,5 and 7 with
combined mean of data in cells 2,4,6 and 8 of 2x2x2 factorial design.
To get first order interaction say EVxCV1 (exptl. variable with control variable) in this design the
researcher must ignore control variable 2 for which purpose he may develop 2x2 design from the
2x2x2 design by combining data of relevant cell of the latter design.
Complex factorial design can be generalized to any number and combination of exptl. & control
independent variables. Advantage of factorial design is to provide equivalent accuracy with less
labour, economic and permit various other comparisons of interest.
-------*******-------
21