Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Biometry Note

Download as pdf or txt
Download as pdf or txt
You are on page 1of 89

Biometry Handout

1. CHAPTER ONE: INTRODUCTION ABOUT AN EXPERIMENT/ RESEARCH


There are several definitions of the term experiment. It can be defined as a planned inquiry to
obtain new facts or to confirm or deny the results of the previous facts or findings. An
experiment will always start with an idea, which has to be formulated into objectives.
Objectives are formulated into hypothesis to allow statistical analysis that will support any
conclusion drawn.
Experiment is an important tool for research and should have the following important
characteristics: simplicity, measuring differences at higher precision, absence of systematic
error, wider ranges of conclusion and calculation of degree of uncertainty. The results of
experiments are used for decision making like giving recommendations to use a new
product/technology, provided that the new product is economically more useful than the old
version. The new technology (research result) could vary depending on the field of study. In
agriculture, it could be new input (variety, fertilizer rate/type, pesticide, herbicide, media, feed,
vaccination, tree, etc.) agronomic practices and post harvest management. In education, it could
be teaching method, evaluation technique and protocol synthesis, optimization of temperatures,
evaluation of the effectiveness of some medication so on.
The core concept underlying all research is its methodology. The research tools are key
elements of the methodology that takes to the ultimate goal of research to discover something,
which was unknown. The physical tools some researchers use to achieve their goals are
distinctly different from those of others, depending on the discipline. However, the general
tools of research that most researchers use to derive meaningful conclusions are similar. One of
the very few important tools of research, which is common to almost all kinds of research, is
statistics.
Statistics play a vital role in virtually all professions. Some familiarity is an essential
component of any higher learning institutions and research organizations. We understand that
all students and scientific personnel have a good scientific background in their respective
disciplines but their statistical knowledge ranges from very little to long years of practical
experience. Most researchers had basic statistical courses but they usually face difficulty in
applying to their own practical situations specially in designing sophisticated research work.
Hence, this module has been prepared to provide researchers with the powerful ideas of modern
statistical methods.
Statistical concepts have also been explained as an essential component during the planning
stage of an experiment when decisions must be made as to the mode and extent of the sampling
Biometry Handout 2022
process. This is done because a good design for the process of data collection permits efficient
inferences to be made, with straightforward analysis.
This module provides basic principles and advanced procedures on the relative importance of
components of variations, the methods to estimate, and means to control them, in relation to
specific experimental designs. The principles and statistical methodologies illustrated in this
module, provides the type and amount of data to be collected, the method of organizing data,
data analysis and proper interpretation of the results obtained. It also provides means to draw
conclusions assess the strength of the conclusions.
Dear Students, what is statistics?
A statistical expression that says, ‘we calculate statistics from statistics by statistics’ is a good
example to understand the different uses of statistics. It is necessary to know the context within
which these three uses of the term are operating. ‘We calculate statistics…’ what it means here
is the results of computations: the means, variances, graphs, tables, correlation coefficients,
regression lines and so on. ‘…from statistics…’ the term statistics here refers to the raw data to
be analyzed. ‘…by statistics.’ The last use of the term is the process or method that we use to
get the desired results and that is the focus of this module. Therefore, Biometry is the
application of statistical methods to the solution of biological problems. Biometry: the
principles and practice of statistics in biological research. Or is a statistics applied to biological
problems. Hence it is also called biostatistics.
1.1. Planning of experiments
Planning is the most important aspect of any experiment. The thing that separates competent
investigator from incompetent ones, in terms of statistical skill, is nothing, but careful planning.
The idea of design should come at the planning phase of the experiment. Upon thinking to start
the statistical part of the project, the part in statistics is really coming to an end. If the planning
has been done carefully, forming a clear idea of what is to be investigated, following the layout
of appropriate design and conducting the experiment accordingly, the analysis and
interpretation will be easier.
 Idea Formulation
Experiments are based on some idea that is born in our mind from different sources. These
ideas need to be formulated in a form of general and or specific objective or a series of specific
objectives. For example the idea may be about the low production and productivity of tomato
crop from the local varieties available in Gondar area.
 Defining objectives
2
Biometry Handout 2022
Research objective should be simple, measurable, achievable, realistic, and time bounded (the
so called “SMART” objective (s))
Example: ‘improve our ability to predict nitrogen fertilizer requirements’
How do we measure “ability”? This is immeasurable objective.
 Keep objectives short, simple and avoid unnecessary jargon
 Do not let objectives stray from the original concept/idea
 Defining the population
In any experiment addressing each and every individuals of a population is difficult if not
absolutely impossible. Thus data for analysis and interpretation or sampling the entire
population is unthinkable. Rather sampling is done on part of the population assumed to
represent the entire population called sample. This new group of individuals that will make up
the population of interest need to be clearly defined before embarking on the experiment
Example: Milk yield experiment to consider the benefits of three feed rations for Jersey cattle.
This experiment doesn’t consider the benefits to other breeds except Jersey, it should include
all ages of lactating cows and adequate number of cows should be taken for data collection.
 Formulating the hypothesis
Hypothesis is a tentative explanation for a phenomenon. It is used as a basis for further
investigation. It is formulated based on known facts yet to be proved or disproved. It is the
restatement of objective in experiment (true or not) in statistical term. It is an assumption to be
accepted or rejected. There are two types of statistical hypothesis for each situation: the null
hypothesis and the alternate hypothesis.
The Null hypothesis (denoted by H0) is defined as the statistic of interest takes a particular
value, which is always along the line ‘there is no difference between treatments’ or ‘there is no
relationship between the measurements’. To test whether the null hypothesis is true or not
depends on the probability level (ρ value). The ρ value given by the test tells us the probability
of getting such a result if the null hypothesis is true. A high ρ value indicates that the
hypothesis could be easily being true and as a result, we should not conclude that there is a
difference. A low ρ value indicates that there is a difference.
The Alternative hypothesis (denoted by H1 or Ha) is defined as value of the statistic of interest
deviates from the value specified in the null hypothesis; at least one comparison is different
from the others. It usually takes the form like ‘there is a difference between treatments’. The
two hypotheses cover all eventualities. Every objective has one null and one alternative
hypothesis.
3
Biometry Handout 2022
A statistical test uses the data obtained from a sample to make a decision about whether or not
the null hypothesis should be rejected. The numerical value obtained from a statistical test is
called test value.
Example: Consider the hypothesis “there is no difference in the mean yield of sorghum hybrids
A, B, and C”. In this hypothesis the variable is the yield; comparison is between sorghum
hybrids; statistics used is mean.
 Testing the hypothesis
Hypothesis needs to be tested to accept or reject it by gathering evidences in favor or against it.
How? Let us say that new varieties of tomato give the same yield as that of the existing variety
(A = B). This is a hypothesis. It can be tested easily. Such hypothesis is called a null
hypothesis. At the end of the experiment period if H 0 is accepted then H1 is rejected or
otherwise.
Ho is a working hypothesis and called null hypothesis because it nullifies the original
hypothesis, H1. H0 says the effect of all treatments is always the same. It is the only hypothesis
to be tested by researcher. The statistical procedure by which we decide to accept or reject a
null hypothesis is called testing of hypothesis.
In the process of deciding to accept Ho or reject it, the researcher may make error called
decision error. Decision errors is rejecting true Ho or accepting wrong Ho. There are two types
of decision errors. They are Type I and Type II decision error.
Type I error occurs if one rejects the null hypothesis when it is true (and accepting H 1 when it
is false)
Type II error occurs if one does not reject the null hypothesis when it is false (and rejecting H 1
when it is true).
Power of a test: - is probability of accepting H1, when it is true.
Power = 1- pr (type II error). Therefore, the objective of statistical tests should be to
try to maximize power.
The level of significance is the maximum probability of committing type I error. This
probability is symbolized by α (Greek letter alpha). That is, P (type I error) = α
Choice of Significance Levels
Basically one can calculate an exact probability level at which a particular differences is
significant, but in this way there could be hundred significance levels and it may not be
possible to standardize results. Therefore, statisticians and subject matter experimenters agreed

4
Biometry Handout 2022
to set threshold value of probability below which significance can be declared. This value was
chosen to be 5% but, there are numerous probability points below or above 5%.
The critical value (separates the critical region from the non-critical region.
The critical or rejection region is the range of values of the test value that indicates that there
is a significant difference and that the null hypothesis should be rejected.
The non-critical or non-rejection region is the range of values of the test value that indicates
that the difference was probably due to chance and that the null hypothesis should not be
rejected.
A one- tailed test indicates that the null hypothesis should be rejected when the test value is in
the critical region on one side of the mean. It is said to be one tailed or left-tailed, depending on
the direction of the inequality of the alternative hypothesis.
In a two-tailed test, the null hypothesis should be rejected when the test value is in either of the
two critical regions.
1.2. Execution of the experiment
During execution of an experiment it is very important to use procedures that are free from
personal biases. It is also important to collect accurate and precise data so that differences
among treatments with the order of collection can be removed from experimental error.
Experimental design: Is a method of arranging test materials into experimental unit and
setting procedures on how to made measurement on specified character(s) of the test material.
OR an experimental design is a plan for assigning experimental units to treatment levels and
the statistical analysis associated with the plan. The design of an experiment involves a number
of inter-related activities.
1. Formulation of statistical hypotheses that is relevant to the scientific hypothesis. A statistical
hypothesis is a statement about: (a) one or more parameters of a population or (b) the functional
form of a population. Statistical hypotheses are rarely identical to scientific hypotheses—they
are testable formulations of scientific hypotheses.
2. Determination of the treatment levels (independent variable) to be manipulated, the
measurement to be recorded (dependent variable), and the extraneous conditions that must be
controlled.
3. Specification of the number of experimental units required and the population from which
they will be sampled.
4. Specification of the randomization procedure for assigning the experimental units to the
treatment levels.
5
Biometry Handout 2022

In summary, an experimental design identifies the independent, dependent, and nuisance


variables and indicates the way in which the randomization and statistical aspects of an
experiment are to be carried out. The primary goal of an experimental design is to
establish a causal connection between the independent and dependent variables. A
secondary goal is to extract the maximum amount of information with the minimum
expenditure of resources.
Some important terms
Experiment- is the systematic and planned enquiry to find out a new discovery or to confirm
or deny the existing results of experiments.
Design of experiments is the choice of treatments, assigning treatments to experimental units
and arrangement of experimental units.
Factorial experiment - is an experiment in which the treatments consist of all possible
combinations of the selected levels of two or more factors.
Experimental unit - is the material for which the treatment is applied & measurements are
taken (may be field plots, plants, leaf, insects, animals, Petridis, pots, DNA kits etc).
Treatments - are any specific experimental conditions applied to the experimental units
(Specific values, levels) it can be standard ration, inoculation, dose and a spraying rate/spraying
schedule. Or it is the things that are being compared by applying them to the experimental
units.
Factor - is a particular treatment; it can be fertilizers, varieties, insecticides, and fungicides,
vaccines, hormone etc.
Why biometry and experimental design?
 To understand the design of experiments and types of the experiment
 To understand the use of experiment randomization and replication
 To develop methods for collection and analysis of data both for ensurance of conclusions
reliability and maximized output.
 To choose an appropriate design and analysis methods for different researches. Example
methods used in animal research will be different of that used in plant research.
 To understand the relationship of different factors in an experiment and maintain
uniformity of all environmental factors which are not a part of the treatments being
evaluated.
Hence at the end of this course, your statistical background must enable you so that you can
correctly choose the statistical technique most appropriate for most biological experiments.
6
Biometry Handout 2022
Biological research must be designed precisely to solve identified problems which are
expressed as a statement of hypothesis.
 Hypothesis has to be proved or disproved through experimentation.
 Hypotheses are suggested by past experience, observations and at time by theoretical
consideration.
Example of hypothesises
 New hybrids are not widely grown because they are susceptible to downy mildew compared
to native varieties.
 Rice crop removes more nitrogen from soil than is naturally replenished.
After hypothesis is framed, the next step is to design an experimental procedure for its
verification, which consists of four phases:
1. Selecting an appropriate materials to test
2. Specifying the characters to measure
3. Selecting procedures to measure those characters
4. Specifying the statistical procedure to prove the hypothesis.
Basic Principles of Experimental Design
The main purpose of designing an experiment is to:
 Increase precision
 Reduce experimental error
Increase in precision and reduction of experimental error achieved by three
essential components of experiment design.
1. Estimate of error
2. Control of error
3. Proper interpretation of results
1. Estimate of error
Every experiment must be designed to have a measure of the experimental error. The error is
the primary basis for deciding whether an observed difference is real or due to chance. The
difference among experimental plots treated alike is called experimental error.
Example1: Plant breeder wanted to compare the yield of a new maize variety A with a
standard variety B. he/she lays out two plots of equal size, side by side and sows one to
variety A and the other to variety B. after grain harvest from each plot, he/she then judged the
one with higher yield as better variety. What is his/her fault?

7
Biometry Handout 2022
1. He/she assumed that any difference between the yields of the two plots is caused by the
varieties and nothing else. This certainly is not true. Even if the same variety were planted
on both plots, the yield would differ. Other factors such as soil fertility, moisture and
damage by insects, diseases and birds also affect maize yields.
2. As yields affected by other factors, a satisfactory evaluation of the two varieties must
involve a procedure that can separate varietal difference from other sources of variation. I.e.
the plant breeder should design an experiment that allows him to decide whether the
difference observed is caused by varietal difference or by other factors.
Then what he/she needs to do?
Two maize varieties planted in two adjacent plots will be considered different in their yielding
ability only if the observed yield difference is larger than that expected if both plots were to the
same variety. Hence he/she needs to know not only the yield difference between plots planted
to different varieties but also the yield difference between plots planted to the same variety.
Plot 1 Plot 2

Case 1
A B

Plot 1 Plot 2

Case 2 A A

If yield of A varies in plot 1&2, yield difference b/n A and B in case1 is not only varietal.
Measure of experimental error
Design of an experiment should have a means of error measurement. It is measured by:
A) Replication
It is the appearance of the same treatment more than once in plots treated alike.
Replication is needed:
 To determine the difference among plots treated alike.
 To measure experimental error

Experimental error can be measured only if there are at least two plots planted to the same variety
(or receiving the same treatment).

8
Biometry Handout 2022
Due to:
 time
 money and usually 3 to 4 replications are recommended
 other resources
Hence at least two plots received the same treatment are needed to determine experimental
error.
R1 R1 R2 R2 R3 R3

A B A B A B

Fertility gradient
Both varieties A and B in example 1 above replicated three times.

B) Randomization: is technical assigning of a treatment to experimental plots so that a particular


treatment is not consistently favoured or handicapped. The treatments are first numbered from
1 to k in any order. The units in each block are also numbered conveniently from 1 to k. The
treatments are then allotted at random to the k units in each block. Random allocation can be
made either by consulting a random number or other methods.
Randomization:
 Increases the probability of distribution any variation such as soil fertility gradient among
the treatments
 Ensures that each treatment will have an equal chance to being assigned to any
experimental plot or experimental site.
This is more involved in getting a measure of experimental error than simply planting several plots to
the same variety.
 For example if the area has a unidirectional fertility gradient so that there is a gradual
reduction of productivity from left to right, variety B would then be handicapped because it is
always on the right side of variety A and always in relatively less fertile area. Thus, the
comparison between the yield performance of variety A and B would be biased in favour of A.
A part of the yield difference b/n the two varieties would be due to the difference in the fertility
levels and not to the varietal difference.

9
Biometry Handout 2022

2. Control of error
The ability to detect differences among treatments increases if the size of the experimental error
decreases. Hence a good experimenter should minimize the experimental error.
There are three commonly used techniques for minimizing experimental error in
agricultural research:
A. Blocking B. Proper plot technique C. Data analysis
A. Blocking
The concept of blocking in experiments is the equivalent of stratifying in surveys. It is a
mechanism in which all similar experimental units are put together in the same group and then
assigning all treatments into each block separately and independently. Thus variation among
blocks can be measured and removed from the experimental error.
Blocking:
 Reduces an experimental error where substantial variation within experimental field can be
expected.
 Reduces experimental error by eliminating the contribution of known sources of variation
among experimental units.
Considerations needed during blocking
a) Variability within each block should be minimized and that between blocks should be
maximized. Only variations within blocks become parts of the experimental error.
b) Experimental area should have a predictable pattern of variability as blocking is ineffective
for multidirectional variability. Hence knowing the source of variability is the basis for
blocking. An ideal source variation to use as a basis for blocking is the one that is large and
highly predicable. Examples of such variations are:
 Soil heterogeneity, in a fertilizer or variety trial where yield data is the primary character of
interest
 Direction of insect migration, in an insecticide trial where insect infestation is the primary
character of interest
 Shape of the field, in study of plant response to water stress
c) Block size, shape and orientation must be selected to maximize variability among blocks.
Experimental plots within the same block must be kept uniform.
 If gradient in variability is unidirectional, long and narrow blocking is preferred. The blocks
should be oriented length wise perpendicular to the direction of the gradient

10
Biometry Handout 2022
 If fertility gradient occurs in two directions with one gradient much stronger than the second
gradient, ignore the weaker gradient and orient your block against the strong gradient
 If the variability gradient occurs in two directions of equal strength and perpendicular to each
other, one of these alternatives can be chosen
 Square blocks
 Long and narrow blocks with their length perpendicular to the direction of one of the gradient
and use the covariance technique to take care of the other gradient
 Latin square design with two-way blockings, one for each gradient
d) If variability pattern is not predictable, square blocks should be used

N.B: whenever blocking is used, the identity of the block and the purpose for their use must be
consistent throughout the experiment. Variations within blocks must be controlled under any
circumstances.
For example:
 If certain operations such as weeding or data collection can not be completed for the whole
experiment in one day, the task should be completed for all plots of the same block in the
same day and if possible under the same condition.
 If more than one weedier or data collector are involved in the trial, the same person should
assigned to make managements for all plots of the same block.
Advantage of blocking
 Increase precision of experiment by reducing experimental error
 More efficient than CRD
 Flexible, any number of replications can be used
 Large number of treatments can be included if large homogenous units are available.
 Statistical analysis is easy
Disadvantage of blocking
 Large block size increase experimental error
 May not be suitable for large number of treatments
B. Proper plot technique
Keeping all other factors, aside from those considered as treatments, uniformly for all
experimental units can minimize experimental error.

11
Biometry Handout 2022
Example: In variety trials where the treatments consist solely of the test varieties, it is required
that all other factors such as soil nutrients, solar energy, plant population, pest incidence are
maintained uniformly for all plots in the experiments.
Though it is difficult to keep all factors in plots uniform it is essential that most important ones
be watched closely to ensure that variability among experimental plots is minimized. This is the
primary concern of a good plot technique.
For field experiments with crops, the important sources of variability among plots treated alike
are soil heterogeneity, competition effects and mechanical errors.
C. Data Analysis
Data analysis controls experimental error that can not be achieved adequately by blocking
alone. Hence proper choice of data analysis can help greatly. Co-variance analysis is most
commonly used for this purpose. In co-variance analysis, one or more covariates are measured.
Covariates: the characters whose functional relationship to the character of primary interest is
known. Analysis of covariance can reduce the variability among experimental units by
adjusting their values to a common value of the covariates.
Examples:
1. In an animal feeding trial, the initial weight of the animals differs. Using this initial weight
as the covariate, final weight after the animals are subjected to various feeds can be
adjusted to the values that would have been attained had all experimental animals started
with the same body weight.
2. Or in a wheat field experiment where rats damaged some of the test plots, co-variance
analysis with rat damages as the covariate can adjust plot yields to the level that they should
have been with no rat damage in any plot.
3. Proper interpretation of results
Interpretation of experimental results should reflect the real conditions to which that
experiment designed. Keeping uniformity is vital to the measurement and reduction of
experimental error. This is an advantage. However, uniformity limits the applicability and
generalization of research results to wide range of conditions. This limitation is troublesome
because most agricultural research is done on experiment stations where average productivity is
higher than that for ordinary farms. This is because the environment surrounding a single
experiment can hardly represent the variation over space and time that is so typical of
commercial FARMS:

12
Biometry Handout 2022
Consider a plant breeder comparing two varieties A and B. Management practices like
fertilization, weed control; site and crop season will greatly affect the relative performance of
the two varieties. Improved varieties are greatly superior to the native varieties when both are
grown in a good environment and with good management. But improved varieties are no better
or even poorer when both are grown by the additional farmer’s practices. Therefore, proper
interpretation of the result is important to avoid experimental bias.
Analysis of Variance (ANOVA)
The concept of variation is so fundamental to scientific experimentation that it is virtually
important for anyone who comes in contact with the experimental results, must appreciate the
universality of variation. In field experiments, the sources of variation and the relative
importance of the different causes of variation are often of interest in themselves. In such
experiments, variation tends to obscure the effects of treatments, making comparison difficult.
This in turn can lead to mistakes in interpreting the results of experiments and to wrong
conclusion about the best treatments to be recommended for commercial use.
To handle such sources of variations, earlier scientists introduced the analysis of variance
techniques. The analysis of variance (ANOVA) was introduced by Ronald Fisher and is
essentially an arithmetic process of partitioning a total sum of squares into components
associated with recognized sources of variations. It has been used in all fields of research where
data are measured quantitatively.
ANOVA is a procedure that can be used to analyze the results from both simple and complex
experiments. It is one of the most important statistical techniques available and provides a link
between the design of experiments and data analysis. Assumptions of ANOVA include
randomness (sampling of individuals be at random), independence (the errors are independently
distributed), normality (the errors are randomly, independently and normally distributed), and
homogeneity (the error terms are homogeneous/equal variance). The ANOVA has its origin in
biology, or at least agriculture, since the methods were specifically developed to deal with the
type of variable response that is common in field experiments but currently it has a much wider
application.
The structure of the various components parts differ from design to design. In the words of
fisher ‘’Analysis of variance is a tool by which the total variation may be split up into several
physically assignable components’’. It sorts out and estimates the various components and
provides way for the test of significance. ANOVA is used in testing hypothesis whether the
means of several normally distributed populations are the same or not.
13
Biometry Handout 2022
In your statistics course, methods were introduced to test hypotheses about one or two
population means. The t-test is good if there are one or two means. If there are more than two
means, it is not advisable to use the t-test. When several means are compared ANOVA is the
most powerful test.
The ANOVA is the powerful statistical technique developed for analyzing measurements that
depend on several kinds of effects which operate simultaneously, to decide which kind of
effects are important and also to estimate these effects. It is a powerful technique, which allows
analysis and interpretation of observations from several populations. This versatile statistical
tool partitions the total variation in a data set according to the source of variation that is present.
The ANOVA for two groups is identical to the results obtained with a t test; it is fair to say that
ANOVA is an extension of the t test to handle more than two independent groups. The
theoretical basis for performing the ANOVA test is the partitioning of the available variance of
all observations into two sources of variations- variation between the group means and
variation within each of the groups. The sampling distribution used for testing these means is
not the t distribution but rather the F distribution (named in honor of R.A. Fisher, who
developed the F statistic).
2. CHAPTER TWO: SINGLE-FACTOR EXPERIMENTS
Experiments in which only a single factor varies while all others are kept constant are called

single-factor experiments. In such experiments, the treatments consist solely of the different
levels of the single variable factor. All other factors are applied uniformly to all plots at a single
prescribed level. For example, most crop variety trials are single-factor experiments in which
the single variable factor is variety and the factor levels (i.e., treatments) are the different
varieties. Only the variety planted differs from one experimental plot to another and all
management factors, such as fertilizer, insect control, and water management, are applied
uniformly to all plots. Other examples of single-factor experiment are:

 Fertilizer trials where several rates of a single fertilizer element are tested.
 Insecticide trials where several insecticides are tested
 Drug test/ vaccines on d/t dos
 Temperature for bacterial growth
 Hormone on d/t concentration
 Plant-population trials where several plant densities are tested.

14
Biometry Handout 2022
2.1. Types of Experimental Designs
Type of experimental designs to be chosen varies depending on:
 Type and level of experimental factors (single- factor, two-factor, three-factor etc)
 Conditions in which the experiment conducted (laboratory vs. field)
 Number and situation of known sources of variation in experimental site (bidirectional
variation is best handled by Latin square design than by RCBD)
 Number of treatment (large vs. small) and level of precision needed to be attained (lattice
design with incomplete blocks is suitable in case of large treatment)
There are two groups of experimental design that are applicable to a single-factor experiment.

One group is the family of complete block designs, which is suited for experiments with a

small number of treatments and is characterized by blocks, each of which contains at least one

complete set of treatments. The other group is the family of incomplete block designs
which is suited for experiments with a large number of treatments and is characterized by

blocks, each of which contains only a fraction of the treatments to be tested.

We have three complete block designs (completely randomized, randomized complete block,
and latin square designs) and two incomplete block designs (lattice and group balanced block
designs). For each design, we illustrate the procedures for randomization, plot layout, and
analysis of variance with actual experiments
2.1.1. Completely Randomized Design (CRD)
Completely randomized design (CRD) is the simplest and least restrictive experimental design.
In CRD the treatments are assigned to the experimental units without restriction. That is, with
CRD every plot is equally likely to be assigned to any treatment. The CRD is used where there
is relatively uniform experimental site, when there are missing plots during the course of the
experiment in other designs, and when the number of experimental units is limited to get
maximum degrees of freedom for error.
In this design treatments are assigned completely at random so that each experimental unit has
the same chance of receiving any one treatment. For the CRD, any difference among units
receiving the same treatment is considered as experimental error. It is appropriate only for
experiments with homogeneous units such as laboratory, greenhouse experiments where
environmental effects are relatively easy to control.
Dear students; what are the advantages and disadvantages of CRD?

15
Biometry Handout 2022

Randomization and Layout of CRD


There is no restriction on the assignment of treatments to experimental units (plots) in CRD. The
only situation is that treatments should be assigned to plots at random. For example, for t number of
treatments to be assigned in n number of plots where, n > t, there will be tr number of plots.
Steps for randomization and layout
Assume that a researcher conducted a lab experiment with four treatments A, B, C, D and E
each replicated four times.
Fig. 1 a sample layout of a CRD with five treatments each replicated four times
1. Determine the total number of experimental plot (product of treatment (t) number and

replication(r))
2. Assign a plot number to each experimental plot in any convenient manner, usually
consecutively from 1 to n.
1A 2C 3B 4D 5E 1,
2, 10B 9E 8A 7C 6D 3,
4, 11C 12D 13E 14B 15A 5,
6, 7,
20E 19B 18D 17A 16C
8, 9,
10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20
3. Treatments are assigned to experimental plots using either random number, drawing cards
(not applicable for treatment number greater than 52) or by coin tossing.
Analysis of variance (ANOVA)
The analysis can be started by calculating the total amount of variability in the data,
irrespective of the treatments. The best way to do this is to calculate a sum of squares (SS) as a
measure of variability.Two sources of variation among n observations obtained from a CRD
trail. The sources are treatment variation and experimental error. The relative size of the
two is used to indicate whether the observed difference among treatments is real or is due to
chance. The difference is real if treatment variation is sufficiently larger than experimental
error.
The total variability, measured by the Total SS, can be partitioned into two components: the
explained and the unexplained variability. The explained variability is that represented by

16
Biometry Handout 2022
differences between the treatments. This variability is explained because there is a potential
explanation for it, i.e. the treatments applied to the experimental units were not the same. The
unexplained variability is that within the treatments, i.e. differences between replicates. This
variability is not caused by the experimental treatments. Hence, it is often called experimental
error. Thus, the following important relationship holds:
Total SS = Explained SS + Unexplained SS
Since the values of two elements in the summation are known, the third value can be obtained
by subtraction. If Total SS = Explained SS + Unexplained SS then unexplained SS = Total SS -
explained SS.

The model is indicated as where, Yij = the jth observation of ith treatment, μ =
the overall mean, Ti = ith treatment effect (μi-μ) and Eij = the effect of jth
observation of ith
treatment. j=1…r, I=1…t.
The following assumption must be made concerning the random error in order to use the one-
way ANOVA technique:
a. Each treatment population has the same variance, that is, the random errors have the same
variance in experiments designed to compare treatment means,
b. The error terms are statistically independent, that is, the magnitude of the error in one
observation is not influenced by the magnitude of the error in another observation,
c. The errors are normally distributed.
A major advantage of CRD is the simplicity of the computation of its analysis of variance
especially when the number of replications is not uniform for all treatments.
ANOVA with equal replication
Steps:
a) Group the data by treatments and calculate the treatment totals (T) and grand total (G).

Calculate also the correction factor (c.f) as .

b) construct the outline of the analysis of variance table as:

Source of Degree of Sum of Mean Computed Tabulated F


variation freedom square square F 5% 1%
Treatment
Error
Total

17
Biometry Handout 2022
c) Represent treatment by t and replication by r and determine degree of freedom for each
source of variation as



The error d.f can also be obtained through subtraction as:

Degree of freedom refers to the maximum number of logically independent values, which are
values that have the freedom to vary, in the data sample.
d) Calculate the correction factor and the various sum of squares
 Compute correction term (CF)



e) Calculate the mean squares (MS) for each of variation by dividing each sum square by its degree of
freedom.

f) Calculate the F value for testing significance of the treatment difference as:

The F value should be computed only when the error d.f is six or more, for a reliable estimate
of the error variance.
g) Obtain the tabular F-values from appendix table, with f1= treatment d.f = t-1 and f2= error
d.f = t(r-1). Enter all values in ANOVA table.
h) Compare the computed F value with the tabular F values, and decide on the significance of
the difference among treatments using the following rules.

18
Biometry Handout 2022
1) h1) if the computed F value is larger than the tabular F value at 1 % level of significance,
the treatment difference is said to be highly significant. Such a result is indicated by placing
two asterisks (**) on the computed F value in the analysis of variance.
2) h2) if the computed F value is larger than the tabular F value at 5 % level of significance
but smaller than or equal to the tabular F value at 1 % level of significance, the treatment
difference is said to be significant. Such a result is indicated by placing one asterisk (*) on
the computed F value in the analysis of variance.
3) h3) if the computed F value is smaller than or equal to the tabular F value at 5 % level of
significance, the treatment difference is said to be non-significant. Such a result is indicated
by placing ns on the computed F value in the analysis of variance.
Note that a non-significant F-test in the ANOVA indicates the failure of the experiment to
detect any difference among treatments.
It could be the result of either a very small or nil treatment difference or a very large
experimental error, or both.
Thus, whenever the F-test is non-significant, the researcher should examine the size of the
experimental error and difference among treatment means.
i) Compute the grand mean and the coefficient of variation CV as follows:

CV is calculated to determine the reliability of the experiment or to determine how much


variability exists within the experiment and it is always expressed in %.
The CV indicates the degree of precision with which the treatments are compared.
o It is a good index of the reliability of the experiment.
o It expresses the experimental error as a percentage of the mean.
o The higher the CV value, the lower is the reliability of the experiment.
Treatment Means Comparison/ Mean Separation
There are different kinds of mean comparison procedures. Mean separation procedures are
designed to make statistical inferences concerning a given set of treatment means. There are
two categories of mean separation procedures, Category I and Category II. The first category
consists of a group of studentized range-based procedures to test all possible pairs of mean
differences. This procedure can be subdivided into multiple comparisons (LSD and Tukey’s

19
Biometry Handout 2022
test) and multiple range tests (Student-Newman-Keuls’ test and Dancun’s New Multiple range
Test). Under this category, for an experiment with t number of treatments, t(t-1)/2 possible
number of comparisons can be done. In multiple comparison method, a single critical value is
used while in multiple range test procedure two or more critical values are used.
The purpose of the second category of mean separation is to test mean difference between any
treatment and a specified treatment usually a control or standard check. In this case, there are
only t-1 paired comparisons (Peterson, 1977). This category includes preplanned contrast and
preplanned comparison to a check or control.
A. Least significant difference method
The least significant difference (LSD) procedure, also known as Fisher’s LSD, compares all
possible [t(t-1)/n] pairs of standard t-test for t treatment means. The LSD procedure is only
used when the treatment source of variation is found to be significant by the F-statistic. The
purpose of testing each mean difference is to test the null hypothesis that the corresponding two
treatment means are equal.
The critical value for conducting each test is the same for each individual comparison of paired
means, and it depends on the number of degrees of freedom for the error, as well as the
significance level of the test, α. It should be noted that α reflects the probability of a Type-I
error for each individual comparison, and therefore cannot be applied to the entire group of
individual comparison since these are not mutually independent. Thus, the t-tests of all possible
hypotheses are not independent.
The LSD method differs from others in that it has a comparison-wise Type-I error, α, over all
repetitions of the experiments (Steel and Torrie, 1980). This means that the α risk will be
inflated when using the LSD procedure. As the number of treatments increases, the Type-I
error for experiment becomes large. Due this fact, a situation may arise whereby the F-statistic
in the ANOVA is significant, yet the LSD procedure fails to find any pair of treatment means
which differ significantly from one another. This occurs since the F-statistic is considering all
possible comparisons between treatment means simultaneously, not in a pair-wise manner as
the LSD procedure does (Cochran and Cox, 1957).

LSD = t(1- α)/2


Assuming two-sided alternatives, the pair of means would be declared significantly different if
the difference between the two means is greater than the LSD value.

20
Biometry Handout 2022

LSD = (tα/2, error d.f.) for equal replication and

LSD = (tα/2, error d.f.) for unequal replication.

Taking five treatments, for example, the process can be explained as T1, T2, T3, T4 and T5
with mean values of 10, 11, 15, 17 and 22, respectively, with α= 0.05, error degrees of freedom
20, and MSE for the five treatments = 8.06, then the critical difference = 4.37. From this
statistics one can see that treatment 5 had the highest mean and two pairs of means that do not
significantly differ from each other are T1 and T2 and T3 and T4.
T1 T2 T3 T4 T5
9 10 16 17 22
The overall α risk may be considerably inflated using this method. Specifically, as t getting
larger, the type I error of the experiment becomes larger (Steel and Torrie, 1980).
B. Tukey’s Test
Tukey (1953) proposed a procedure for testing hypotheses for which the overall significance
level is exactly α when the sample sizes are equal and at most α when the sample sizes are
unequal. His procedure can also be used to contract confidence intervals on the differences in
all pairs of means. For these intervals, the simultaneous confidence level is 100(1- α) percent
when the sample sizes are equal and at least 100(1- α) percent when the sample sizes are
unequal. This is good procedure when interest focuses on pairs of means. Tukey’s procedure
makes use of the distribution of studentized range statistic

q=

Appendix Table A.8 of Steel and Torrie (1980) contains values of qα (p, d.f.), the upper α and
percentage points of q where d.f. is degrees of freedom for error. For equal sample size,
Tukey’s test declares two means significantly different from each other, if the absolute value of
their sample differences exceeds T.

Tα = qα(t, d.f.) .

In the same way, we can construct a set of 100(1- α) percent confidence intervals for all pairs of
means. For the previous example, with means 9, 10, 16, 17 and 22, with α= 0.05 and error

21
Biometry Handout 2022
degrees of freedom 20, there will be q0.05 (5, 20) = 4.23 and if the MSE for the five treatments
= 8.06, then

qα (5, 20) = 5.37.

Hence, any pair of treatment means that differ in absolute value by more than 5.37 would imply
that the corresponding pair of population means is significantly different.
T1 T2 T3 T4 T5
9 10 16 17 22

C. Duncan’s New Multiple Range Test (DNMRT)


A widely used procedure for comparing all pairs of means is the DNMRT. To apply DNMRT
for equal sample size, the t treatment means will be arranged in ascending order and the
standard error of each mean is determined as

SE (m) = .
From Duncan’s Table of significant ranges, Appendix Table A.7 of Steel and Torrie (1980), rα
(p, d.f.) values are obtained for t = 2, 3… t, where α is the significant level and d.f. is error
degrees of freedom. This ranges will be converted into a set of t-1 least significant ranges (Rp)
for p = 2, 3… t by calculating Rp).
Rp = rα(p, d.f.) SE (m) for p = 2, 3… t
Then, the observed differences between means are tested, beginning with the largest versus
smallest, which would be compared with the least significant ranges Rt. Next, the difference of
the largest and second smallest is computed and compared with the least significant range Rt-1.
Such comparisons are continued until all means have been compared with the largest mean.
Finally, the difference between the second largest mean and the smallest is compared against
the least significant range Rt-1. This process is continued until the difference between all
possible n(n-1)/2 pairs of means have been considered. If the observed difference is greater
than the corresponding least significant range, we conclude that the pair of means in question is
significantly different. To prevent contradictions, no differences between a pair of means are
considered significant if the two means involved fall between two other means that do not
differ significantly.

22
Biometry Handout 2022
DNMRT can be applied on the previous example. Recalling that MSE = 8.06, n = 5 and error
degrees of freedom = 20, then the treatment means can be ordered in ascending order as T1=9,

T2 = 10, T3 = 16, T4 = 17 and T5 = 22. The SE (m) = = = 1.27.


From the table of significant ranges in Appendix Table A.7 (Steel and Torrie, 1980) for 20
degrees of freedom and 0.05 level of probability, we obtain r0.05(2, 20) = 2.95, r0.05 (3, 20) =
3.10, r0.05(4, 20) = 3.18 and r0.05(5, 20) = 3.25. Hence, the least significant ranges are
R2 = r 0.05(2, 20) SE (m) = 3.75
R3 = r 0.05(3, 20) SE (m) = 3.94
R4 =r 0.05(4, 20) SE (m) = 4.04
R5 = r 0.05(5, 20) SE (m) = 4.13
The comparison would produce that there are significant differences between all pairs of means
except treatment 1 and 2, and 3 and 4. In this example, DNMRT and the LSD method produced
the same result that leads to identical conclusions. DMRT requires greater observed differences
to detect significantly different pairs of means included in the group. For two means the critical
value of DNMRT and the LSD are exactly equal. Generally, DNMRT is quite powerful when n
is getting larger; that is, it is very effective at detecting differences between means when real
differences exist and for this reason it is popular.
D. The Newman Keuls Test
Newman the procedure is usually called the Student-Newman-Keuls’ test. Operationally, the
procedure is similar to DNMRT, except that the critical differences between means are
calculated somewhat differently. Specifically, we compute a set of critical values K p = qα(p,
d.f.) SE (m), p = 2, 3… t where q α (p, d.f.) is the upper the α percentage point of the
studentized range for groups of means of size p and d.f. error degrees of freedom. Once the
value of Kp is computed, extreme pairs of means in groups of size p are compared with Kp
exactly as in DNMRT.
Which comparison method is the best?
A logical question at this point is which one of these methods should be used? Unfortunately,
there is no clear-cut answer to this question, and professional statisticians often disagree over
the utility of the various procedures. Carmer and Swanson (1973) have conducted simulation
studies of a number of multiple comparison procedures, including others not discussed here.
They reported that the least significant difference method is a very effective test for detecting

23
Biometry Handout 2022
true differences in means if it is applied only after the F test is significant in the analysis of
variance. They also reported good performance in detecting true differences with DNMRT.
Some statisticians use Tukey’s method as it does control the overall error rate. The Newman-
Keuls Test is more conservative and the power of the test is less than DNMRT. There are other
multiple comparison procedures and further references can be obtained in Steel and Torrie
(1980) and other texts of experimental designs.
Therefore, the comparison between the test methods can be summarized by suggesting the use
of either LSD or DNMRT suffice the present knowledge. Besides, latest statistical packages
like SAS are programmed to produce results of mean comparison by these two statistics (LSD
and DNMRT).
Numerical example 1
A company would like putting together a computer system for sale to business. Upon
manufacturing, they do not wish to take time to develop a laser quality printer (LQP) to go with
their system. Hence, they planned to find best reasonably priced LQP available to subcontract
as a system option. To determine difference between machines, six of the requested models
(replications) were taken from three randomly chosen manufacturers (treatments). The data in
hours to first failure is given below (Table 1) for the three models (manufacturers’ printers).
Table 1. Data on first failure in LQP of three manufacturers

I II III IV V VI Treatment Total (Ti)

Model-I 60 45 72 68 71 52 368
Model-II 102 96 105 99 103 95 600
Model-III 121 132 118 128 131 126 756
Grand Total 1724

In this example, the shortcut methods will be introduced as they are of practical utilities. The
stepwise computations of the different components are as follows.
Grand total (GT) = sum of all observations for a particular observations.
GT = 368 + 600 + 756 = 1724
Once, the grand total is obtained, the next step is to compute correction factor.

Correction factor (C.F.) = where, r = number of replication and


24
Biometry Handout 2022

= t = number of treatment =165120.89

1) Total sum of squares (Total SS) = SS of each observation – C.F.


Total SS = ∑∑Xij 2 –C.F. = (60)2 +(45)2 + … + (126)2 – 165120.89= 13547.11

2) Sum squares of treatment (SSt) = (SS of each treatment) – C.F. SSt = – C.F.=

- 165120.89 =12705.78
3) Sum squares of error (SSE) = Total SS – SSt = 13547.11 - 12705.78 = 841.33
Table 2. Analysis of variance for CRD with three treatments and of six repeats

Source of
Variation d.f. SS MS F-cal F-tab0.05 0.01
Treatments 2 12705.78 6352.89 113.26** 3.68 6.36
Error 15 841.33 56.09

**, Significant at 0.01 level of probability


There is a significant effect of the treatments (there is significant difference among the models)
at 0.01 level of probability (Table 2). Hence, the hypothesis that says all treatments (models)
have the same mean values (hours to first failure) is rejected. The computation for standard
error of means, standard error of difference and the critical values are the same as in the
previous example.
The formula to be used will be the one with equal number of repeats. A common standard error
of difference will be calculated to serve all treatment means, but in the previous case where
there were different repeats, a standard error of difference for each treatment means was
computed.
4) Standard error of means, SE(m)

SE (m) = ± =± = 7.489

5) error of differences, SE(d) SE(d) = ± =± = 10.591

25
Biometry Handout 2022
6) Critical difference, CD or Least significant difference (LSD)
CD = SE (d) x t-at 0.05 = 10.591 x 1.753 = 18.566
Mean differences
Mean of Mode-l - Mean of Model-II = 61.33 - 100 = - 38.67
Mean of Mode-l - Mean of Model-III = 61.33 - 126 = - 64.67
Mean of Mode-Il - Mean of Model-III = 100 - 126 = - 26
Treatment means can be compared as in previous
Table 3. Mean comparison

Paired treatments Mean difference CD/LSD values Status


Mode-l and Model-II 38.00 18.566 *
Mode-l and Model-III 64.67 18.566 *
Mode-lI and Model-III 26.00 18.566 *

The conclusion from this mean comparison is that all models are statistically different where
the highest mean first failure value was due to Model-III. Hence, the company can go for
Model-I system option to develop a laser quality printer as it has lowest mean value of first
failure.
Example 2
A person wanted to compare the effectiveness of different dose of drug type on time of
recovery. These drug doses are represented by D 1, D2, D3, D4 and D5. The data generated is
presented below. Now test the hypothesis of no difference in response to time recovery among
the different drug doses.
D1(19) D3(22) D4(20) D1(20)
D5(29) D2(24) D5(30) D3(24)
D2(26) D4(25) D1(16) D2(22)
D5(28) D3(25) D5(31) D4(28)
D4(27) D1(16) D2(27) D3(20)

26
Biometry Handout 2022
Treatment Rep1 Rep2 Rep3 Rep4 Treatment Treatment
Total mean
D1 19 16 16 20 71 17.75
D2 26 24 27 22 99 24.75
D3 22 25 24 20 91 22.75
D4 27 25 20 28 100 25
D5 29 28 30 31 118 29.5
Grand total 479
Grand mean 23.95
Steps to prove:
1. Calculate correction factor and the various sums of squares (SS) as:

2. Calculate the mean squares (MS) for each of variation

3. Calculate F

ANOVA table
Source of DF S.S M.S F-Comp F-Tabular
Variation 5% 1%
Drug dose 4 289.70 72.42 12.75** 3.06 4.89
Error 15 85.25 5.68
Total 19 374.95
**denotes highly significant (at 1% level of significance)
The table value of F at 1% α level of significance for 4 and 15 d.f is 4.89. Thus the calculated
value of F 12.75 shows that drug dose resulted in highly significant time of recovery. This

27
Biometry Handout 2022
indicates that the hypothesis that the time of recovery obtained by the different drug dose is the
same has to be rejected.
CV (Coefficient of variation)

Mean separation/mean comparison


Mean of the different dose are indicated below.
D1 D2 D3 D4 D5
17.75 24.75 22.75 25.0 0 29.50
SE (m) = (√MSE/r) and SE (d) = (√2MSE/r)
Least Significant Difference (LSD) (CD) = SE (d) x tα
LSD (1%) =2.947x (√2x5.68/4) = 2.947x1.685= 4.97
LSD (5%) = 2.131 (√2x5.68/4)=2.131x 1.685= 3.59
D1 D2 D3 D4
D5 11.75** 4.75* 6.75** 4.50 ns
D4 7.254** 0.25 ns 2.25 ns -
D3 5.00** - -
D2 7.0* 2.00ns - -
Dear students, interpreter the result and draw your conclusion
 Assignment on CRD with Unequal Replication

2.1.2. Randomised Complete Block Design (RCBD)


Randomized complete block design is one of the most widely used designs. The principle
behind this design is that the experimental area is divided into groups called blocks, each block
containing all treatments. The term ‘block design’ originated from the design of agricultural
field experiment, where ‘block’ refers to a group of adjacent plots. The essence of blocking is
to minimize the within block variation, experimental error. The block simply represents one
restriction on complete randomization due to the environment in which the experiment is
conducted. When the experiment is conducted a uniform technique should be followed for all
experimental units within the block.
In CRD no local control measures are adopted except that the experimental units are
homogenous. When number of treatments is large precision can not be attained with CRD.
Improvement can be provided by error control measure in RCBD.

28
Biometry Handout 2022
 The RCBD is one of the most widely used experimental designs in agricultural research.
 The design is suited for field experiments where the number of treatments is not large and
the experimental area has predictable source of variation.
 The presence of blocks of equal size each of which contains all the treatments is the main
feature of RCB design.
Blocking technique
The primary purpose of blocking is to reduce experimental error by eliminating the contribution

of known sources of variation among experimental units. This is done by grouping the
experimental units into blocks such that variability within each block is minimized and
variability among blocks is maximized. Because only the variation within a block become part
of the experimental error, blocking is most effective when the experimental area has a
predictable pattern of variability. With a predictable pattern, plot shape and block orientation
can be chosen so that much of the variation is accounted for by the difference among blocks,
and experimental plots within the same block are kept as uniform as possible. There are two
important decisions that have to be made in arriving at an appropriate and effective blocking
technique. These are:
 The selection of the source of variability to be used as the basis for blocking.
 The selection of the block shape and orientation.
Randomization and layout
Randomization for RCBD is applied separately and independently to each of the blocks.
Steps
 Divide the experimental area into r equal blocks where r is the number of replications
 Subdivide the first block into t equal experimental plots; t is the number of treatments.
Number the t-plots consecutively from 1 to t, and assign t-treatments at random to the t-
plots using random number or drawing lots. Numbering should be begin from the left side
of the block
 Repeat step 2 completely for each of the remaining blocks.
 In CRD, randomization is done without any restriction, but for RCBD, all treatments must
appear in each block and randomization is done separately for each block.

29
Biometry Handout 2022
Sample field layout of RCBD in 3 replications and 12 treatments

10 07 06 01 08 02 05 12 03 04 11 09
25 26 27 28 29 30 31 32 33 34 35 36
B2

04 11 12 09 02 10 06 01 08 03 05 07
24 23 22 21 20 19 18 17 16 15 14 13
B1

06 08 12 04 05 10 11 02 09 01 07 03
1 2 3 4 5 6 7 8 9 10 11 12
Treatment no
Plot no
When the number of treatments is large, RCBD are not usually suitable because it is often
difficult to get homogenous groups of units of size larger. If, however, homogenous groups can
be formed with larger number of units, RCBD can still be adopted with large number of
treatments.
Analysis
The data collected from experiments with RCBD form a two-way classification. There are tr
cells in the two-way table with one observation in each cell. The data are orthogonal and
therefore the design is an orthogonal design.
There are three sources of variation in RCB design treatment, replication (or block) and
experimental error.

Where Yij denotes a random variable observation from the ith treatment in jth block. μ, ti, & rj are
respectively the general mean, effect of the ith treatment and effect of the jth block. These effects
are fixed and eij is the error component which is a random variable. These are assumed to be
normally and independently distributed with zero mean and constant variance.
Step 1. Group the data by treatments and replications, calculate treatment totals (T), replication
totals (R), grand total (G)

30
Biometry Handout 2022
Step 2. Outline the ANOVA table
Source of D.F. Sum of Mean Computed Tabulated F
Variation squares square F 5% 1%
Replication
Treatment
Exp. Error
Total
Step 3. Using r = no. of replications and t= no. of treatments determine d.f for each source.
Total d.f = rt-1, Treatment d.f = t-1, Replication d.f. = r-1, Error d.f = (r-1) (t-1)
Error d.f = Total d.f – Rep. d.f – Trt. D.f
Step 4. Calculate the correction factor and the various sum of squares as
Let Σyij =Ti (i =1,2, …,t), observation total of the ith treatment
and Σyij =Bj (j =1,2, …,r), observation total of the jth block these are the marginal totals of
the two-way data table.

Correction factor denoted by:



Step 5. Compute the mean squares for each source of variation by dividing each sum square by
its d.f.

Step 6. Compute the F value for testing the treatment difference as

and

F -value computed for block measures block efficiency (whether it maximized difference
among blocks and much minimized difference among plots within the block). Blocking is
31
Biometry Handout 2022
effective in reducing experimental error if (F replication) is significant (i.e. when the computed
F is greater than the tabulated F-value).
Compare the computed F value with the tabular F value from (appendix E). Make conclusions
on the results.
Step 7. Compute the coefficient of variation (used to test the reliability of the experimental
conditions)

Relative efficiency of blocking in reducing experimental error can be computed as:

Where Eb is block mean square and Ee is error mean square in the RCB analysis of variance
If the error d.f is less than 20, the R.E. must be multiplied by correction factor k defined as:

Note: Ee in the denominator is the error for the RCB design and the numerator is the
comparable error had the CRD been used. The value of the relative efficiency is indicative of
the gain in precision due to blocking. Example: if the R.E. value is 1.60, it shows that the use of
RCB design instead of CRD design increased experimental precision by 60%.
Step 8. Enter all values computed in the ANOVA
Source of Var. D.F S.S M.S F-comp F-tab
Replication r-1
Treatments t-1
Error (r-1) (t-1)
Total rt – 1
 Hypothesis that the treatments have equal effects is tested by F-test, where F is the ratio St2/
S 2 with (t-1) and (r-1) (t-1) d.f.
 If the F is not significant the data do not suggest that the treatment effects are different.
 When F is significant, we conclude that the treatment effects are different. We may then be
interested to either compare the treatments in pairs or evaluate special contrasts depending
on the objectives of the experiments.
Example

32
Biometry Handout 2022
Table 4. Data collected on bacterial counts from an experiment with three treatments in five
replications is given below.
Blocks ∑ Treatments Mean of treatments
1 2 3 4 5
Control 3.74 4.58 4.58 4.47 4.79
Treatment 1 4.47 6.78 5.19 5.19 6.85
Treatment 2 5.65 7.00 6.08 5.74 7.55
∑Blocks_______________________________________________________________
a) test the hypothesis that the mean number of bacteria across treatments is not different
b) test the reliability of experimental conditions
Answer:
ANOVA table for the bacterial counts of three treatments in five replications
S.V D.F S.S M.S F-com F-tab
Blocks 4 6.434 1.609 8.47 3.84
Treatments 2 9.980 4.990 26.26 4.46
Error 8 1.523 0.190
Total 14 17.937

For example, the F test for treatment effects (H0: tt = 0, t= 1, 2, 3) is F = MSt/S2e = 4.990/0.190
= 26.26.
When compared with F (0.95; 2, 8) = 4.46, this result causes us to reject the null hypothesis.
The mean number of bacterial counts differs from treatment to treatment.
Treatments Means
Control 4.432
Treatment 1 5.696
Treatment 2 6.404
LSD at 1% level = t (1-α/2) √2s2/r
Error DF= (r-1)(k-1)
3.355 x √2x0.190/5 = 0.925
5.696 6.404
4.432 1.264 1.972
5.696 - 0.708
6.404 - -
33
Biometry Handout 2022
Two comparisons out of three were significantly different.
CV = √EMS /grand mean x 100
= √0.190 /5.511 x 100 = 7.91%
Experimental conditions are reliable.
 Dear students, interpret the results and draw your conclusion
Advantages of RCBD are:-
 Simple, easy for lay out and calculations
 It can remove one source of variation from experimental error and thus increase precision.
 This design allows any number of treatments and replications. Hence, it is flexible.
However, when the number of treatments is large (more than 12) the efficiency of the
design decreases as the size of replication increases.
 Soil heterogeneity between the blocks can be estimated and this helps to reduce the error
variance
 The individual replication can be arranged in any manner according to soil homogeneity.
 Statistical analysis of the results is 'fairly simple'.
 If part of the experiment is damaged, it is possible to discard that replication and go ahead
with data analysis without discarding the entire experiment.
 When the information from some of the individual units is missing, it is easy to find out
missing plot value by applying missing plot technique developed by Yates. This design is
especially more suited when soil heterogeneity is in one direction.
Disadvantages of RCBD are:-
 Missing data can cause some difficulty in the analysis. One or two missing plots can be
handled fairly easily but numerous missing data can cause real problems.
 Assignment of treatments by mistake to plots in the wrong block can lead to problems in
the analysis.
 The design is less efficient than others in the presence of more than one source of unwanted
variation.
 If the plots are uniform, the RCBD is less efficient than the CRD, because the number of
error degree of freedom is less in RCBD than in CRD.
 This design is not efficient when soil heterogeneity is in two directions.
 Not suitable when two or more factors are to be tested at a time
 The randomized block design has a number of uses:

34
Biometry Handout 2022
 It can be used to eliminate one source of unwanted variation. Often, it provides satisfactory
precision without the need for a more complex design.
 It provides unbiased estimates of the means of the blocking factor. Hence these means can
be estimated using the randomized complete block design.
 As the design is flexible, it is commonly used in the field experimentation with any number
of treatments and replications
 Most suited when the soil heterogeneity is in one direction in the field.
2.1.3. Latin Square Designs (LSD)
Latin square design is used when there are two known sources of variation among experimental
units. The experimental unit is divided into rows and columns, and the treatments are allocated
such that each treatment occurs only once in each row-block and once in each column-block.
Features of LSD
 The number of rows and columns are equal
 Each row and each column is a complete block or replication
 It can handle two known sources of variation among experimental units.
 It considers two independent blocking criteria known as row-blocking and column
blocking.
 Helps to estimate variation among row-blocks as well as among column-blocks and to
remove them from experimental error.
 The number of treatment and replication must be equal.
Examples where LS design can be used are:
 Field trials in which the experimental area has two fertility gradients running perpendicular
to each other
 Insecticide field trials where the insect migration has a predictable direction perpendicular
to the dominant fertility gradient.
 Green house trails in which the experimental pots are arranged in straight line perpendicular
to the glass or screen walls, such that the difference among rows of pots (treatment
difference) and the distance from the glass wall (light difference) are expected to be the two
major source of variability among the experimental pots.
 Laboratory trials with replication over time: differences among experimental units
conducted at the same time and among those conducted over time constitute the two known
sources of variability.

35
Biometry Handout 2022
N.B: When the number of treatments is large the design becomes impractical because of the
large number of replications required.

On the other hand, when the number of treatments is small, the error degree of freedom
becomes too small for the error to be reliably estimated.
Advantages
 It allows the experimenter to control two sources of variation as column and row. The error
will be reduced at least by 2 % and the design is more efficient when the soil is more
heterogeneous.
 When the data from some of the plots are missing, missing plot technique can be used to
find the missed values of the plots.
 More efficient than CRD and RCBD
Limitations
 It is only applicable when the number of treatments is not less than four and not more
than eight. When the number of treatment is small the DF associated with the
experimental error becomes too small for the error to be reliably estimated.
 Number of treatments is limited to number of rows and columns
 It is not flexible as RCBD
 The design is impractical when number of treatment is large as large replication is
needed. Because of such limitation, the LS design has not been widely used in
agricultural experiments despite its great potential for controlling experimental error. The
statistical analysis is complicated by missing plots and mis assigned treatments.
Uses
 It is useful when two sources of variation must be controlled.
 The design is more suited for multi locational trials and the soil variation is quite different
in minor localities.
 For practical purposes its use is restricted to trials with more than four but fewer than 10
treatments. In field experiments, the Latin square could be used to remove the effect of two
gradients, say, slope and fertility, at right angles to each other. Here the best field plan
would be square, with square plots. The Latin square design can also be used to impose a
double control on a linear gradient. We can group the plots into p groups of p plots each
along the gradient for the row classification.
Randomization and layout

36
Biometry Handout 2022
Randomization and layout for a LS design is shown below for an experiment with five treatments A, B,
C, D and E.
Steps:
1. Select a Latin square plan with five treatments using appendix k on Gomez. For our
example, the 5 x 5 Latin square plans is:
1 2 3 4 5
1 A B C D E
2 B A E C D
3 C D A E B
4 D E B A C
5 E C D B A
2. Randomize the row arrangement of the plan selected in step 1. Following one of the
randomization schemes described. Use random numbers or draw lots
Random no. Sequence Rank
C D A E B 628 1 3
D E B A C 846 2 4
B A E C D 475 3 2
E C D B A 902 4 5
A B C D E 452 5 1
NB: Use the rank to represent existing row number of the selected plan and the sequence to
represent the row number of the new plan.
3. Randomize the column arrangement using the same procedure used for row arrangement in
step 2. The final layout becomes that of the experiment layout.
1 2 3 4 5 Random no. Sequence Rank
1 E C B A D 792 1 4
2 A D C B E 032 2 1
3 C B D E A 947 3 5
4 B E A D C 293 4 3
5 D A E C B 196 5 2
Analysis of variance
There are four sources of variation in LSD. These are variation due to row, column, treatment
and experimental error.
The model is:

37
Biometry Handout 2022
Where Yijk denotes the random variable corresponding to the observation yijk in the ith row, jth
column and under kth treatment; μ, ri, cj, ts, (I, j, s = 1, 2,… K) are fixed effects denoting in order
the general mean, the row, the column and the treatment effects. The eijk is the error
component, assumed to be independently and normally distributed with zero mean and a
constant variance.
It is assumed that each observation Yijk can be expressed as follows:
(i = 1, 2 … p; j = 1, 2, … p; k = 1, … p)
Data collected from this design are analysed as a three-way data. There should be k 2
observations as there are three factors each at k levels. Because there is only one observation
per cell instead of k in the usual three way-classified data, we can obtain only the sum of
squares due to each of the three factors and error sum of squares.
The analysis of variance is conducted by following a similar procedure as described for the
analysis of two-way classified data. The different sums of squares are obtained as below:
Steps:
1. First arrange the data in a row and column table
2. Calculate the row totals (R) and column totals (C) and the grand total (G), Compute the
treatment totals (T).
Let the data be arranged first in a row x column table such that yij denotes the observations of
the ith and jth cells of the table.
Let Ri = Σ Yij = ith row total (i = 1, 2,…,K),
Cj = Σ Yij = jth row total (j = 1, 2,…, K),
Tk = sum of those observations which come from the kth treatment
= kth treatment observation total (s= 1, 2,…,K)
G = Σ Yi = grand total

3. Out line the Analysis of variance of a k x k Latin square design


Source of var. D.F S.S M.S F-comp. F-tab
---------------------------------------------------------------------------------------------------------

Rows k-1

38
Biometry Handout 2022

Columns k-1

Treatment k-1

Error k-1) (k-2)

Tota1 k2 -1

4. Determine the d.f for each source of variation.


Rows k-1
Columns k-1
Treatment k-1
Error (k-1) (k-2)
5. Compute the correction factor and the various sum of squares (see step 3)

6. Compute the mean squares for each source of variation by dividing by its d.f.(see step 3)
7. Compute the F value for testing treatment effects as

8. Calculate the coefficient of variation

The hypothesis of equal treatment effects is tested by F-test where F is the ratio of treatment
mean squares to error mean squares. If F is not significant, treatment effects do not differ
significantly among themselves. If F is significant, further studies to test the significance of any
treatment contrast (mean comparison) can be made in exactly the same way as discussed for
RCBD.

Example
An experiment to test the response of tef to five rates of fertilizer combinations was conducted
in an area which has two fertility gradient perpendiculars to each other. Table below gives the
yield (in qt/ha) obtained from each rates of fertiliser combination.
Columns
1 2 3 4 5
39
Biometry Handout 2022
Rows
1 B (19.5) E(21.7) A(18.1) D(14.8) C(13.7)
2 D(16.2) B(19.0) C(16.3) A(17.9) E(17.5)
3 A (20.6) D (16.5) E (19.5) C (15.2) B(14.1)
4 E (22.5) C (18.5) D (15.7) B (16.7) A (16.0)
5 C (20.5) A (19.5) B (15.6) E (18.7) D (12.7)
a) Test the hypothesis that the different rates of fertilizer combinations has no differential effect
on yield of tef.
b) Test the reliability of the experimental conditions
c) Is both row-blocking and column blocking are efficient in this experiment?
d) Test the efficiency of design used in this experiment over RCBD
Answer: A 5x5 Latin square design could then be employed, using each fertilizer exactly once
with each row and exactly once in each column.
Table 5. Column and row totals
row Column Row Total
1 2 3 4 5 (R)
1 19.5 21.7 18.1 14.8 13.7 87.8
2 16.2 19.0 16.3 17.9 17.5 86.9
3 20.6 16.5 19.5 15.2 14.1 85.9
4 22.5 18.5 15.7 16.7 16.0 89.4
5 20.5 19.5 15.6 18.7 12.7 87
Column 99.3 95.2 85.2 83.3 74
total (C)
Table 6. Treatment and grand totals. Master plan/treatment wise arrangement of the results is as
follows:

Treatment Replication Trt total(T)


1 2 3 4 5
A 20.6 19.5 18.1 17.9 16.0 92.1
B 19.5 19.0 15.6 16.7 14.1 84.9
C 20.5 18.5 16.3 15.2 13.7 84.2
D 16.2 16.5 15.7 12.7 14.8 75.9
E 22.5 21.7 19.5 18.7 17.5 99.9
Grand total 437
40
Biometry Handout 2022
(G)

Computation of MS
Row MS = Row SS
DF row

Column MS= Column SS


DF column

Treatment MS = Treatment SS
DF treatment

Error MS = Error SS
DF error
Computation of F
F (treatment) = MS treatment
MS error
F (row) = MS row

41
Biometry Handout 2022
MS error
F (column) = MS column
MS error
ANOVA TABLE
S. V df SS MS F-comp F-tabulated
Column 4 80.73 20.18 37.37** 3.26(5%) 5.41 (1%)
Rows 4 1.36 0.34
Fertilizer 4 65.42 16.35 30.28**
Error 12 6.43 0.54
Total 24 253.94
Interpretation (Class discussion)
CV (%) =√EMS /grand mean x 100 = √0.54 /17.48 x 100 = 4.20
Experimental conditions are reliable.
S E = 0.33
LSD = 3.055 SQRT (2 x 0.54)/5
LSD at 1 per cent = 1.42
Means E D C B A
19.98 18.42 16.98 16.84 15.18
A 15.18 4.8* 3.24* 1.8* 1.66* -
B 16.84 3.14* 1.58* 0.14 - -
C 16.98 3.0* 1.44* - - -
D 18.42 1.38 - - - -
E 19.98 -
8 comparisons out of 10 comparisons are significant.
Dear students, what is your Interpretation and conclusion

Reading Assignments on
Mean comparison using DNMRT
Missing data estimation technique from single and two plots
3. CHAPTER THREE: TWO OR MORE FACTOR EXPERIMENTS
In chapter two we have seen experiments with a single factor and designs to be used in such
single factor experiments. This chapter illustrates the different two or more factor experiments,

42
Biometry Handout 2022
designs to be employed in such experiments and the effect of main factors and their interaction
on the performance of the test materials.
 Two factor experiments
Organisms are simultaneously exposed to many growth factors during their life time. Because
of the variation in organism’s response to any single factor with the level of the other factors
existence, single factors experiments are often criticized for their narrowness. Result of single
factor experiment is applicable only to the particular level in which the other factors were
maintained in the trial.
The differential response of organisms to the factor of interest under different levels of the
other factors encourages the use of factorial experiment, over single factor experiments, which
designed to handle simultaneously two or more variable factors. In factorial experiments
combinations of two or more levels of more than one factor are the treatments. That is factorial
Experiments involve more than one factor each at two or more levels. If the number of levels of
each factor in factorial experiment is the same, it is called symmetrical factorial. However, if
the level of one of the factors is greater than the other (s), it is called asymmetrical factorial or
sometimes mixed factorial.
Example of experiment involving two factors:
a) Nitrogen fertilizer at two levels denoted by N0 and N1
b) Irrigation at three levels (FC, 75%FC and 50%FC) is an asymmetrical factorial as level
of irrigation factor is greater than fertilizer level.
We can make the following six combinations which form treatments in factorial experiments.

Table 7. A 2x3 factorial combination of nitrogen fertilizer and irrigation level


Nitrogen Level Irrigation level
FC 75%FC 50%FC
0 (N0) N0FC N075%FC N050%FC
40 (N1) N1FC N175%FC N150%FC
Interaction between two factors

43
Biometry Handout 2022
Two factors are said to interact if the effect of one factor changes as the level of other factor
changes. Definition and description of the different effect due to factors such as simple, main
and interaction effects is very important to understand the effect of each factor and their
interaction on the test material.
A) Simple effect: - is the effect of one factor of different level at a given level of the other factor.
For example from table 7 the simple effect of nitrogen fertilizer at irrigation level of FC can be
expressed as:
 Simple effect of N at FC = N1FC-N0FC
 Simple effect of N at 75%FC = N175%FC-N075%FC
Similarly the simple effect of irrigation level at a given rate of fertilizer can also be expressed as:
 Simple of effect of 75%FC at N1 = 75%FCN1-FCN1
 Simple effect of 50%FC at N1 = 50%FCN1-FCN1
B) Main effect: - it is the average value of simple effects one of the factors at different levels of the other factor.
Example the main effect of N fertilizer is

Similarly the main effect of irrigation is given as:

C) Interaction effect:- is the interaction effect between factors A and factor B as a function of the difference between
the difference in simple effect of A at different levels of factor B or the difference between the simple effect of
factor B at different levels of factor A. for example the main effect of nitrogen fertilizer and irrigation level is
expressed as:

Or

Note:

1. An interaction effect between two factors can be measured only if the two factors are tested
together in the same experiment.
2. When interaction is absent the simple effect of a factor is the same for all levels of the other
factor and equals the main effect.

44
Biometry Handout 2022
A graphical representation of the different magnitude of interactions between factors is
presented below

Yield (t/ha) yield (t/ha)


4
Y
(a) 4 Y
(c)
X
3
3

2 2
X
1 1

0 0
N0 N1 N0 N1

4
(d)
4
(b) Y

X
3
3

2
2 X
1
1
Y
0
0
N0 N1
N0 N1

a) Shows no interaction between variety (X & Y) and fertilizer rates (N0 &N1, b) and c) shows
intermediate interactions and d) shows high interaction
When interaction is present the simple effect of a factor changes as the level of the other factor
changes. Consequently, the main effect is different from the simple effect.

Notation of factorial experiments


It is indicated that factorial experiment consists of all possible combinations of selected levels
in two or more factors. Factors in a general way denote by the letters A, B, C etc. unless
otherwise specified. And also several types of the notations of the levels of the factors are in

45
Biometry Handout 2022

use. These notations are always in codes. For example a0 and a1 are level notations for factor A
and similarly for the other factors. They represent both qualitative and quantitative levels.
0 and 1 are used as a code to represent levels. Factors A and B with two levels can be denoted
and written either as
 a0b0, a0b1, a1b0 and a1b1 or
 00, 01, 10 and 11 ------written in code
An experiment involving two factors, each at two levels, such as two varieties and two nitrogen
rates, is expressed as a 2x2 or 22 factorial experiments. For example if we consider two
varieties of wheat (A &B) and two rates of N fertiliser (0kg/ha & 60kg/ha), the four possible
combinations to be derived are:
Table 8. A 2x2 factorial combination of nitrogen fertilizer and irrigation level
Treatment No. Treatment combination
Variety N rate, kg/ha
1 A 0
2 A 60
3 B 0
4 B 60
This is a 22 factorial experiment
If another third factor is introduced into this experiment, say supplementary irrigation at two
levels, the experiment becomes 2x2x2 or a 23 factorial experiment with eight possible treatment
combinations.
Table 9. A 23 factorial combination of nitrogen fertilizer and irrigation level
Treatment No. Treatment combination
Variety N rate, kg/ha Supplementary irrigation
1 A 0 With
2 A 0 Without
3 A 60 With
4 A 60 Without
5 B 0 With
6 B 0 Without
7 B 60 With
8 B 60 Without
N: B. The term factorial describes the way in which treatments are formed but does not refer to
the design of an experiment.
Full and correct description of factorial experiment needs and inclusion of design used as for
example 23 factorial experiments in a randomised complete block design. Such description of
experiment also indicates the total number of treatments in a factorial experiment. For example,
the expression 23 indicated as the total number of treatments is eight in this case.
46
Biometry Handout 2022
A model that represents factorial experiment has a form of:

Advantage factorial experimentss


 Individual effects of each factor & their interaction effects can be studied simultaneously
 More information is obtained
 Can save the experimental resources
Disadvantage
When the number of factors or levels of factors or both are increased the number of treatment
combinations will also increase. Thus,
 More factors and levels increase treatment size
 Decrease homogeneity of experimental material
 Increased experimental error and loss of precision
 Execution of experiment and analysis is complex
3.2. Experimental Designs Used in Two or More Factorial Experiments
For most factorial experiments, the number of treatments is usually too large for an efficient
use of a complete block design. Incomplete block designs such as lattice design, which
sufficiently used for single factor experiments, are also not appropriate for such factorial
experiments. Hence special type of designs, developed specifically for factorial experiment that
are comparable to the incomplete block designs for single factor experiments are needed.
3.2.1. Factorials in Complete Block Designs
Nature of complete block design discussed for single factor experiments is applicable to a
factorial experiment. During randomization and layout ignore the factor composition of the
factorial treatments and consider all the treatments as if they are unrelated. During analysis of
variance using this design for such factorial experiments, partitioning of the treatments sum
squares into factorial components corresponding to the main effects of individual factors and to
their interaction needs special attention.
A step – by- step procedures for the analysis of variance of a two- factor experiment in RCBD
will be illustrated with an experiment involving five rates of nitrogen fertilizer and three rice
varieties each replicated four times.
Step 1. Denote the different factors with appropriate levels.
Factors involved and their level are:
 Variety (V1, V2, V3)
 Nitrogen level (0, 40, 70, 100, 130) kg/ha
47
Biometry Handout 2022
Table 10. The 3x5 factorial treatment combinations of three rice varieties and five nitrogen rates
Nitrogen Level Varieties
V1 V2 V3
0 (N0) N0V1 N0V2 N0V3
40 (N1) N1V1 N1V2 N1V3
70 (N2) N2V1 N2V2 N2V3
100 (N3) N3V1 N3V2 N3V3
130 (N4) N4V1 N4V2 N4V3
Table 11. Grain yield of three rice varieties tested with five levels of Nitrogen in a RCB design.
Grain yield t/ha
N Level Kg/ B 1 B2 B3 B4 Treatment total (T)
ha V1
N0 3.852 2.606 3.144 2.894 12.496
N1 4.788 4.936 4.562 4.608 18.984
N2 4.576 4.454 4.884 3.924 17.838
N3 6.034 5.276 5.906 5.652 22.868
N4 5.874 5.916 5.984 5.518 23.292
V2
N0 2.846 3.794 4.108 3.444 14.192
N1 4.956 5.128 4.150 4.990 19.224
N2 5.928 5.698 5.810 4.308 21.744
N3 5.664 5.362 6.458 5.474 22.958
N4 5.458 5.546 5.786 5.932 22.722
V3
N0 4.192 3.754 3.738 3.428 15.112
N1 5.250 4.582 4.896 4.286 19.014
N2 5.822 4.848 5.678 4.932 21.280
N3 5.888 5.524 6.042 4.756 22.210
N4 5.864 6.264 6.056 5.362 23.546
Rep. total 76.992 73.688 77.202 69.508
Grand Tot 297.390

Step 2. Construct the sketch outline of the analysis of variance as:


Source of var. DF Sum of Mean Computed Tabulated F at
squares square F 5% 1%
Replication r-1
Treatment ab-1
Variety (A) a-1
Nitrogen(B) b-1
(a-1)(b-1)
AxB
Error (r-1)(ab-
48
Biometry Handout 2022
1)
Total Rab-1
Step 3. Compute treatment total (T), replication (block) total (R) and grand total (G) and total
sum square, replication SS, treatment SS and error SS

Step 4. Following the same procedure in as in chapter two, computer the different mean squares and
the F-value.
Source of var. Degree of Sum of Mean Computed F Tabulated F at
freedom squares square 5% 1%
Replication 3 2.599 0.866 5.74** 2.83 4.29
Treatment 14 44.578 3.184 21.09** 1.94 2.54
Variety (A) 2
Nitrogen(B) 4
AxB 8
Error 42 6.353 0.151
Total 59 53.53
**=The variation between replication and treatments is significant at 1% level
Step5. Construct the factor by factor two-way table of totals, with factor A and factor B computed.

Table 12. The variety x Nitrogen table of totals from data in table 11
Variety total (AB) Nitrogen total
Nitrogen V1 V2 V3 (B)
N0 12.496 14.192 15.112 41.800
N1 18.894 19.224 19.014 57.132
N2 17.838 21.744 21.280 60.862
N3 22.868 22.958 22.210 68.036
N4 23.292 22.722 23.546 69.560
Variety total(A) 95.388 100.840 101.162 297.390
Step 6. Compute the three factorial components of the treatment sum of squares from table
12.as:
49
Biometry Handout 2022

Step7. Compute the mean square for each source of variation by dividing the SS by its
respective d.f

Step8 Compute the F-value for the three factorial components as

Step 9. Enter values computed in step 6 to 8 into table under step 4

Source of var. Degree of Sum of Mean Computed F Tabulated F at


freedom squares square 5% 1%
Replication 3 2.599 0.866 5.74** 2.83 4.29
Treatment 14 44.578 3.184 21.09** 1.94 2.54
Variety (A) 2 1.052 0.526 3.48* 3.22 5.15
Nitrogen(B) 4 41.234 10.308 68.26** 2.59 3.80
AxB 8 2.292 0.286 1.89ns 2.17 2.96
Error 42 6.353 0.151
Total 59 53.53

CV (%) =7.8

50
Biometry Handout 2022
**=The variation between replication and treatments is significant at 1% level, *= significant
at 5% level and ns= not significant
Step10. Compute the coefficient of variation

Summery of results: there is a substantial yield difference as the rate of nitrogen fertilizer
varies. The response of the three rice varieties to a particular rate of nitrogen fertilizer was also
very substantial. Yielding potential of each varieties increase as the dosage of fertilizer rate
increases from 0 to 130kg/ha. However, it is difficult to indicate whether there is a real
difference b/n the different rates of fertilizer in their impact on the yielding potential of the
crop. Hence in this case it is advisable to conduct further analysis of orthogonal comparison
or contrast analysis.

3.2.2. Split-Plot Design


Split-plot design is specifically designed for a two factor experiment with more treatments than
that can be accommodated by a complete block design. The structure of the design is divided
into two the main plot and the sub-plot and treatments allocated to the main and the sub-plot
are respectively called main plot factor and subplot factor. This mean each main plot becomes a
block for the subplot treatments (i.e. is the levels of the subplot factors)
Precision of split-plot design
With this design, the precision for the measurement of the effects of the main plot factor is
sacrificed to improve that of the subplot factor precision. Split-plot design measures the main
effect of the subplot factor and its interaction with the main plot factor more precisely than
RCBD. However, effect of main plot factor (i.e. the levels of the main plot factor) is more
precisely measured by Randomizes complete block design. This is because plot size and
precision of measurements of the effect are not the same for both factors with split-plot design.
Because of this difference in precision, assignment of factor to main plot and subplot is very
crucial in split plot design.
Considerations during factors assignment to experimental plots
I. Degree of precision
Level of precision required for a particular factor vary based on conditions. For example, in an
experiment involving variety and fertilizer rates as a factor, variety will be assigned to the
subplot by plant breeder as he/she wishes to see greater precision for varietal comparison than
51
Biometry Handout 2022
for fertilizer responses. On the other hand, an agronomist who wishes to study fertiliser
responses of some promising crop varieties would probably wants greater precision for
fertilizer response than varietal effect and assigns fertilizer to the subplot.
II. Relative size of the main effects
If the main effect of factor A is expected to be much larger and easier to detect than that of
factor B, factor A can be assigned to the main plot and factor B to the subplot. This increases
the chance of detecting the difference among factor B which has a smaller effect.
Example: in water stress level x variety experiment, variety will be assigned to the subplot and
stress level to the main plot.
III. Management practices
Cultural practice required by a factor may also dictate the use of large plot. For an instance, in
an experiment to evaluate water management x variety, it may desirable to assign water
management to the main plot and variety to the subplot
Such practices are used
 To minimize water movement between adjacent plots
 Facilitate simulation of the water level required
 Reduce border effects
Randomization and layout
In split-plot design both the procedures for randomization and analysis of variance involve
separating processes, one for the main plot and another for the subplot.

During randomization:
1. Main plot treatments are first assigned to the main plot in each replications
2. The subplot treatments are then assigned randomly to the subplot
3. Randomization is done following one of the randomization methods described in RCBD.
4. Represent the number of main plot treatments by a, number of subplot treatments by b
and replication numbers by r. These labels should be used consistently throughout the
experiment.
The steps in the randomization and layout of a split-plot design are shown below using a as the
number of main plot treatments, b as the number of subplot treatments and r as the number of
replications.

52
Biometry Handout 2022
Example: A researcher initiated a research to evaluate the combined effect of N-fertilizer and
irrigation water amount on the performance of maize. He used four irrigation level and six
levels of nitrogen each replicated three times. Yield of maize recorded at the end of the
experiment is given in table 13
 Decide on allocation of treatments to main plot and subplot
 Randomize both treatments in both main and subplots
 Test the hypothesis that nitrogen use efficiency of the crop is not affected by soil moisture
content.
 Test the reliability of the experimental conditions

Table 13. Grain yield data of maize from factorial experiment involving four irrigation level
and six levels of Nitrogen in a split-plot design with three replications
Grain yield, t/ha
Rep 1 Rep 2 Rep 3
Nitrogen level I0
N0 4.430 4.478 3.850
N1 3.944 5.314 3.660
N2 3.464 2.944 3.142
N3 4.126 4.482 4.836
N4 5.418 5.166 6.432
N5 6.502 5.858 5.586
I1
N0 4.768 6.004 5.556
53
Biometry Handout 2022
N1 5.192 4.604 4.652
N2 6.076 6.420 6.704
N3 6.008 6.127 6.642
N4 6.244 5.724 6.014
N5 4.546 5.744 4.146
I2
N0 6.462 7.056 6.680
N1 7.139 6.982 6.564
N2 5.792 5.880 6.370
N3 2.774 5.036 3.638
N4 7.290 7.848 7.552
N5 7.682 6.594 6.576
I3
N0 7.080 6.662 6.320
N1 1.414 1.960 2.766
N2 8.452 8.832 8.818
N3 6.228 7.387 6.006
N4 5.594 7.122 5.480
N5 2.248 1.380 2.014
Approaches to the problem
1. Divide the experimental area into r= 3 blocks, each of which is further divided into a =
4 main plots. Here irrigation level needs greater precision and has greater effect as compared to
fertilizer
M

Rep I Rep II
a

M
Rep III a
2. Follow the RCB
randomization procedure with a = 4 replications randomly assign the 4 irrigation level
treatments to the 4 main plots in each of the 3 blocks

I3 I2 I0 I I0 I3 I1 I3 I

1 I2 I0 2

Rep I Rep II Rep III


Divide each of the (r) (a) = 12 main plots into b= 6 subplots and following the RCB
randomization procedure for b= 4 treatments and (r)(a) = 18 replication, randomly assign the 6
nitrogen levels to the 6 subplots in each of the 18 main plots.
I3 I2 I0 I1 I1 I0 I2 I3 I0 I1 I3 I2

54
Biometry Handout 2022

N2 N3 N1 N4 N4 N2 N3 N1 N3 N5 N4 N1
N5 N4 N2 N1 N1 N5 N4 N2 N4 N0 N1 N2
N3 N0 N0 N5 N3 N2 N5 N0 N2 N1 N3 N5
N0 N1 N3 N2 R N1 N0 N2 N5 N5 N2 N0 N3 ep I
N1 N5 N4 N3 N5 N4 N0 N3 N0 N3 N2 N4
N4 N2 N5 N0 N0 N3 N1 N4 N1 N4 N5 N0
Rep II
Rep III
Note: Important features of field layout of split-plot design
1. The size of the main plot is b times the size of the subplot. In this example the size of
the main plot is 6 times the size of the subplot
2. Each main plot treatment is tested t times while each subplot is tested (a) (r) times
which indicates subplot treatment test is always larger than the main plot. It is the reason for
more precision for subplot treatment than the main plot treatment.
Analysis of variance for split-plot design
Analysis of variance for split plot design has two components. The main plot analysis
and subplot analysis
Approaches to analysis of variance
Denote the main plot factor as A and subplot factor as B and do the different
computations.

1. construct an outline for the analysis of variance for a split-plot design as:

Source of DF Sum of Mean square Computed F Tab F


variation squares 5% 1%
Replication r-1
Main-plot factor a-1
(A)
Error (a) (r-1)(a-1)
Subplot factor b-1
(B)
AxB (a-1)(b-1)
Error (b) a(r-1)(b-1)
Total rab-1
Note: The ANOVA of a split plot design is divided into main-plot analysis and subplot analysis

55
Biometry Handout 2022
a) Construct tables of totals. Main plot factor against replication and main plot factor against
subplot factor
Table 14. The replication x irrigation level table of yield total computed from data in table
13
Irrigation Yield total (RA)(t/ha) Irrigation
total(A)
Rep I Rep II Rep III
I0 27.884 28.242 27.506 83.632
I1 32.834 34.623 33.714 101.171
I2 37.139 39.396 37.38 113.915
I3 31.016 33.343 31.404 95.763
Replication 128.873 135.604 130.004
total(R)
Grand total 394.481
b) Construct a
table of factor A and factor B

Irrigation Yield total (AB)(t/ha) Irrigation


N0 N1 N2 N3 N4 N5 total(A)
I0 12.758 12.918 9.55 13.444 17.016 17.946 83.632
I1 16.328 14.448 19.2 18.777 17.982 14.436 101.171
I2 20.198 20.685 18.042 11.448 22.69 20.852 113.915
I3 20.062 6.14 26.102 19.621 18.196 5.642 95.763
Nitrogen 69.346 54.191 72.894 63.29 75.884 58.876
total(B)
Ground 394.481
total

1. compute the correction factor and sum of squares for the main plot analysis as:

56
Biometry Handout 2022

2. Compute the sum of squares for the subplot analysis as:

3. For each source of variation, compute the mean square by dividing the SS by its
corresponding d.f.:

57
Biometry Handout 2022
4. Compute F-value for each effect that need to be tested, by dividing each mean square by
its corresponding error term:

Obtain the corresponding F-value for the computed F from Appendix E (Gomez and Gomez,
1984)
5. calculate coefficient of variation

Fill ANOVA table with values calculated for main plot and subplots

Source of variation DF Sum of squares Mean square Computed F Tabulated F


5% 1%
Replication 2 1.082 0.541
Main-plot factor (A) 3 25.291 8.43 193.8** 8.94 27.91
Error (a) 6 0.262 1.475
Subplot factor (b) 5 29.465 5.893 3.99ns 4.46 9.29
AxB 15 133.904 8.926 6.05** 2.21 3.12
Error (b) 40 14.745 1.474
Total 71 204.748
Result summary: For each level of irrigation water increasing nitrogen level may not
necessarily increase the productivity of maize as the effect of nitrogen is non-significant. The
result showed inconsistency of yield recorded as the rate of fertilizer increase in each irrigation
level and the overall total yield as a response of nitrogen. The value of some yield figures
suggests as some errors committed during data recording which is confirmed by large CV (b)
value (22.2%), which shows unreliability of experimental conditions in the subplot.

58
Biometry Handout 2022
Dear students discuss the results by comparing the treatments and draw your conclusion.
Exercise:
The response of six varieties of lettuce, grown in frames, to various uncovering dates was
investigated in a split plot experiment with four blocks. The main plot treatments were three
uncovering dates and each main plot was split into six split plots for the six varieties. The final
recorded yield was given in table 15

Table 15. Yield data of Lettuce from factorial experiment involving three uncovering dates
and six lettuce varieties in a split-plot design within four blocks.
Uncovering date Block
Variety I II III IV
A 11.8 7.5 9.7 6.4
B 8.3 8.4 118 8.5
X C 9.2 10.6 11.4 7.2
D 15.6 10.8 10.3 14.7
E 16.2 11.2 14.0 11.5
F 9.9 10.8 4.8 9.8
A 9.7 8.8 12.5 9.4
B 5.4 12.9 11.2 7.8
C 12.1 15.6 7.6 9.4
Y D 13.2 11.3 11.0 10.7
E 16.5 11.1 10.8 8.5
F 12.5 14.3 15.9 7.5
A 7.0 9.1 7.1 6.3
B 5.7 8.4 6.1 8.8
Z C 3.3 6.9 1.0 2.6
D 12.6 15.4 14.2 11.3
E 12.6 12.3 14.4 14.1
F 10.2 11.6 10.4 12.2
59
Biometry Handout 2022
I. Formulate a possible hypothesis on behave of the research
II. Do the analysis of variance for this experiment and test your hypothesis
III. Test the reliability of the experimental conditions
IV. Write a short summery of your final result
4. CHAPTER FOUR: INCOMPLETE BLOCK DESIGNS
Theoretically, the complete block design, such as the CRD, RCBD and LSD are applicable to
experiments with any number of treatments. However, those complete block designs become
less efficient as the number of treatments increases, primarily because block size increase
proportionally with the number of treatments, and the homogeneity of experimental plots
within a large block is difficult to maintain. That is, the experimental error of a complete block
design is generally expected to be increased with the number of treatments.
An alternative set of designs for single factor experiments having a large number of treatments
is incomplete block designs, one of which is the lattice design. As the name implies, each block
in an incomplete block design does not contain all treatments and a reasonably small block size
can be maintained even if the number of treatment is large. With smaller blocks, the
homogeneity of experimental plots or units in the same block is easier to maintain and a higher
degree of precision can generally be expected.
The improved precision with the use of an incomplete block design is achieved with some
disadvantages (costs). The major ones are:
 Inflexible number of treatments or replications or both
 Unequal degree of precision in the comparison of treatment means
 Complex data analysis, especially data with missing value.
Although there is no concrete rule as to how large the number of treatments should be before
the use of an incomplete block design should be considered, the following guidelines may be
helpful:
Variability in the Experimental Material: The advantage of an incomplete block design over
complete block design is enhanced by an increased variability in the experimental material. In
general, whenever block size in
RCB design is too large to maintain a reasonable level of uniformity among experimental units
within the same block, the use of an incomplete block design should be seriously considered.
Computing Facilities and Services: Data analysis for an incomplete block design is more
complex than that for a complete block design. Thus, in situations where adequate computing

60
Biometry Handout 2022
facilities and services are not easily available, incomplete block designs may have to be
considered only as the last measure.
In general, an incomplete block design with its reduced block size is expected to give a higher
degree of precision than complete block design. Thus the use of an incomplete block design
should generally be preferred so long as the resources required for its use (example more
replications, inflexible number of treatments, and more complex data analysis) can be satisfied
1. Lattice Design
The lattice design is the incomplete block design most commonly used in field research. There
is sufficient flexibility in the design to make its application simpler than most of other
incomplete block designs. This section is primarily devoted to two of the most commonly used
lattice designs, the balanced and partially balanced lattice designs. Both require that number of
treatment must be a perfect square.
1.1 B1anced Lattice Design

The balanced lattice design is characterized by the following basic features:


1. The number of treatments (t) must be a perfect square (i.e., t = k 2), such as 25,
36,49,64,81, 100 etc. Although, this requirement may be seen tough at first, it is usually easy to
satisfy in practice. As the number of treatments becomes large, adding a few more or
eliminating some less important treatments is usually easy to accomplish. For example, if a
plant breeder wishes to test the performance of 80 varieties in a balanced lattice design, all he
needs to do is add one more variety to make a perfect square. Or if he has 82 or 83 varieties, he
can easily eliminate one or two varieties.
1. The block size (k) is equal to the square root of the number of treatments (i.e., k = t1/2).
2. The number of replications (r) is one more than the number of block size (i.e. r = k+1).
That is the number of replications required is 6 for 25 treatments; 7 for 36 treatments; 9 for
64 treatments, and so on.

Randomization: here, we can see how randomization of nine treatments with four
replications, each consists of three incomplete blocks with each block containing three
experimental units.
In this example the basic balanced lattice design plan for 3x3 balanced lattice designs is shown
below (appendix L of Gomez and Gomez book as a reference).

61
Biometry Handout 2022

Basic plan of a 3x3 balanced lattice design involving 9 treatments in blocks of three and four
replications Treatment number
Incomplete block REP1 REP2 REP3 REP4
1 1,2,3 1,4,7 1,5,9 1,6,8
2 4,5,6 2,5,8 2,6,7 2,4,9
3 7,8,9 3,6,9 3,4,8 3,5,7

The final field plan (lay out) of a 3x3 quadruple lattice design

Block 1 5 2 8
Block 1 5 4 6
Block 2 6 9 3
Block 2 2 3 1
Block 3 4 1 7
Block 3 8 9 7
Rep I Rep II

Block 1 8 1 6 Block 1 9 5 1
Block 2 9 2 4 Block 2 4 3 8
Block 3 7 3 5 Block 3 2 7 6
Rep III Rep IV

If you want to know more on the randomization and lay out procedures you can refer to Gomez
and Gomez text book.
Analysis of Variance: There are four sources of variations that can be accounted for in a
balanced lattice design: replication, treatment, incomplete block and experimental error.
Relative to the RCBD the incomplete block is an additional source of variation and reflects the
differences among incomplete blocks of the same replication.
1.2 Partially Balanced Lattice Design
The partially balanced lattice design is similar to the balanced lattice design but allows for a
more flexible choice of the number of replications. While the partially balanced latticed
design requires that the number of treatments must be a perfect square and that of the block
size is equal to the square root of the treatment number, the number of replication is not
prescribed as a function of the number of treatments. In fact, any number of replications can
be used in partially balanced lattice design.

62
Biometry Handout 2022
With two replications, the partially balanced lattice design is referred to as simple lattice
design; with three replication, a triple lattice design; with four replications, a quadruple
design; and so on.
Randomization: The procedure of randomization similar to the procedures used in balanced
lattice design, except for the modification of the number of replications. Any number of
replication can be used in partially balanced lattice.
Analysis of Variance: The analyses of variance are aimed at partitioning the total sum squares
into possible sources of variations and test the significance of treatment means. The procedure
of analysis of variance in partially balanced lattice is similar to that balanced lattice design. The
sources of variations in partially balanced lattice design are replications, treatments unadjusted,
Treatment adjusted, block adjusted and experimental error.
2. Group Balanced Block Design
The primary feature of the group balanced block design is the grouping of treatments into
homogenous blocks based on selected characteristics of the treatments. Whereas the lattice
design achieve homogeneity within blocks by grouping experimental plots based on some
known patterns of heterogeneity in the experimental area, the group balanced block design
achieves the same objective by grouping treatments based on some known characteristics of
the treatments.
In a group balanced block design, treatments belonging to the same group are always tested in
the same block, but those belonging to different groups are never tested together in the same
block. Hence, the precision with which the different treatments are compared is not the same
for all comparisons. Treatments belonging to the same group are compared with a higher
degree of precision than those belonging to different groups. The group balanced block design
is commonly used in variety trials where varieties with similar morphological characters are put
together in the same group. Two of the most commonly used criteria for grouping of varieties
are:
 Plant height, in order to avoid the expected large competition effects when plants with
widely different height are grown in adjacent plots.
 Growth duration, in order to minimize competition effects and to facilitate harvest
operation

63
Biometry Handout 2022
Another type of trial using group balanced block design is that involving chemical insect
control in which treatments may be subdivided into similar spray operations to facilitate the
field applications of chemicals.
Analysis of variance- Analysis of variance in group balanced block design follows similar
procedure as we discussed so far in others designs.

5. CHAPTER FIVE: MANAGEMENT OF PROBLEM DATA


5.1. Missing Data
Analysis of variance is valid for only if the basic research data satisfy certain conditions. Some
of these conditions are implied, others are specified. In field experiments, for example implied
that all plots are grown successfully and all the necessary data are taken and recorded. In
addition, it is specified, that the data satisfy all the mathematical assumptions underlying the
analysis of variance. In this lecture, we examine a group of data problem that is commonly
encountered in agricultural research. This problem is known as missing data.
Missing data situation occurs whenever a valid observation is not available for any one of the
experimental units. Occurrence of missing data results in two major difficulties- loss of
information and non applicability of the standard analysis of variance.
Common causes of missing data:
Though data gathering in field experiments is usually done with extreme care, numerous factors
beyond the researcher’s control can contribute to missing data. Some of these causes are:
1. Improper treatment: is declared when an experiment has one or more experimental plots
that do not receive the intended treatment. Non application, application of an incorrect dose,
and wrong timing of application are common cases of improper treatment. Thus any
observation made on a plot where treatment has not been properly applied should be considered
invalid.
2. Destruction of experimental plants: most field experiments aim for a perfect stand in all
experimental plots but this is not always achieved due to poor germination, physical damage
during crop culturing and pest damages. In this case, considering plots as missing plot is
important.
3.Loss of harvested samples: during or after harvesting one or more plot samples may be lost
due to improper handling of the research data both in the field, store and laboratory.

64
Biometry Handout 2022
4. Illogical data: some illogical data may appear in the data set due to incorrect recording,
transcription and encoding and decoding of data. If the situation exists, such illogical data
should be avoided and the corresponding plots should be treated as missing values.
5.1.1. Missing Data Technique in RCBD
In randomized complete block design: the missing data is estimated as:
X = (rR’ + tT’ –G’)/ ((r-1) (t-1)) N.B- (r-1) (t-1) means DF for Error
Where, X - is missing value estimator,
“r”- is no. of replications,
“t” -is no. of treatments,
R’ -is total of available values of the block or replication that contains the missing value,
T’ -is total of available values of the treatment that contains the missing value and
G’- is grand total of all available values
Note: Modifications
1. SS computation is carried out as usual but Total SS & Treatment SS is to be corrected by
subtracting the upward bias (denoted as B) and B is given by
B = (R’ – (t-1) X) 2)/ (t (t-1))
2. DF (Total and Error) is corrected by subtracting one from the actual DF Total and DF Error,
i.e., (rt-1)-1 and ((t-1)(r-1)) -1, respectively
3. LSD/CD = (EMS/r (2 + (t/ ((r-1) (t-1))) 1/2
Example 1: Consider data generated using RCBD and given below with the value for T2 is
missing in rep III. Analyze the data and give your interpretation.
Steps in analysis
1. Calculate treatment and replication totals that contain the missing value & G’.
Replications
Treatments
I II III IV Treatment Total

1 22.9 25.9 39.1 33.9 121.8

2 29.5 30.4 X 29.6 T’=89.5

3 28.8 24.4 32.1 28.6 113.9

4 47.0 40.9 42.8 32.1 162.8

5 28.9 20.4 21.1 31.8 102.2

Replication Total 157.1 142 R’=135.1 156 G’=590.2

2. Estimate the missed value (X)


X = ((rR’ + tT’) –G’)/DF Error
= ((4 x 135.1 + 5 x 89.5) – 590.2)/12= 33.14

65
Biometry Handout 2022
3. Substitute “X” estimated value in Treatment 2 & RIII and find new T2 , RIII and G values
T2 = 89.5 + 33.14 = 122.64
RIII= 135.1 + 33.14 = 168.24
G = 590.2 + 33.14 = 623.34

Replications
Treatments Treatment
I II III IV
Total
1 22.9 25.9 39.1 33.9 121.8
2 29.5 30.4 33.14 29.6 122.64
3 28.8 24.4 32.1 28.6 113.9
4 47.0 40.9 42.8 32.1 162.8
5 28.9 20.4 21.1 31.8 102.2
Replication Total 157.1 142 168.24 156 G= 623.34
4. Construct ANOVA Table
5. Calculate DF (Here subtract one from the total and error DF).
Total DF = rt-1-1= rt-2 = (4x5)-2 = 18
Treatment DF = t-1 = 5-1 = 4
Replication DF = r-1 = 4-1 = 3
Error DF = ((r-1) (t-1)) – 1 = 12 -1 = 11
6. Determine upward bias (B)
B = ((R’ – (t-1) X) 2)/ (t (t-1))
= 0.322
7. Calculate SS
CF = G2/N = (623.34)2/20 = 19427.64
Total SS = ∑ (yij) 2 - CF – B
= 20367.12 -19427.64 – 0.322 = 939.16
Treatment SS = ∑ (Ti) 2/r - CF – B
= (79797.7/4)-19427.64-0.322 = 521.46
Replication SS = ∑ (Bi) 2/t – CF
=19497.02-19427.64 = 69.38
Error SS = Total SS – Treatment SS – Replication SS = 348.32
8. Calculate MS
Treatment MS = Treatment SS/Treatment DF = 130.37
66
Biometry Handout 2022
EMS = ESS/EDF = 31.66
9. F calculated for the treatments = Treatment MS/EMS = 4.12
10. F tabulated for the treatments = 3.36 (5%) and 5.67 (1%)
11. Compute the ANOVA table
Source of DF SS MS F- F-Tabulated
Variation Calculated 5% 1%
Treatment 4 521.46 130.37 4.12* 3.36 5.67
Replication 3 69.38
Error 11 348.32 31.66
Total 18 939.16
12. Decision: Accept H1 since there is significant difference between the treatments at 5%
level of significance.
13. LSD5% = tα at error DF. ((EMS/r). (2 + (t/ (t-1) (r-1)) 1/2 = 10.36
14. Compare treatment means
Treatment 5 3 1 2 4
Mean 102.2 113.9 121.8 122.64 162.8
Treatment 4 is significantly different from the other treatments
5.1.2. Missing Data Techniques in LSD
The missing data in a Latin square design is estimated as:
X = (t (Ro + Co + To) -2Go)/ (t-1) (t-2)
Where t= number of treatments
Ro = total of observed values of the row that contains the missing data
Co = total of observed values of the column that contains the missing data
To = total of observed values of the treatment that contains the missing data
Go = grand total of all observed values
Modifications
Subtract one from both the total and error degree of freedom
Compute the correction factor for bias B as:
B = [Go-Ro-Co-(t-1) To] 2/ [(t-1) (t-2)] 2
And subtract this computed B value from the treatment SS and the total SS.
Mean comparison
To compare the mean of the missing treatment with other treatments
CD = tα at 5% × √EMSS (2/t + 1/ (t-1) (t-2)
To compare the means of the other treatments

67
Biometry Handout 2022
CD = tα at 5% × √ (2EMSS)/t
Example 1: The following data are an observation corresponding to six varieties (V1, V2, V3,
V4, V5 and V6) of finger millet tested at Gambella agricultural research center for their yield
differences using Latin square design (LSD). One of the observations in column 3 and row 2
was completely damaged by root aphid and the data was considered as a missing data. Estimate
this missing value and analyze the data to arrive at valid conclusion.
Rows/ 6 Row
1 2 3 4 5
Columns total
1 V3 (21.53) V1(14.83) V4(11.28) V6(28.13) V2(18.26) V5(10.14) 104.17
Ro =
2 V2(28.23) V6(33.16) V3(X) V5(18.21) V1(24.16) V4(14.13)
117.89
3 V5 (12.46) V3(31.13) V6(28.22) V2(13.14) V4(17.16) V1(20.22) 122.30
4 V4 (13.33) V2(24.12) V5 (18.26) V1(16.13) V3(24.43) V6(23.12) 119.39
5 V1(18.12) V5(11.13) V2(20.13) V4(22.42) V6(24.16) V3(17.13) 113.09
6 V6(26.13) V4(23.26) V1(26.24) V3(24.17) V5(19.43) V2(18.13) 137.36
Column Co 102.87 Go =
119.80 137.63 122.20 127.60
total =104.13 714.23
Solutions
Given:
t = 6, Ro = 117.89, Co = 104.13, To = 118.39, G0 = 714.23
X = (t (Ro + Co + To) -2Go)/ (t-1) (t-2)
= (6 (117.89 + 104.13 + 118.39) – 2 (714.23))/ (6-1) (6-2)
= (2042.46 – 1428.46)/20 = 30.7
B = [Go-Ro-Co-(t-1) To] 2/ [(t-1) (t-2)] 2
= [714.23-117.89-104.13-(6-1)118.39]2/[(6-1)(6-2)]2
= 9948.06/400 = 24.87

Data after missing plot is calculated


Rows/ 6
1 2 3 4 5 Row total
Columns
1 V3 V1 V4(11.28) V6(28.13) V2(18.26) V5(10.14) 104.17
68
Biometry Handout 2022
(21.53) (14.83)
V6(33.16
2 V2(28.23) V3(30.7) V5(18.21) V1(24.16) V4(14.13) 148.6
)
V5 V3(31.13
3 V6(28.22) V2(13.14) V4(17.16) V1(20.22) 122.30
(12.46) )
V4 V2(24.12 V5
4 V1(16.13) V3(24.43) V6 (23.12) 119.39
(13.33) ) (18.26)
V5(11.13
5 V1(18.12) V2(20.13) V4(22.42) V6(24.16) V3(17.13) 113.09
)
V4(23.26
6 V6(26.13) V1(26.24) V3(24.17) V5(19.43) V2(18.13) 137.36
)
Column 102.87 G=
119.80 137.63 134.83 122.20 127.60
total 744.93

Treatment: V1 V2 V3 V4 V5 V6
Treatment Total: 119.70 122.01 149.09 101.58 89.63 162.92
Treatment Mean: 19.95 20.33 24.84 16.93 14.94 27.15
Analysis:
DF rows = DF columns = DF treatments = t-1 = 6-1 = 5
DF total = t2-1 = 62-1 = 35 (since the data has missing value, the DF will be 35-1 = 34)
DF error = DF total – (DF rows + DF columns + DF treatments)
= 35 – (5 + 5 + 5)
= 20 (since the data has missing value, the DF will be 20 – 1 = 19)
CF = (GT) 2/t2
= (744.93)2/6x6 = 15,414.46
Row SS = (∑ (RI) 2)/t) - CF; where i = 1, 2 ….r
= (∑ ((R1 total) 2 + (R2 total) 2 + …..+ (R6 total) 2)/t) - CF
= (((104.17)2 + (148.66)2 + …. + (137.36)2)/6) – 15,414.46 = 192.46
Column SS = (∑ (Cj) 2)/t) - CF; where j = 1, 2 ….c
= (∑ ((C1 total) 2 + (C2 total) 2 + …..+ (C6 total) 2)/t) - CF
= (((119.80)2 + (137.63)2 + …. + (102.87)2)/6) – 15414.46 = 130.54
Treatment SS = (∑ (Tk) 2)/t) – CF-B; where k = 1, 2 ….t
= (∑ ((T1 total) 2 + (T2 total) 2 + …..+ (T6 total) 2)/t) – CF-B
= (((119.70)2 + (122.01)2 + …. + (162.92)2)/6) – 15,414.46-24.87 = 616.89
Total SS = (∑ (Yijk) 2)) – CF-B; where i = 1, 2 ….r; j = 1, 2 …c; k = 1, 2 …t
= (∑( (Y111 total)2 + (Y112 total)2 + …..+ (Y666 total) 2)) – CF-B
= (((21.53)2 + (14.83)2 + …. + (18.13)2)) – 15,414.46-24.87

69
Biometry Handout 2022
= 16713.2-15,414.46-24.87 = 1,273.87
Error SS = Total SS – (Column SS + Row SS + Treatment SS)
= 1,273.87 – (130.54 + 192.46 + 616.89) = 333.98
ANOVA Table
Sources of F F tabulated
DF SS MSS 5% 1%
variation calculated
Between rows 5 192.46 38.49 2.19NS
Between
5 130.54 26.11 1.49NS
columns
Between 2.74 4.17
5 616.89 123.38 7.29**
treatments
Error 19* 333.98 17.58
Total 34* 1,273.87
**there is a highly significant difference between the treatments
NS there is no significant difference between the rows and the columns
Decision
F cal > F table for the treatments, thus there is a significant difference between the
treatments.
There is no significant difference between both rows and columns.
Mean comparison for the treatments
SE = √EMSS (2/t + 1/(t-1) (t-2)
= √17.58 (2/6 + 1/ (6-1) (6-2)
= 2.60
CV = (√EMSS (2/t + 1/ (t-1) (t-2)/GM) ×100 GM = GT/N = 744.93/36 = 20.69
= (2.60/20.69)×100
= 12.56%
CD
To compare the mean of the missing treatment with others
CD = tα at 5% × √EMSS (2/t + 1/(r-1) (r-2)
= 2.093×2.60
= 5.44, any treatment mean differing by an amount more than 5.44 from the treatment
with missing value is significantly different from this treatment.
To compare the means of the other treatments
CD = tα at 5% × √ (2EMSS)/t
= 2.093 × √ (2*17.58)/6
70
Biometry Handout 2022
= 5.06, any treatment mean differing by an amount more than 5.06 is significantly different.
Treatment means are arranged from the smallest to the largest value as follows:
Treatment: V5 V4 V1 V2 V3 V6
Treatment Total: 89.63 101.58 119.70 122.01 149.09 62.92
Treatment Mean: 14.94 16.93 19.95 20.33 24.84 27.15
Conclusion: There is no significant difference between V5, V4 and V1; V4, V1 and V2; V1,
V2 and V3; V3 and V6. V6 is significantly different from V2, V1; V4 and V5.
5.2. Data Transformation
When data violet the assumption of the analysis of variance (normality, independence,
randomness, additively and homogeneity), there may be two possibilities: either transforming
the data to fulfill the assumptions or to use non-parametric statistics. The disadvantage of using
a non-parametric statistics is that it is less powerful. However, for most biological science
experiments where there are two-way and three-way analysis of variance and regression, there
is no straightforward equivalent to non-parametric statistics (Townend, 2002). Therefore,
transforming the data is the only option although interpretation of the results of transformed
data is more difficult.
To transform the data, it is important to start with the measured values (x) and perform
transformation to get new set of values (x’). The analysis of variance or regression is then
carried out on these new sets of values. Some of the most commonly used transformations are
described below.
5.2.1. Log Transformation
It is the most commonly used method. It is usually used for data that violet normality
(positively skewed distribution). The new value,
x’ = log10(x+1).
If the measurement has zero value, it is necessary to add one to all values before applying the
log transformation for the reason that natural logarithm of zero is undefined. Adding one to
each value does not affect the outcome (variance) but it is necessary to deduct the same value at
the end of the analysis from the mean.
5.2.2. The square root transformation
It is given by x’ = √x. This kind of transformation is used when there are data from Poisson
distribution (data from counts).

71
Biometry Handout 2022
5.2.3. Angular or Arcsine Transformation
It is often used when each value in the data set is a proportion or percentage of something. It
only applies where the minimum and maximum values are 0 and 1 (or 0 and 100%). If the
original data is expressed as percentages, then x must be converted to values between 0 and 1
before transforming (eg. 60% = 0.6). Then,
x’ = sin-1√x or x’ = arcsin√x.
5.2.4. Box-Cox Transformation
It is given by x’ = (xλ-1)/λ (if λ≠0) or x’ = ln x (if λ=0). It looks unfriendly but it has the
advantage that it is possible to calculate the value of λ which best achieves a normal
distribution. This is not a straightforward to do by hand, but a function is available on some
computer packages to calculate the best value of λ. Once the value of λ is obtained, it is fairly
easier to put it in the above formula to get the transformed values.
After the transformed values are generated, it is necessary to check the new values meet the
assumptions by graphing for normality, for example. If the transformed values do not meet the
assumptions, it is better to go back to the original data and try another type of transformation. If
none of the above transformations do not work, it is better to try another simple formula like
cube root transformation x’ = 3√x, 1/x or 1/x2. It is not always possible to achieve a normal
distribution using simple transformation, especially when the data have more than one peak
(multi-modal distribution). This probably indicates that the population is actually made up of
two or more populations which would be better treated as separate populations in the analysis.

6. CHAPTER SIX: CORRELATION AND REGRESSION


Correlation is used to study the relationship between two types of measurements made on the
same individuals. Correlation should not be used without first examining the data using a
scatter diagram. The scatter diagram provides a visual impression of the nature of relation
between two variables (x and y) in a bivariate data set. In many cases, the points appear to band

72
Biometry Handout 2022
around a straight line. The visual impression of the closeness of the scatter to a linear relation
can be quantified by calculating a numerical measure, called correlation coefficient.
The magnitude and direction of the relationships between two or more variables are measured
by statistical tool known as correlation. To say there is a correlation between variables, the
change in one variable(s) affects the change in the other variable(s). When the change in the
variables is in the same direction (increase/decrease of one variable is associated with increase/
decrease of the other variable) the variables are said to be positively correlated (there is a
direct correlation). While when the increase/decrease of one variable is associated with
decease/increase of the other variable, the variables are said to be negatively correlated
(inverse correlation). For example income and expenditure are positively correlated while pest
incidence and yield of a crop are inversely correlated.
Correlation is the degree of relationship between variables. There are two broad categories of
variables. They are independent variables (treatments and extraneous factors) and dependent
variables (response variables). Correlation explains the relationship/association between these
variables. In general, we can say correlation is the association between two variables (one is
dependent and another is independent variable). The correlation may be positive or negative.
For instance, the number of wheat grains will vary with the length of ear heads i.e. Longer ear
heads (spikes) generally yield more number of grains per ear head and smaller ear heads yields
less number of grains. Or in estimating the moisture content in grains, it is observed to vary
with temperature, greater percentage of moisture at lower temperature and lower percentage of
moisture at higher temperature. The two characteristics in the former example clearly bear a
positive relationship to each other (between size of ear head and number of grains per ear
head); while in the latter example, the two characteristics bear an opposite relationship to each
other (i.e. there is negative relationship between the two variables).
Based on the number of variables involved there are two types of correlation. When only two
variables are involved, we speak of simple correlation and when more than two variables are
involved, we speak of multiple correlations. Based on the type of the changes in the variables
there are also two types of correlation. When the change in the variables is constant the
relationship is said to be linear otherwise non-linear. The scope of this chapter is to address
simple linear correlation and regression analysis.
Multiple correlations: Multiple correlations are useful in prediction problems when one is
interested in predicting, say the yield from the simultaneous knowledge about several

73
Biometry Handout 2022
characteristics of the plants such as plant height, number of productive tillers, type of variety,
etc.
Correlation Coefficient- A correlation coefficient is a statistical measure which indicates both
nature and degree of linear relationship between two measurable characteristics (variables). The
two variables are denoted by X and Y. Correlation coefficient is denoted by the symbol ‘r’ for a
sample and is called Pearson’s correlation coefficient, named after a person who developed
the formula Karl Person.
The value of r is always between –1 and +1. The magnitude of r indicates the strength of a
linear relation, whereas its sign indicates the direction. More specifically, r >0 if the pattern of
x and y values is a band that runs from the lower left to upper right, r < 0 if the pattern of x and
y values is a band that runs from the upper left to lower right, r = +1 if all x and y values lie
exactly on a straight line with positive slope (perfect positive relation) and r = -1 if all x and y
values lie exactly on a straight line with negative slope (perfect negative relation). A high
numerical value of r, closer to either +1 or -1 represents a strong relationship. A value of r
closer to zero means that the linear association is very weak or nil.
The correlation coefficient is closer to zero when there is no visible pattern of relation; that is,
the y value does not change in any direction as the x values change. A value of zero could also
happen because the points band around the curve that is far from linear (may be curvilinear).
When pairs of measurements (X, Y) are available for each individual, the value of ‘r’ is given
by the formula
ΣXY - (ΣX) (ΣY)
n
r= _____________________________
√ (ΣX2 – (ΣX) 2/n) (ΣY2 – (ΣY) 2/n)
Where,
r is simple correction coefficient
ΣX is sum of X Values
ΣY is sum of Y Values
ΣX2 is sum of squares of X values
ΣY2 is sum of square of Y values
ΣXY is sum of products of X and Y values)
(ΣX) 2 is the square of the sum of X values
(ΣY) 2 is the square of the sum of Y values
74
Biometry Handout 2022
n is number of pairs of observations

Y (r = 0) Y (r = -1) Y (r = +1)
x x x x x
x x x x x
x x x x x
x x x x x
X X X
Fig 2: Zero Correlation Fig 3: Negative Correlation Fig 4: Positive Correlation
Procedure for test of significance of “r” value
Step 1: Compute the ∑X, ∑Y, ∑X2, ∑Y2, ∑XY, (∑X) 2 and (∑Y) 2
Step 2: Compute the simple linear correlation coefficient “r” by using the formula
Step 3: Test the significance of the simple linear correlation coefficient by comparing the
computed “r” value of step 2 to the tabular “r” value of the appendix with (n-2) degrees of
freedom. The simple linear correlation coefficient is declared significant at the α level of
significance if the absolute value of the computed “r” value is greater than the corresponding
tabular “r” value at α level of significance.
Measurement on two variables (x and y) for calculating correlation, r

_______________________________

x y x2 y2 xy

_______________________________
2 5 4 25 10
1 3 1 9 3
5 6 25 36 30
0 2 0 4 0
________________________________
Total 8 16 30 74 43

∑x ∑y ∑x2 ∑y2 ∑xy

sxx = , syy = and sxy =

75
Biometry Handout 2022

r = ___________________________

= 0.93
OR A second option to compute r is shown below for this data

x y x- y- (x- )2 (y- )2 (x- )(y- )

2 5 0 1 0 1 0
1 3 -1 -1 1 1 1
5 6 3 2 9 4 6
0 2 -2 -2 4 4 4

Total 8 16 0 0 14 10 11
=2 =4 sxx syy sxy

r =

= 0.93
Interpretation:
The correlation is high and positive. That means the variables X and Y have a close and direct
relationship. That is as variable X increases, the variables Y increases and vice versa. The
calculated ‘r’ value should be compared with tabular ‘r’ value to give statistical conclusion at
n-2 degree of freedom. In this example, the calculated “r” value (+ 0.93) is greater than the
table “r” value at n-2 df i.e. 5-2=. (0.754). Thus, the relationship between X and Y is positively
significant correlated.
Note: The correlation coefficient test is made without considering the sign of r i.e., one has to
compare calculated r value with table r value without considering the sign of r.
Note on correlation

76
Biometry Handout 2022
 correlation doesn’t tell anything about cause and effect (dependent and independent
variables)
 Both the variables may be affected by third variable which is not known to us
 Sample numbers are advised to be more than ten.
Regression analysis is a statistical technique for investigating and modeling the relationship
between variables. Regression is similar to correlation as it is used for testing a linear
relationship between two types of measurements made on the same individuals. However,
regression goes further in that it is also possible to produce an equation describing the line of
best fit through the points on the graph. When using regression analysis, unlike in correlation,
the two variables have different roles. Regression is used when the value of one of the variables
is considered to be dependent on the other, or at least reliably predicted from the other. The
dependent variable is denoted by y and the independent variable by x. Hence, the consideration
is very important because regression of y on x is not the same as regression of x on y.
Regression analysis concerns the study of the relationships between variables with the objective
of identifying, estimating and validating the relationship. The objective of regression analysis is
the development of a statistical model that can predict the values of a variable based upon the
values of another variable. In this Section the subject will be presented with specific reference
to the straight-line model. Then, on the basis of the model, it is possible to test whether one
variable actually influences the other or not.

Regression analysis is one of the most widely used techniques for analyzing multifactor data.
Its usefulness results from the logical process of using an equation to express the relationship
between a variable of interest and a set of related predictor variables. For example, an
experimental study of the relation between two variables is often motivated by a need to predict
one variable from the other. The director of a job-training programme may wish to study the
relation between the duration of training and the score of the trainee on subsequent skill test. A
forester may wish to estimate the timber volume of a tree from the measurement of the trunk
diameter a few meters above the ground (breast height). An agronomist may be interested in
predicting the grain yield of maize at different levels of nitrogen fertilizer. A medical
technologist may be interested in predicting the blood alcohol measurement from the read-out
of newly devised breath analyzer.
In such context as these, the predictor or input variable is denoted by x, and the response or
output variable is labeled y. The objective is to find the nature of relation between x and y
from experimental data and use the relation to predict the response variable y from the input
77
Biometry Handout 2022
variable x. The first step in such a study is to plot and examine the scatter diagram. If a linear
relation emerges, the calculation of the numerical value of r will confirm the strength of the
linear relation. Its value indicates how effectively y can be predicted from x by fitting a straight
line to the data.
A regression problem involving a single predictor (also called simple regression) arises when
there is a need to study the relation between two variables x and y and use to predict y from x.
The variable x acts as an independent variable whose values are controlled by the experimenter.
The variable y depends on x and also subjected to unaccountable variations or errors.
For any experiment, n is used to denote the sample size or number of runs of the experiment.
Each run gives a pair of observations (x, y) in which x is the fixed setting of the independent
variable and y denotes the corresponding response.
The general simple linear regression model for a population is given by an equation Y = α + β
X + έ; where “α” is the intercept, “β” denotes the regression coefficient/slope of the regression
line/ amount of change in Y for each unit change in X and έ is the error. For any given value of
X, the sum of the error term will be zero. Hence, the mean value of έ is zero and the model
becomes Y = α + βX. Β may be negative or positive or zero. The value of Y when X = 0 is
called the Y-intercept denoted by α. The Y-intercept α and the slope β are population
parameters and are estimated from samples. The estimator of α and β are denoted by a and b,
respectively and hence the model becomes y = a + bx. a and b are unbiased estimator of α and
β, respectively. The least square method estimates the parameters α and β using a and b.
The best estimate of β is given by:
b = ƩXY – (ƩX) (ƩY)/n
ƩX2 – (ƩX) 2/n
The best estimate of α is given by:
a = mean of Y – b (mean of X)
Note that the assumption of linear regression are the same as ANOVA and we test the
hypothesis H0: β = 0 and H1 : β ≠ 0. There is a need to distinguish between dependent and
independent variables in regression analysis as opposed to correlation analysis.
The regression line indicated the average value of the dependent variable, Y associated with a
particular value of independent variable, X. The slope, b, hereafter referred to as a regression
coefficient, indicates the change in Y with a one unit change in X.
Example: A chemist wishes to study the relation between the drying time of a paint and a
concentration of a base solvent that facilitates a smooth application. The data of concentration
78
Biometry Handout 2022
setting x and the observed drying times y are recorded in the first two columns (Table 20). Plot
the data, calculate r and determine the fitted line.

Table 20: Data on concentration (x) and drying time (y) in minutes
Concentration, x Drying time, y x2 y2 Xy
0 1 0 1 0
1 5 1 25 5
2 3 4 9 6
3 9 9 81 27
4 7 16 49 28
Total 10 25 30 165 66
To calculate r and determine the equation of the fitted line, the basic quantities mean of x, mean
of y, variance of x variance of y and covariance of xy should be calculated first using the total
in Table 20.

Mean of x = =2

Mean of y = =5

Sxx = 30 - = 10

Syy = 165 – = 40

Sxy = 66 – 10 x = 16,

Then, r = = 0.8

β1 = = 1.6

β0 = 5 - (1.6)2 = 1.8, then the equation of the fitted line is ŷ = 1.8 + 1.6x.
Example: A researcher wishes to study the relation between the drying time of paint and a
concentration of a base solvent that facilitates a smooth application. The data of concentration
setting x and the observed drying times y are recorded in the first two columns (Table 21). Plot
the data, calculate r and determine the fitted line.

Table 21: Data on concentration (x) and drying time (y) in minutes
Concentration, x Drying time, y x2 y2 Xy
0 1 0 1 0
1 5 1 25 5
2 3 4 9 6
79
Biometry Handout 2022
3 9 9 81 27
4 7 16 49 28
Total 10 25 30 165 66
To calculate r and determine the equation of the fitted line, the basic quantities mean of x, mean
of y, variance of x variance of y and covariance of xy should be calculated first.

Mean of x = =2

Mean of y = =5

Sxx = 30 - = 10

Syy = 165 – = 40

Sxy = 66 – 10 x = 16,

Then, r = = 0.8

β1 = = 1.6

β0 = 5 - (1.6)2
OD = 1.8, then the equation of the fitted line is

2
1
0.8
0.6 ŷ = 1.8 + 1.6x
0.4
0.2
2 4 6 8
PH
Exercise 1: Suppose that we had the following results from an experiment in which we
measured the growth of a cell culture (as optical density) at different pH levels.

pH Optical density
3 0.1

80
Biometry Handout 2022

4 0.2
4.5 0.25
5 0.32
5.5 0.33
6 0.35
6.5 0.47
7 0.49
7.5 0.53
Compute the co-efficient of correlation and also find out whether the association is significant
or not at 5% significance.
Exercise 2: Consider the relationship between the number of branches on the plants and the
number of seed pods it produces.
No. of branches/plant (X): 14
No. of seed/ pods (Y): 50 60 70 100 120
Based on the given data, answer the following questions.
1. Fit a simple linear regression of Y on X,
2. Estimate the number of seeds/ pod (Y) when the number of branches/plant (X) is 1
7. CHAPTER SEVEN: ANALYSIS OF COVARIANCE
The analysis of covariance is concerned with two or more measured variables where any
measurable independent variable is not at the predetermined levels as in factorial experiment.
The covariance between two random variables is a measure of the nature of the association
between the two. It makes use of the concept of both analysis of variance and of regression, i.e.,
it detects the variances and covariances of specific variables to estimate treatment effect more
accurately than the use of ANOVA alone.
The application of covariance analysis can be extended to any number of covariates and to any
functional relationship between variables. However, this manual is limited to a single covariate
that has linear relationship with the trait of primary interest. The importance of covariance
analysis in improving precision has been emphasized by several authors (Cochran and Cox,
1957; Steel and Torrie, 1980; Gomez and Gomez, 1984; and Walpole et al., 2002). The core
remarks of the authors can be summarized as follows. The most important uses of covariance
analysis are to: control error (increase precision), adjust treatment means; assist in the
interpretation of data, partition the total covariance into component parts and estimate missing
data.

81
Biometry Handout 2022
The use of covariance to control error is by means of regression in such a way that a certain
recognized effects that cannot be controlled effectively by experimental design. The main task
here is to control the variation due to the physical conduct of the experiment but not usually on
inherent variation. Examples of variation that can be controlled by covariance analysis are:
when crop varieties with different number of plants per plot are evaluated together for yield
(number of plants per unit area and yield are positively associated in most cases) we can use
number of plants per plot as a covariate and there by estimate the true treatment effect.
Estimation of the true treatment effect will enable us to adjust treatment means. The adjusted
treatment means in turn help in proper interpretation of data.
By measuring additional variable (covariate, X) that is known to be associated with the primary
variable (Y) linearly, the source of variation associated with the covariate can be deducted from
experimental error. After that the primary variable can be adjusted linearly upward or
downward, depending on the relative size of its respective covariate. By doing so, the treatment
mean is adjusted and the experimental error is reduced and the precision for comparing
treatment means is increased. In this case the covariate must not be affected by treatments
being tested.
Covariance Analysis and Blocking
Covariance analysis should be considered in experiments in which blocking cannot adequately
reduce experimental error. This is true because blocking is done before the start of the
experiment; it can be used only to cope with sources of variation that are known or predictable.
Analysis of covariance can take care of unexpected sources of variation that occurs during the
process of experimentation. Hence, covariance analysis is a supplementary procedure to take
care of sources of variation that cannot be accounted for by blocking.
Covariance analysis is essentially an extension of the analysis of variance and hence, all the
assumptions for a valid analysis of variance which was mentioned in ANOVA are also
important here. Besides, covariance analysis requires that the covariates to be fixed (measured
without error and independent of treatment), the regression of the dependent variable (Y) on the
covariate (X) after removal of block and treatment differences is linear and the errors are
normally and independently distributed with mean zero and a common variance.
The identification of covariate is an important task in the application of covariance analysis.
Assigning the covariate is highly determined by the purpose for which the covariance technique
is applied. When there is irregular stand establishment in field experiments, the number of
plants per plot becomes an important source of variation. Here, covariance analysis can be used
82
Biometry Handout 2022
with stand count as a covariate. Another example in animal science could be determination of
rate of body weight gain of different animals feeding on certain ration (the experimental animal
do vary in age, age and body weight gain are positively associated with age) then the initial age
of the animals can be used as covariate.

83
Biometry Handout 2022
8. APPENDIX: STATISTICAL TABLES AND PROCEDURES

84
Biometry Handout 2022

85
Biometry Handout 2022

86
Biometry Handout 2022

87
Biometry Handout 2022

88

You might also like