Testing Ass2
Testing Ass2
Testing Ass2
Reliability:
The dictionary meaning of reliability is consistency, dependence or trust. So in
measurement reliability is the consistency with which a test yields the same result in
measuring whatever it does measure. A test score is called reliable when we have reason
for believing the score to be stable and trust-worthy. Stability and trust-worthiness
depend upon the degree to which the score is an index of time-reliability’ is free from
chance error. Therefore reliability can be defined as the degree of consistency between
two measurements of the same thing.
For example we administered an achievement test on Group-A and found a mean score
of 55. Again after 3 days we administered the same test on Group-A and found a mean
score of 55. It indicates that the measuring instrument (Achievement test) is providing a
stable or dependable result. On the other hand if in the second measurement the test
provides a mean score around 77 then we can say that the test scores are not consistent.
ADVERTISEMENTS:
According to Ebel and Frisbie (1991) “the term reliability means the consistency with
which a set of test scores measure whatever they do measure.”
Theoretically, reliability is defined as the ratio of the true score and observed score
variance.
ADVERTISEMENTS:
ADVERTISEMENTS:
How similar the test scores are if two equivalent forms of tests are administered?
To what extent the scores of any essay test. Differ when it is scored by different
teachers?
It is not always possible to obtain perfectly consistent results. Because there are several
factors like physical health, memory, guessing, fatigue, forgetting etc. which may affect
the results from one measurement to other. These extraneous variables may introduce
some error to our test scores. This error is called as measurement errors. So while
determining reliability of a test we must take into consideration the amount of error
present in measurement.
Nature of Reliability:
1. Reliability refers to consistency of the results obtained with an instrument but not the
instrument itself
ADVERTISEMENTS:
2. Reliability refers to a particular interpretation of test scores. For example a test score
which is reliable over a period of time may not be reliable from one test to another
equivalent test. So that reliability cannot be treated as general characteristics.
4. Reliability is necessary but not a sufficient condition for validity. A test which is not
reliable cannot be valid. But it is not that a test with high reliability will possess high
validity. Because a highly consistent test may measure something other than that what
we intend to measure.
Methods of Determining Reliability:
For most educational tests the reliability coefficient provides the most revealing
statistical index of quality that is ordinarily available. Estimates of the reliability of test
provide essential information for judging their technical quality and motivating efforts
to improve them. The consistency of a test score is expressed either in terms of shifts of
an individual’s relative position in the group or in terms of amount of variation in an
individual’s score.
ADVERTISEMENTS:
In this method, the reliability is stated in terms of the standard error of measurement. It
indicates the amount of variation of an individual’s score.
ADVERTISEMENTS:
(ii) Two separate but equivalent forms of the test may be administered to the same
individuals.
(iii) The test items of a single test are divided into two separate sets and the scores of
two sets are correlated.
The methods are similar in that all of them involve correlating two sets of data, obtained
either from the same evaluation instrument or from equivalent forms of the same
procedure. This reliability coefficient must be interpreted in terms of the types of
consistency being investigated.
ADVERTISEMENTS:
A high coefficient of correlation indicates high stability of test scores. In the words of
Gronlund, Measures of stability in the .80’s and .90’s are commonly reported for
standardized tests over occasions within the same year. But this method suffers from
some serious drawbacks. First of all what should be the interval between two
administrations.
If it is administered within a short interval say a day or two, then the pupil will recall
their first answers and spend their time on new material. It will tend to increase their
score in second administrations. If interval is too long say one year, then the maturation
effect will affect the retest scores and it will tend to increase the retest scores.
In both the cases it will tend to lower the reliability. So what should be the time gap
between two administrations depends largely on the use and interpretation of test
scores. Due to its difficulties in controlling conditions which influence the scores of
retest, reduces the use of test-retest method in estimating, reliability coefficient.
Both the tests selected for administration should be parallel in terms of content,
difficulty, format and length. When time gap between the administrations of two forms
of tests are provided the coefficient of test scores provide a measure of reliability and
equivalence. But the major drawback with this method is to get two parallel forms of
tests. When the tests are not exactly equal in terms of content, difficulty, length and
comparison between the scores obtained from these tests may lead to erroneous
decisions.
The common procedure of splitting the test is to take all odd numbered items i.e. 1, 3, 5,
etc. in one half and all even-numbered items i.e. 2, 4, 6, 8 etc. in the other half Then
scores of both the halves are correlated by using the Spearman- Brown formula.
ADVERTISEMENTS:
By using formula (5.1) we can get the reliability coefficient on full test as:
The reliability coefficient .82 when the coefficient of correlation between half test is .70.
It indicates to what extent the sample of test items are dependable sample of the content
being measured—internal consistency.
This relationship between true score, obtained scores and the error can be
expressed mathematically as follows:
We can find out standard error of measurement (SE) when the reliability coefficient and
standard deviation of the distribution is given.
No obtained score tells us what the true score is, but the knowledge of the SE indicates
the difference between the obtained score and true score. When the SE is small, it
indicates that the true score is closer to the obtained score and it also indicates whether
the difference between scores of two individuals is real difference or difference due to
errors of measurement.
The major factors which affect the reliability of test, scores can be
categorized in to three headings:
1. Factors related to test.
For example in Group A students have secured marks ranging from 30 to 80 and in
Group B student have secured marks ranging from 65 to 75. If we shall administer the
tests second time in Group A the test scores of individuals could vary by several points,
with very little shifting in the relative position of the group members. It is because the
spread of scores in Group A is large.
On the other hand the scores in Group B are more likely to shift positions on a second
administration of the test. As the spread of scores is just 10 points from highest score to
lowest score, so change of few points can bring radical shifts in relative position of
individuals. Thus greater the spread more is the reliability.
Following are some of the important factors with the testee which affect the
test reliability:
(a) Heterogeneity of the group:
When the group is a homogeneous group the spread of the test scores is likely to be less
and when the group tested is a heterogeneous group the spread of scores is likely to be
more. Therefore reliability coefficient for a heterogeneous group will be more than
homogeneous group.
When one is choosing a standardized test on interpreting its results, it is not sufficient to
just look at the numerical value of reliability estimate, one must also take into account
how that estimate was obtained. Gronlund (1976) has remarked about the significance
of methods of estimating reliability.
According to him “the split-half method gives the largest numerical values to the
reliability coefficient. Equivalent forms method and test retest tend to give lower
numerical value to the reliability coefficient. Typically these two methods provide
medium to large reliability coefficient. Equivalent forms method typically provides
smallest reliability coefficient for a given test.”
Therefore it may be said that the teacher should seek a standardized test whose
reliability is as high as possible. But he must interpret this reliability coefficient in the
light of the groups of pupils on which it is based, the variability of this group and
methods of estimating reliability.
Characteristic # 2. Validity:
“In selecting or constructing an evaluation instrument, the most important
question is; To what extent will the results serve the particular uses for
which they are intended ? This is the essence of validity.” —GRONLUND
Validity is the most important characteristic of an evaluation programme, for unless a
test is valid it serves no useful function. Psychologists, educators, guidance counselors
use test results for a variety of purposes. Obviously, no purpose can be fulfilled, even
partially, if the tests do not have a sufficiently high degree of validity. Validity means
truth-fullness of a test. It means to what extent the test measures that, what the test
maker intends to measure.
Ebel and Frisbie (1991)—”The term validity, when applied to a set of test scores, refers to
the consistency (accuracy) with which the scores measure a particular cognitive ability
of interest.”
C.V. Good (1973)—In the dictionary of education defines validity as the “extent to which
a test or other measuring instrument fulfils the purpose for which it is used.”
Anne Anastasi (1969) writes “the validity of a test concerns what the test measures and
how well it does so.”
According to Davis (1964) validity is the extent of which the rank order of the scores of
examinees for whom a test is appropriate is the same as the rank order of the same
examinees in the property or characteristic that the test is being used to measure. This
property or characteristic is called the criterion. Since any test may be used for many
different purposes, it follows that it may have many validities one corresponding to each
criterion.”
Freeman (1962) defines, “an index of validity shows the degree to which a test measures
what it purports to measure, when compared with accepted criteria.”
Lindquist (1942) has said, “validity of a test may be defined as the accuracy with which it
measures that which it is intended to measure, or as the degree to which it approaches
infallibility in measuring what it purports to measure.”
From the above definitions it is clear that validity of an evaluation device is the degree to
which it measures what it is intended to measure. Validity is always concerned with the
specific use of the results and the soundness of our proposed interpretation.
It is not also necessary that a test which is reliable may also be valid. For example
suppose a clock is set forward ten minutes. If the clock is a good time piece, the time it
tells us will be reliable. Because it gives a constant result. But it will not be valid as
judged by ‘Standard time’. This indicates “the concept that reliability is a necessary but
not a sufficient condition for validity.”
Nature of Validity:
1. Validity refers to the appropriateness of the test results but not to the instrument
itself.
3. Tests are not valid for all purposes. Validity is always specific to particular
interpretation. For example the results of a vocabulary test may be highly valid to test
vocabulary but may not be that much valid to test composition ability of the student.
(iii) Response set—a consistent tendency to follow a certain pattern in responding the
items.
Gronlund and Linn (1995) states “Objectivity of a test refers to the degree to which
equally competent scores obtain the same results. So a test is considered objective when
it makes for the elimination of the scorer’s personal opinion and bias judgement. In this
context there are two aspects of objectivity which should be kept in mind while
constructing a test.”