Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Item Analysis

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 7

KMC COLLEGE OF NURSING, MEERUT

Nursing Education

Assignment On
ITEM ANALYSIS

Submitted to: - Submitted By:-

Ms. Natasha Verma Mrs Jyoti Katiyar


Lecturer, KMC College Of Nursing M.Sc. (N) 1st Year
ITEM ANALYSIS

Definition of Item Analysis.


Item analysis is a process which examines student responses to
individual test items (questions) in order to assess the quality of those
items and of the test as a whole. Item analysis is especially valuable in
improving items which will be used again in later tests, but it can also be
used to eliminate ambiguous or misleading items in a single test
administration. In addition, item analysis is valuable for increasing
instructors’ skills in test construction, and identifying specific areas of
course content which need greater emphasis or clarity.
Item Statistics.
Item statistics are used to assess the performance of individual test items
on the assumption that the overall quality of a test derives from the
quality of its items.
Item Number.
This is the question number taken from the student answer sheet, and the
Key Sheet. Up to 150 items can be scored on the Standard Answer
Sheet.
Mean and Standard Deviation.
The mean is the “average” student response to an item. It is computed by
adding up the number of points earned by all students on the item, and
dividing that total by the number of students.
The standard deviation, or S.D, is a measure of the dispersion of student
scores on that item. That is, it indicates how “spread out” the responses
were. The item standard deviation is most meaningful when comparing
items which have more than one correct alternative and when scale
scoring is used. For this reason it is not typically used to evaluate
classroom tests.
Item Difficulty.
For items with one correct alternative worth a single point, the item
difficulty is simply the percentage of students who answer an item
correctly. In this case, it is also equal to the item mean. The item
difficulty index ranges from 0 to 100; the higher the value, the easier the
question. When an alternative is worth other than a single point, or when
there is more than one correct alternative per question, the item
difficulty is the average score on that item divided by the highest
number of points for any one alternative. Item difficulty is relevant for
determining whether students have learned the concept being tested. It
also plays an important role in the ability of an item to discriminate
between students who know the tested material and those who do not.
The item will have low discrimination if it is so difficult that almost
everyone gets it wrong or guesses, or so easy that almost everyone gets
it right.
Item Discrimination.
Item discrimination refers to the ability of an item to differentiate among
students on the basis of how well they know the material being tested.
Various hand calculation procedures have traditionally been used to
compare item responses to total test scores using high and low scoring
groups of students. Computerized analyses provide more accurate
assessment of the discrimination power of items because they take into
account responses of all students rather than just high and low scoring
groups.
The item discrimination index provided by ScorePak® is a Pearson
Product Moment correlation2 between student responses to a particular
item and total scores on all other items on the test. This index is the
equivalent of a point-biserial coefficient in this application. It provides
an estimate of the degree to which an individual item is measuring the
same thing as the rest of the items.
Because the discrimination index reflects the degree to which an item
and the test as a whole are measuring a unitary ability or attribute, values
of the coefficient will tend to be lower for tests measuring a wide range
of content areas than for more homogeneous tests. Item discrimination
indices must always be interpreted in the context of the type of test
which is being analysed. Items with low discrimination indices are often
ambiguously worded and should be examined. Items with negative
indices should be examined to determine why a negative value was
obtained. For example, a negative value may indicate that the item was
mis-keyed, so that students who knew the material tended to choose an
unkeyed, but correct, response option.
Tests with high internal consistency consist of items with mostly
positive relationships with total test score. In practice, values of the
discrimination index will seldom exceed .50 because of the differing
shapes of item and total score distributions. ScorePak® classifies item
discrimination as “good” if the index is above .30; “fair” if it is between
.10 and.30; and “poor” if it is below 10.
Altenate Weight.
This column shows the number of points given for each response
alternative. For most tests, there will be one correct answer which will
be given one point, but ScorePak® allows multiple correct alternatives,
each of which may be assigned a different weight.
Means.
The mean total test score (minus that item) is shown for students who
selected each of the possible response alternatives. This information
should be looked at in conjunction with the discrimination index; higher
total test scores should be obtained by students choosing the correct, or
most highly weighted alternative. Incorrect alternatives with relatively
high means should be examined to determine why “better” students
chose that particular alternative.
Frequencies and Distribution.
The number and percentage of students who choose each alternative are
reported. The bar graph on the right shows the percentage choosing each
response; each “#” represents approximately 2.5%. Frequently chosen
wrong alternatives may indicate common misconceptions among the
students.
Difficulty and Discrimination Distributions.
At the end of the Item Analysis report, test items are listed according
their degrees of difficulty (easy, medium, and hard) and discrimination
(good, fair, and poor). These distributions provide a quick overview of
the test, and can be used to identify items which are not performing well
and which can perhaps be improved or discarded.
Test Statistics.
Two statistics are provided to evaluate the performance of the test as a
whole.
Standard Error of Measurement.
The standard error of measurement is directly related to the reliability of
the test. It is an index of the amount of variability in an individual
student’s performance due to random measurement error. If it were
possible to administer an infinite number of parallel tests, a student’s
score would be expected to change from one administration to the next
due to a number of factors. For each student, the scores would form a
“normal” (bell-shaped) distribution. The mean of the distribution is
assumed to be the student’s “true score,” and reflects what he or she
“really” knows about the subject. The standard deviation of the
distribution is called the standard error of measurement and reflects the
amount of change in the student’s score which could be expected from
one test administration to another.
Whereas the reliability of a test always varies between 0.00 and 1.00, the
standard error of measurement is expressed in the same scale as the test
scores. For example, multiplying all test scores by a constant will
multiply the standard error of measurement by that same constant, but
will leave the reliability coefficient unchanged.
A general rule of thumb to predict the amount of change which can be
expected in individual test scores is to multiply the standard error of
measurement by 1.5. Only rarely would one expect a student’s score to
increase or decrease by more than that amount between two such similar
tests. The smaller the standard error of measurement, the more accurate
the measurement provided by the test.
A Caution in Interpreting Item Analysis Results.
Each of the various item statistics provided by ScorePak® provides
information which can be used to improve individual test items and to
increase the quality of the test as a whole. Such statistics must always be
interpreted in the context of the type of test given and the individuals
being tested. W. A. Mehrens and I. J. Lehmann provide the following set
of cautions in using item analysis results (Measurement and Evaluation
in Education and Psychology. New York: Holt, Rinehart and Winston,
1973, 333-334)

 Item analysis data are not synonymous with item validity. An


external criterion is required to accurately judge the validity of test
items. By using the internal criterion of total test score, item
analyses reflect internal consistency of items rather than validity.
 The discrimination index is not always a measure of item quality.
There is a variety of reasons an item may have low discriminating
power:(a) extremely difficult or easy items will have low ability to
discriminate but such items are often needed to adequately sample
course content and objectives;(b) an item may show low
discrimination if the test measures many different content areas
and cognitive skills. For example, if the majority of the test
measures “knowledge of facts,” then an item assessing “ability to
apply principles” may have a low correlation with total test score,
yet both types of items are needed to measure attainment of course
objectives.
 Item analysis data are tentative. Such data are influenced by the
type and number of students being tested, instructional procedures
employed, and chance errors. If repeated use of items is possible,
statistics should be recorded for each administration of each item.
Bibliography:
 www.scrib.in
 www.google.com
 www.slideshare.com
 www.wikipedia.com

You might also like