Test Development

Psychological Assessment
Test Development(8)
Source: Cohen & Swerdlik (2018), Kaplan & Saccuzzo (2018)
Test Conceptualization Pilot Work/Pilot Study/Pilot Research

- preliminary research surrounding the
L.L. Thurstone - He is widely criterion of a prototype of a test.
considered to be one of the primary
architects of modern factor analysis. - Trial.
- Attempts to determine how to
- credited for being at the measure a targeted construct.
forefront.
Test Construct
Test Development - an umbrella term
for all that goes into the process of Test Construct - stage in the process
creating a test. that entails writing test items,
revisions, formatting, and setting
Test Conceptualization - scoring rules.
brainstorming ideas about what kind
of test a developer wants to publish. Scaling - process of setting rules for
assigning numbers in measurement.
Questions to ponder on when
conceptualizing for new tests: - To be objective.
- Process by which a measuring
1. What is it designed to measure? device is assigned and
2. What is the objective? calibrated and by which
3. Is there a need for this kind of numbers—scale values—are
test? assigned to different amounts
4. Who will use the test? of the trait, attribute, or
5. Who will take the test? characteristic being measured.
6. What content will the test
cover? Scaling Methods:
7. How will the test be
administered? ● Age-Based - age is of critical
8. What is the ideal format of the interest.
test? ● Grade-Based - grade is of
9. Should more than one form of critical interest.
test be developed? ● Stanine - if all raw scores of the
10. What special training will be test are to be transformed into
required of test users for scores that range from 1-9.
administering or interpreting ● Unidimensional - only one
the test? dimension is presumed to
11. What types of responses will be underlie the ratings.
required of test takers? ● Multidimensional - more than
12. Who benefits from the one dimension.
administration of this test?
Comparative and Categorial:
13. Is there potential harm?
14. How will meaning be attributed ● Rating Scale - grouping of
to scores on this test? words, statements, or symbols
on which judgements of the
Criterion-Referenced Test - known to
strength of a particular trait are
be used more often in board exams.
indicated by the test taker.
Test Development(8)
● Summative Scale - final score is 1. Multiple-Choice Format -

obtained by summing the has three elements: stem
ratings across all the items. (question), a correct
● Likert Scale - scale attitudes, opinion, and several
usually reliable. incorrect alternatives
- MDBS-R (distractors or foils).
● Thurstone Scale - involves the - Easiest for the
collection of a variety of test takers, but
different statements about a not for the test
phenomenon which are ranked developers.
by an expert panel in order to 2. Matching Item - test
develop the questionnaire. taker is presented with
● Method of Paired Comparisons two columns: premises
- produces ordinal data by and responses.
presenting with pairs of two 3. Binary Choice - true or
stimuli which they are asked to false.
compare. - Easiest.
● Comparative Scaling - entails - Disadvantage:
judgements of a stimulus in only 25% chance
comparison with every other of passing rate.
stimulus on the scale. ● Constructed-Response Format -
● Categorial Scaling - stimuli are requires test takers to supply or
placed into one of two or more to create the correct answer,
alternative categories that not merely selecting it.
differ quantitatively with 1. Completion Item -
respect to some continuum. requires the examinee to
● Guttman Scale - yields ordinal- provide a word or phrase
level measures. that completes a
- Ranges from weaker to sentence.
stronger expressions. - Fill in the blank
- Pinaka mahirap. 2. Short Answer Item -
- Para malaman ang should be written clearly
susunod na action. enough that the test
taker can respond
Item Pool - reservoir or well from succinctly, with short
which the items will or will not to be answer.
drawn for the final version of the test. 3. Essay Item - respond by
writing a composition.
★ Items required x # of forms =
- Allows creative
no. of test items.
integration and
Item Format - form, plan, structure, expression of the
arrangement, and layout of individual material.
test items.
Writing items for computer
● Selected-Response Format - administration:
require test takers to select
response from a set of
alternative response.
Test Development(8)
Item Banks - relatively large and Scoring Items:

easily accessible collection of test
questions. Cumulative Scoring - the higher score
one achieved on the test, the higher
Computerized Adaptive Testing (CAT) the test taker is on the ability that the
- refers to an interactive, computer test purports to measure.
administered test-taking process
wherein items presented to the test - Test items added together to
taker are based in part on the test get the final score.
taker’s performance on previous
Class Scoring/Category Scoring - test
items.
taker responses earn credit toward
- Easy for test developers, but placement in a test takers who pattern
not for the test takers. of responses is presumably similar in
- Rules: some way.
- Ipapatigil na ng
Ipsative Scoring - comparing a test
administrator yung
taker’s score on one scale within a test
nagttest magsagot kasi
to another scale within that same test.
nareach na niya yung no.
of mali na pwede niya Semantic Differential Rating
makuha. Technique - measures an individual’s
- Kapag nahihirapan na yung unique, perceived meaning of an
test taker sa sunod sunod na object, a word, or an individual;
items, kukuha sa item usually essay type, open-ended
bank/item pool yung format.
administrator at ipapalit dun sa
questions kung saan nahirapan Test Tryout
yung test taker.
- Advantage: reduce man power. ★ The test should be tried out on
● Floor Effects - occurs when people who are similar in
there is some lower limitation a critical respects to the people
survey or questionnaire and a for whom the test was
large percentage of designed.
respondents score near this ★ An informal rule of thumb
lower limits (test takers have should be no fewer that 5 and
low scores). preferably as many as 10 for
● Ceiling Effects - occurs when each item (the more, the
there is some upper limitation better).
on a survey or questionnaire Pseudobulbar Affect (PBA) -
and a large percentage of neurological disorder characterized by
respondents score near this frequent involuntary outburst of
upper limit (test takers have laughing or crying that may or may
high scores). not be appropriate to the situation.
Item Branching - ability of the
computer to tailor the content and
order of presentation of items on the
basis of responses to previous items.
Test Development(8)
Empirical Criterion Keying - ● Point-Biserial Method - another

administering a large pool of test way to examine the
items to a sample of individuals who discriminability of items is to
are known to differ on the construct find the correlation between
being measured. performance on the item and
performance on the total test.
Item Analysis - Correlation between a
dichotomous variable
- Statistical procedure used to
and continuous variable.
analyze items.
- To know the good item vs. the Item Characteristic Curve - graphic
bad item. representation of item difficulty and
discrimination.
Item Difficulty - defined by the
number of people who get a particular Guessing - one that eluded anu
item correct. universally accepted soulutions.
Item-Difficulty Index - calculating the How to eliminate Guessing
proportion of the total number of test probabilities:
takers who answered the item
correctly. 1. A correction for guessing must
recognize that, when a
● Item-Endorsement Index - for respondent guesses at an
personality testing. answer on an achievement test,
the guess is not typically made
Item-Reliability Index - provides an
on a totally random basis.
indication of the internal consistency
2. A correction for guessing must
of a test.
also deal with the problem of
Item-Validity Index - designed to omitted items. Sometimes,
provide an indication of the degree to instead of guessing, the
which a test measures what it purports testtaker will simply omit a
to measure. response to an item.
3. Just as some people may be
Item-Discrimination Index - measure luckier than others in front of a
of item discrimination. Las Vegas slot machine, so
some testtakers may be luckier
- In this context, a multiple- than others in guessing the
choice item on an achievement choices that are keyed correct.
test is a good item if most of Any correction for guessing
the high scorers answer may seriously underestimate or
correctly and most of the low overestimate the effects of
scorers answer incorrectly. guessing for lucky and unlucky
● Extreme Group Method - testtakers.
compares people who have
done well with those who have Qualitative Methods - techniques of
done poorly on the test. data generation and analysis that rely
● Discrimination Index - primarily on verbal rather than
difference between these statistical procedures.
proportion.
Test Development(8)
Quantitative Item Analysis - various DIF Items - items that respondents

nonstatistical procedures designed to from different groups at the same
explore how individual test items level of underlying trait have different
work. probabilities of endorsing a function
of their group membership.
Test Revision
- Characterize each item

according to its strength and
weaknesses.
Cross Validation - revalidation of a

test on a sample of test takers othe
than those on who test performance
was originally found to be a valid
predictor of some criterion.
- Often results to validity

shrinkage.
Validity Shrinkage - decrease in item

validities that inevitably occurs after
cross validation.
Co-Validation - conducted on two or

more test using the same sample of
test takers.
Co-Norming - creation of norms or the

revision of existing norms.
Anchor Protocol - test protocol scored

by highly authoritative scorer that is
designed as a model for scoring and a
mechanism for resolving scoring
discrepancies.
Scoring Drift - discrepancy between

scoring in an anchor protocol and the
scoring of another protocol.
Differential Item Functioning - item

functions differently in one group of
test takers known to have the same
level of the underlying trait.
DIF Analysis - test developers

scrutinize group by group item
response curves looking for DIF items.

Test Development

Uploaded by

Copyright:

Available Formats

Test Development

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Test Development

Uploaded by

Copyright:

Available Formats

Psychological Assessment

Test Conceptualization Pilot Work/Pilot Study/Pilot Research

● Summative Scale - final score is 1. Multiple-Choice Format -

Item Banks - relatively large and Scoring Items:

Empirical Criterion Keying - ● Point-Biserial Method - another

Quantitative Item Analysis - various DIF Items - items that respondents

- Characterize each item

Cross Validation - revalidation of a

- Often results to validity

Validity Shrinkage - decrease in item

Co-Validation - conducted on two or

Co-Norming - creation of norms or the

Anchor Protocol - test protocol scored

Scoring Drift - discrepancy between

Differential Item Functioning - item

DIF Analysis - test developers

You might also like