PSY 414 Psychological Testing & Construction 10.03.2021

PSY 414
Psychological Testing &

Construction
PSYCHOLOGICAL TESTING:
• Psychological testing is the administration of psychological tests,
which are designed to be ‘an objective and standardised measure
of a sample of behaviour’
• Psychological testing is the branch of psychology in which we use
standardised tests, construct them in order to understand the
individual differences.
• Psychological testing is a term that refers to the use of psychological
tests. It refers to all the possible uses, applications, and underlying
concepts of psychological tests.
TEST:
• Test - is a measurement device or technique used to quantify
behaviour or aid in the understanding and prediction of behaviour.
‒ It does not fully measure your full understanding of the material.
ITEM:
• Item - a specific stimulus to which a person responds overtly; this
response can be scored or evaluated.
PSYCHOLOGICAL TEST:
• A psychological test is an objective and standardised
measure of a sample of behaviour
OR
• A psychological test is a standardised measure
quantitatively or qualitatively one or more than one
aspect of trait by means of a sample of verbal or non-
verbal behaviours.
OR
• Psychological tests are written, visual, or verbal
evaluations administered to assess the cognitive and
emotional functioning of children and adults.
OR
• Psychological test is a set of items that are designed to
measure characteristics of human beings that pertain to
behavior.
Types of Behaviors:
1. Overt behavior - individual’s observable activity.
2. Covert behavior - behaviors that take place within an
individual and cannot be directly observed.
SCALES:
• Scales - relate raw scores on test items to some defined
theoretical or empirical distribution.
PURPOSE OF PSYCHOLOGICAL TESTS
Psychological tests are used to assess a variety of mental
abilities and attributes, including achievement and ability,
personality, and neurological functioning.
1. A psychological test attempts to compare the same
individual on two or more than two aspects of traits.
2. Two or more than two persons may be compared on the
same trait. Such a measurement may be either
quantitative or qualitative.
TYPES OF PSYCHOLOGICAL TESTS
1. Individual test – test that can be given to only one person
at a time.
- Test administrator – the person giving the test.
2. Group test – administered to more than one person at a
time by a single examiner.
TYPES OF PSYCHOLOGICAL TESTS
ACCORDING TO TYPE OF BEHAVIOUR
1. Personality tests
• measure typical behaviour - traits, temperaments, and dispositions
• evaluate the thoughts, emotions, attitudes, and behavioural traits that
comprise personality.
• The results of these tests can help determine personality strengths
and weaknesses, and may identify certain disturbances in personality.
2. Ability tests
• contain items that can be scored in terms of speed, accuracy, or both.
• are designed to assess what a person is capable of doing.
• It represents a person’s level of competency to perform a certain type
of task.
Types of Personality Tests
1. Structured (objective) - which provides a self-report
statement to which the person responds ‘true’ or ‘false’, ‘yes’
or ‘no’.
2. Projective - which provides an ambiguous test stimulus,
response requirements are unclear.
Types of Ability Tests
1. Intelligence tests - which measure potential to solve
problems, adapt to changing circumstances, and profit from
experiences.
2. Aptitude tests - which measure potential for acquiring a
specific skill.
3. Achievement tests - which measure previous learning.
QUALITIES OF A GOOD PSYCHOLOGICAL TEST
The 4 Qualities of a good test are a follows
1. Validity
2. Reliability
3. Objectivity
4. Usability
1. Validity
• Validity means truth-fullness of a test. It means to what
extent the test measures that which the test maker intends
to measure.
• A valid measure is one that measures what it is intended to
measure.
It includes two aspects: (i) What is measured and (ii) how
consistently it is measured.
1. It is not a test characteristic, but it refers to the meaning of
the test scores and the ways we use the scores to make
decisions.
2. Validity is always concerned with the specific use of the
results and the soundness of our proposed interpretation
Factors Affecting Validity
1. Factors in the test:
• Unclear directions to those who are supposed to respond the test.
• Difficulty of the reading vocabulary and sentence structure.
• Too easy or too difficult test items.
2. Factors in Test Administration and Scoring
• Unfair aid to individual respondents who ask for help.
• Cheating by the respondents during testing.
• Unreliable scoring of essay type answers.
3. Factors related to Testee (Respondent)
• Test anxiety of the respondent.
• Physical and psychological state of the respondent.
• Response set - a consistent tendency to follow a certain pattern in
responding the items.
2. Reliability
• A test score is called reliable when we have reason for believing the
score to be stable and trustworthy. Stability and trustworthiness
depend upon the degree to which the score is an index of time
(reliability) and is free from chance error.
• Therefore, reliability can be defined as the degree of consistency
between two measurements of the same thing.
• A reliable measure is one that measures a construct consistently
across time, individuals, and situations.
• Reliability is necessary, but not sufficient, for validity.
Reliability falls into two general classifications:
1. Relative Reliability or Reliability Coefficient: Here reliability is stated
in terms of a coefficient of correlation known as reliability
coefficient. Hence we determine the shifting of relative position of
an individual’s score by coefficient of correlation.
2. Absolute Reliability or Standard error of Measurement: Here,
reliability is stated in terms of the standard error of measurement.
It indicates the amount of variation of an individual’s score.
Methods of determining reliability
coefficient
1. Test-Retest method.
2. Equivalent forms/Parallel forms method.
3. Split-half method.
4. Rational Equivalence/Kuder-Richardson method.
Factors which affect the reliability of test
The major factors which affect the reliability of test, scores
can be categorised in to three headings
1. Factors related to test
• Length and content of the test
• Characteristics of items
• Spread of Scores
2. Factors related to testee (respondent)
• Heterogeneity of the group
• Test experience of the respondents
• Motivation of the students
3. Factors related to testing procedure
• Time Limit of test
• Cheating opportunity given to the respondents
3. Objectivity
• Objectivity of a measuring instrument means the degree to
which different persons scoring the answer receipt arrives
of at the same result.
• Objectivity in testing is the extent to which the instrument
is free from personal error (personal bias), that is
subjectivity on the part of the scorer
Aspects of objectivity
• Objectivity of Scoring means same person or different
persons scoring the test at any time arrives at the same
result without many chance error.
• Objectivity of Test Items means that the item must call for a
definite single answer. Well-constructed test items should
lead themselves to one and only one interpretation by
respondents who know the material involved. It means the
test items should be free from ambiguity.
4. Usability
Usability is another important characteristic of measuring
instruments. Because practical considerations of the
evaluation instruments cannot be neglected. The test must
have practical value from time, economy, and administration
point of view. This may be termed as usability.
Aspects of Usability
• Ease of Administration
• Time required for administration
• Ease of Interpretation and Application
• Availability of Equivalent Forms
• Cost of Testing
Other Things to Consider
• Scorability – easy to score

• Interpretability – test results can be properly
interpreted and is a major basis in making
sound educational decisions
• Economical – the test can be reused without
compromising the validity and reliability
STEPS IN TEST CONSTRUCTION
1. Planning
2. Item Writing
3. PRELIMINARY ADMINISTRATION of the test
4. RELIABILITY of the final test
5. VALIDITY of the final test
6. Establishment of NORMS for the final test
7. Preparation of TEST MANUAL and reproduction of the test

1. PLANNING THE TEST
• Planning of the test is the first important step in test
construction. The main goal of the testing process is to collect
valid, reliable and useful data. At the stage, the test constructor
addresses the following:
1. DEFINITION OF THE CONSTRUCT - Define of the construct to
be measured by the proposed test. Describe exactly what the
test is intended to measure.
2. OBJECTIVE OF THE TEST - The author has to spell out the
broad and specific objectives of the test in clear terms.
o The purpose or purposes for which they will use the test.
o The prospective user(s) of the test - for example,
Industrial/Organisational Psychologist, Vocational Counsellor,
Clinical Psychologist, Educationalist, etc.
…PLANNING THE TEST
3. POPULATION - What will be the appropriate age range,
educational level and cultural background of the examinees
(respondents), who will find it desirable to take the test?
4. CONTENT OF THE TEST - What will be the content of the
test? Is this content coverage different from that of existing
tests developed for the same or similar purposes?
5. TEST FORMAT - The author has to decide what will be the
nature of items and response formats i.e. if the test will be
multiple choice, true / false, inventive response or some
other form.
6. TYPE OF INSTRUCTION - What would be the type of
instruction? Will it be written or to be delivered orally?
7. TEST ADMINISTRATION - Whether the test would be
administered individually or in group; will be the test be
designed or modified for computer administration? A
detailed agreement for preliminary and final administration
should be considered.
8. USER QUALIFICATION AND PROFESSIONAL COMPETENCE -
What special training or qualification will be necessary for
administering or interpreting the test?
9. PROBABLE LENGTH OF TIME - The test constructor has to
decide about the probable length and time for completion
of the test.
10. METHOD OF SAMPLING - What will be the method of
sampling: will it be random or selective?
11. ETHICAL AND SOCIAL CONSIDERATION - Is there any
potential harm for the examinees (respondents) resulting
from the administration of this test? Are there any
safeguards built into the recommended testing procedure to
prevent any sort of harm to anyone involved in the use of
this test?
12. INTERPRETATION OF SCORES - How will the scores be
interpreted? Will the score of an examinee (respondent) be
compared to others in the criteria group or will they be used
to asses mastery of a specific content area? To answer this
question the author has to decide whether the proposed
test will be criterion-referenced or norm-referenced.
13. MANUAL AND REPRODUCTION OF TEST - Planning also
includes the total number of reproduction and a preparation
of manual.
2. ITEM WRITING
• Items are specific questions or problems that make up a
test.
• An item is a specific stimulus to which a person responds

overtly (i.e. can be observed) or can be scored. This
response can be scored or evaluated for example on a scale
or grade.
• A test is a measurement device or technique used to

quantify behaviour or help in understanding and prediction
of behaviour. It is also termed as a collection of items.
Pre-requisites for Item Writing
Though there are no set rules for writing of good items (a lot depends
upon the writer’s intuition, imagination, experience and ingenuity),
there are some essential prerequisites which must be met if the item
writer wants to write good and appropriate items.
1. Command on subject matter: The item writer must have thorough
knowledge and complete mastery of the subject matter. Be fully
acquainted with all facts, principles, misconceptions, fallacies on a
particular field.
2. Full Awareness of the Population: Knowing the intelligence level
of the people would help in manipulating difficulty level of items
to so as to suit the people’s ability level.
3. Command of Language: The item writer must have a large
vocabulary knowing different meanings of a word so as to avoid
confusion. He must also be able to convey the meaning of the
items in the simplest possible language.
…Pre-requisites for Item Writing
4. Expert Opinion: After writing down the items, they must be
submitted to a group of subject experts for their criticism or
suggestions, which must then be duly modified.
5. Cultivating a Rich Source of Ideas: This is important as ideas
are not produced in mind automatically but rather require
certain factors or stimuli. Common sources of such factors
are textbooks, journals, discussions, questions for interview,
course outlines and other instructional materials.
Characteristics of a Good ITEM
1. Clarity: An item should be phrased in such a manner that is void of
ambiguity regarding its meaning to both the item writer and
respondents who take the test.
2. Moderately Difficult: The item should not be easy or too difficult.
3. Discriminating Power: The item must have discriminating power,
i.e. it must clearly distinguish between those who possess the trait
and those who do not.
4. To the Point: It must only measure the significant aspects of
knowledge or understanding and not trivial aspects of the subject
matter.
5. No Room for Guesswork: As far as possible, it should not
encourage guesswork by the subjects.
6. Clear in Reading: It should not present and difficulty in reading.
7. Independent for its meaning: It should be in such a way that it can
be answered on its own not depending or referring to another
item.
Steps in Item Writing
Item writing involves a number of steps:
1. Define clearly what you want to measure.
2. Generate an item pool
3. Avoid exceptionally long items
4. Keep the level of reading difficulty appropriate for those
who will complete the scale
5. Avoid Double Barrelled items that conveys two or more
ideas at the same time
6. Consider mixing positively and negatively worded items
Item formats
1. Dichotomous format: takes the form of True or false
tests/ Yes or No Test.
• This format offers two alternatives for each item. If a test
taker selects one of the alternatives that is presented, they
are awarded a point.
2. Polychotomous Format resembles the dichotomous
format only that it has more than two alternatives. A point
is given for selecting one of the alternatives but not for
selecting any other choice.
• For a polychotomous examination, the test taker has to
determine which alternative is correct. Incorrect alternatives
are called distractors.
3. The Likert Format
• This format requires that a respondent indicates the degree of agreement
with a particular attitudinal question.
• It is very popular with personality and attitude scales.
• This scale is non-comparative and measures only a single trait.
• The respondent is asked to indicate their level of agreement with a given
statement by way of an ordinal scale.
• It is often expressed as a four, five and sometimes as six point scale
ranging from strongly agree, agree, neutral, disagree, strongly disagree.
The more the number of points, the less likely it is for the correspondent
to be neutral.
4. The Category Format
• It is similar to the Likert scale but uses an even greater number choices
than the Likert scale.
• Although it may seem similar to the Likert format, the category scale
uses a defined point rating system.
• Test takers are required to rate a given item scenario on a scale in a
category range. For example one may use a scale of 1 to 5 or 1 to 10,
where 1 is the lowest score and 5 or 10 being the highest score
respectively. The numbers that are assigned when using the rating scale
are sometimes influenced by the context in which the items are rated.
5. Essay Type Items
• The essential characteristics of the task set by an essay test are that
each respondent:
1. Organises his own answers, with a minimum of constraint.
2. Uses his own words (usually his own handwriting).
3. Answers a small number of questions.
4. Produces answers having all degrees of completeness and
accuracy.
3. PRELIMINARY ADMINISTRATION OF TEST
• This is the phase after development of test items.
Steps in Preliminary Administration of Test
• Before proceeding toward administration of the test, the test
must be reviewed by at least three (3) experts.
• When the test has been written down and modified in the
light of the suggestions and criticisms given by the experts, the
test is said to be ready for experimental try-out.
1. The Experimental Try-Out/ Pre-Try-Out:
- The first administration of the test is called EXPERIMENTAL TRY-
OUT or PRE-TRY-OUT.
- The sample size for experimental try out should be 100.
- According to Conrad (1951), the main purpose of the
experimental tryout of any psychological and educational test is
as follows:
a. Determines vagueness and weaknesses - This has to do with
finding out the major weaknesses, omissions, ambiguities and
inadequacies of the Items.
…Steps in Preliminary Administration of Test
b. Determines difficulty level of each item - Experimental try-out
helps in determining the difficulty level of each item, which in
turn helps in their proper distribution in the final form.
c. Determines time limit - It helps in determining a reasonable
time limit for the test.
d. Determines appropriate length of a test - This determines the
appropriate length of the tests i.e., it helps in determining the
number of items to be included in the final form.
e. Identifying weaknesses in directions - This identifies any
weaknesses and vagueness in directions or instructions of the
test.
2. Proper Try-Out:
- The second preliminary administration is called PROPER
TRY-OUT.
- At this stage test is delivered to the sample of 400 and
must be similar to those for whom the test is intended.
- The proper try out is carried out for the item analysis.
- ITEM ANLYSIS is the technique of selecting discriminating
items for the final composition of the test. It aims at obtaining
three kinds of information regarding the items:
i. ITEM DIFFICULTY: Item difficulty is the proportion or
percentage of the respondents or individuals who answer the
item correctly.
ii. DISCRIMINATORY POWER OF THE ITEMS: The discriminatory
power of the items refers to the extent to which any given item
discriminates successfully between those who possess the trait
in larger amounts and those who possess the same trait in the
least amount.
iii. EFFECTIVENESS OF DISTRACTORS: It determines the non-
functional distractors. Non-functioning distractors (NFDs) are
those options in a test, that are selected by less than 5% of the
respondents. These NFDs may have no connection or having
some clues, which are not directly related to the correct
answer.
3. Final Try-Out:
- The third preliminary administration is called Final Try-Out.
- The sample for final administration should be at least 100.
- At this stage the items are selected after item analysis and
constitute the test in the final form.
- It is carried out to determine the minor defects that may
not have been detected by the first two preliminary
administrations.
- The final administration indicates how effective the test
will be when it would be administered on the sample for
which it is really intended. Thus, the preliminary
administration would be a kind of ‘DRESS REHERSAL’
providing a sort of final check on the procedure of
administration of the test and its time limit.
- After final tryout, expert opinion should be considered
again.
4. VALIDITY OF FINAL TEST
• Validity refers to the extent to which an assessment
measures what it is intended to measure
• The validity of the final test is concerned with test
conducted and how the items correctly measures what it
supposed to measure - which makes the results of the test
valid.
Validity has a number of sub-categories which all need to be

met for a test to be considered a legitimate psychometric
measuring tool.
Types of Validity
• Internal validity refers to whether the effects observed in a
study are due to the manipulation of the independent
variable and not some other factor. In-other-words there is
a causal relationship between the independent and
dependent variable. Internal validity can be improved by
controlling extraneous variables, using standardised
instructions, counter balancing, and eliminating demand
characteristics and investigator effects.
• External validity refers to the extent to which the results of
a study can be generalised to other settings (ecological
validity), other people (population validity) and over time
(historical validity). External validity can be improved by
setting experiments in a more natural setting and using
random sampling to select participants.
Face Validity
• Face validity is simply whether the test appears to measure
what it claims to.
• A direct measurement of face validity is obtained by asking
people to rate the validity of a test as it appears to them.
This rater could use a Likert scale to assess face validity. For
example:
1. the test is extremely suitable for a given purpose
2. the test is very suitable for that purpose
3. the test is adequate
4. the test is inadequate
5. the test is irrelevant and therefore unsuitable
Construct Validity
• This type of validity refers to the extent to which a test captures
a specific theoretical construct or trait, and it overlaps with some
of the other aspects of validity.
• To test for construct validity it must be demonstrated that the
phenomenon being measured actually exists. So, the construct
validity of a test for intelligence, for example, is dependent on a
model or theory of intelligence. Construct validity entails
demonstrating the power of such a construct to explain a
network of research findings and to predict further relationships.
Concurrent validity
• This is the degree to which a test corresponds to an external
criterion that is known concurrently (i.e. occurring at the same
time). If the new test is validated by a comparison with a
currently existing criterion, we have concurrent validity. Very
often, a new IQ or personality test might be compared with an
older but similar test known to have good validity already.
Predictive Validity
• This is the degree to which a test accurately predicts a criterion
that will occur in the future. For example, a prediction may be
made on the basis of a new intelligence test, that high scorers at
age 12 will be more likely to obtain university degrees several
years later. If the prediction is born out then the test has
predictive validity.
5. RELIABILITY OF THE FINAL TEST
• Reliability is the degree to which an assessment tool produces stable and
consistent results.
• Reliability refers to how dependably or consistently a test measures a
characteristic. If a person takes the test again, will he or she get a similar
test score, or a much different score? A test that yields similar scores for
a person who repeats the test is said to measure a characteristic reliably.
• Reliability in psychometrics is the overall consistency of a measure.
Key terms
• A measure: is said to have a high reliability if it produces similar results
under consistent conditions.
• Consistency: The quality of always being the same, doing things in the
same way, having the same standards.
• A Test Score: is a piece of information, usually a number, that conveys the
performance of a respondent on a test. One formal definition is that it is
‘a summary of the evidence contained in a respondent’s responses to the
items of a test that are related to the construct or constructs being
measured.’
Types of Reliability
1. Test-retest reliability is a measure of reliability obtained by
administering the same test twice over a period of time to a
group of individuals. The scores from Time 1 and Time 2 can
then be correlated in order to evaluate the test for stability over
time.
• Example: A test designed to assess student learning in
psychology could be given to a group of students twice, with
the second administration perhaps coming a week after the
first. The obtained correlation coefficient would indicate the
stability of the scores.
2. Parallel forms reliability is a measure of reliability obtained by
administering different versions of an assessment tool (both
versions must contain items that probe the same construct,
skill, knowledge base, etc.) to the same group of
individuals. The scores from the two versions can then be
correlated in order to evaluate the consistency of results across
alternate versions.
…Types of Reliability
3. Inter-rater reliability is a measure of reliability used to assess the
degree to which different judges or raters agree in their assessment
decisions.
4. Internal consistency reliability is a measure of reliability used to
evaluate the degree to which different test items that probe the
same construct produce similar results.
i. Average inter-item correlation is a subtype of internal consistency
reliability. It is obtained by taking all of the items on a test that probe the
same construct (e.g., reading comprehension), determining the correlation
coefficient for each pair of items, and finally taking the average of all of
these correlation coefficients. This final step yields the average inter-item
correlation.
ii. Split-half reliability is another subtype of internal consistency reliability. The
process of obtaining split-half reliability is begun by ‘splitting in half’ all items
of a test that are intended to probe the same area of knowledge (e.g., World
War II) in order to form two ‘sets’ of items. The entire test is administered to
a group of individuals, the total score for each ‘set’ is computed, and finally
the split-half reliability is obtained by determining the correlation between
the two totals ‘set’ scores.
6. ESTABLISHMENT OF NORMS
• When psychologists design a test to be used in a variety of settings,
they usually set up a scale for comparison by establishing norms.
• Norm is defined as the average performance or scores of a large
sample representative of a specified population.
• Norms are prepared to meaningfully interpret the scores obtained
on the test for as we know, the obtained scores on the test
themselves convey no meaning regarding the ability or trait being
measured. However, when these are compared with the norms, a
meaningful inference can immediately be drawn.
Types of norms:
– Age norms
– Grade norms
– Percentile norms
– Standard scores norms
• All these types of norms are not suited to all type of tests. Keeping
in view the purpose and type of test, the test constructer develops
a suitable norm for the test.
Age Norms
• Age norms indicate the average performance of different samples of test
takers who were at various ages at the time the test was administered.
• If the measurement under consideration is height in inches for example we
know that scores (heights) for children will gradually increase at various
rates as a function of age up to the middle to late teens.
• The child of any chronological age whose performance on a valid test of
intellectual ability indicated that he or she had intellectual ability similar to
that of the average child of some other age was said to have the mental age
of the norm group in which his or her test score fell.
• The reasoning here was that irrespective of chronological age, children with
the same mental age could be expected to read the same level of material,
solve the same kinds of math problems, and reason with a similar level of
judgment.
• But some have complained that the concept of mental age is too broad and
that although a 6-year-old might, for example perform intellectually like a
12-year-old, the 6 year old might not be very similar at all to the average 12
year old socially, psychologically and otherwise.
Grade Norm
• Grade norm was designed to indicate the average test performance
of test takers in a given school grade, grade norms are developed by
administering the test to representative samples of children over a
range of consecutive grade levels.
• Like age norms, grade norms have wide spread application with
children of elementary school age, the thought here is that children
learn and develop at varying rates but in ways that are in some
aspects predictable.
• One drawback in grade norms is that they are useful only with
respect to years and months of schooling completed. They have little
or no applicability to children who are not yet in school or who are
out of school.
Percentile Norm
• A percentile is an expression of the percentage of people whose
score on a test or measure falls below a particular raw score. For
example, the 20th percentile is the value (or score) below which 20%
of the observations may be found.
• A percentile is a converted score that refers to a percentage of test
takers.
• Percentage correct refers to the distribution of raw scores-more
specifically, to the number of items that were answered correctly
multiplied by hundred and divided by the total number of items.
• Because percentiles are easily calculated they are a popular way of
organising test data and are very adoptable to a wide range of tests.
Standard Score Norms
• When a raw score is converted into a formula it becomes standard
score.
• For example marks obtained in paper may be in 100% are applicable
only in specific area but when they are converted in GPA they
become standard score.
Steps To Generate a Norm
Test norms, are generated during the process of test construction
and test standardisation.
1. Identify the population of interest.
2. Identify the most critical statistics that will be computed for
the sample data (e.g., mean, standard deviation, percentile
ranks).
3. Decide on the tolerable amount of sampling error.
(discrepancy between the sample estimate and the population
parameter) for one of the statistics in step 2. (Frequently the
sampling error of the mean is specified.)
4. Devise a procedure for drawing a sample from the population
of interest.
5. Estimate the minimum sample size required to hold the
sampling error within the specified limits.
…Steps To Generate a Norm
6. Draw the sample and collect the data. Document the
reasons for any attrition which may occur. If substantial
attrition occurs (e.g., failure of an entire school to
participate after it has been selected into the sample), it
may be necessary to replace this unit with another chosen
by the same sampling procedure.
7. Compute the values of the group statistics of interest and
their standard errors.
8. Identify the types of normative scores that will be needed
and prepare the Normative conversion tables.
9. Prepare written documentation of the norming
procedure and guidelines for Interpretation of the
normative scores.
Checking for a Good Normative Sample
• A normative sample is a set of participants selected by a test
developer to create a reference group in which other test
takers scores will be compared and interpreted. And this
group is called Norm Reference Group
• How large is the normative sample?
• When was the sample gathered?
• Where was the sample gathered?
• How were individuals identified and selected for the sample?
• Who tested the sample? (professional or student).
• How did the examiner or examiners qualify to do the testing?
• What was the composition of the normative sample (age, sex,
ethnicity, race, linguistic background, education, geographic
distribution, any other relevant variable)?
Moving From Raw Data To Normed Data
Raw data, norms, and normed data are three different, albeit
related things:
• Raw datum/data - the outcome or numerical result of
some measurement - sometimes referred to as a raw score.
• Norm - a collection of raw scores which are processed and
grouped together to serve as a comparison group.
• Normed data - the result of mathematically comparing raw
data against a norm.
7. PREPARATION OF TEST MANUAL
• The last step in test construction is the preparation of a manual of the
test.
• In the manual the test constructor reports the psychometric
properties of the
– test,
– norms and
– references.
• This gives a clear indication regarding the procedures of the
– test administration,
– scoring methods and
– time limits (if any) of the test.
• It also includes instructions as well as the details of
arrangement of materials i.e., whether items have been
arranged in random order or in any other order.
• The test constructer finally orders for printing of the test and the
manual.
…PREPARATION OF TEST MANUAL
The following criteria are important in test manual development:
• Manual should be drafted far ahead of time as anytime it is done in a haste
it is mostly filled with error.
• The test manual should be reconsidered by colleagues and experts to
ascertain its qualities.
• Test manual should not be in exact number required. Excess items should
be available to take care of need.
• Test manual should be clearly worded, no ambiguous presentation and
should have only one meaning.
• Test manual should not contain too many words but rather it should be
concise, precise and straight to the point.
• Test manual should be constructed on what the test taker should know or
be able to make meaning of it. It should be within the limit of scheme of
work.
• Appropriate stimulus should be chosen. For instance primary school child
will understand illustration, diagrams better than verbal materials and
choice of whether item will be objective or essay will be an important
consideration.
• Test manual should not be worded in a manner that the response could
easily be deduced from the item itself. It should rather be intellectually
challenging to the test taker.
TEST ADMINISTRATION
• Test administration are procedures developed for an exam
/ assessment programme in order to help reduce
measurement error and to increase the likelihood of fair,
valid, and reliable assessment. Specifically, appropriate
standardised procedures improve measurement by
increasing consistency and test security.
• Test administration can also be referred to the process of
testing which involves administering, scoring and
interpretation of psychological test scores / performance
under certain, controlled and conducive test environment.
General Guidelines for Administrators to Follow
1. Provide ample time for test / exam.
2. Allow sufficient practice on sample items.
3. Use short testing periods if possible.
4. Make arrangements for deficits in visual, auditory, and
other sensory-motor systems.
5. Be aware of fatigue and test anxiety, and take them
into account when interpreting scores.
6. Use encouragement and positive reinforcement
whenever possible.
7. Do not force respondents / examinees to respond
when they repeatedly decline to do so.
Administrator’s Post-test Duties
• Collecting all exam material and ensuring that:
i. All tests have been handed in.
ii. Respondents know when grades or test papers can be collected
or will be posted.
iii. Return the test room to its pre-test set up.
Scoring The Exam / Test
• Administrator may be responsible for scoring the exams him / herself
or may mail them to a service or bring them to a computer grading
service.
• Score all answers of a specific essay at one time.
• Score all answers to a specific essay within one test scoring period.
• If both writing quality and essay content are to be graded, they should
be assigned separate grades before being combined.
• Have two readers score each essay, and let the final grade be the
average of the two scores given to a particular essay.
• Write comments next to the examinee’s responses, and correct errors
on their papers.
TESTING IN CLINICAL AND COUNSELLING SETTINGS
Clinicians
• The professionals who identify themselves as clinicians work in
such settings as hospitals, community clinics, mental health
centers, counseling centers and private practice.
• The clinicians most likely to make use of psychological tests are
psychologists, psychiatrists, and clinical or psychiatric social
workers.
Uses of Tests in Clinical Settings
Tests can uncover problems that a mental health professional may
not detect until much later. This allows the clinician to focus on the
appropriate treatment more quickly, thereby saving time and money
for the patient or client. Once a course of treatment has begun, tests
can help the clinician monitor the effectiveness of the treatment as it
proceeds.
Psychological testing and assessment
They are both methods for collecting information but the assessment is
much broader than psychological testing
• Psychological tests are only one tool in the assessment process
• Psychological testing has to do with the instrumentation part of
assessment - using psychometric tools for information gathering
• Tests are important because they provide a more standardised
procedure for gathering and interpreting relevant data, compared to
techniques like observation and interview
• Psychological assessment is a powerful tool, but its effectiveness
depends upon the skill and knowledge of the person interpreting
the test. When used wisely and in a cautious manner, psychological
assessment can help a person learn more about themselves and
gain valuable insights.
• Good psychologists know this and will take great care in writing up
a psychological assessment report, communicating in careful and
cautious language.
Frequently used tests by clinical
psychologists
• Wechsler Adult Intelligence Scale
• Minnesota Multiphasic Personality Inventory
• Bender Visual Motor Gestalt Test
• Rorschach Inkblot Test
• Thematic Apperception Test
• Wechsler Intelligence Scale for Children—Revised
• Peabody Picture Vocabulary Test
• Sentence Completion Tests (all kinds)
• House-Tree-Person Test
• Draw-a-Person Test
Primary types Psychological testing
The Clinical Interview
• Structured: predetermined set of questions. May be scored and leads to
diagnosis.
• Nondirective clinical interview: Nondirective interview is an interview in
which questions are not prearranged. Unstructured or nondirective
interviews generally have no set format.
• Semi-structured: some predetermined questions, some open-ended, plus
follow-up questions.
Intellectual functioning
• Electroencephalogram (EEG): A test or record of brain activity
electroencephalography(is a neurological test that uses electronic
monitoring device to measure and record electrical activity in the brain)
• Event-related potential (ERP): Event-related brain potentials (ERPs) are a
non-invasive method of measuring brain activity during cognitive
processing
• Imaging techniques: e.g., PET, MRI: Imaging is the technique and process
of creating visual representations of the interior of a body for clinical
analysis and medical intervention, as well as visual representation of the
function of some organs or tissues (physiology)
…Primary types Psychological testing
Objective personality testing:
• Objective tests are psychological tests that measure an
individual’s characteristics in a way that is independent of rater
bias or the individual’s own beliefs
• Is a type of paper and pencil personality assessment, often in
multiple choice or true/false formats, that assesses personality.
Projective Test:
• A projective test, in psychology, is a personality test designed to
allow a person to respond to ambiguous stimuli, presumably
revealing hidden emotions and internal conflicts. This is different
from an "objective test" in which responses are analysed
according to a universal standard (for example, a multiple choice
exam) rather than an individual’s judgment.
• The best known projective test is the Rorschach inkblot test in
which a patient is shown an irregular spot of ink and asked to
explain what they see.
Counsellors and what they do
• A counselling psychologist typically has a master’s or doctoral degree in
counselling psychology, which may be offered either through a psychology or
an education department. Other individuals who may use the term counselling
include social workers who have specialised in psychiatric social work, people
with master’s degrees in education who specialised in either school or non-
school counselling, and people with bachelor’s degrees who have received
additional training in substance abuse, marriage, or family counselling.
What they do
• Counsellors generally put their focus on what’s happening to you in the
present. This could be difficulties at work/home, one specific traumatic event
such as a bad break up or losing your job, or even just feeling more stressed
than usual.
• A counsellor will look at your immediate presenting symptoms and behaviour
(e.g. feeling more anxious than usual) and how that’s impacting your life,
rather than delving deeper into your childhood or past.
• Looking at these symptoms, the counsellor will focus on equipping you with
workable, short-term tools that can help you break out of negative thoughts
and habits.
Tests used by counselling psychologists
• Due to their focus on vocational and personal problems,
counseling psychologists tend to use psychological tests
that assess people’s abilities, personalities, and interests
• Tests commonly used are;
• Strong-Campbell Interest Inventory
• Wechsler Adult Intelligence Scale
• Nelson-Denny Reading Test
• Sixteen Personality Factor Questionnaire
What makes a good test?
An employment test is considered ‘good’ if the following can be said about
it:
• The test measures what it claims to measure consistently or reliably.
This means that if a person were to take the test again, the person
would get a similar test score.
• The test measures what it claims to measure. For example, a test of
mental ability does in fact measure mental ability, and not some other
characteristic.
• The test is job-relevant. In other words, the test measures one or more
characteristics that are important to the job.
• By using the test, more effective employment decisions can be made
about individuals. For example, an arithmetic test may help you to
select qualified workers for a job that requires knowledge of arithmetic
operations.
• The degree to which a test has these qualities is indicated by two
technical properties: reliability and validity
REPONSE BIAS, INTELLIGENCE & THEORIES OF
INTELLIGENCE
RESPONSE BIAS
• Response bias (also called survey bias) refers to the tendency
of a person to answer questions on a survey untruthfully or
misleadingly. It occurs when people answer test items in ways
that do not align with their true attitudes, beliefs, thoughts,
or behaviors.
• Response bias can reduce the reliability and validity of a
measurement, since questions are not being answered
truthfully. This can also make psychological tests and
assessments virtually useless, since any conclusions drawn
from test responses would be inaccurate.
Types of Response Bias
1. Social Desirability Bias: is the most common response bias. It refers to
people’s tendency to respond in ways that are more acceptable to others,
regardless of the truth. For example, respondents may be asked ‘How
often have you yelled at your partner?’ People may respond to this
question by under-reporting how often they actually yell in order to
make themselves appear more favorable. Another item asks, ‘How often
do you hug your partner?’ People may respond to this question by over-
reporting how often they hug, in an attempt to make themselves look
good.
2. Acquiescence Bias: refers to people’s tendency to agree with statements
regardless of what they mean. It can occur in any item in which the
respondents are asked to confirm a statement, especially a problem that
respondents are presented with items requiring them to either agree or
disagree with a statement. For example, respondents may be asked to
agree or disagree with the following statement, ‘You should never go to
bed angry at your partner.’ People may answer this question by stating
that they agree, but not because they truly agree with the statement;
they are merely being agreeable.
…Types of Response Bias
• Extreme Responding Bias: it is quite clear from the name itself,
extreme responding refers to giving extreme answers to questions -
extremely negative or extremely positive e.g. respondents may be
asked to rate or give feedback on a scale of 1-5, he/she may chose
either 5 being the highest or 1 the lowest
• Midpoint or Moderate Responding Bias: is the tendency for people to
use the middle rating in a scale regardless of the content. Some of the
reasons people choose to respond by choosing the midpoint include
not wanting to answer truthfully for some reason, not being sure about
their true opinion, and having little or no interest in the issue. For
example, if there are five items that require respondents to selects a
rating from one (never) to seven (always). A person with extreme
responding bias would select one or seven for most of the statements,
while a person with moderate responding bias would select four (the
midpoint).
Causes of Response Bias
Some conditions or factors that take place during the process of responding
to surveys, affecting the way responses are provided as follows:
• Unfamiliar content: the person may not have the background knowledge
to fully understand the question.
• Response bias is influenced because the participants are part of a study.
In an experimental setting, they tend to adopt behavior they think is
correct instead of been themselves thereby leading to bias response.
• Another reason is the setting where the experiment is conducted.
• Fatigue: giving a survey when a person is tired or ill may affect their
responses.
• Faulty recall: asking a person about an event that happened in the
distant past may result in erroneous responses
TEST BIAS & TESTING SPECIAL POPULATIONS
• Test Bias occurs when a test causes participants of similar abilities to
perform differently because of their socio economic status, culture,
religion, gender differences etc.
Types of Test Bias:
1. Cultural
2. Socioeconomic
3. Gender
4. Item
5. Construct
6. Sampling
7. Language
8. Examiner bias
• The individual and group tests are suitable for persons with normal or
near-normal capacities in speech, hearing, vision, movement, and
general intellectual ability. However, not every respondent (examinee)
falls within the ordinary spectrum of physical and mental abilities
• Some people who have physical / mental challenges might need
special accommodations during testing to ensure that their test scores
are accurate. This special populations are categorised into three groups
as follows:
1. Sensory Impairments – includes blindness and deafness.
2. Motor Impairments – includes disabilities such as paralysis
and missing limbs.
3. Cognitive Impairments – includes mental retardation,
learning disabilities, and traumatic brain injuries.
TESTING IN BUSINESS & INDUSTRY
Testing in business and industry is the administration of psychological test
which are designed and standardised to measures behavioural construct
in industries or organisations
Purpose Of Psychological Test in Business and Industry

• The purpose of psychological testing in business and industry is to
make more informed hiring decisions, it is used to determine the
ability of the potential employees to work under stressful
conditions and how to handle the job effectively under those
conditions.
Uses of test in business and industry

• Selection of new employees
• Evaluation of current employees
• Evaluation of programmes and products
Thank YOU

PSY 414 Psychological Testing & Construction 10.03.2021

Uploaded by

Copyright:

Available Formats

PSY 414 Psychological Testing & Construction 10.03.2021

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

PSY 414 Psychological Testing & Construction 10.03.2021

Uploaded by

Copyright:

Available Formats

PSY 414

Psychological Testing &

• Scorability – easy to score

3. PRELIMINARY ADMINISTRATION of the test

4. RELIABILITY of the final test

5. VALIDITY of the final test

6. Establishment of NORMS for the final test

7. Preparation of TEST MANUAL and reproduction of the test

• An item is a specific stimulus to which a person responds

• A test is a measurement device or technique used to

Validity has a number of sub-categories which all need to be

Purpose Of Psychological Test in Business and Industry

Uses of test in business and industry

You might also like