Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Split-Half Method

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 14

Split-half method:

This method treats the two halves of a measure as alternate forms.


It involves:
 Administering a test to a group of individuals
 Splitting the test in half
 Correlating scores on one half of the test with scores on the other half of the test
The correlation between these two split halves is used in estimating the reliability of the test.
This halves reliability estimate is then stepped up to the full test length using the Spearman–
Brown prediction formula.
There are several ways of splitting a test to estimate reliability. For example, a 40-item
vocabulary test could be split into two subtests, the first one made up of items 1 through 20 and
the second made up of items 21 through 40. However, the responses from the first half may be
systematically different from responses in the second half due to an increase in item difficulty
and fatigue.
In splitting a test, the two halves would need to be as similar as possible, both in terms of their
content and in terms of the probable state of the respondent. The simplest method is to adopt an
odd-even split, in which the odd-numbered items form one half of the test and the even-
numbered items form the other. This arrangement guarantees that each half will contain an equal
number of items from the beginning, middle, and end of the original test.

Types of Validity
The concept of validity applies to both whole studies (often called inference validity) and the measurement of individual
variables (often called construct validity).

Inference Validity

Inference validity refers to the validity of a research design as a whole. It refers to whether you can trust the conclusions of
a study.

1. Internal validity (largely about interpretability)

Refers to whether claimed conclusions, especially relating to causality, are consistent with research results (e.g., statistical
results) and research design (e.g., presence of appropriate control variables, use of appropriate methodology).

An obvious example of failing internal validity is when a researcher misinterprets a statistical result.
Internal validity can sometimes be checked via simulation, which can tell you whether a given theorized process could in
fact yield the outcomes that you claim it does.

2. External validity (generalizability)

This refers to the generalizability of results. Does the study say anything outside of the particular case? For example, in
your study of 150 workers in a consulting company's IT department, you find that the more central they are in the
friendship network, the better they do their jobs. To what extent can you say this is true of other workers?

A carpenter, a school teacher, and scientist were traveling by train through Scotland when they saw a black sheep through
the window of the train. "Aha," said the carpenter with a smile, "I see that Scottish sheep are black." "Hmm," said the
school teacher, "You mean that some Scottish sheep are black." "No," said the scientist, "All we know is that there is at
least one sheep in Scotland, and that at least one side of that one sheep is black."

Three strategies for strengthening external validity:


 Sampling. Select cases from a known population via a probability sample (e.g., a simple random sample). This
provides a strong basis for claiming the results apply to the population as a whole.
 Representativeness. Show the similarities between the cases you studied, and a population you wish your results to
be applied to, and argue that the correlations you found in your study will also be similar in the other setting.
 Replication. Repeat the study in multiple settings. Use meta statistics to evaluate the results across studies.
Although journal reviewers don't always agree, consistent results across many settings with small samples is more
powerful evidence than a large sample of a single setting. 

Construct Validity

Construct validity refers to the validity of a variable that is being measured. There are many subtypes that have been
defined. One should not get too hung up on the exact terminology used because there is a lot variation in usage. The
breakdown below is Trochim's version.

1. Translation Validity
Subjective evaluation of whether a measure matches the construct it is meant to measure.

1.1 Face validity is often used to mean 'does it pass the test of common sense?'. Does it mean the same thing as the
concept. e.g., if you want to know if someone is a liberal, asking "are you a liberal?" has a lot of face validity. Asking 'Do
you have a precocious child?' has low face validity (but might predict well). 

1.2 Content validity. Do all of the elements of the measure seem connected in the right direction to the concept. e.g., in
determining if there is there is fire, asking Is there smoke? Destruction? Heat? Ash? Burnt stuff? 

2. Criterion Validity

How well the measure relates to other measures and characteristics.

2.1 Predictive validity. Ability to predict future events. e.g., an attitudinal battery of questions that measures risk of
divorce scale actually predicts divorces. Intent-to-buy attitude scales that actually predict future purchases. 

2.2 Concurrent validity. Ability to discriminate between groups. For example, if you are testing math, engineers should do
better on the test than poets. 

2.3 Convergent validity. Does the measure correlate positively with other measures of the same construct, or measures of
very similar constructs? e.g., a new, easier-to-administer scale of organizational commitment should correlate strongly
with the old, longer scale that it is intended to replace.

2.4 Discriminant validity. The measure should correlate poorly with measures of different constructs. E.g., we don’t want
our emotional intelligence measure to correlate too well with self-monitoring. It should be measuring something different.
The four types of validity
In quantitative research, you have to consider the reliability and validity of your methods and
measurements.
Validity tells you how accurately a method measures something. If a method measures what it
claims to measure, and the results closely correspond to real-world values, then it can be
considered valid. There are four main types of validity:
 Construct validity: Does the test measure the concept that it’s intended to measure?
 Content validity: Is the test fully representative of what it aims to measure?
 Face validity: Does the content of the test appear to be suitable to its aims?
 Criterion validity: Do the results correspond to a different test of the same thing?
Note that this article deals with types of test validity, which determine the accuracy of the actual
components of a measure. If you are doing experimental research, you also need to
consider internal and external validity, which deal with the experimental design and the
generalizability of results.
Construct validity
Construct validity evaluates whether a measurement tool really represents the thing we are
interested in measuring. It’s central to establishing the overall validity of a method.
What is a construct?
A construct refers to a concept or characteristic that can’t be directly observed, but can be
measured by observing other indicators that are associated with it.
Constructs can be characteristics of individuals, such as intelligence, obesity, job satisfaction, or
depression; they can also be broader concepts applied to organizations or social groups, such as
gender equality, corporate social responsibility, or freedom of speech.
Example
There is no objective, observable entity called “depression” that we can measure directly. But
based on existing psychological research and theory, we can measure depression based on a
collection of symptoms and indicators, such as low self-confidence and low energy levels.
What is construct validity?
Construct validity is about ensuring that the method of measurement matches the construct you
want to measure. If you develop a questionnaire to diagnose depression, you need to know: does
the questionnaire really measure the construct of depression? Or is it actually measuring the
respondent’s mood, self-esteem, or some other construct?
The other types of validity described below can all be considered as forms of evidence for
construct validity.
Content validity
Content validity assesses whether a test is representative of all aspects of the construct.
To produce valid results, the content of a test, survey or measurement method must cover all
relevant parts of the subject it aims to measure. If some aspects are missing from the
measurement (or if irrelevant aspects are included), the validity is threatened.
Example
A mathematics teacher develops an end-of-semester algebra test for her class. The test should
cover every form of algebra that was taught in the class. If some types of algebra are left out,
then the results may not be an accurate indication of students’ understanding of the subject.
Similarly, if she includes questions that are not related to algebra, the results are no longer a
valid measure of algebra knowledge.
Face validity
Face validity considers how suitable the content of a test seems to be on the surface. It’s similar
to content validity, but face validity is a more informal and subjective assessment.
Example
You create a survey to measure the regularity of people’s dietary habits. You review the survey
items, which ask questions about every meal of the day and snacks eaten in between for every
day of the week. On its surface, the survey seems like a good representation of what you want to
test, so you consider it to have high face validity.
As face validity is a subjective measure, it’s often considered the weakest form of validity.
However, it can be useful in the initial stages of developing a method.
Criterion validity
Criterion validity evaluates how closely the results of your test correspond to the results of a
different test.
What is a criterion?
The criterion is an external measurement of the same thing. It is usually an established or widely-
used test that is already considered valid.
What is criterion validity?
To evaluate criterion validity, you calculate the correlation between the results of your
measurement and the results of the criterion measurement. If there is a high correlation, this
gives a good indication that your test is measuring what it intends to measure.
Example
A university professor creates a new test to measure applicants’ English writing ability. To
assess how well the test really does measure students’ writing ability, she finds an existing test
that is considered a valid measurement of English writing ability, and compares the results when
the same group of students take both tests. If the outcomes are very similar, the new test has a
high criterion validity.

\\\\\\\\\\\\\\
What makes a good psychological test? In this lesson, we'll discuss the test construction process
and essential steps in the item writing and item analysis phases of this process.
Psychological Testing
Randall is a researcher who wants to conduct a study on how motivation affects behavior. Before
he can begin collecting data for his study, Randall will need to develop a psychological test.
A psychological test is an objective and standardized measure of a sample of behavior.
Randall is an experienced researcher, so he knows that there are four main characteristics of a
good psychological test.
First, a good test must have objectivity, meaning that it's free from subjective elements
regarding the interpretation of items and the scoring of the test.
A good test must also have reliability, meaning that the test obtains consistent results when it's
administered.
Randall's test should also demonstrate validity, which indicates the extent to which the test
measures what it intends to measure.
Finally, Randall's test needs to be practicable, in that the test isn't too lengthy and the scoring
method isn't too difficult.
The Test Construction Process
Now it's time for Randall to begin developing his test. He knows that the development of a good
psychological test requires six essential steps:
1. Planning
2. Writing items for the test
3. Preliminary administration of the test
4. Checking the reliability of the final test
5. Checking the validity of the final test
6. Preparation of the test manual and reproduction of the test
The first step in this process—careful planning—is the most important. Randall will have to
make important decisions about test content, format, instructions, length, and scoring.
Item Writing
As Randall begins step two—item writing—he knows that a good test item must have the
following characteristics:
 Clarity in meaning
 Clarity in reading
 Not too easy or too difficult
 Doesn't encourage guesswork
 Gets to the point
To make sure that his test items meet these standards, Randall follows some general guidelines
for item writing:
 He avoids using non-functional words, or words that make no contribution towards the
appropriate and correct choice of a response.
 He writes items that are adaptable to the level of understanding of different types of
respondents.
 He avoids using stereotyped words.
 He avoids items that provide irrelevant clues, such as items with the same order of the
correct answer.
 He avoids items that can be answered only by referring to other items.
Randall must also decide how many items his test will include. This can be a tricky decision,
since there's no hard and fast rule for this. He'll need to consider several different factors,
including the number of items that can be answered within a given time limit and the number of
items that will be needed for statistical analyses.
After weighing these factors, Randall decides that his final test should include 35 items.
However, Randall knows that many of his initial items will be dropped from the test after item
analysis. He anticipates that only about half of his initial items will be retained in the final test,
so he decides to write 70 items for his preliminary test.
After writing all of his items, Randall arranges them so that they are in an increasing order of
difficulty and so that items dealing with the same topic are placed together.
Item Analysis
Now it's time for Randall to begin a process known as item analysis, in which he'll examine
responses to his individual test items in order to assess the quality of those items and of the test
as a whole. This begins by administering his preliminary test with a pilot group. Next, he'll use
these results to check the reliability and validity of his test.
Recall that reliability means that a test obtains consistent results. There are a few different
methods that Randall can use to check reliability. Two of the most common are the test-retest
method and the alternate form method.

The four types of validity


Published on September 6, 2019 by Fiona Middleton. Revised on June 19, 2020.

In quantitative research, you have to consider the reliability and validity of your methods and


measurements.

Validity tells you how accurately a method measures something. If a method measures what it
claims to measure, and the results closely correspond to real-world values, then it can be
considered valid. There are four main types of validity:

 Construct validity: Does the test measure the concept that it’s intended to measure?
 Content validity: Is the test fully representative of what it aims to measure?
 Face validity: Does the content of the test appear to be suitable to its aims?
 Criterion validity: Do the results correspond to a different test of the same thing?

Note that this article deals with types of test validity, which determine the accuracy of the actual
components of a measure. If you are doing experimental research, you also need to
consider internal and external validity, which deal with the experimental design and the
generalizability of results.

Table of contents

1. Construct validity
2. Content validity
3. Face validity
4. Criterion validity

Construct validity
Construct validity evaluates whether a measurement tool really represents the thing we are
interested in measuring. It’s central to establishing the overall validity of a method.

What is a construct?
A construct refers to a concept or characteristic that can’t be directly observed, but can be
measured by observing other indicators that are associated with it.

Constructs can be characteristics of individuals, such as intelligence, obesity, job satisfaction, or


depression; they can also be broader concepts applied to organizations or social groups, such as
gender equality, corporate social responsibility, or freedom of speech.

Example

There is no objective, observable entity called “depression” that we can measure directly. But
based on existing psychological research and theory, we can measure depression based on a
collection of symptoms and indicators, such as low self-confidence and low energy levels.

What is construct validity?


Construct validity is about ensuring that the method of measurement matches the construct you
want to measure. If you develop a questionnaire to diagnose depression, you need to know: does
the questionnaire really measure the construct of depression? Or is it actually measuring the
respondent’s mood, self-esteem, or some other construct?

To achieve construct validity, you have to ensure that your indicators and measurements are
carefully developed based on relevant existing knowledge. The questionnaire must include only
relevant questions that measure known indicators of depression.

The other types of validity described below can all be considered as forms of evidence for
construct validity.

Content validity
Content validity assesses whether a test is representative of all aspects of the construct.

To produce valid results, the content of a test, survey or measurement method must cover all
relevant parts of the subject it aims to measure. If some aspects are missing from the
measurement (or if irrelevant aspects are included), the validity is threatened.

Example

A mathematics teacher develops an end-of-semester algebra test for her class. The test should
cover every form of algebra that was taught in the class. If some types of algebra are left out,
then the results may not be an accurate indication of students’ understanding of the subject.
Similarly, if she includes questions that are not related to algebra, the results are no longer a
valid measure of algebra knowledge.
What can proofreading do for your paper?
Scribbr editors not only correct grammar and spelling mistakes, but also strengthen your writing
by making sure your paper is free of vague language, redundant words and awkward phrasing.

See editing example

Face validity
Face validity considers how suitable the content of a test seems to be on the surface. It’s similar
to content validity, but face validity is a more informal and subjective assessment.

Example
You create a survey to measure the regularity of people’s dietary habits. You review the survey
items, which ask questions about every meal of the day and snacks eaten in between for every
day of the week. On its surface, the survey seems like a good representation of what you want to
test, so you consider it to have high face validity.
As face validity is a subjective measure, it’s often considered the weakest form of validity.
However, it can be useful in the initial stages of developing a method.

Criterion validity
Criterion validity evaluates how closely the results of your test correspond to the results of a
different test.

What is a criterion?
The criterion is an external measurement of the same thing. It is usually an established or widely-
used test that is already considered valid.

What is criterion validity?


To evaluate criterion validity, you calculate the correlation between the results of your
measurement and the results of the criterion measurement. If there is a high correlation, this
gives a good indication that your test is measuring what it intends to measure.

Example

A university professor creates a new test to measure applicants’ English writing ability. To
assess how well the test really does measure students’ writing ability, she finds an existing test
that is considered a valid measurement of English writing ability, and compares the results when
the same group of students take both tests. If the outcomes are very similar, the new test has a
high criterion validity.

You might also like