Subtest Origins
Subtest Origins
Subtest Origins
This manuscript has been accepted for publication in its current form. Please cite as:
Gibbons, A., & Warne, R. T. (2019). First publication of subtests in the Stanford-Binet 5,
doi:10.1016/j.intell.2019.02.005
Abstract
In this article we describe the origins of the subtests that appear on the modern Stanford-Binet
Intelligence Scales (SB5), Wechsler Preschool and Primary Scale of Intelligence (WPPSI-IV),
Wechsler Intelligence Scale for Children (WISC-V), and Wechsler Adult Intelligence Scale
(WAIS-IV). We found that the majority of these subtest formats were first created in 1908 or
earlier and that only three have been created since 1980. We discuss the implications of this
findings, which are that (1) many subtests have lengthy research histories that support their use
in measuring intelligence; (2) many subtests have formats that predate modern theories of test
creation, cognitive psychology, and intelligence; and (3) the history of many subtests is more
One of the first successes in applied psychology was the development of intelligence
tests. Early tests in the 1910’s and 1920’s found rapid, widespread acceptance, with millions of
American examinees tested every year (Cronbach, 1975; Thorndike, 1975; Yerkes, 1921). The
use of these tests persists today, and in the 21st century the most popular individually
administered intelligence tests are the Stanford-Binet Intelligence Scale (SB5) and the Wechsler
Intelligence Scales, the latter of which are the Wechsler Adult Intelligence Scale (WAIS-IV), the
Wechsler Intelligence Scale for Children (WISC-V), and the Wechsler Preschool and Primary
Scale of Intelligence (WPPSI-IV). These instruments have dominated intelligence testing for
decades. The original version of the Stanford-Binet scale was first published over 100 years ago
(Terman, 1916), though many of the items were direct translations or close adaptations of items
from Binet’s 1905, 1908, and 1911 intelligence scales. Ironically, Binet and Terman had
opposite goals in their work on intelligence testing. Binet aimed to identify children who were
struggling academically (Wolf, 1973), while Terman had an interest in identifying gifted
children—an interest which started with his dissertation (Terman, 1905) and lasted until his
death. Indeed, Terman’s research on gifted children is his work that Terman is best remembered
today (Warne, 2019). The Stanford-Binet has been revised several times since 1916, with the
The first Wechsler scale appeared in 1939 as the Wechsler-Bellevue, an intelligence test
designed for adult examinees (see description in Wechsler, 1944), as opposed to the child
examinees that Terman designed the Stanford-Binet for. Wechsler disapproved of the heavily
verbal content of the early versions of the Stanford-Binet and of the test’s ability to produce a
FIRST PUBLICATIONS OF SB & WECHSLER TESTS 4
global IQ as the only measure of a person’s intellectual level (Wechsler, 1944). Therefore, he
designed his test to produce a verbal and performance (i.e., non-verbal) IQ score. To create the
Wechsler-Bellevue, Wechsler evaluated item formats that appeared on prior scales and selected
the ones which he thought were the best measures of intelligence, based on his research (Boake,
2002; Wechsler, 1944) and his experience administering the Army Alpha and Army Beta in
Texas during World War I (Yerkes, 1921, pp. 40, 80). As he wrote, “Our aim was not to produce
a set of brand new tests but to select, from whatever source available, such a combination of
them as would best meet the requirements of an effective adult scale” (Wechsler, 1944, p. 76).
Wechsler favored test formats and items that (a) showed high discrimination in intelligence
across much of the continuum of ability, (b) produced scores with high reliability, (c) correlated
strongly with other widely accepted measures of intelligence, and (d) correlated with
“pragmatic” subjective ratings of intelligence from people who knew the examinee—such as a
work supervisor (Wechsler, 1944). These criteria led Wechsler to believe that, for example, an
information subtest was effective but that the Army Beta’s cube analysis subtest was not
(because the latter was incapable of discriminating among people with intellectual disabilities).
The success of the scale led Wechsler to create a separate test for children (the WISC) in 1949
and another for preschool children (the WPPSI) in 1967. All Wechsler tests have been revised
Throughout the years, however, psychologists have updated these tests with new analyses
and norm samples, while also adding or removing subtests. Despite the revisions that have
occurred over the decades, the revisers of the Wechsler scales or the Stanford-Binet have never
completely replaced every subtest when updating an intelligence scale. The result is that
FIRST PUBLICATIONS OF SB & WECHSLER TESTS 5
contemporary versions of these tests are an amalgamation of old subtest formats and modern test
construction methods.
It is the legacy of these old subtests on modern tests that intrigued us. Knowing that many
subtests on the Stanford-Binet or the Wechsler scales long predate the current versions of these
tests, we investigated the origins of these subtests, hoping to find the earliest publication of the
subtest format in the scholarly literature. Throughout the history of the changes to the subtests,
there has never been a compilation of the origins of the subtests on popular intelligence scales.
Considering many of the subtests that have long been part of the SB or the Wechsler scales are
still in use today, it is important to understand where they came from. The origin of these
subtests provides valuable information about the creation of the SB and Wechsler scales and may
shed light on test theory and test score interpretation. We believed that understanding the history
of subtests would lead intelligence test users to have a greater appreciation of these subtests.
Moreover, we have engaged in this historical research with the goal of correcting
misconceptions that psychologists have about the origin of frequently used intelligence subtests.
For example, in one article the authors claimed that Corsi invented the block tapping task in
1972 (Wongupparaj, Wongupparaj, Kumari, & Morris, 2017, p. 72). In reality, we show below
that the task was invented in 1913. Likewise, we found multiple sources (e.g., Boake, 2002;
Frank, 2013) that stated that the picture completion subtest (found on the WAIS-IV) originated
with Healy (1914), but we discovered that Healy’s task is different from the modern subtest,
which originated with Binet (see below). We believe that such misconceptions are probably
common. An incorrect understanding of the origin of a subtest may limit the thoroughness of
literature searches about psychometric validity or the Flynn effect. Finally, research about the
FIRST PUBLICATIONS OF SB & WECHSLER TESTS 6
subtests’ psychometric properties outside of the context of the SB or Wechsler scales can
Search Procedures
The task of identifying the origin of subtests may seem easy at first glance, but there are
circumstances that make the task difficult. When the SB or Wechsler scales were first created or
later updated, the test creators or revisers often did not state any origins of the subtests on their
scales, let alone provide any citations for the first description of subtests. Modern test manuals
for these tests are silent on the issue of the origin of their subtests, probably because many
readers do not find information on the origins as important as technical data (e.g., validity of
Lastly, throughout their history, many of these subtests have been known by different names or
were changed slightly (e.g., from written format to oral). These changes sometimes made it hard
Our search procedures for these tests started with a careful reading of lengthy accounts of
the early history of intelligence testing (Boake, 2002; Matarazzo, 1972; Peterson, 1926/1969;
Wolf, 1973; Young, 1924). When these works discussed a particular subtest that resembled a
subtest on a modern Wechsler scale or the Stanford-Binet 5, we investigated literature that the
author cited so that we could track down the original source of the subtest. We did the same for
two sources about the history of a specific subtest (Richardson, 2005, 2011). Additionally, we
consulted the manuals for the original Stanford-Binet (Terman, 1916), the Army Alpha and
Army Beta (Yerkes, 1921), and we read English translations of Binet’s reports of his original
scales (Binet 1911/1916; Binet & Simon, 1905a/1916, 1908/1916) to understand which subtests
appeared on these influential instruments and to try to link them with modern subtests. Finally,
FIRST PUBLICATIONS OF SB & WECHSLER TESTS 7
we conducted searches of each subtest’s names in Google Scholar and PsycInfo in an effort to
search for any earlier mentions of the subtests than what we had found.
We conducted all searches for each individual subtest, and we never searched for
multiple subtests’ origins at the same time. Once we identified an early use of a subtest, we
verified that the description did indeed correspond to modern intelligence subtests. (This was
important because sometimes a verbal description, such as “picture completion,” did not
correspond to a modern subtest.) In an effort to verify that we had indeed found the earliest
publication or description of a test, we would then search for earlier sources than what we had
found. To do this, we first searched the source’s citations in an effort to find any earlier
indications of the subtest’s use in the scholarly literature. We also conducted searches of
scholarly databases using terminology we found in the article to look for earlier sources. When
we exhausted these avenues and failed to find any earlier sources, we stopped the search for an
Subtests
Table 1 lists all of the subtests found in the Stanford-Binet 5, the WAIS-IV, the WISC-V,
and the WPPSI-IV. Subtests with very similar formats are combined into a single row. For
example, the WISC-V Digit Span, WAIS-IV Digit Span, WISC-V Picture Span, and WISC-V
Letter-Number Sequencing all require examinees to repeat in order a sequence of stimuli that
have been presented. Although the stimuli and/or difficulty differ, the required tasks are all
sufficiently similar that we saw the later subtests (e.g., WISC-V Letter-Number Sequencing) as
an adaptation of the original test (i.e., the Digit Span subtest). Thus, for each row in Table 1 we
only searched for a single subtest origin. Finally, readers should note that the subtests in Table 1
FIRST PUBLICATIONS OF SB & WECHSLER TESTS 8
and in the rest of this section are listed in alphabetical order; when subtests are combined, we
listed the most widely known name for the subtest first.
subtraction, multiplication, and division and have been on intelligence tests for a long time.
According to Wechsler, arithmetic items were used as a, “rough and ready measure of
intelligence,” as early as the late 1800s (1944, p. 82). Arithmetic items were common on
academic achievement tests before Binet, but these questions would not have been standardized
among different tests. Not only were arithmetic tests found on the original Wechsler scales, but
these types of items can also be found on the Army Alpha (Yoakum & Yerkes, 1920), Binet’s
scales (in questions like, “Counting 4 single sous,” were found on the 1908 scale; Binet &
Simon, 1908/1916), and on a test of reasoning ability created by Bonser (1910), though these do
not predate Binet and Simon’s (1908) use. Stone (1908) also created a standardized arithmetic
test that resembles items found on early intelligence tests, though it is unclear whether his work
had any influence on the creators or revisers of the Stanford-Binet or Wechsler tests.
The verbal quantitative reasoning subtest found on the Stanford-Binet 5 consists of items
where subjects are asked to count, perform addition and subtraction problems, and name
numbers. This test is extremely similar to arithmetic, so we believe that it has the same origin as
Due to the demand recently to have more nonverbal items on intelligence scales, a
nonverbal quantitative reasoning subtest was formed for the SB5. The main difference between
nonverbal and verbal quantitative reasoning is that the verbal version of the test has the questions
FIRST PUBLICATIONS OF SB & WECHSLER TESTS 9
written in words and numbers, while the nonverbal version uses pictures to ask arithmetic
questions. Even though the nonverbal quantitative reasoning subtest uses pictures instead of
words, its origins can be traced back to the same place as arithmetic items (which use words to
ask math questions). Verbal arithmetic-like items have not only been found on the early Binet
scales (Binet & Simon, 1908/1916), but have also been used to quickly measure intelligence
even before psychometrics was developed (Wechsler, 1944). It is likely that these types of items
were also used on exams such as academic achievement tests. Nonverbal forms of arithmetic
questions may have been used for other academic purposes as well, but the first time nonverbal
quantitative reasoning items have been found on an intelligence scale is in the latest version of
the SB5.
Block Design
In the block design subtest on the Wechsler tests, the examinee recreates a picture or
model they have seen with blocks. This subtest was first published in Kohs’s 1920 article, “The
Block-Design Tests.” Kohs stated in the opening paragraphs of his article that his goal was to
create a performance task that could measure intelligence without using language in the
instructions or executing the task. According to Boake (2002), Kohs based his cube task on a
game of the time named Color Cubes, which were being used already in classrooms to teach
children to imitate visual designs and learn colors (e.g., The Special Class Teachers’ Club,
1917).
Block Span
In the block span subtest, an examinee is shown an array of blocks, which the examiner
taps in a predetermined order. The examinee must then repeat the sequence of taps. (In some
intelligence test batteries, this test is called the Corsi block test; we call it “block span” because
FIRST PUBLICATIONS OF SB & WECHSLER TESTS 10
this is its name on the SB5.) Although the block span subtest resembles the more common digit
span subtest (see below), the two subtests have different origins. The block span task has its roots
traced back to Knox’s (1913) Cube Imitation Test, which was designed as part of a non-verbal
test battery to identify immigrants at Ellis Island who had intellectual disabilities (Richardson,
2011). In Knox’s original version, the examiner would tap a series of larger cubes in a
predetermined order with a smaller cube; the examinee was to then use the smaller cube to repeat
the sequence. In later versions, the smaller cube was replaced with another object, generically
called a “pawn” (Richardson, 2005). In the modern SB5 block span subtest, the pawn has been
removed from the test, with the examinee instead using their fingers to tap the blocks in the order
they are shown. It is interesting to note that because screening immigrants was a task for
physicians, Knox had a distinctly medical viewpoint of intellectual disabilities and recommended
that only physicians administer his test battery, including the cube imitation test.
Cancellation
In the cancellation subtest, examinees are given a paper with a wide variety of random
symbols on it (e.g., jumbled letters of the alphabet). The examinee is also told to cross out every
example of a particular target symbol that they can find (e.g., every “B”). The earliest mention
we can find of this test comes from Peterson (1926/1969, pp. 79-80), who stated that Oehrn
reported in his 1889 dissertation a cancellation test of sorts that would ask the subjects to find
certain letters. Oehrn’s cancellation test was one of three that he used to measure “perception,”
the others requiring examinees to count the number of letters printed randomly on a page and to
Coding/Animal Coding
FIRST PUBLICATIONS OF SB & WECHSLER TESTS 11
In the coding subtest, examinees are given a key for symbols and are asked to decode a
message based on that same key. For example, the key could be as simple as 1 = A, 2 = B, etc.
With this sort of code, the examinee would then be asked to encode a message, like converting
“cat” to “3-1-20”. An influential use of the coding subtest was the “digit symbol subtest” found
on the Army Beta (Yerkes, 1921; Yoakum & Yerkes, 1920), which required examinees to
convert numbers to geometric symbols. The Army Beta creators credited the first appearance of
a coding subtest to Pyle’s (1913) book, where he called it the “substitution test.” However,
Dearborn (1910) seems to be the first researcher to use this type of subtest. In his study,
Dearborn gave different tasks to college students in their classroom. One of them was “The
Practice Experiment,” which strongly resembles the modern-day coding subtest. Dearborn
(1910) believed that this task could measure the speed at which a person could master a new
piece of information (i.e., the code) and reproduce it. However, it is important to notice, that
Dearborn administered the task across multiple days, whereas modern tests only administer
Comprehension
During a comprehension subtest, the subject is asked to produce the answer to a question
that is not considered a “fact,” but which can be answered using previously learned informal
knowledge. Wechsler (1944) acknowledged that comprehension questions predate the creation of
his instruments. The original questions seem to appear on Binet’s original scale (Binet & Simon,
1905a/1916) under the section titled “Reply to an Abstract Question,” as well as the 1908 scale
(under the subtest titled “Comprehension Questions”; see Binet & Simon, 1908/1916). One
example from the third version of Binet’s test is, “When one breaks something belonging to
another what must one do?” (Binet, 1911/1916, p. 224). In his original scale, Binet stated that,
FIRST PUBLICATIONS OF SB & WECHSLER TESTS 12
“This test is one of the most important of all, for the diagnosis of mental debility” (Binet &
Simon, 1905a/1916, p. 65). Wechsler (1944) also mentions that comprehension tests are those
that involve common sense, knowledge of practical information, and the ability to use past
Delayed Response
In the delayed response subtest on the SB5 there are three cups with a toy under them.
After mixing the three cups around, the examinee tries to select the cup has the toy underneath it.
According to Roid and Barram (2004), this subtest is based on the “classic shell game” (p. 39),
and is a measure of short-term memory. The “classic shell game” has been used in criminal
activity (e.g., three-card monte) and as a magic trick. (However, it is important to note that the
delayed response subtest found on the SB5 lacks the deception of the classic shell game or a
sleight-of-hand trick.) Though the SB5 seems to be the first time that the shell game has
appeared on an intelligence scale, this task long predates intelligence tests. Apparently, this game
came to America in the 18th century from England as a variant called “thimble-rig” (Maurer,
1947). Though thimble-rig was played with thimbles instead of cups, it still had the basic
concept of the subject determining which cover had the object underneath it. Different versions
of thimble-rig are still played today and have been for centuries.
In the digit span subtest, the examiner verbally gives a series of one-digit numbers which
the subject must repeat. In some variants (often called backward digit span or reverse digit span),
the subject must repeat the sequence backwards. The picture span subtest is extremely similar;
during the task, the examinee is shown a set of pictures and then the subject must select the
pictures (preferably in a specified order) from a different array of images. The letter-number
FIRST PUBLICATIONS OF SB & WECHSLER TESTS 13
sequencing subtest consists of verbally giving the examinee a set of numbers and letters and then
asking the examinee to repeat them back in alphabetical and numerical order.
The origin of the digit span subtest was an article by Jacobs (1887) in which he described
studies on school-age children in which the examiner read numerals out loud twice and then
required the subjects to repeat the numbers (either aloud or written on paper). Jacobs was
inspired by Ebbinghaus’s research memorizing nonsense symbols, and he believed that the digit
span task would have more accessible stimuli as a test of short-term memory capacity. This task
found its way to the 1905 version of Binet’s scale, where it was named “Repetition of Three
Figures” (Binet & Simon, 1905a/1916) because Binet had his examinees repeat back three
numbers (figures) that the examiner gave orally. It is not clear whether Binet was aware of
Jacobs’s (1887) article, but Jacobs showed that performance on digit span was better for
successively older children. This age progression in performance was a characteristic that Binet
saw as desirable in a task because he believed that intellectual ability increased (on average) with
age in children.
Digit span is a perennially popular subtest on intelligence tests, and innovations are not
unusual. Terman (1916, p. 207) credited Bobertag with inventing the backward digit span test in
1911. Blair (1957) suggested a nonverbal task which we see as the precursor to the modern
picture span subtest on the WAIS-IV and WISC-V. Blair’s task was designed to measure
memory span in deaf and hearing children by showing young examinee a series of cards with
visual stimuli; the child must then point to the stimuli in the same order on a set of identical
response cards. Modern users of the WISC-V and WAIS-IV are familiar with the letter-number
sequencing subtest, which requires the series of stimuli to be repeated in either ascending order
(for numbers) or alphabetical order (for letters). This subtest appeared on the WAIS-III
FIRST PUBLICATIONS OF SB & WECHSLER TESTS 14
(Wechsler, 1997) and the version of the Wechsler Memory Scale that appeared the same year.
We see all of these tasks as adaptations of the original digit span task that Jacobs (1887)
proposed.
Early Reasoning
The SB5 includes the early reasoning subtest, which requires the young examinee to use
pictorial stimuli and tell a story about the image based on visual cues. The second Binet scale
(Binet & Simon, 1908/1916) is a clear predecessor for this test; the 1908 scale has three images,
each containing at least one human figure. The child then was asked to describe the picture, and
more complex responses based on interpretation (rather than simply naming objects in the
image) were viewed as indicative of greater intellectual ability. Binet found this subtest so useful
when diagnosing intellectual disabilities that he wrote, “Very few tests yield so much
information as this one. . .. We place it above all the others, and if we were obliged to retain only
one, we should not hesitate to select this one” (Binet & Simon, 1908/1916, p. 189).
Figure Weights
In the figure weights subtest found on the WISC-V and WAIS-IV, the examinee is shown
an image of scales with different weights on both sides. The examinee then chooses what type of
weights would balance a third scale. The figure weights task seems to be most similar to the
original Piagetian balance beam task, which Inhelder and Piaget introduced in 1955 as a task that
can indicate whether a child has reached the formal operational stage of reasoning (de
Ribaupierre & Lecerf, 2006). The main difference between the two is that figure weights is two-
dimensional (on paper) and is only based on different colors and shapes while the Piagetian
balance beam task uses actual weights in the item administration process. Even so, the Piagetian
The form board and form patterns subtests contain tasks that ask the examinee to match
geometric shapes to other geometric shapes. Early versions of the form board resembled modern
puzzles for young children, with wooden pieces that had to be placed into matching shapes that
were cut into a wooden board. Form boards are among the oldest subtests still in use today; Jean
Marc Gaspard Itard was the first to use a form board-like task when he studied and educated a
young boy found in the wild (named the “wild boy of Aveyron”) in 1798. Itard’s successor and
colleague, Édouard Séguin, made more permanent versions of the same test, and Séguin’s widely
read descriptions of form boards resulted in their popular usage among psychologists and
physicians studying and training children and individuals with intellectual disabilities
The very similar visual puzzles and object assembly subtests have an origin in the puzzles
used for entertainment and geography education, which were first created in the 1750s in
England and were in widespread use in the early 20th century when the first intelligence tests
were being created (Norgate, 2007). Both form boards and visual puzzles/object assembly were
incorporated into nonverbal testing settings (Richardson, 2011), and a paper-and-pencil version
of object assembly—in which the examinee must divide a square to show how a set of two or
three shapes can form the entire square—was present on the Army Beta (Yoakum & Yerkes,
1920).
Information
information have, for a long time, been the stock in trade of psychiatric examinations, and prior
to the introduction of standardized intelligence tests they were widely used by psychiatrists in
FIRST PUBLICATIONS OF SB & WECHSLER TESTS 16
estimating the intellectual level of patients” (p. 77). The original Binet scale also had information
items, such as, “Giving the name of four common coins,” and asking the examinee to give their
age (Binet & Simon, 1905a/1916). Binet and Simon (1905b/1616) gave explicit credit to the
French physicians Blin (1902) and Damaye (1903) for these items about topics that it was
reasonable to expect a person to be exposed to a given culture to know. The original items that
inspired Binet covered a variety of topics, including questions about the body and age in general;
they are clear sources for the information subtest on modern intelligence tests. However, Hall
(1893, pp. 16-22) published a series of items administered to children in two cities. While most
of these were vocabulary items and whether the child had seen certain items or events (e.g., seen
a watchmaker at work, or seen an axe), some of them include information-type items. Examples
of these include whether they know “That leathern things come from animals,” “What bricks are
made of,” and “Origin of butter” (all examples form Hall, 1893, p. 20). Hall (1893) believed that
effective teaching required relating new information to what the child already knew; therefore,
an understanding of children’s vocabulary and information about the world around them would
be pedagogically useful. While the information items Hall (1893) used do not seem to have
influenced Binet, Terman did acknowledge their influence on early Stanford-Binet information
Many later psychologists created their own information items that were culturally
appropriate for their examinees. An example of this is on the Army Alpha, which has the
following information item: “The pitcher has an important place in a) tennis b) football c)
Last Word
FIRST PUBLICATIONS OF SB & WECHSLER TESTS 17
In the last word subtest on the SB5 the examinees are (1) asked a question, (2) prompted
to answer, and then (3) asked to remember the last word of the question. Roid and Barram (2004,
p. 49) say that the last word subtest is based on a task reported by Daneman and Carpenter’s
(1980) study, making it one of the newest subtests found on the SB5 and the modern versions of
the Wechsler scales. Daneman and Carpenter (1980) created this task to measure working
It is important to note the differences between the modern SB5 last word subtest and
Daneman and Carpenter’s (1980) task. The earlier authors had their subjects read different
sentences aloud and then recall the last word of each sentence (in the order that the sentences
were presented) after reading the final sentence. Moreover, Daneman and Carpenter (1980) did
not require examinees to answer a question. Despite the differences, the connection with
While taking the matrices test, the subject is shown a pattern of geometric figures and is
asked to complete the pattern. Although the SB5 and all the modern Wechsler tests contain
matrix items, the best-known test to use matrix items is the Raven’s tests (the Raven’s Coloured
Matrices), a series of nonverbal matrices tests that are extremely good measures of fluid
intelligence. Because the Raven name has always been associated with matrix items, the origin
of this subtest is not obscure. Penrose and Raven (1936) were the first to describe a matrix,
though Raven’s (1939) article is a more widely known early report of matrix items. Matrix tasks
were designed to be a measure of “innate mental capacity” (Penrose & Raven, 1936, p. 7) that
In the memory for sentences subtest, the examinee must repeat back a sentence that is
read to them. The 1905 Binet Scale subtest titled “Repetition of 15 Word Sentences,” in which
the examinee was also supposed to repeat back a sentence (Binet & Simon, 1905a/1916). But
according to Wolf (1973, pp. 86-87), Binet and Victor Henri used a memory for sentences test as
early as 1892, and this earlier work presaged Binet’s use of the subtest on his first intelligence
scale. Consistent with Binet’s emphasis on studying complex mental capacities instead of simple
abilities, Binet believed that memory of entire sentences was more useful as a measure of
cognitive development than memory for isolated words. Memory for sentences had a sharper age
performance gradient than for memory of isolated words, which Binet saw as a useful
Picture Absurdities
In a picture absurdities subtest, an examinee is shown a picture that has something wrong
or absurd in it. The examinee then must explain what is “absurd” about the picture. For example,
a picture might depict a firefighter holding a hose, but with flowers emerging from his hose,
instead of water. Although conceptually similar to the verbal absurdities subtest (see below), the
picture absurdities subtest emerges independently years later. The earliest description of this
subtest containing images, such as “A man with three legs,” and a man smoking an upside pipe.
Terman claimed in this article that he was inspired in 1914 by a “picture puzzle” in a children’s
magazine in which many objects within a picture were “. . . so drawn as to contain an absurdity”
(Terman & Chamberlain, 1918, pp. 347-348). However, in this description, Terman also
mentioned that he was unaware of Rossolimo’s “test of this kind.” No exact citation to
FIRST PUBLICATIONS OF SB & WECHSLER TESTS 19
Rossolimo’s test is given, but we found two references in 1911 to works by Rossolimo that could
have been Terman’s sources. One article in English contains an offhand mention of a test of
second, more detailed article in German describes a picture absurdities subtest containing 30
images (10 for children, 10 for uneducated adults, and 10 for educated adults) as part of a mental
test battery (Rossolimo, 1911b, pp. 278-279). This article does not reproduce any of the images,
but the descriptions clearly describe images that would be on a picture absurdities subtest. For
example, one item for children consisted of, “A lady is reading a book with her eyes blindfolded,
with glasses put over the bandage” (Rossolimo, 1911b, p. 278, translated via Google translate).
These two articles by Rossolimo are the earliest record of a picture absurdities subtest we have
found.
Picture Completion
During the picture completion subtest, the examinee is shown pictures that have
something missing. The examinee is then asked to fill in whatever is missing in the picture. The
picture completion subtest’s origin can be traced to Binet’s 1908 subtest called “Unfinished
Pictures” (Binet & Simon, 1908/1916). Both the original subtest and its modern equivalent on
the WAIS-IV have similar instructions, though the artwork is much more sophisticated in the
modern subtest. Figure 1 shows some examples of the “Unfinished Pictures” task found on the
1908 Binet-Simon scale. Picture completion items were also found on the Army Beta (Yoakum
During the picture memory subtest, examinees are briefly shown some images and
afterwards are then asked to choose from a new group of pictures the images they had previously
seen. Binet’s original scale has a subtest titled “Exercise of Memory of Pictures” (Binet &
Simon, 1905a/1916), which he called “. . . a test of attention and visual memory” (p. 60). The
original version of this task shares the same format of presenting a pictorial stimulus and then
The position and direction subtest (found on the SB5) asks the examinee to move items
based on commands that include words related to spatial position (e.g., “on,” “inside”). More
difficult items designed for older age groups ask the examinee to imagine rotating in various
directions sequentially (e.g., “left,” “right,” “north,” south”) and then to state what direction the
examinee would face after the hypothetical sequence is completed. Both types of items appear on
the 1937 Stanford-Binet (Terman & Merrill, 1937), but had been reported nearly twenty years
before (Terman & Chamberlain, 1918, pp. 344-345) as part of a pilot procedure for 23 subtests,
Procedural Knowledge
In the procedural knowledge subtest, the subject is shown a series of cards and is asked to
describe either how they use the object or perform the task shown in the card. In Wolf’s 1973
biography of Binet, she mentioned that Damaye’s (1903) study of normal cognitive development
included several questions that resemble information items. One specific type of item they
mentioned was asking the child about an object, especially asking them to describe the use of it
(p. 173). This description highly resembles procedural knowledge. Even though there are slight
differences in the fact that the modern procedural knowledge subtest uses cards and pictures,
FIRST PUBLICATIONS OF SB & WECHSLER TESTS 21
Blin and Damaye’s question is the oldest description of an item resembling procedural
knowledge.
Similarities
While taking the modern-day similarities tests, the examinee is asked to qualitatively
define the relationship between a pair of words provided to them. The 1905 version of Binet’s
scale mentions a subtest called, “Resemblances of Several Known Objects Given from Memory”
(Binet & Simon, 1905a/1916). This subtest examined the subject’s ability to compare common
objects and state how they are similar. Earlier than this test, though, Binet and Henri suggested
1926/1969, p. 89).
In the picture concepts subtest, the examinee chooses which images among those
provided all share a common characteristic. This focus on commonalities makes this subtest
greatly resemble the similarities subtest, which requires examinees to identify how objects
presented verbally are similar. We see the picture concepts subtest as having the same origin,
which is Binet’s 1905 subtest, “Resemblances of Several Known Objects from Memory” (Binet
& Simon, 1905a/1916). As far as we could find, the first time this picture form of a similarities
In the symbol search subtest, the examinee is shown one or two target images and is then
supposed to determine whether there is a matching symbol from another set of images. The
WPPSI’s bug search is a similar test, but with target images that are cartoon bugs (as age-
appropriate stimuli). These subtests are similar to matching activities designed for children. As
formatted for an intelligence test, though, it was first seen on the WISC-III (Wechsler, 1991),
FIRST PUBLICATIONS OF SB & WECHSLER TESTS 22
and it is seen as a measure of “perceptual organization, fluid intelligence, and planning and
learning ability” (Weiss, Saklofske, Holdnack, & Prifitera, 2016, pp. 13-14).
Verbal Absurdities
The verbal absurdities subtest consists of statements which have something false in them
that the examinee is supposed to identify. In Binet’s second test, he had a subtest named
“Criticism of Sentences” (Binet & Simon, 1908/1916, pp. 227-229). A memorable (though
gruesome) example can be found on the 1908 Binet scale: “Yesterday they found on the
fortification the body of an unfortunate girl, cut into eighteen pieces. It is believed that she killed
herself” (Binet & Simon, 1908/1916, p. 228). The same subtest (retaining some of Binet’s
sentences, including the one we quote, was called “Absurdities” in the 1916 Stanford-Binet
(Terman, 1916). According to Wolf (1973, pp. 147-148), Binet suggested this type of test much
Verbal Analogies
The current verbal analogies test first gives the examinee two words which the examinee
must then understand the relationship between. Based on this relationship, the examinee is then
supposed to generate a fourth word that has the same relationship with the third word that the
first two words have with one another. The earliest mention we have been able to find of verbal
analogy items is in an 1894 article by Binet when he suggested items that asked examinees to
define the relationship between two words (Wolf, 1973). Using this item format, Binet believed
that “. . . one would certainly arrive at a test of judgment and of other complex functions” (as
quoted in Wolf, 1973, p. 93). For example, in Binet’s version of analogies, the examinee would
be asked to explain the relationship between the word spoon and soup. Though Binet’s 1894
suggestion for a test isn’t exactly the same as the current version of verbal analogies, it still asks
FIRST PUBLICATIONS OF SB & WECHSLER TESTS 23
the examinee to discover the relationship between two words. We see this as a clear precursor to
Vocabulary subtest and receptive vocabulary subtest items require the examinee to define
a word from a standardized list, whether in a multiple choice or free-response format. These
types of items are on the current version of the Stanford-Binet and all three of the Wechsler
intelligence scales, but predate the original versions of these modern tests. The first time that
vocabulary items were seen on an intelligence test specifically was in Binet’s original 1905 test
(Binet & Simon, 1905a/1916). Both Wolf (1973, p. 84) and Matarazzo (1972, p. 32) stated that
Binet published a vocabulary test in 1890, a full 15 years before his first intelligence scale.
However, vocabulary tests were part of educational testing in the 19th century and were not
unique to the realm of intelligence testing (e.g., see Hall’s, 1893, summary of an 1869 German
report of the performance of 10,000 children on a German vocabulary test). These vocabulary
tests often functioned as academic achievement tests, though Hall (1893) saw understanding a
child’s vocabulary as serving a foundation for future teaching (see the subsection the information
subtest origin).
Zoo Location
A new subtest on the WPPSI-IV is the zoo location subtest. For this task, the examinees
are shown zoo animals at a specific location on a simple two-dimensional map. Later, the
examiner asks the child to place the zoo animals in the location that they were previously. A
precursor of this subtest is the 7/24 test (originally created by Barbizet & Cany, 1968), which
required examinees to reproduce from recall a random pattern of 7 dots by placing round tokens
on a 24-square grid.
FIRST PUBLICATIONS OF SB & WECHSLER TESTS 24
Discussion
Tracing the origins of all subtests found on the SB5, WAIS-IV, WISC-V, and WPPSI- IV
is a project that has not been undertaken before. We wrote this article to give intelligence test
users an appreciation for the history of these subtests and also explain more about the creation of
the original scales. Perhaps with a knowledge that most subtests have been in use for many years,
practitioners can have more confidence in their use of these item formats because they can know
that these subtests have been subjected to repeated investigation for several decades.
Many of the origins of the modern subtests date back over a century. Indeed, the median
year of publication for the subtests was 1908, and only three subtests originated after 1980. Most
subtests on the Wechsler tests and SB5 have withstood the test of time and have accumulated a
large body of validity research, proving their utility in measuring intelligence. Moreover,
continuity in subtests gives researchers tools for longitudinal testing, research into development
and aging, and the investigation of population-level trends (e.g., the Flynn effect). Using the
same subtest formats over the decades also permits research and knowledge about how these
items function to accumulate. A timeline showing our proposed candidates for the first
One discovery that we found striking was the diverse sources of inspiration for subtests.
While the majority did have roots in the creation of cognitive tests, others have their origin in
games (the delayed response subtest, the object assembly subtest), classroom lessons (the block
design subtest), the study of a feral child (form boards and related subtests), school assessments
(vocabulary subtest) and more. To us, this means that items on intelligence tests often have a
connection with the real world—even when they are presented in a standardized, acontextual
FIRST PUBLICATIONS OF SB & WECHSLER TESTS 25
testing setting. Additionally, this undercuts the suggestion that critics of intelligence testing often
make that intelligence test items are meaningless tasks that are divorced from any relationship to
On the other hand, one criticism of intelligence tests seems justified from our study:
subtests that appear on popular intelligence tests have changed little in the past century (Linn,
1986). While one could argue that the enduring appeal of these subtests is due to their high
performance in measuring intelligence, the fact remains that many of these subtests were often
created with little guiding theory or understanding of how the brain and mind work to solve
problems (Naglieri, 2007).While sophisticated theories regarding test construction and the
interrelationships of cognitive abilities have developed in recent decades (e.g., Carroll, 1993), it
is often not clear exactly how the tasks on modern intelligence tasks elicit examinees to use their
It is apparent from our research that as the creators and revisors of the Stanford-Binet and
Wechsler tests have considered new item formats, they have taken inspiration (in a very direct
way) from pre-existing item formats. From a pragmatic perspective, this makes sense; using an
existing item format is easier than inventing a new one. Moreover, these item formats often had
research supporting their use in intelligence testing, whereas a new subtest would not.
Additionally, copyright laws only protect the exact item content—not the general format of a
subtest. Therefore, we believe that most psychologists creating or revising the Wechsler or
Stanford-Binet tests found it expedient to reuse existing item and subtest formats, which is why
most subtest formats on modern intelligence tests are over 100 years old.
We also wish to draw the reader’s attention to the fact that there have been many subtests
developed that do not appear on current intelligence scales. An example of this is Porteus’s
FIRST PUBLICATIONS OF SB & WECHSLER TESTS 26
(1915) maze task, which appeared on the Army Beta and in some iterations of the WISC.
Though the test functioned well as a non-verbal test of intelligence (see Porteus, 1965, for a
review), other non-verbal tasks have surpassed it, and it does not appear on any Wechsler test
today or the SB5. This demonstrates that intelligence testing is not a static technology. Rather,
test creators and revisers frequently re-examine subtests to determine the best available tasks to
include on an intelligence test. Likewise, there are some popular tests, like the Woodcock-
Johnson IV, Differential Ability Scales II, and the Kaufman Assessment Battery for Children II
that are all more popular than the SB5. While there is overlap between these tests’ subtests and
the subtests we explored in this article, an exploration of the origin of the item formats used on
Our work highlights the influence of a small number of intelligence tests: Binet’s scales,
the Army Alpha and Army Beta, the original Stanford-Binet, and the early Wechsler tests. Many
of the subtests we examined either originated in these scales or reached a large audience of
psychologists via the inclusion of these tests. Other tasks often are capable of measuring
intelligence, such as executive functioning tasks (Brydges, Reid, Fox, & Anderson, 2012), but
these rarely find a place on the Wechsler or Stanford-Binet tests. Although we admire the work
of the early pioneers of intelligence testing (Warne, 2019; Warne, Burton, Gibbons, & Melendez,
2019), we believe that modern test revisers and creators would benefit from examining the work
We also wish to emphasize that even though subtest or item formats in some cases have
been consistent through the decades, this does not imply that item content has remained
unchanged. For example, words on vocabulary or information subtests have often changed as
tests have been revised. Modern examinees who take these subtests do not receive the same
FIRST PUBLICATIONS OF SB & WECHSLER TESTS 27
questions as examinees did a hundred years ago; rather, the general format of the test item and
what the examinee is asked to do is what is the same. Test creators and revisers regularly use the
framework of a subtest’s item format to create new stimuli to probe examinees’ mental abilities.
Indeed, test publishers now commonly revise intelligence tests at regular intervals to combat
breaches in item confidentiality, the Flynn effect, and item drift that may develop over time.
It is important to recognize that many subtests on other intelligence scales can also be
traced back to the early 20th century. For example, Binet’s (1905a/1916) original scale contained
a paper cutting task (suggested by Henri in 1898, according to Wolf, 1973, p. 150) in which the
examiner folded a paper and then cut a portion out. The examinee then had to draw what the
paper would look like unfolded. The SB5 and current Wechsler scales do not contain this subtest,
but the current version of the Cognitive Abilities Test (Lohman & Lakin, 2017) does. Lumosity,
a company that creates “brain training” computer programs, even calls this task “Thurstone’s
punched holes,” despite the fact that Thurstone did not invent the task (Simons et al., 2016).
Likewise, Ramful, Lowrie, and Logan (2016) stated that this test had a 1970s origin—misstating
the true creation of the test by over seven decades. Thus, it is likely that many other subtests have
Our historical research showed that some of the item formats found on the original Binet-
Simon scale predate their inclusion on the famous 1905 Binet intelligence scale. Binet’s first
article on the cognitive development of children was published in 1890 (Wolf, 1973, p. 81), and
he pursued this line of work in almost total seclusion for over a decade (Wolf, 1973). Binet’s
lengthy practice in testing children’s cognition provided knowledge and experience that he drew
upon when he created his scales—a fact that has been noted by others scrutinizing the historical
record (e.g., Nicolas, Andrieu, Croizet, Sanitioso, & Burman, 2013). This contrasts with the
FIRST PUBLICATIONS OF SB & WECHSLER TESTS 28
overly simplistic version of the history of testing that psychology students are often exposed to in
measurement (e.g., Coaley, 2014; Kaplan & Saccuzzo, 2018). This version of history also
ignores the contributions of individuals like Itard, Séguin, Henri, Damaye, and others who
provided Binet with item formats that he would use in his scales.
Despite our best efforts to identify the origins of different intelligence scale subtests, we
do not claim to have read every scholarly article about intelligence or cognitive testing that was
published in the early decades of this field. Although we did successfully trace most subtests to
publications that predate their first appearance on the SB or a Wechsler test, there is still the
intelligence test, the majority of the subtests on the current versions of the Wechsler and
Stanford-Binet intelligence scales have origins dating back more than a century. We encourage
psychologists who use, study, or revise these tests—and other intelligence tests—to be aware of
this lengthy history. For many of these subtests the psychometric literature extends far beyond
the test manuals for the Wechsler and Stanford-Binet tests. Additionally, understanding the
origins of the subtests on modern intelligence tests can also help psychologists appreciate the
References
Barbizet, J., & Cany, E. (1968). Clinical and psychometrical study of a patient with memory
Binet, A., (1911/1916). New investigations upon the measure of the intellectual level among
school children (E. S. Kite, trans.). In A. Binet & T. Simon, The development of
intelligence in children (the Binet-Simon Scale) (pp. 274-329). Baltimore, MD: Williams
& Wilkins.
Binet, A., & Simon, T. (1905a/1916). New methods for the diagnosis of the intellectual level of
subnormals (E. S. Kite, trans.). In A. Binet & T. Simon, The development of intelligence
in children (the Binet-Simon Scale) (pp. 9-36). Baltimore, MD: Williams & Wilkins.
Binet, A., & Simon, T. (1905b/1916). Upon the necessity of establishing a scientific diagnosis of
inferior states of intelligence (E. S. Kite, trans.). In A. Binet & T. Simon, The
Binet, A., & Simon, T. (1908/1916). The development of intelligence in the child (E. S. Kite,
trans.). In A. Binet & T. Simon, The development of intelligence in children (the Binet-
Blair, F. X. (1957). A study of the visual memory of deaf and hearing children. American Annals
Boake, C. (2002). From the Binet-Simon to the Wechsler-Bellevue: Tracing the history of
doi:10.1076/jcen.24.3.383.981
Bonser, F. G. (1910). The reasoning ability of children of the fourth, fifth, and sixth school
grades (Teachers College Contributions to Education, no. 37). New York, NY: Teachers
Brydges, C. R., Reid, C. L., Fox, A. M., & Anderson, M. (2012). A unitary executive function
doi:10.1016/j.intell.2012.05.006
Carroll, J. B. (1993). Human cognitive abilities: A survey of factor-analytic studies. New York,
Cronbach, L. J. (1975). Five decades of public controversy over mental testing. American
Damaye, H. (1903). Eassai de diagnostic entre les états de débilités mentales. Paris, France:
Steinheil.
Daneman, M., & Carpenter, P. A. (1980). Individual differences in working memory and
doi:10.1016/S0022-5371(80)90312-6
388. doi:10.1037/h0073531
FIRST PUBLICATIONS OF SB & WECHSLER TESTS 31
de Ribaupierre, A., & Lecerf. T. (2006). Relationships between working memory and
109–137. doi:10.1080/09541440500216127
Frank, G. (1983). The Wechsler enterprise: An assessment of the development, structure, and use
Hall, G. S. (1893). The contents of children’s minds on entering school. New York, NY: E. L.
doi:10.1037/h0075712
Kaplan, R. M., & Saccuzzo, D. P. (2018). Psychological testing: Principles, applications, and
Knox, H. A. (1913). The differentiation between moronism and ignorance. New York Medical
10.1037/h0074466
Linn, R. L. (1986). Educational testing and assessment: Research needs and policy issues.
Lohman, D. F., & Lakin, J. M. (2017). Cognitive Abilities Test (Form 8). Boston, MA:
Houghton-Mifflin Harcourt.
FIRST PUBLICATIONS OF SB & WECHSLER TESTS 32
Matarazzo, J. D. (1972). Wechsler’s measurement and appraisal of adult intelligence (5th ed.).
Maurer, D. W. (1947). The argot of the three-shell game. American Speech, 22, 161-170.
doi:10.2307/3181790.
Naglieri, J. A. (2007). Traditional IQ: 100 years of misconception and its relationship to minority
with gifted and talented students (pp. 67-88). Waco, TX: Prufrock Press.
Nicolas, S., Andrieu, B., Croizet, J.-C., Sanitioso, R. B., & Burman, J. T. (2013). Sick? Or slow?
doi:10.1016/j.intell.2013.08.006
Norgate, M. (2007). Cutting borders: Dissected maps and the origins of the jigsaw puzzle.
Penrose, L. S., & Raven, J. C. (1936). A new series of perceptual tests: Preliminary
8341.1936.tb00690.x
Peterson, J. (1969). Early conceptions and tests of intelligence. Westport, CT: Greenwood Press,
Porteus, S. D. (1915). Mental tests for feeble-minded: A new series. Journal of Psycho-
Porteus, S. D. (1965). Porteus maze test: Fifty years' application. Palo Alto, CA: Pacific Books.
Pyle, W. H. (1913). The examination of school children: A manual of directions and norms. New
Ramful, A., Lowrie, T., & Logan, T. (2016). Measurement of spatial ability: Construction and
validation of the spatial reasoning instrument for middle school students. Journal of
Raven, J. C. (1939). The R.E.C.I. series of perceptual tests: An experimental survey. British
Richardson, J. T. E. (2005). Knox’s cube imitation test: A historical review and an experimental
Roid, G. H., & Barram, R. A. (2004). Essentials of Stanford-Binet intelligence scales (SB5)
211-214.
Rossolimo, G. (1911). Die psychologischen profil. Klinik für psychische und nervöse
Krankheiten, 6, 249-326.
Simons, D. J., Boot, W. R., Charness, N., Gathercole, S. E., Chabris, C. F., Hambrick, D. Z., &
The Special Class Teacher’s Club. (1917). The Boston way: Plans for the development of the
Stone, C. W. (1908). Arithmetical abilities and some factors determining them (Contributions to
education Teachers College series No. 19). New York, NY: Teachers College, Columbia
University.
Sylvester, R. H. (1913). The form board test. Princeton, NJ: Psychological Review Company.
for the use of the standard revision and extension of The Binet-Simon Intelligence Scale.
Terman, L. M. (1924). The mental test as a psychological method. Psychological Review, 31, 93-
117. doi:10.1037/h0070938
Terman, L. M., & Chamberlain, M. B. (1918). Twenty three serial tests of intelligence and their
Terman, L. M., & Merrill, M. A. (1937). Measuring intelligence: A guide to the administration
Thorndike, R. L. (1975). Mr. Binet's test 70 years later. Educational Researcher, 4(5), 3-7.
doi:10.2307/1174855
Warne, R. T. (2019). An evaluation (and vindication?) of Lewis Terman: What the father of
gifted education can teach the 21st century. Gifted Child Quarterly, 63, 3-21.
doi:10.1177/0016986218799433
Warne, R. T., Burton, J. Q., Gibbons, A., & Melendez, D. A. (2018). Stephen Jay Gould’s
analysis of the Army Beta test in The Mismeasure of Man: Distortions and
FIRST PUBLICATIONS OF SB & WECHSLER TESTS 35
doi:10.3390/jintelligence7010006.
Wechsler, D. (1944). The measurement of adult intelligence (3rd ed.). Baltimore, MA: Williams
& Wilkins.
Wechsler, D. (1991). The Wechsler Intelligence Scale for Children (3rd ed.). San Antonio, TX:
Wechsler, D. (2002). The Wechsler Preschool and Primary Scale of Intelligence (3rd ed.). San
Wechsler, D. (2012). The Wechsler Preschool and Primary Scale of Intelligence (4rd ed.). San
Weiss, L. G., Saklofske, D. H., Holdnack, J. A., & Prifitera, A. (2016). WISC-V assessment and
Wolf, T. H. (1973). Alfred Binet. Chicago, IL: The University of Chicago Press.
Wongupparaj, P., Wongupparaj, R., Kumari, V., & Morris, R. G. (2017). The Flynn effect for
Yerkes, R. M. (1921). Psychological examining in the United States army. Washington, DC:
Yoakum, C. S., & Yerkes, R. M. (1920). Army mental tests. New York, NY: Henry Holt and
Company.
Young, K. (1924). The history of mental testing. The Pedagogical Seminary, 31, 1-48,
doi:10.1080/08919402.1924.10532922
FIRST PUBLICATIONS OF SB & WECHSLER TESTS 36
Table 1
Subtests on the Current Versions of the Stanford-Binet and Wechsler Tests
Subtest Namesa SB5 WPPSI-IV WISC-V WAIS-IV
Arithmetic/Verbal and Nonverbal X X X
Quantitative Reasoning
Block Design X X X
Block Span X
Cancellation X X X
Coding/Animal Coding X X X
Comprehension X X X
Delayed Response X
Digit Span/Picture Span/Letter-Number X X
Sequencing
Early Reasoning X
Figure Weights X X
Form Board and Form Patterns/Visual X X X X
Puzzles/Object Assembly
Information X X X
Last Word X
Matrices/Object Series/Matrix Reasoning X X X X
Memory for Sentences X
Picture Absurdities X
Picture Completion X
Picture Concepts X X
Picture Memory/Picture Naming X
Position and Direction X
Procedural Knowledge X
Similarities X X X
Symbol Search/Bug Search X X X
Verbal Analogies X
Verbal Absurdities X
Vocabulary/Receptive Vocabulary X X X X
Zoo Locations X
a
Some subtests are listed as having multiple names because subtests from different
instruments may have different names but formats that are so similar that we
considered the subtests to be the same.
b
Subtests are listed alphabetically.
FIRST PUBLICATIONS OF SB & WECHSLER TESTS 37
Figure 1. Examples of the “Unfinished Pictures” task, the forerunner of the modern WAIS-IV
Figure 2. Timeline of our proposed candidates for the first known publication of Stanford-Binet 5 and modern Wechsler subtests.