Reading Materials For Learning TOEIC Vocabulary Based On Corpus Data
Reading Materials For Learning TOEIC Vocabulary Based On Corpus Data
Reading Materials For Learning TOEIC Vocabulary Based On Corpus Data
TANIMURA Midori
UTIYAMA Masao
National Institute of Information and Communications Technology
1. Introduction
It is agreed that presenting words in context rather than isolating them from context
enhances learners’ acquisition of vocabulary. For example, several recent studies have
found positive evidence supporting the use of explicit vocabulary instruction in
conjunction with reading (Paribakht & Wesche 1997). A great deal of vocabulary
learning material which follows this idea has also been provided by publishers or on the
Web, however, the selection of most words is mainly based on material writers’
experience and intuition. Indeed, Kennedy (1987a, 1987b) points out that corpora are
useful tools to look critically at existing language teaching materials about ways of
expressing quantification and frequency of vocabulary in ESL (English as a second
language) textbooks. Considering recent remarkable developments and the progress of
computer technology, it is now possible to select vocabulary statistically, and create
materials objectively based on vast amounts of corpus data.
The sorts of learners we principally have in mind are the ones who mastered senior
high school level vocabulary and want to continue to increase their vocabulary. We
attempt to bridge the gap between English vocabulary used in Japanese senior high
school textbooks and those used in international communication through reading.
2. Literature review
What are the factors that should be taken into account when creating materials for
learning vocabulary based on corpus data? It would be suggested that there are three
major factors to be considered: (1) recycling and attention; (2) words in context; and (3)
word selection by corpus data.
that learners have the opportunity to keep meeting words that they have met before and
these words need to be reinforced by another meeting in limited time (Paribakht &
Wesche 1997). Indeed, there is interesting research done by Rott (1999) about the
number of times a learner needs to be exposed to unknown words to acquire and retain
them. The results indicate 6 times exposure produced significantly more vocabulary.
Other researchers also have found similar results.
On the other hand, according to some psychologists (Craik & Lockhart, 1972; Craik
& Tulving, 1975), repetition is not an important factor in vocabulary learning, but
attention that is given to an item decides whether it will be remembered or not. Ellis
(1995) and Robinson (1995) also indicate that vocabulary learning requires attention but
it should be addressed to both meaning and form. Furthermore, in Fraser’s study
(1999), higher retention rates occur when words are noticed and potentially highly
salient. She notes that this occurs with not only L1 cognates but also in L2 word
association, and frequently encountered words.
depending on which corpus is used. As Nation and Newton (1997) mention, every
field has its own technical vocabulary, which is used within a narrow range and
normally used within a specialized field. In other words, in a particular field it may
occur frequently, but in other fields it may scarcely occur. In fact, Grabe (1991) argues
that in specialized academic settings, infrequent words may be the most important for
L2 learners to know, because they may be closely connected to the topic being
discussed.
The studies above reveal three points: (1) vocabulary should be recycled and it
should obtain learners’ attention; (2) vocabulary should be presented in contexts; and (3)
vocabulary for specific areas should be selected by appropriate corpus data. However,
there does not seem to be any traditional materials which meet all three criteria at the
same time. In this paper, we propose new vocabulary learning materials for the
TOEIC test in reading. In order to examine its usefulness and effectiveness, it was
used in class for a year and it was compared with a vocabulary list book where new
words and their meanings were isolated from contexts and which was basically based on
material writers’ experience.
To examine the validity of our reading texts, the following research questions were
examined:
1. Which material is more effective and efficient to retain the meanings of
vocabulary, texts or word lists?
2. Does general reading ability improve after learning words for the TOEIC
test?
Step 1. Gather the texts (= articles) containing as many new and recycled words
(that are shared with the 642 words) as possible.
Step 2. Sort the texts by size of how many new and recycled words are included.
In the end, 116 texts which contained 642 words were constructed as reading material.
4
Due to the algorithm, the texts selected at the earlier stage of this process had more new
words than the ones selected at the later stage. That is, the number of new words
gradually decreases as learners go on in their study. For example, in the first text, 43
new words were included which need to be learned for the TOEIC test while in the tenth
text 16 new words were included. This means that most words in the TOEIC
vocabulary were covered in the earlier texts. Thus, learners can learn vocabulary
efficiently even if all reading texts can not be covered in the lesson. The reading
material was also designed to contain as many recycled words out of the 642 words as
possible so that learners meet the same words repeatedly through reading the texts. In
this study, 60 texts out of 108 were used as reading texts through the academic year.
The following statistics are thus based on the words in 60 texts.
The following is a sample of an English text. The third text is chosen here because
the first and second texts do not include recycled words. In the original text of the
English version, three different colors were used to distinguish new words and two
kinds of recycled words. New words were colored in red, recycled words which
occurred in previous texts were colored in blue and recycled words which occurred in
the same text were colored in orange. In the following example, new words are
underlined, recycled words in previous texts are in italics, and recycled words in the
same text are boxed.
It is assumed that those colored words work as markers to make learners conscious of
them and get their attention.
5
Table 1 and Table 2 present how the target (TOEIC) words were recycled in the
reading material. Table 1 shows the high token frequency words. For example,
“company,” the top ranking word, occurs 50 times throughout the texts. This could
occur in the same text more than once. The word “according” in rank 4 is often used
as “according to,” but phrases are not considered in this study.
Table 2 Number of Texts Where Each Type Occurs (the Top 20)
Ranking Words Number of 9 offer 10
Occurrences 12 current 9
1 official 31 12 provide 9
2 company 21 12 bank 9
3 expect 14 15 firm 8
3 according 14 15 concern 8
5 number 12 15 agency 8
6 employee 11 15 cost 8
6 receive 11 15 organization 8
6 service 11 15 industry 8
9 million 10 15 price 8
9 area 10 15 fiscal 8
6
As shown in Table 1 and 2, target words are recycled and consequently learners are
likely to be exposed enough to acquire target vocabulary efficiently.
4. Procedure
4.1. Participants
Ninety-nine Japanese English learners aged between 18 and 20 years old
participated in this study. There were three groups of participants; class A, B and C,
who were placed according to the scores of their entrance exams. They are first year
students majoring in science, marketing, architecture and engineering in a university.
In this university, first year students are required to take two English classes by the same
Japanese instructors and one English conversation class by foreign teachers every week.
A total of 60 lessons in both the first and the second semester taught by a Japanese
teacher were targeted for this study.
4.2. Instruction
An instruction sheet was delivered to the learners. To access the reading texts on
the Web, learners were supplied with the URL, a username and a password. Learners
were told that the lessons using these reading texts were part of reading materials
development for vocabulary learning. Then learners were instructed on how to follow
the Web page.
This is a part of the page where reading texts are listed. Learners studied from 1.
Streamlining to cost NTT over 1.4 tril. yen. This text contains 296 words and was
issued on November 9th 2001. It contains 43 common words in the TOEIC vocabulary.
All of the words were new and there were no recycled words. Printed materials were
also prepared for the learners who had difficulty accessing a computer.
1.Streamlining to cost NTT over 1.4 tril. yen (296 words - 2001/11/09)
new: account agreement allowance amount approve bank committee company compensation
convention corporate cost current demand due earnings employee expect expense expensive
old:
4.3. Treatment
The treatment in this study was carried out from May 2004 to January 2005. Two
kinds of materials were used as a supplement for learning vocabulary for the TOEIC test,
one was our reading texts and the other a commercially available vocabulary list book
which learners were required to purchase and study in each lesson. In this vocabulary
list, each word with a pronunciation guide followed by its meanings is presented on the
left-hand side, and some specific words are accompanied by typical usage with the
Japanese translation on the right-hand side. This vocabulary list book, however, does
not present example sentences of all the words.
Learners were required to study both vocabulary learning materials as homework,
and they were tested in each lesson. During the summer holidays, the learners were
also required to study both sets of vocabulary. Through the year, 60 out of the 116
reading texts were covered, while all words in the vocabulary list book were covered.
The reason all the words in reading texts were not covered was the logistics of the
lesson schedule, but 559 words out of 642 were covered; which amounts to 87% of the
important words.
4.4. Measure
Two kinds of tests were conducted: (a) vocabulary retention test; and (b) reading
comprehension tests following the TOEIC format which consisted of 40 multiple choice
tests. The retention test (see 4.5 test designs) was done to examine which vocabulary
learning materials were more effective to recall. Learners were asked to write
Japanese translations of the words which were listed in the test sheet. Since many
words have multiple senses and are different parts of speech (e.g. nouns, adjectives and
verbs), learners were instructed to write as many meanings as possible. If one of the
meanings which learners listed was compatible with the one in reading texts or
vocabulary list book, a score was given. The reading part of the TOEIC test was used in
order to assess learners’ initial and final reading levels on the first and the last day of the
semester. At the end of the semester a questionnaire was conducted to examine what
the students thought of the reading texts and the Web system they actually used for
self-study.
typed and stored as a text file. Next, each word in the vocabulary list book and in the
reading texts was given a ranking in JACET. It is necessary to note that the words in
JACET 8000 are categorized as parts of speech and thus the same words occur several
times in different rankings. As table 3 shows, it turned out that only the words
“upward” and “forward” occurred more than once.
In this study, the higher ranking is adopted; 3902 for “upward” and 642 for “forward”.
The original ranking was used for the other words, since they occur only once. All the
words in the reading texts and the vocabulary list book were then combined, and finally
three types of combined word rank lists were made: (1) words which occur in the
reading texts but not in the vocabulary list book; (2) words which do not occur in the
reading texts but occur in the vocabulary list book; and (3) words which occur in both
materials. The left column of Table 4 shows the total number of words which occur in
each type. The right column shows the number of the words which do not occur in
JACET. In order to select the words for retention test, those words which do not occur
in JACET were excluded.
Twenty words in each type (1), (2) and (3) and thus a total of sixty words were selected
for the retention test. To conform to the level of difficulty, the ranking distribution in
JACET 8000 was examined as in Figure 1-3.
9
Texts ○ - Lists ○
Texts ○ - Lists ×
140
80
70 120
60 100
50
80
40
60
30
20 40
10 20
0
0
0 1000 2000 3000 4000 5000 6000 7000 8000
0 1000 2000 3000 4000 5000 6000 7000 8000
600
500
400
300
200
100
0
0 1000 2000 3000 4000 5000 6000 7000 8000
In JACET 8000, ranking 1 means the most frequent word and ranking 8000 means
the least. While figures 1 and 2 show that more than 10 words occur at the 5000 word
level, figure 3 shows that most words are distributed from 0 to 3000 word level and not
many words occur above the 4000 word level. Therefore five words were selected
from the 0 word level (from 1 to 999) to the 3000 word level (from 3000 to 3999) for
each type (see appendix).
It was predicted that the words in type (3) were most retained, those in type (1) were
less retained and those in (2) were least retained, because in type (3) words were most
recycled and in type (2) words were least recycled.
5. Results
5.1. Results of research questions
Research question 1: Which material is more effective and efficient to retain the
meanings of vocabulary, texts or word lists?
Table 6 presents means and standard deviation for retention test scores
according to the three types. The average score of (1) is the highest and
that of (2) is the lowest.
The result of table 8 can be summarized as follows: type (1) > type (2); type (1) >
type (3); and type (3) > type (2). As might be expected, learners’ scores in type (1)
were higher than in type (2) and the scores in type (3) were higher than in type (2).
Words in type (1) and (3) are recycled while the words in type (2) occur only. Thus
these results indicate that learning recycled words in reading texts is a more effective
way to remember vocabulary than learning from a vocabulary list book. In other
words, the reading texts would make it possible for the learners to pay more attention to
the target words which they need to learn and the contexts help the learners to
understand the meaning more easily.
In contrast, the score in type (1) is higher than type (3). There is a significant
difference between type (1) and type (3), which was not expected. Looking at the
average score in Table 7, however, the difference between the two types was 1.2 and
thus it is not as big as the difference between type (1) and (2) or (2) and (3). Even
though the level of difficulty was carefully conformed when selecting words for the test,
learners may have known some words. Overall, this result indicates that texts are
more effective and efficient to retain meanings of vocabulary than word lists are.
Research question 2: Does general reading ability improve after learning words for
the TOEIC test?
Table 9 presents the mean and standard deviation for the pre- and post-tests of a
TOEIC reading section. T-test shows that pre- and post-tests are significantly different
(t = -2.37, df = 76, p = 0.02).
Whereas there is a significant difference between pre- and post tests, the difference
between the scores is not much. Compared to the development of vocabulary skills,
the range of reading skills shows low rates of increase. Thus it would be difficult to
conclude that development of vocabulary skills correlate with reading skills although
the result indicates that vocabulary learning would have a positive effect on general
reading ability. It should be noted that there are still many words left the students need
to know for understanding the reading texts. Moreover reading involves different
types of knowledge, cognitive and linguistic skills and strategies. Vocabulary is
12
(1) Do you think it was good to have a vocabulary check test in very good - not good 4.1
each class?
(2) Do you think your vocabulary size for TOEIC increased increased - not increased 3.4
(3) Is reading texts more fun than previous learning styles? very fun - not fun 3
(5) Do you think reading texts is easy and appropriate? appropriate - not appropriate 3.4
(6) Do you want to use reading texts next year? very much - not very much 3.5
(7) Do you think there are better materials than reading texts? many - not many 3.5
The average scores of the questions were between 3 and 4. The score of question
(1) had the highest score of 4.1 and a substantial majority felt that learning vocabulary
was unavoidable and that they needed to be tested to remember vocabulary. The
second highest score was 3.7 in question (4). Learners felt that these reading texts
were fairly useful but at the same time, the score of question (3) was marked the lowest,
suggested that their learning might not be enjoyable with reading these texts. In terms
of the necessity of increasing vocabulary size, there is no doubt that learners felt they
needed to learn vocabulary and the materials had the capacity to arouse their motivation.
But in terms of reading for pleasure, it is not surprising that the learners found the
reading difficult, because the texts used here were from the newspaper which included
TOEIC vocabulary in them, and their topics are not very familiar. The result of
question (6) and (7) showed some discrepancy. The figure suggests that whereas
learners wanted to use these texts next year, they also felt that there must be better
13
materials.
The learners were also asked to answer the following questions from (8) to (10) and
give their reasons and comments.
The figures in (8) and (9) showed that the learners were studying seriously about 30
out of 60 texts, and spending about 13 minutes reading each text. This result indicated
that they specially focused on the words which were colored and not on all the
sentences, because it would be difficult to understand the passage in such a short time.
The answers in (10) presented that three times as many learners chose (b) reading texts
is than chose (a) vocabulary list book, but more than half the learners chose (c) difficult
to answer. The main comments by the learners who chose reading texts and those who
chose vocabulary list book are shown in Table 13.
In a practical sense there may be pros and cons in both materials. Comments
indicate that learners who chose (a) vocabulary list book tend to use it as a dictionary.
On the other hand, learners who chose (b) reading texts tend to use them for vocabulary
learning aids and for improving general English proficiency. Learners who chose (3)
difficult to answer seem to flexibly use the materials depending on their purposes.
One of the learners who chose (c) difficult to answer commented that both materials are
helpful to learn vocabulary.
Prospective readers of these reading materials were people who have learned a basic
level of high frequency words and already acquired reading skills. For the learners
who have just started learning English, word lists may be a useful tool to learn new
words.
6. Conclusion
This research investigated the effect of corpus-based reading texts in learners'
vocabulary development for the TOEIC test. The approach we used was to recycle and
reinforce frequent vocabulary in context to get learners’ attention. We showed the
syllabus designs of how we used reading texts in and outside of the classroom. We
also presented the test design of how we selected the words for the vocabulary retention
test. The result of the pre/post reading tests showed that vocabulary learning enhanced
general reading comprehension. The result of the retention test showed that reading
texts is more efficient and effective to remember vocabulary compared with a
vocabulary list book. The results of the questionnaires showed that reading texts
promotes a positive attitude toward reading even though it is not very easy. The
challenges that teachers and learners may encounter when using the reading texts and
how they are dealt with were also addressed.
This way of creating our vocabulary learning materials and presenting it on the Web
in this study can be applied for any kind of specialized fields, such as medicine,
engineering, architectures and chemical, and it would contribute to fostering learners’
vocabulary learning more efficiently. Considering the burden that learners feel when
learning English, this way of creating materials based on statistical and objective corpus
data is absolutely necessary and it is the role of teachers and material writers to select
vocabulary and reduce the learning burden which many learners have.
Our vocabulary learning Web site, VOCABRIDGE, is now available on the internet.
15
References
Chujo, K. (2003). Eigo shokyuushamuke TOEIC goi 1 & 2 no sentei to sono kouka
[Selecting “TOEIC vocabulary 1 & 2” for beginning level students and measuring its
effect on a sample TOEIC test]. Journal of the College of Industrial Technology, Nihon
University, 36, 27-42.
Chujo, K., Ushida, A., Yamazaki, M., Genung, A., Uchibori, A., & Nishigaki, C.
(2004). Bijuaru beishikku niyoru TOEIC-yoo goiryoku yoosei sofutowuea no shisaku
(3) [The development of English CD-ROM material to teach vocabulary for the TOEIC
test (utilizing Visual Basic): Part 3]. Journal of the College of Industrial Technology,
Nihon University, 37, 29-43.
Craik, F. I. M., & Lockhart, R. S. (1972). Levels of processing: A framework for
memory research. Journal of Verbal Learning and Verbal Behavior, 11, 671-684.
Craik, F. I. M., & Tulving, E. (1975). Depth of processing and the retention of words
in episodic memory. Journal of Experimental Psychology, 104, 268-284.
Ellis, R. (1995). Modified oral input and the acquisition of word meanings. Applied
Linguistics, 16, 409-441.
Fanselow, J. (1992). Try the opposite. Tokyo: Simul Press.
Fraser, C. A. (1999). Lexical processing strategy use and vocabulary learning
through reading. Studies in Second Language Acquisition, 21, 225-241.
Gardner, D. (2004). Vocabulary input through extensive reading: A comparison of
words found in children’s narrative and expository reading materials. Applied
Linguistics, 25, 1-37.
Grabe, W. (1991). Current developments in second language reading research.
TESOL Quarterly, 25, 375-406.
Kennedy, G. (1987a). Expressing temporal frequency in academic English. TESOL
Quarterly, 21, 69-86.
Kennedy, G. (1987b). Quantification and the use of English: A case study of one
aspect of the learner's task. Applied Linguistics, 8, 264-286.
Laufer, B. (1997). The lexical plight in second language reading: Words you don’t
know, words you think you know, words you can’t guess. In J. Coady and T. Huckin
(Eds.), Second language vocabulary acquisition (pp.20-34). Cambridge: Cambridge
University Press.
Laufer, B., & Nation, I. S. P. (1995). Vocabulary size and use: Lexical richness in L2
written production. Applied Linguistics, 16, 307-322.
Nagy, W. (1997). On the role of context in first- and second-language vocabulary
learning. In N. Schmitt and M. McCarthy (Eds.), Vocabulary: Description, acquisition
16
Appendix
JACET 1000 DISTANT ○ ×
Words Lists Texts
Rank 2000 AFTERWARD ○ ×
0 DEAL × ○ 2000 COMPOSE ○ ×
0 AHEAD × ○ 2000 EMPHASIZE ○ ×
0 TOTAL × ○ 2000 LITERALLY ○ ×
0 DIRECTOR × ○ 2000 AWKWARD ○ ×
0 GUIDE × ○ 3000 ABUSE ○ ×
1000 SECURITY × ○ 3000 EXPENDITURE ○ ×
1000 QUARTER × ○ 3000 ACCUSE ○ ×
1000 NEARBY × ○ 3000 GENUINE ○ ×
1000 CHAIN × ○ 3000 COMPROMISE ○ ×
1000 AIRPORT × ○ 0 FIGURE ○ ○
2000 SPECIALIST × ○ 0 INDIVIDUAL ○ ○
2000 USER × ○ 0 EXPRESS ○ ○
2000 PRIORITY × ○ 0 CROWD ○ ○
2000 CONSULTANT × ○ 0 OPERATION ○ ○
2000 WEEKLY × ○ 1000 CHARGE ○ ○
3000 REQUIREMENT × ○ 1000 RESPONSIBLE ○ ○
3000 CONVERT × ○ 1000 EXPORT ○ ○
3000 ALLOWANCE × ○ 1000 ORGANIZE ○ ○
3000 INADEQUATE × ○ 1000 OBTAIN ○ ○
3000 ANALYST × ○ 2000 TRANSFER ○ ○
0 PUBLIC ○ × 2000 APPLICATION ○ ○
0 DESIGN ○ × 2000 URBAN ○ ○
0 COMMUNITY ○ × 2000 APPROVE ○ ○
0 SPREAD ○ × 2000 PARTICIPATE ○ ○
0 FREEDOM ○ × 3000 PROCEDURE ○ ○
1000 BREATH ○ × 3000 REASONABLE ○ ○
1000 DESPITE ○ × 3000 WITHDRAW ○ ○
1000 IMPRESSION ○ × 3000 APPROXIMATELY ○ ○
1000 PERSUADE ○ × 3000 ANTICIPATE ○ ○