A Picture Is Worth A Thousand Words: The Use of Videos in Vocabulary Acquisition
The use of videos in English Foreign Language (EFL) classes is becoming more and
more common and the value of audiovisual materials in order to enhance the instruction
of foreign languages has been highlighted by many scholars (Canning-Wilson, 2000;
Jones, 2004; Hai-peng & Li-jing, 2007). In the past, video material was very limited and
it was not an easy task to find useful material, which would serve as support for
teaching specific aspects of language, to complement teachers' instruction in the EFL
However, the 21st century has yielded many advances with respect to technology.
Varied multimedia material has been developed and designed to be introduced into the
classroom. Within this new spectrum of instructional aids, Internet has become the core
real life listening situations, in which images and movement are usually present, in the
form of videos, interactive multimedia material or CD-ROMs. The new tendency to
introduce audiovisual material provides learners with more concrete information and
annotations that help vocabulary processing and acquisition to be more effective.
A study carried out by Canning-Wilson (2000) on the practical aspects of using videos
in foreign language instruction showed that audiovisual aids enhance the information
that students of foreign languages receive. Using video, students receive the message
through two different channels; on the one hand, through the oral channel as the
information is provided through words; and on the other, through the visual channel due
to the information that the images offer. Improvements in learning through multimedia
resources happened due to such dual-coding (dual channel input). The effectiveness of
multimedia in the teaching scope was mainly attributed to its similarities in structure to
the information processing theory, which postulates (Treichler, 1967) that people
generally remember 10% of what they read, 20% of what they hear, 30% of what they
see, and 50% of what they hear and see at a time. Therefore, dual-coding helps the
retention of information because the processing takes place through two generally
independent channels at the same time. Furthermore, it has an extra effect since the
learner creates more cognitive paths, which can be followed to retrieve information.
Those cognitive paths become a schema (pattern) that puts less pressure or load on the
working memory, and thus it facilitates understanding (Bagui, 1998). Canning-Wilson
(2000) stated that images create a context for what is being said and gestures and
visualized words make students' understanding more successful. Indeed, in multimedia
environments, spoken utterances are backed up by visual aids (Baltova, 1994). Finally,
Najjar (1996) suggested that different information has to be coded in different media in
order to be the most effective for people to learn.
Multimedia goes beyond double-channel input, since the perception of the message is
multiplied by the number of ways through which that information is delivered to the
user. So, the more channels involved in the instructional process the better students
Other researchers (Jones, 2003) discovered that when a written or spoken passage is not
clear in the context where it is emitted, deeper processing of the information might fail
to support students learning and that might produce the incorrect learning of words.
Thus, multiple-channel support can be crucial in those situations to help students
understand successfully.
At a conference in 2000, Canning-Wilson claimed that more importantly, video can be
used to help distinguish items on a listening comprehension test, aid in the role of recall,
help to sequence events, as well as be adapted, edited or changed in order to meet the
needs of the language learner. Thus, visual support not only helps students'
understanding, perception and acquisition of new information, but also its retention and
recalling. Therefore, one of the purposes of this article was to analyze whether using
videos affects vocabulary retention and recognition immediately, 2 weeks and four
weeks after the instruction.
Visual support enhances students' learning, but some characteristics should be taken
into account to make the most of this type of material: materials should contain
descriptive pictures; they should illustrate the target language; distracters should be
avoided; when available, authentic material should be used; and finally, short video
fragments should be used to avoid loss of attention.
So far, much research has been done on L2 instruction with multimedia support in
University context, but very little in high schools. Therefore, this study aims to see
whether the same results are applicable to high school students of EFL.
Vocabulary Learning
Multimedia technologies' application in teaching makes it possible to enhance and
facilitate students' comprehension, and at the same time, it can also improve student's
vocabulary learning.
For many years, it was believed that vocabulary learning merely involved memorizing
and monotonous practicing. It was an individual task students should do on their own.
However, the use of communicative strategies suggests that other factors also take part
in the process of vocabulary acquisition and usage. In fact, incidental vocabulary
learning occurs when students acquire vocabulary while reading or listening for
comprehension rather than focusing solely on memorizing list of words (Jones, 2004,
p. 123). A study by Hai-peng and Li-jing (2007) on vocabulary acquisition in a
multimedia environment showed the benefits derived from the use of multimedia
support in the instruction of vocabulary. They found that thanks to multimedia
resources, learners respond to multimedia in a complex way and give the feeling of
experiencing information instead of simply acquiring it. (p. 56). That is, students see
the actual use of the language, in a real context and in a natural way, and besides,
students feel more fun from multimedia, and learning becomes a happy process (p.
56). So, multimedia also improves students' motivation towards vocabulary learning. B
Jones (2003) proved that double-channel input was more effective both, in quantitative
and qualitative terms. Her quantitative results showed that students with visual input
experienced more incidental learning than those without visual input. When both visual
and verbal annotations were present, students' performance was highest, and the
difference to students with verbal annotations available was statistically significant (p.
55). In the qualitative field, students provided a more supportive view of the helpfulness
of multiple modes, images, and interaction for aural comprehension. They believed
that pictures demanded deeper processing (p.124). Jones (2004), based on Kellogg &
Howe suggestions (1971), proved that foreign words associated with visual imagery or
actual objects were learned more easily than without such comprehension aids. And for
the retrieval of the newly-learned words, she concluded that students from the visual
and verbal group could select from and make two and even three connections between
the verbal, visual, and aural mental representations to help them build the meaning of
those words.
Some relevant themes emerged in that study including the helpfulness of interaction
with annotations, the supportive nature of multimodal materials for vocabulary
acquisition, as well as students' beliefs concerning the amount of invested mental effort
(Salomon, 1983) needed to process verbal or visual annotations. Such effort, however,
could be lightened by multimedia instruction. Vocabulary acquisition under
multi-media environment can improve the vocabulary teaching efficiency and extend
students' vocabulary. (Hai-Peng & Li-Jing, 2007, p. 59).
Even though research seems to have proved the effectiveness of multimedia materials in
vocabulary acquisition concentrating on reading activities, a notable exception is Jones'
study that focused on the effect of multimedia input in listening activities. Our work
follows her research and aims to find out whether vocabulary acquisition is also
improved when multimedia support is used in listening activities.
Vocabulary Recognition
Recognition and recall tests are two different forms of testing and demand separate
processing strategies. Recognition tests usually consist on multiple choice activities
where learners have to select or guess the correct answer from the options given;
however, recall tests demand the production from memory of the correct translation of a
list of words given. It is, thus, more difficult than recognition because learners must
search for the correct option within their mental representation of the vocabulary item
(Jones, 2004). Recognition is the testing method that has been used in the present study
because our goal was to analyze vocabulary passive acquisition first, as productive
vocabulary should be the next step and may be affected by many ore variables tan
simply kind of input.
Many researchers have investigated the use of pictorial and written annotations when
testing either reading or listening comprehension in L2. Research has shown that
students who have access to pictorial and written annotations in written production tests
outperform those who do not have access to pictorial annotations because the
combination allows for more than one retrieval route to the information in long term
memory (Jones, 2004).
Jones (2004) also predicted that students with access to pictorial and written
annotations during a L2 listening comprehension activity would [later] recognize more
written translations and pictorial representations of keywords on written vocabulary
recognition posttests (p.133). Although every group performed equally well in the first
post-test regardless of the kind of annotations (written only, written and pictorial, or no
annotation); in general, the group who did not have any kind of support performed the
poorest because the difficulty of the aural text prevented students from creating enough
contextual knowledge, and, therefore, they had less ability to understand and learn the
vocabulary. Students who had access to pictorial and written support acquired better
and more consistently the vocabulary items, and additionally, they could establish direct
connections between the L1 and L2 vocabulary and the corresponding images. Many
other researchers (Baggett, 1989; Kozma, 1991; Oxford & Crookall, 1990) also
advocated that images carry a structural message that complements the language
presented and that the pictorial mode facilitates vocabulary learning.
Therefore, our study wants to contribute to this path of research to make findings in this
area more consistent.
Research Questions
In the present study, we will analyze how audiovisual materials support vocabulary
instruction in English Foreign Language (EFL) classrooms in Spanish high schools. On
the basis of the theoretical background presented above, the following research
questions were formulated:
1. Do students perform better in vocabulary post-tests with visual and audio input
or with only-audio input?
2. Is vocabulary retention enhanced when audiovisual materials have been used in
the short-, medium- and long-term?
3. Are images better than words for vocabulary retrieval?
This piece of research was done in Jaso Ikastola. It is a semi-private school that belongs
to the Euskal Herriko Ikastolen Elkartea (the association of Basque private schools of
the Basque Country) and it is located in a neighborhood called Mendebaldea, in Irua. It
was created as a social initiative in 1980 and it is a cooperative managed by students'
parents. This school is a Public Private Partnership, i.e., it is a private school which
receives money from public funds. Jaso gathers students from the city as well as from
the surrounding areas. Currently, there are around 700 students.
Its main goal is to provide students with an integral and multilingual education.
Students start in pre-school at the age of 2 and leave at the age of 16, when they finish
Compulsory Secondary Education. It is a D model school, therefore, Basque is the
principal (or vehicular) language used there. However, Jaso follows an innovative and
particular curriculum.
Jaso Ikastola, together with other schools within the Euskal Herriko Ikastolen Elkartea,
takes part in a European project to establish a multilingual model of education
introducing English since the early stages of education, at the age of 4. Such project
receives the financial support of the European Commission's Socrates Program (Lingua
Action D). For this purpose, original materials have been created and developed in
English and translated and adapted into Dutch, Italian, Irish, and subsequently piloted in
secondary schools in eight European countries.
The school has three different multilingual projects: Euskaraz bizi, Eleanitz English and
Eleanitz Franais.
Jaso Ikastola promotes English from the very beginning of education. Students have two
hours and a half of English instruction per week until they reach Primary Education;
then, in Primary Education, they study English 3 hours per week, and finally in
Secondary Education, one more hour per week is added to their timetables. Moreover,
in 3rd and 4th of Compulsory Secondary Education, Geography and Social Sciences,
respectively, are taught in English, i. e., English becomes the medium of instruction.
This study was carried out with a group of 16 students in 2nd year of Compulsory
Secondary Education. Most of them were 13-14 years old and had been studying
English for longer than students in other schools taking into consideration the
aforementioned multilingual projects developed by Jaso Ikastola. Thus, students at the
age of 14 in this center had a higher level of English than other students of the same
age. In fact, according to the teachers' criterion, students had an average B1 level of
English based on the European Common Framework of Reference. Such characteristic
was beneficial for the study because it was possible to use an authentic piece of news as
one of the instruments.
The real amount of students in the class was 26 but some of them had an exam resit at
the same time and thus they were not included in this experiment. The group was split
into two different sub-groups, the experimental group (from now on group A) and the
control group (from now on group B). Group A consisted, originally, of 9 students and
B of 8. However, as one of the students from A could not take the second post-test, her
results have not been taken into consideration. Thus, both groups had the same number
of subjects, 8.
The tools used to carry out the study were a video and an audio fragment 1, a pre-test and
a post-test.
Firstly, the video clip was chosen. It was a piece of news about banning fast-food in the
UK. It was chosen in relation to the topic students were seeing in class at that moment.
The video included audio, the images of the film and several written annotations.
Secondly, a pre-test based on the video clip was designed in order to create a corpus of
unknown vocabulary items that would be instructed. The pre-test consisted of a text,
which was a piece of news containing some of the vocabulary that appeared on the
video from which the words that were unknown to every student would be extracted in
order to be taught and tested afterwards.
Finally, the post-test was designed. It consisted of a box with the vocabulary items and
students were asked to match the original word in some cases with its translation and in
other cases with an image that was provided in order to test whether retrieval was
promoted better with words or images. Both channels of information (words and
images) were used. Extra help was given by indicating the category (verbs [v] or nouns
[n]) of words.
In session one, the pre-test was administered. As students did not underline enough
words to have a corpus, the researcher noted down on the blackboard words from the
text which were considered unknown to the students and students had to write the
translation, a synonym or they had to explain the meaning of those words. This way, a
vocabulary corpus of words unknown to all students was created.
In session two, two weeks after the pre-test activity was administered, the group was
split into two groups: 8 students in group A, and 8 in B. For the treatment, a news video
about doctors' calling for a ban on fast food in the UK was used. This video was related
with the unit students were seeing, which was Teenage Health. Group A watched the
video twice, whereas group B listened to the isolated audio from the piece of news an
equal number of times. Then, the new vocabulary items that appeared on the news were
introduced and explained to all students. After the instruction, the group was split again
and the groups were played their respective fragments twice again. Finally, students
from both groups were gathered together in the classroom and the post-test was
administered to them immediately after the treatment. Students took the same post-test
two more times at intervals (after two weeks and after four weeks).
Data Analysis
The three post-tests were corrected and analyzed to obtain the data. The first values
obtained from the post-tests were the raw correct answers of the students in every
post-test. Then, the results were introduced into the SPSS program and analyzed in
order to see statistically significant differences in the data with one-way analysis of
variance (ANOVA).
The results were compared, firstly, according to the method of instruction, i.e., video
or audio. Retention rate was examined next, by analyzing the results of the three
post-tests, which were compared, first all together and then group by group, to see to
what extent it was due to the method that students retained more vocabulary items and
to see which methodology benefited more retention and recognition in the long-term.
Finally, to respond to the third research question, the answers were analyzed according
to the type of annotation used in the recognition tests, images or words.
Results and Discussion
Table 1: General results
Test 1
Test 2
Test 3
In general, regardless of the method of instruction used during the sessions, students
obtained better results immediately after the instruction was given. The longer the lapse
of time between the post-test administration and the instruction, the fewer words
students recognized. So, in both groups there was a general tendency to forget words as
time went by. Moreover, the differences in word recognition rate on the post-tests were
significant (test 1 vs. post-test 2 [t= 5.201, p= .000]; post-test 2 vs. post-test 3 [t= 2.423,
p= .029]; post-test 1 vs. post-test 3 [t= 5.210, p= .000]).
Regarding the type of instruction used in each group, the study found differences
between the performance of the experimental and the control group (see Figure 2).
Figure 2: Average results (groups)
Group A
Group B
Test 1
Test 2
Test 3
In the first post-test the biggest difference was found between group A and B (9.13, vs.
8.50), although the difference was not statistically significant.. However, that difference
tended to decrease after two weeks, as the second post-test showed (6.63, vs. 6.38).
Eventually, in the last post-test, the difference was nonexistent. Both groups scored
equally (5.75). Therefore, the answer to the first research question should be that
students perform better when visual support is provided but only immediately after the
instruction. Our results support the dual-coding input theory as the experimental group's
higher scores in the immediate post-test show that students in this group retained more
information immediately after the instruction. Students received the information
through three different channels (visual, audio and, in some cases, written) and besides,
they created a connection among the words, the concept and the images from the video
fragment. This finding supports previous research and seems to prove that visual
information simply carries more information than does text and allows for greater
comprehension and retention. When we examine the recall protocol measures, students
who accessed visual annotations understood and retained their knowledge of the
passage best because the dense/deep quality of images allowed residual memory to
remain. (Jones, 2004, p. 134). Moreover, this superior recalling immediately after
multimedia support seems to back up the fact that retrieving words is more difficult than
inputting them and
annotations, can facilitate this process since the organization and integration of two
different forms of mental representations enhance retrieval performance by providing
multiple retrieval cues (Plass et al., 1998, p. 34).
The second research question inquired about whether the retention rate would be higher
when audio was complemented with visual support during the sessions.
As we have already stated, in both groups students tended to forget words as time
elapsed and the word-loss rate was significantly different between post-tests 1 and 2 (t=
4.396, p= .017) and 1 and 3 (t= 5.940, p= .017) in group A. However, the comparison
between post-tests 2 and 3 was not significant for this group (t= 2.173, p= .155). The
same statistical pattern was found for group B. There was a statistically significant
difference between the groups' results in post-test 1 and 2 (t= 3.167, p= .002) and 1 and
3 (t= 3.994, p= .001), but not between post-test 2 and 3 (t= 1.391, p= .095).
Consequently, we must recognize that instruction with audiovisual support seems to be
more effective only in the short term because the tendency in both groups was to
experience a decrease in scores in the delayed post-tests. Furthermore, even though
group A outperformed group B in the immediate and in the 2 weeks post-tests, final
scores in the 4 weeks post-test were equal.
Thus, the results showed that in both cases, the vocabulary retained after 2 and 4 weeks
decreased and students in audiovisual conditions also tended to forget the vocabulary
they had previously learned. In fact, the loss was greater in the experimental group as in
the 4 weeks delayed post-test, no difference was found in scores between both
conditions, audio-only input and audio and visual input. So it could be concluded not
only that the audio and visual input does not contribute to the transfer of information to
long-term memory more than audio-only input, but also that the loss was bigger
compared to only-audio input . There are two possible explanations for that.
First, that students in the experimental group worked in an automatic manner, in which
little conscious effort was utilized and experience with the material was fast and
effortless, and therefore learned less (Cohen, 1987, p. 45). The subjects in our study
may have employed less mental resources to understand the piece of news because
images lessened the load of audio, while the control group should have needed more
mental resources and attention to comprehend the only-audio input. Besides, according
to Jones (2003), students views of the easy or difficult nature of annotations determines
the amount of invested mental effort students apply to a given task. The amount of
mental effort students invest in learning is influenced by how they perceive the source
of the information. That is, if they perceive that a given task is difficult, they will use
more mental effort and therefore more nonautomatic energy to process the material. If
they perceive that a given task is easy, they will invest less mental effort and may
potentially learn or retain less as a result (Jones, 2003, p. 57-58). Cennamo (1993) also
found that learners perception of television as an easy medium actually interfered with
their ability to learn from it. And Salomon (1984) stated that when learners perceived
television to be easier than print, or audio-only material in our case, they invested less
mental effort in learning from television and therefore learned less. So, in short, the
rate of forgetting [was] thus a function of the lack of depth and analysis (Cohen, 1987,
p. 45).
Consequently, our results seem to support this view as subjects in the audiovisual
environment tended to forget a greater amount of vocabulary items than students in
only-audio conditions.
The second possible explanation, and totally opposed to the previous one, for this lack
of long term superior retention rates has also been put forward by previous researchers
on the topic (Jones, 2004, Baggett, 1989) who argued that sometimes multimedia
materials deliver too much information for students to process, so the brain is making
such efforts to process every piece of information that finally the information is not
acquired properly. Baggett (1989) also argued in favor of this overload of information
stating that:
There are more connections in the memory representation when the input is
visual. Brown leaf presented verbally creates the instance of leaf
connected with the concept brown. But showing a picture of brown leaf
causes one to create the concept of leaf connected with concepts of brown,
olive, rust, burgundy, etc, not to mention its shape, size, environment, etc.
In the verbal presentation there is one sure connection: leaf with brown. (p.
On the contrary, the control group students were administered the same information but
through a single channel. That might have helped them because the effort to process was
inferior so attention could be more focused and, therefore, even though students
remembered fewer items immediately, those items seemed to be retrieved more
consistently. Consequently, it seems plausible to think that the vocabulary items they
learned were successfully transferred to the long-term memory, as the difference
between the immediate post-test and the final post-test results for this group was
considerably smaller than the difference experienced by the experimental group.
Regarding the third question about whether images would be better than words in
retrieving vocabulary, in general, the results showed that students recognized more
vocabulary items when images were provided than when a translation was provided in
the post-test.
Figure 3: General results (annotations)
Test 1
Test 2
Test 3
In post-test 1, students recognized more images than words (4.81 vs. 4). In post-test 2,
the difference broadened (4.19 vs. 2.31) and, eventually, in the 4 weeks delayed
post-tests, the difference decreased a little bit, but it was still prominent (3.69 vs. 2.06).
Besides, the difference between each annotation type was found to be statistically
significant in every post-test (post-test 1 [t= 2.282, p= .038]; post-test 2 [t= 3.529, p= .
003]; post-test 3 [t= 4.333, p= .001]), and more significant in the most delayed post-test.
The biggest rate of loss in the association between translations and words should also be
pointed out. As time elapsed, students tended to recognize fewer words when they were
linked to its translation (written annotation) than to an image (visual annotation). In fact,
from post-test 1 to post-test 2, students' retrieval of written annotations was reduced
almost to half (post-test 1 [4], post-test 2 [2.31]); whereas the decrease in image
recognition was smaller (post-test 1 [4.81], post-test 2 [4.19]).
Within the groups, there was a significant difference in group A between the annotation
types in each of the three post-tests in favor of images (post-test 1 [t= 3.862, p= .006];
post-test 2 [t= 2.762, p= .028]; post-test 3 [t= 3.742, p= .007]). In group B, except in the
first post-test, students performed significantly better with images too (post-test 1 [t=
1.070, p= .320]; post-test 2 [t= 2.517, p= .040]; post-test 3 [t= 2.376, p= .049]).
The underlying reason for these results may be that the information supported by
images seems to be more consistently retrieved than the information supported by
words. The connection seems to be stronger when students are given a word and an
image of that word than when students are given a word and its translation because the
mental representation of the word is stronger when it is linked to images than to a
translation under both conditions, audiovisual and only-audio conditions. This idea was
firstly supported by a pioneer of the field, Omaggio, in 1979, and ever since, many other
researchers have found the positive effects of visual imagery on L2 vocabulary
recognition and learning. Oxford and Crookall (1990) explained that most learners are
capable of associating new information to concepts in memory by means of meaningful
visual images, and that visual images make learning more efficient. () Moreover, the
pictorial-verbal combination involves many parts of the brain, thus providing greater
cognitive power (p. 17). The binding of form (unknown L2 vocabulary) to meaning
(visual representations) is the most effective way for learners to acquire concrete ideas
and references. Texts are symbolic representations of information which are processed
sequentially, whereas pictures are analog representations of information which may be
mapped directly onto the mental model. This is due to the fact that both visual images
and the mental model represent content matter in the same fashion, i.e., both utilize
analogies (Chun and Plass, 1997). So, in short, the effectiveness of images for retrieving
the meaning of the L2 words may lie in the fact that the image and the mental
representation of the word are coded in the same language (analog).
Conclusion and Future Research
One limitation of this research may be the number of participants. Due to external
factors such as resit exams and Easter Holidays, it was impossible to modify and
postpone the sessions needed in order to have a wider sample of students. It would have
been a more complete and exhaustive piece of research if a third group had been added
with no annotations at all, only the instruction. That should be considered in future
The analyzed data showed that using audiovisual materials instead of the regular
only-audio listening was more effective in terms of vocabulary retention immediately
after exposure. Nevertheless, the effect of this kind of input is not sustained in time
since both, experimental and control group, did equally well in the 4 weeks delayed
Vocabulary recognition results with pictorial vs. written annotations seemed to point to
the superior nature of images for word recognition. Students from both groups
recognized better the original English words with images than with translations.
Therefore, the theory about dual-coding is supported by this study. It seems to be easier
to retain and recognize an item when a connection is made among the word, the aural
mental representation and the L1 word. Therefore, our findings suggest that pictorial
representations of words make it easier to recognize the concept, not only for students
who have received instruction with images but also for students who have received no
such visual input, but nevertheless, appear to have created their own mental
representations which may correspond to the pictures in the post-test.
Whether this greater recognition is sustained in the long term when there is only one
type of annotation remains a question for further research. Researchers should explore if
testing students with
produce the same results as when both types of annotations are present in the same test.
Finally, this study has been based on vocabulary recognition activities, but further
investigation could also be carried out in order to see whether these results can also be
applied to recalling activities, which demand the production of the responses instead of
making students retrieve their knowledge from images or translations given.
