Ed 613938
Ed 613938
Ed 613938
What is it?
When users talk into an ASR-enabled application, the speech signal turns into
an audio file that is first filtered for background noise and then parsed into
phonemes, which are the smallest sound units in a language: the word ‘push’, for
example, has three phonemes (‘p’, ‘u’, and ‘sh’). Through statistical probability,
the ASR system analyzes the phoneme sequences it ‘recognizes’ and deduces the
words that best match those sound strings. The auto-generated text can then be
‘read’ by a machine to perform some other tasks.
1. University of St. Thomas, Saint Paul, Minnesota, United States; pere9775@stthomas.edu; https://orcid.org/0000-0001-
7543-4506
How to cite: Pérez Castillejo, S. (2021). Automatic speech recognition: can you understand me? In T. Beaven & F. Rosell-
Aguilar (Eds), Innovative language pedagogy report (pp. 121-126). Research-publishing.net. https://doi.org/10.14705/
rpnet.2021.50.1246
Examples
An emerging ASR application is the use of Virtual Assistants (VA) such as Alexa
or Siri (Istrate, 2019; see also Underwood, this volume). The communicative
functions that VAs motivate include uttering commands (“Alexa, play some
music!”) or asking factual questions (“Siri, what is the weather like in Tokyo
today?”). Successfully getting a VA to perform the desired action or to provide
the needed information requires not only pronunciation accuracy, but also
some knowledge of L2 vocabulary and sentence structure: the learners are not
reading or repeating model sentences. If the task involves asking questions
and using the information obtained, listening comprehension is an additional
skill practiced.
122
Susana Pérez Castillejo
Benefits
Using ASR for pronunciation training may encourage learner autonomy: the
immediate feedback provided by the software, in the form of a transcript or an
accuracy score, makes learners more aware of their progress, and the ability to
carry out the exercises without the teacher gives them more control over their
practice.
Speaking tasks with VAs also increase speaking opportunities beyond the
classroom. VAs are not suitable for conversational practice, yet, but producing
the short action-oriented or information-seeking utterances typical in these
tasks is still a good proficiency-building exercise that can prepare learners for
more involved oral discourses. In fact, frequent use of VAs for independent
practice has been linked to significant improvements in L2 speaking proficiency
(Dizon, 2020).
Potential issues
123
Chapter 19. Automatic speech recognition
Auto-generated transcripts that are still highly accurate with novice learners
will be a welcome grading aid for teachers. Reading is faster than listening,
particularly if the audio file is plagued with the long pauses typical in low-
proficiency speech. While auto-generated fluency scores can indicate progress
on the temporal aspects of speech (frequency and mean duration of pauses,
percentage of speaking time), transcripts can help teachers provide feedback
on lexical and syntactic accuracy faster.
124
Susana Pérez Castillejo
References
125
Chapter 19. Automatic speech recognition
Resource
For some advice on which ASR apps to try out, see: https://www.techradar.com/news/best-
speech-to-text-app
126
Published by Research-publishing.net, a not-for-profit association
Contact: info@research-publishing.net
Disclaimer: Research-publishing.net does not take any responsibility for the content of the pages written by the
authors of this book. The authors have recognised that the work described was not published before, or that it
was not under consideration for publication elsewhere. While the information in this book is believed to be true
and accurate on the date of its going to press, neither the editorial team nor the publisher can accept any legal
responsibility for any errors or omissions. The publisher makes no warranty, expressed or implied, with respect to
the material contained herein. While Research-publishing.net is committed to publishing works of integrity, the
words are the authors’ alone.
Trademark notice: product or corporate names may be trademarks or registered trademarks, and are used only for
identification and explanation without intent to infringe.
Copyrighted material: every effort has been made by the editorial team to trace copyright holders and to obtain
their permission for the use of copyrighted material in this book. In the event of errors or omissions, please notify
the publisher of any corrections that will need to be incorporated in future editions of this book.
Typeset by Research-publishing.net
Cover layout by © 2021 Raphaël Savina (raphael@savina.net)
Photo by Digital Buggu from Pexels (CC0)
Legal deposit, France: Bibliothèque Nationale de France - Dépôt légal: mars 2021.