English mispronunciation detection module using a Transformer network integrated into a chatbot
Keywords:
Mispronunciation detection, Automatic Speech recognition, Transformer networkAbstract
Today it is crucial to have up-to-date information for companies to be more competitive in this business world. There are applications based on speech recognition that allows access to data stored in databases. However, the proper functioning of these applications lies in good pronunciation, a skill that most people do not have. In this paper, the architecture of an English mispronunciation detection module integrated into a chatbot is proposed. It allows users to enter the audio of the phrases in which they want to evaluate their pronunciation. The output is the mispronounced words, thus helping the user to practice their English language pronunciation. The proposed architecture consists of an Automatic Speech Recognizer (ASR) model based on a Transformer network that converts the audio signal to text and an algorithm for string alignment that identifies mispronounced words using the Levenshtein distance. The Transformer network was trained using the LibriSpeech and L2-ARTIC datasets. The module was evaluated using the Accuracy metrics, reaching 90%, and the Character Error Rate metric, reaching 9.5%. Additionally, its performance was evaluated on a group of real users, showing promising results.