This study assessed the accuracy of Automatic Speech Recognition (ASR), a technology which enables a computer to correctly identify inputted words or sounds into a microphone or telephone. Linguistic interests in phonemic variation of... more
This study assessed the accuracy of Automatic Speech Recognition (ASR), a technology which enables a computer to correctly identify inputted words or sounds into a microphone or telephone. Linguistic interests in phonemic variation of second language acquisition combined with computer science inquiries of speech recognition led to the investigation of this research. Native English speech as well as non-native English speech from citizens of the People’s Republic of China comprised the data used in the study. The Hidden Markov Toolkit (HTK) was used to train two Hidden Markov Models (HMMs) of speech--one for each speech group. Testing data from both groups were then evaluated by both models. Although the speech recognizing models had a 75% success rate of correctly assessing the individual frames of native and non-native English utterances, they failed to reliably recognize complete utterances of native English speech. This can be accounted by the miscalculation of utterance length, a crucial step in the HTK. For further investigation of this study, relevant and screened data would be needed to improve accuracy of the ASR program.
This work emphasises on the development of Assamese online character recognition system using HMM and SVM and performs a recognition performance analysis for both models. Recognition models using HTK (HMM Toolkit) and LIBSVM (SVM Toolkit)... more
This work emphasises on the development of Assamese online character recognition system using HMM and SVM and performs a recognition performance analysis for both models. Recognition models using HTK (HMM Toolkit) and LIBSVM (SVM Toolkit) are generated by training 181 different Assamese Stokes. Stroke and Akshara level testing are performed separately. In stroke level testing, the confusion patterns of the test strokes from HMM and SVM classifiers are compared. In Akshara level testing, a GUI (provided by CDAC-Pune) which is integrated with the binaries of HTK/LIBSVM and language rules (stores the set of valid strokes which makes a character) are used, manual testing is done with native writers to test the Akshara level performance for both models. Experimental results show that the SVM classifier outperforms the HMM classifier.
Chapter 14 describes the methods of automatic speech recognition. Consecutive parts of the chapter are devoted, inter alia, to methods of speech feature extraction, the dynamic time warping (DTW) method, and speech recognition using... more
Chapter 14 describes the methods of automatic speech recognition. Consecutive parts of the chapter are devoted, inter alia, to methods of speech feature extraction, the dynamic time warping (DTW) method, and speech recognition using hidden Markov models (HMM). The content described in the chapter is supported by programs written in MATLAB which are available for download on the website. (book in Polish)
This paper describes preliminary research results on Polish speech recognition using DTW method aided by HMM. The HMM method is used as a preclassifier searching for n best hypotheses of the recognized words while the DTW – as a precise... more
This paper describes preliminary research results on Polish speech recognition using DTW method aided by HMM. The HMM method is used as a preclassifier searching for n best hypotheses of the recognized words while the DTW – as a precise final classifier. Results of the experiments on large vocabulary isolated word recognition show slight improvement in recognition accuracy compared to alone DTW and HMM methods
The preliminary research results on choosing optimal parameters values of speech recognition process using Hidden Markov Models (HMMs) are presented. An influence of the tree based clustering log likelihood threshold Pq_min , sampling... more
The preliminary research results on choosing optimal parameters values of speech recognition process using Hidden Markov Models (HMMs) are presented. An influence of the tree based clustering log likelihood threshold Pq_min , sampling frequency, kind of utterance and speaker sex on the recognition were tested. Experiments were carried out on the closed set of the 365 isolated utterances taken from Polish speech database - CORPORA. The HTK software was used in the experiments. Obtained results show a dependence of the optimal recognition accuracy on the tested speech recognition parameters, mainly on category of utterance. An optimal value of the parameter Pq_min and sampling frequency was evaluated when a weightet cepstral coefficients were used as a features. The optimal sampling frequency depends on speaker sex.
This work describes a construction of PADAS “Phonetics Arabic Database Automatically segmented” based on a data-driven Markov process. The use of a segmentation database is necessary in speech synthesis and recognizing speech. Manual... more
This work describes a construction of PADAS “Phonetics Arabic Database Automatically segmented” based on a data-driven Markov process. The use of a segmentation database is necessary in speech synthesis and recognizing speech. Manual segmentation is precise but inconsistent, since it is often produced by more than one label and require time and money. The MAUS segmentation and labeling exist for German speech and other languages but not in Arabic. It is necessary to modify MAUS for establish a segmental database for Arab. The speech corpus contains a total of 600 sentences recorded by 3 (2 male and 1 female) Arabic native speakers from Tunisia, 200 sentences for each.