Abstract
Audio-visual dialogue is an appealing tool for natural interface with computers. Lip-reading is one of important part for audio-visual dialogue. In this paper, it is proposed to use a self-organizing feature map (SOM) and a hierarchical SOM: Hypercolumn model (HCM), as a module of phoneme feature space construction for HMM base lip-reading system. Those SOMs allow alleviating many difficulties associated with feature space construction. It is, however, required for on-line systems to reduce the feature extraction time to the range of normal video camera rates. To achieve this, a randomization technique is introduced. The experimental results show performances of the SOMs for Japanese lip-reading.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
HMM Took Kit, http://htk.eng.cam.ac.uk/
Deligne, S., Potamianos, G., Neti, C.: Audio-visual Speech Enhancement with AVCDCN (Audio-visual Codebook Dependent Cepstral Normalization). In: Int. Conf. on Spoken Language Processing, pp. 1449–1452 (2002)
Heckmann, M., Krochel, K., Savariaux, C., Berthommier, F.: DCT-Based Video Features for Audio-visual speech recognition. In: Int. Conf. on Spoken Language Processing, pp. 1925–1928 (2002)
Meier, U., Stiefelhagen, R., Yang, J., Waibel, A.: Towards unrestricted lipreading. In: 2nd International Conference on Multimodal Interfaces (ICMI 1999) (1999)
Tsuruta, N., Tobely, T., Yoshiki, Y.: A Randomized Self-organizing Map for Gesture Recognition. Journal of Japan Society for Fuzzy Theory and Systems 14(1), 82–87 (2002)
Tobely, T., Tsuruta, N., Amamiya, M.: A Randomized Model of The Hypercolumn Neural Network for Gesture Recognition. Int. Journal of Computers, Systems and Signals 3(1), 14–18 (2002)
Kohonen, T.: Self-organizing maps. Springer Series in Information Science (1997)
Fukushima, K.: Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position. Biol. Cybernetics 36(4), 193–202 (1980)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2003 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Tsuruta, N., Iuchi, H., El Sagheer, A., El Tobely, T. (2003). Self-Organizing Feature Maps for HMM Based Lip-Reading. In: Palade, V., Howlett, R.J., Jain, L. (eds) Knowledge-Based Intelligent Information and Engineering Systems. KES 2003. Lecture Notes in Computer Science(), vol 2774. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-45226-3_23
Download citation
DOI: https://doi.org/10.1007/978-3-540-45226-3_23
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-40804-8
Online ISBN: 978-3-540-45226-3
eBook Packages: Springer Book Archive