Abstract
We summarize our research results on an innovative approach to making smart meeting rooms accessible to hands-free users. Specifically, we developed an autodirective system to acquire speech in a noisy room using a microphone array, and to identify the speech from a privileged speaker among others in real time. We successfully established that a commercial speaker-dependent speech recognition product could recognize beamformed speech acquired using our autodirective algorithm. We used the NIST Smart Flow System and the Mk-III microphone array developed by the National Institute of Standards and Technology to conduct our experiments.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Stanford, V.: Smart Space Scenario. In: Proceedings of the 1998 DARPA/NIST Smart Spaces Workshop, July 30-31, Gaithersburg, MD, pp. 1.1–1.2 (1998)
Flanagan, J., Stanford, V.: Situation Awareness in Smart Spaces. In: Proceedings of the 1998 DARPA/NIST Smart Spaces Workshop, July 30-31, Gaithersburg, MD, pp. 3.1–3.13 (1998)
Li, Q., Juang, B.: Speaker Authentication. In: Chou, W., Juang, B. (eds.) Pattern Recognition in Speech and Language Processing, pp. 229–259. CRC Press, Boca Raton (2003)
Reynolds, D., Rose, R.: Robust Text-Independent Speaker Verification Using Gaussian Mixture Speaker Models. IEEE Trans. Speech and Audio Processing 3(1) (1995)
Kwan, C., et al.: A Real-Time Demonstration of the NIST Smart Flow System, Phase 1 SBIR Final Report (2003)
Flanagan, J., Berkley, D., Elko, G., West, J., Sondhi, M.: Autodirective Microphone Systems. Acustica 73, 58–71 (1991)
DeGraaf, S., Johnson, D.: Capability of Processing Algorithms to Estimate Source Bearings. IEEE Trans. On Acoustics, Speech, and Signal Processing ASSP-33(6), 1368–1379 (1985)
Johnson, D., DeGraaf, S.: Improving the Resolution of Bearing in Passive Sonar Arrays by Eigenvalue Analysis. IEEE Trans. on Acoustics, Speech, and Signal Processing ASSP-33(6), 638–647 (1982)
Omologo, M., Matassoni, M., Svaizer, P.: Speech Recognition with Microphone Arrays. Microphone Arrays. In: Brandstein, M., Ward, D. (eds.) Signal Processing Techniques and Applications, pp. 331–349. Springer, Heidelberg (2001)
Flanagan, J., Huang, T. (eds.): Special Issue on Human-Computer Multimodal Interface. Proc. of the IEEE 91(9) (2003)
Hazen, T., et al.: A Segment-Based Audio-Visual Speech Recognizer: Data Collection, Development, and Initial Experiments. In: Proc. of the Sixth International Conference on Multimodal Interfaces, October 14-15, 2004, State College, Pennsylvania, USA, pp. 235–242 (2004)
Rose, R., Quek, F., Shi, Y.: MacVisSTA: A System for Multimodal Analysis. In: Proc. of the Sixth International Conference on Multimodal Interfaces, October 14-15, 2004, State College, Pennsylvania, USA, pp. 259–264 (2004)
Demirdjian, D., Wilson, K., Siracusa, M., Derrell, T.: Real-time Audio-Visual Tracking for Meeting Analysis. In: Proc. of the Sixth International Conference on Multimodal Interfaces, October 14-15, 2004, State College, Pennsylvania, USA, pp. 331–332 (2004)
Rabiner, L., Juang, B.-H.: Linear Predictive Coding Model for Speech Recognition. In: Fundamentals of Speech Recognition, pp. 97–121. PTR Prentice-Hall, Englewood Cliffs (1993)
Knill, K., Young, S.: Hidden Markov Models in Speech and Language Processing. In: Young, S., Bloothoft, G. (eds.) Corpus-Based Methods in Language and Speech Processing, pp. 36–41. Kluwer Academic Pulishers, Norwell (1997)
Parzen, E.: On estimation of a probability density function and mode. Ann. Math. Stat. 33, 1065–1076 (1962)
Nabney, I.: Netlab Algorithms for Pattern Recognition. Springer, New York (2001)
Fiscus, J., Radde, N., Garofolo, J., Le, A., Ajot, J., Laprun, C.: The Rich Transcription 2005 Spring Meeting Recognition Evaluation. In: Renals, S., Bengio, S. (eds.) MLMI 2005. LNCS, vol. 3869, pp. 369–389. Springer, Heidelberg (2006)
Fiscus, J., Ajot, J., Radde, N., Laprun, C.: Multiple Dimension Levenshtein Edit Distance Calculations for Evaluating Automatic Speech Recognition Systems During Simultaneous Speech. LREC, May 2006, Genoa, Italy (to appear, 2006)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Xu, R. et al. (2006). Speaker Identification and Speech Recognition Using Phased Arrays. In: Cai, Y., Abascal, J. (eds) Ambient Intelligence in Everyday Life. Lecture Notes in Computer Science(), vol 3864. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11825890_11
Download citation
DOI: https://doi.org/10.1007/11825890_11
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-37785-6
Online ISBN: 978-3-540-37788-7
eBook Packages: Computer ScienceComputer Science (R0)