Speaker Identification and Speech Recognition Using Phased Arrays

Xu, Roger; Mei, Gang; Ren, ZuBing; Kwan, Chiman; Aube, Julien; Rochet, Cedrick; Stanford, Vincent

doi:10.1007/11825890_11

Roger Xu²⁰,
Gang Mei²⁰,
ZuBing Ren²⁰,
Chiman Kwan²⁰,
Julien Aube²⁰,
Cedrick Rochet²⁰ &
…
Vincent Stanford²¹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3864))

1326 Accesses
7 Citations

Abstract

We summarize our research results on an innovative approach to making smart meeting rooms accessible to hands-free users. Specifically, we developed an autodirective system to acquire speech in a noisy room using a microphone array, and to identify the speech from a privileged speaker among others in real time. We successfully established that a commercial speaker-dependent speech recognition product could recognize beamformed speech acquired using our autodirective algorithm. We used the NIST Smart Flow System and the Mk-III microphone array developed by the National Institute of Standards and Technology to conduct our experiments.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Introduction

Concurrent speakers localization using blind source separation and microphone array geometry

Article 09 May 2021

A Comparative Study of Speech Processing in Microphone Arrays with Multichannel Alignment and Zelinski Post-Filtering

References

Stanford, V.: Smart Space Scenario. In: Proceedings of the 1998 DARPA/NIST Smart Spaces Workshop, July 30-31, Gaithersburg, MD, pp. 1.1–1.2 (1998)
Google Scholar
Flanagan, J., Stanford, V.: Situation Awareness in Smart Spaces. In: Proceedings of the 1998 DARPA/NIST Smart Spaces Workshop, July 30-31, Gaithersburg, MD, pp. 3.1–3.13 (1998)
Google Scholar
Li, Q., Juang, B.: Speaker Authentication. In: Chou, W., Juang, B. (eds.) Pattern Recognition in Speech and Language Processing, pp. 229–259. CRC Press, Boca Raton (2003)
Google Scholar
Reynolds, D., Rose, R.: Robust Text-Independent Speaker Verification Using Gaussian Mixture Speaker Models. IEEE Trans. Speech and Audio Processing 3(1) (1995)
Google Scholar
Kwan, C., et al.: A Real-Time Demonstration of the NIST Smart Flow System, Phase 1 SBIR Final Report (2003)
Google Scholar
Flanagan, J., Berkley, D., Elko, G., West, J., Sondhi, M.: Autodirective Microphone Systems. Acustica 73, 58–71 (1991)
Google Scholar
DeGraaf, S., Johnson, D.: Capability of Processing Algorithms to Estimate Source Bearings. IEEE Trans. On Acoustics, Speech, and Signal Processing ASSP-33(6), 1368–1379 (1985)
Article Google Scholar
Johnson, D., DeGraaf, S.: Improving the Resolution of Bearing in Passive Sonar Arrays by Eigenvalue Analysis. IEEE Trans. on Acoustics, Speech, and Signal Processing ASSP-33(6), 638–647 (1982)
Article Google Scholar
Omologo, M., Matassoni, M., Svaizer, P.: Speech Recognition with Microphone Arrays. Microphone Arrays. In: Brandstein, M., Ward, D. (eds.) Signal Processing Techniques and Applications, pp. 331–349. Springer, Heidelberg (2001)
Google Scholar
Flanagan, J., Huang, T. (eds.): Special Issue on Human-Computer Multimodal Interface. Proc. of the IEEE 91(9) (2003)
Google Scholar
Hazen, T., et al.: A Segment-Based Audio-Visual Speech Recognizer: Data Collection, Development, and Initial Experiments. In: Proc. of the Sixth International Conference on Multimodal Interfaces, October 14-15, 2004, State College, Pennsylvania, USA, pp. 235–242 (2004)
Google Scholar
Rose, R., Quek, F., Shi, Y.: MacVisSTA: A System for Multimodal Analysis. In: Proc. of the Sixth International Conference on Multimodal Interfaces, October 14-15, 2004, State College, Pennsylvania, USA, pp. 259–264 (2004)
Google Scholar
Demirdjian, D., Wilson, K., Siracusa, M., Derrell, T.: Real-time Audio-Visual Tracking for Meeting Analysis. In: Proc. of the Sixth International Conference on Multimodal Interfaces, October 14-15, 2004, State College, Pennsylvania, USA, pp. 331–332 (2004)
Google Scholar
Rabiner, L., Juang, B.-H.: Linear Predictive Coding Model for Speech Recognition. In: Fundamentals of Speech Recognition, pp. 97–121. PTR Prentice-Hall, Englewood Cliffs (1993)
Google Scholar
Knill, K., Young, S.: Hidden Markov Models in Speech and Language Processing. In: Young, S., Bloothoft, G. (eds.) Corpus-Based Methods in Language and Speech Processing, pp. 36–41. Kluwer Academic Pulishers, Norwell (1997)
Google Scholar
Parzen, E.: On estimation of a probability density function and mode. Ann. Math. Stat. 33, 1065–1076 (1962)
Article MathSciNet MATH Google Scholar
Nabney, I.: Netlab Algorithms for Pattern Recognition. Springer, New York (2001)
MATH Google Scholar
Fiscus, J., Radde, N., Garofolo, J., Le, A., Ajot, J., Laprun, C.: The Rich Transcription 2005 Spring Meeting Recognition Evaluation. In: Renals, S., Bengio, S. (eds.) MLMI 2005. LNCS, vol. 3869, pp. 369–389. Springer, Heidelberg (2006)
Chapter Google Scholar
Fiscus, J., Ajot, J., Radde, N., Laprun, C.: Multiple Dimension Levenshtein Edit Distance Calculations for Evaluating Automatic Speech Recognition Systems During Simultaneous Speech. LREC, May 2006, Genoa, Italy (to appear, 2006)
Google Scholar

Download references

Author information

Authors and Affiliations

Intelligent Automation, Inc., 7519 Standish Place, Rockville, Maryland, USA
Roger Xu, Gang Mei, ZuBing Ren, Chiman Kwan, Julien Aube & Cedrick Rochet
The National Institute of Standards and Technology, 100 Bureau Drive, Gaithersburg, Maryland, USA
Vincent Stanford

Authors

Roger Xu
View author publications
You can also search for this author in PubMed Google Scholar
Gang Mei
View author publications
You can also search for this author in PubMed Google Scholar
ZuBing Ren
View author publications
You can also search for this author in PubMed Google Scholar
Chiman Kwan
View author publications
You can also search for this author in PubMed Google Scholar
Julien Aube
View author publications
You can also search for this author in PubMed Google Scholar
Cedrick Rochet
View author publications
You can also search for this author in PubMed Google Scholar
Vincent Stanford
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Ambient Intelligence Lab, Carnegie Mellon University, CIC-2218, 4720 Forbes Avenue, 15213, Pittsburgh, PA, USA
Yang Cai
Informatika Fakultatea, University of the Basque Country, Manuel Lardizabal 1, 20018, Donostia, Spain
Julio Abascal

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Xu, R. et al. (2006). Speaker Identification and Speech Recognition Using Phased Arrays. In: Cai, Y., Abascal, J. (eds) Ambient Intelligence in Everyday Life. Lecture Notes in Computer Science(), vol 3864. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11825890_11

Download citation

DOI: https://doi.org/10.1007/11825890_11
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-37785-6
Online ISBN: 978-3-540-37788-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Speaker Identification and Speech Recognition Using Phased Arrays

Abstract

Access this chapter

Preview

Similar content being viewed by others

Introduction

Concurrent speakers localization using blind source separation and microphone array geometry

A Comparative Study of Speech Processing in Microphone Arrays with Multichannel Alignment and Zelinski Post-Filtering

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Publish with us

Navigation

Speaker Identification and Speech Recognition Using Phased Arrays

Abstract

Access this chapter

Preview

Similar content being viewed by others

Introduction

Concurrent speakers localization using blind source separation and microphone array geometry

A Comparative Study of Speech Processing in Microphone Arrays with Multichannel Alignment and Zelinski Post-Filtering

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Share this chapter

Publish with us

Search

Navigation