Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Speaker Identification and Speech Recognition Using Phased Arrays

  • Chapter
Ambient Intelligence in Everyday Life

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3864))

Abstract

We summarize our research results on an innovative approach to making smart meeting rooms accessible to hands-free users. Specifically, we developed an autodirective system to acquire speech in a noisy room using a microphone array, and to identify the speech from a privileged speaker among others in real time. We successfully established that a commercial speaker-dependent speech recognition product could recognize beamformed speech acquired using our autodirective algorithm. We used the NIST Smart Flow System and the Mk-III microphone array developed by the National Institute of Standards and Technology to conduct our experiments.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Stanford, V.: Smart Space Scenario. In: Proceedings of the 1998 DARPA/NIST Smart Spaces Workshop, July 30-31, Gaithersburg, MD, pp. 1.1–1.2 (1998)

    Google Scholar 

  2. Flanagan, J., Stanford, V.: Situation Awareness in Smart Spaces. In: Proceedings of the 1998 DARPA/NIST Smart Spaces Workshop, July 30-31, Gaithersburg, MD, pp. 3.1–3.13 (1998)

    Google Scholar 

  3. Li, Q., Juang, B.: Speaker Authentication. In: Chou, W., Juang, B. (eds.) Pattern Recognition in Speech and Language Processing, pp. 229–259. CRC Press, Boca Raton (2003)

    Google Scholar 

  4. Reynolds, D., Rose, R.: Robust Text-Independent Speaker Verification Using Gaussian Mixture Speaker Models. IEEE Trans. Speech and Audio Processing 3(1) (1995)

    Google Scholar 

  5. Kwan, C., et al.: A Real-Time Demonstration of the NIST Smart Flow System, Phase 1 SBIR Final Report (2003)

    Google Scholar 

  6. Flanagan, J., Berkley, D., Elko, G., West, J., Sondhi, M.: Autodirective Microphone Systems. Acustica 73, 58–71 (1991)

    Google Scholar 

  7. DeGraaf, S., Johnson, D.: Capability of Processing Algorithms to Estimate Source Bearings. IEEE Trans. On Acoustics, Speech, and Signal Processing ASSP-33(6), 1368–1379 (1985)

    Article  Google Scholar 

  8. Johnson, D., DeGraaf, S.: Improving the Resolution of Bearing in Passive Sonar Arrays by Eigenvalue Analysis. IEEE Trans. on Acoustics, Speech, and Signal Processing ASSP-33(6), 638–647 (1982)

    Article  Google Scholar 

  9. Omologo, M., Matassoni, M., Svaizer, P.: Speech Recognition with Microphone Arrays. Microphone Arrays. In: Brandstein, M., Ward, D. (eds.) Signal Processing Techniques and Applications, pp. 331–349. Springer, Heidelberg (2001)

    Google Scholar 

  10. Flanagan, J., Huang, T. (eds.): Special Issue on Human-Computer Multimodal Interface. Proc. of the IEEE 91(9) (2003)

    Google Scholar 

  11. Hazen, T., et al.: A Segment-Based Audio-Visual Speech Recognizer: Data Collection, Development, and Initial Experiments. In: Proc. of the Sixth International Conference on Multimodal Interfaces, October 14-15, 2004, State College, Pennsylvania, USA, pp. 235–242 (2004)

    Google Scholar 

  12. Rose, R., Quek, F., Shi, Y.: MacVisSTA: A System for Multimodal Analysis. In: Proc. of the Sixth International Conference on Multimodal Interfaces, October 14-15, 2004, State College, Pennsylvania, USA, pp. 259–264 (2004)

    Google Scholar 

  13. Demirdjian, D., Wilson, K., Siracusa, M., Derrell, T.: Real-time Audio-Visual Tracking for Meeting Analysis. In: Proc. of the Sixth International Conference on Multimodal Interfaces, October 14-15, 2004, State College, Pennsylvania, USA, pp. 331–332 (2004)

    Google Scholar 

  14. Rabiner, L., Juang, B.-H.: Linear Predictive Coding Model for Speech Recognition. In: Fundamentals of Speech Recognition, pp. 97–121. PTR Prentice-Hall, Englewood Cliffs (1993)

    Google Scholar 

  15. Knill, K., Young, S.: Hidden Markov Models in Speech and Language Processing. In: Young, S., Bloothoft, G. (eds.) Corpus-Based Methods in Language and Speech Processing, pp. 36–41. Kluwer Academic Pulishers, Norwell (1997)

    Google Scholar 

  16. Parzen, E.: On estimation of a probability density function and mode. Ann. Math. Stat. 33, 1065–1076 (1962)

    Article  MathSciNet  MATH  Google Scholar 

  17. Nabney, I.: Netlab Algorithms for Pattern Recognition. Springer, New York (2001)

    MATH  Google Scholar 

  18. Fiscus, J., Radde, N., Garofolo, J., Le, A., Ajot, J., Laprun, C.: The Rich Transcription 2005 Spring Meeting Recognition Evaluation. In: Renals, S., Bengio, S. (eds.) MLMI 2005. LNCS, vol. 3869, pp. 369–389. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  19. Fiscus, J., Ajot, J., Radde, N., Laprun, C.: Multiple Dimension Levenshtein Edit Distance Calculations for Evaluating Automatic Speech Recognition Systems During Simultaneous Speech. LREC, May 2006, Genoa, Italy (to appear, 2006)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Xu, R. et al. (2006). Speaker Identification and Speech Recognition Using Phased Arrays. In: Cai, Y., Abascal, J. (eds) Ambient Intelligence in Everyday Life. Lecture Notes in Computer Science(), vol 3864. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11825890_11

Download citation

  • DOI: https://doi.org/10.1007/11825890_11

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-37785-6

  • Online ISBN: 978-3-540-37788-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics