Abstract
We describe the latest version of the SRI-ICSI meeting and lecture recognition system, as was used in the NIST RT-07 evaluations, highlighting improvements made over the last year. Changes in the acoustic preprocessing include updated beamforming software for processing of multiple distant microphones, and various adjustments to the speech segmenter for close-talking microphones. Acoustic models were improved by the combined use of neural-net-estimated phone posterior features, discriminative feature transforms trained with fMPE-MAP, and discriminative Gaussian estimation using MPE-MAP, as well as model adaptation specifically to nonnative and non-American speakers. The net effect of these enhancements was a 14-16% relative error reduction on distant microphones, and a 16-17% error reduction on close-talking microphones. Also, for the first time, we report results on a new “coffee break” meeting genre, and on a new NIST metric designed to evaluate combined speech diarization and recognition.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Stolcke, A., Wooters, C., Mirghafori, N., Pirinen, T., Bulyko, I., Gelbart, D., Graciarena, M., Otterson, S., Peskin, B., Ostendorf, M.: Progress in meeting recognition: The ICSI-SRI-UW Spring 2004 evaluation system. In: Proceedings NIST ICASSP 2004 Meeting Recognition Workshop, Montreal, National Institute of Standards and Technology (2004)
Stolcke, A., Anguera, X., Boakye, K., Çetin, Ö., Grézl, F., Janin, A., Mandal, A., Peskin, B., Wooters, C., Zheng, J.: Further progress in meeting recognition: The ICSI-SRI Spring 2005 speech-to-text evaluation system. In: Proceedings of the Rich Transcription 2005 Spring Meeting Recognition Evaluation, Edinburgh, National Institute of Standards and Technology, pp. 39–50 (2005)
Janin, A., Stolcke, A., Anguera, X., Boakye, K., Çetin, Ö., Frankel, J., Zheng, J.: The ICSI-SRI Spring 2006 meeting recognition system. In: Renals, S., Bengio, S., Fiscus, J.G. (eds.) MLMI 2006. LNCS, vol. 4299, pp. 444–456. Springer, Heidelberg (2006)
Zheng, J., Cetin, O., Hwang, M.Y., Lei, X., Stolcke, A., Morgan, N.: Combining discriminative feature, transform, and model training for large vocabulary speech recognition. In: Proc. ICASSP, Honolulu, vol. 4, pp. 633–636 (2007)
Lamel, L., Schiel, F., Fourcin, A., Mariani, J., Tillman, H.: The translingual English database (TED). In: Proc. ICSLP, Yokohama, pp. 1795–1798 (1994)
Adami, A., Burget, L., Dupont, S., Garudadri, H., Grezl, F., Hermansky, H., Jain, P., Kajarekar, S., Morgan, N., Sivadas, S.: Qualcomm-ICSI-OGI features for ASR. In: Hansen, J.H.L., Pellom, B. (eds.) Proc. ICSLP, Denver, vol. 1, pp. 4–7 (2002)
Anguera, X., Wooters, C., Pardo, J.M.: Robust speaker diarization for meetings: ICSI-SRI RT-06S meetings evaluation system. In: Renals, S., Bengio, S., Fiscus, J.G. (eds.) MLMI 2006. LNCS, vol. 4299. Springer, Heidelberg (2006)
Anguera, X.: Beamformit (the fast and robust acoustic beamformer) (2006), http://www.icsi.berkeley.edu/~xanguera/beamformit/
Boakye, K., Stolcke, A.: Improved speech activity detection using cross-channel features for recognition of multiparty meetings. In: Proc. ICSLP, Pittsburgh, PA, pp. 1962–1965 (2006)
Vergyri, D., Stolcke, A., Gadde, V.R.R., Ferrer, L., Shriberg, E.: Prosodic knowledge sources for automatic speech recognition. In: Proc. ICASSP, Hong Kong, vol. 1, pp. 208–211 (2003)
Povey, D., Woodland, P.C.: Minimum phone error and I-smoothing for improved discriminative training. In: Proc. ICASSP, Orlando, FL, vol. 1, pp. 105–108 (2002)
Graciarena, M., Franco, H., Zheng, J., Vergyri, D., Stolcke, A.: Voicing feature integration in SRI’s Decipher LVCSR system. In: Proc. ICASSP, Montreal, vol. 1, pp. 921–924 (2004)
Kumar, N.: Investigation of Silicon-Auditory Models and Generalization of Linear Discriminant Analysis for Improved Speech Recognition. PhD thesis, Johns Hopkins University, Baltimore (1997)
Morgan, N., Chen, B.Y., Zhu, Q., Stolcke, A.: TRAPping conversational speech: Extending TRAP/Tandem approaches to conversational telephone speech recognition. In: Proc. ICASSP, Montreal, vol. 1, pp. 536–539 (2004)
Zhu, Q., Stolcke, A., Chen, B.Y., Morgan, N.: Using MLP features in SRI’s conversational speech recognition system. In: Proc. Interspeech, Lisbon, pp. 2141–2144 (2005)
Jin, H., Matsoukas, S., Schwartz, R., Kubala, F.: Fast robust inverse transform SAT and multi-stage adaptation. In: Proceedings DARPA Broadcast News Transcription and Understanding Workshop, Lansdowne, VA, pp. 105–109. Morgan Kaufmann, San Francisco (1998)
Stolcke, A., Anguera, X., Boakye, K., Çetin, Ö., Grézl, F., Janin, A., Mandal, A., Peskin, B., Wooters, C., Zheng, J.: Further progress in meeting recognition: The ICSI-SRI Spring 2005 speech-to-text evaluation system. In: Renals, S., Bengio, S. (eds.) MLMI 2005. LNCS, vol. 3869, pp. 463–475. Springer, Heidelberg (2006)
Povey, D., Kingsbury, B., Mangu, L., Saon, G., Soltau, H., Zweig, G.: fMPE: Discriminatively trained features for speech recognition. In: Proc. ICASSP, Philadelphia, vol. 1, pp. 961–964 (2005)
Zheng, J., Stolcke, A.: Improved discriminative training using phone lattices. In: Proc. Interspeech, Lisbon, pp. 2125–2128 (2005)
Zheng, J., Stolcke, A.: fMPE-MAP: Improved discriminative adaptation for modeling new domains. In: Proc. Interspeech, Antwerp, pp. 1573–1576 (2007)
Çetin, Ö., Stolcke, A.: Language modeling in the ICSI-SRI Spring 2005 meeting speech recognition evaluation system. Technical Report TR-05-06, International Computer Science Institute, Berkeley, CA (2005)
Wooters, C., Huijbregts, M.: The ICSI RT 2007 speaker diarization system. LNCS, vol. 4625, pp. 509–519. Springer, Heidelberg (2008)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Stolcke, A. et al. (2008). The SRI-ICSI Spring 2007 Meeting and Lecture Recognition System. In: Stiefelhagen, R., Bowers, R., Fiscus, J. (eds) Multimodal Technologies for Perception of Humans. RT CLEAR 2007 2007. Lecture Notes in Computer Science, vol 4625. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-68585-2_42
Download citation
DOI: https://doi.org/10.1007/978-3-540-68585-2_42
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-68584-5
Online ISBN: 978-3-540-68585-2
eBook Packages: Computer ScienceComputer Science (R0)