Enriching Confusion Networks for Post-processing

Ghannay, Sahar; Estève, Yannick; Camelin, Nathalie

doi:10.1007/978-3-319-68456-7_10

Sahar Ghannay¹⁶,
Yannick Estève¹⁶ &
Nathalie Camelin¹⁶

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10583))

Included in the following conference series:

International Conference on Statistical Language and Speech Processing

725 Accesses

Abstract

The paper proposes a new approach for a posteriori enrichment of automatic speech recognition (ASR) confusion networks (CNs). CNs are usually needed to decrease word error rate and to compute confidence measures, but they are also used in many ways in order to improve post-processing of ASR outputs. For instance, they can be helpfully used to propose alternative word hypotheses when ASR outputs are corrected by a human on post-edition. However, CNs bins do not have a fixed length, and sometimes contain only one or two word hypotheses: in this case the number of alternatives to correct a misrecognized word is very low, reducing the chance of helping the human annotator.

Our approach for CN enrichment is based on a new similarity measure presented in this paper, computed from acoustic and linguistic word embeddings, that allows us to take into consideration both acoustic and linguistic similarities between two words.

Experimental results show that our approach is relevant: enriched CNs (for a bin size equals to 6) increase the potential correction of erroneous words by 23% than initial CNs produced by an ASR system. In our experiments, a spoken language understanding task is also targeted.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Introduction of Semantic Model to Help Speech Recognition

Part-of-Speech and Confusion-Set Constrained Language Model for Vietnamese Spelling Correction Corpus Construction

Analysis of Phonemes and Tones Confusion Rules Obtained by ASR

Notes

1.
http://www.icsi.berkeley.edu/Speech/docs/sctk-1.2/sclite.htm.

References

Stoyanchev, S., Salletmayr, P., Yang, J., Hirschberg, J.: Localized detection of speech recognition errors. In: 2012 IEEE Spoken Language Technology Workshop (SLT), pp. 25–30. IEEE (2012)
Google Scholar
Pincus, E., Stoyanchev, S., Hirschberg, J.: Exploring features for localized detection of speech recognition errors. In: Proceedings of the SIGDIAL Conference, pp. 132–136. ACL (2013)
Google Scholar
Soto, V., Cooper, E., Mangu, L., Rosenberg, A., Hirschberg, J.: Rescoring confusion networks for keyword search. In: 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 7088–7092. IEEE (2014)
Google Scholar
Mangu, L., Brill, E., Stolcke, A.: Finding consensus in speech recognition: word error minimization and other applications of confusion networks. Comput. Speech Lang. 14(4), 373–400 (2000)
Article Google Scholar
Fusayasu, Y., Tanaka, K., Takiguchi, T., Ariki, Y.: Word-error correction of continuous speech recognition based on normalized relevance distance. In: IJCAI, pp. 1257–1262 (2015)
Google Scholar
Laurent, A., Meignier, S., Merlin, T., Deléglise, P.: Computer-assisted transcription of speech based on confusion network reordering. In: 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4884–4887. IEEE (2011)
Google Scholar
Bengio, Y., Ducharme, R., Vincent, P., Janvin, C.: A neural probabilistic language model. JMLR 3, 1137–1155 (2003). JMLR.org
MATH Google Scholar
Schwenk, H.: CSLM-a modular open-source continuous space language modeling toolkit. In: INTERSPEECH, pp. 1198–1202 (2013)
Google Scholar
Ghannay, S., Favre, B., Estève, Y., Camelin, N.: Word embedding evaluation and combination. In: Language Resources and Evaluation Conference (LREC 2016), Portorož, Slovenia, 10th edn., pp. 23–28, May 2016
Google Scholar
Levy, O., Goldberg, Y.: Dependency based word embeddings. In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, vol. 2, pp. 302–308 (2014)
Google Scholar
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. In: Proceedings of Workshop at ICLR (2013)
Google Scholar
Pennington, J., Socher, R., Manning, C.D.: Glove: global vectors for word representation. In: Proceedings of the Empirical Methods in Natural Language Processing (EMNLP 2014), vol. 12 (2014)
Google Scholar
Kamper, H., Wang, W., Livescu, K.: Deep convolutional acoustic word embeddings using word-pair side information. arXiv preprint arXiv:1510.01032 (2015)
Levin, K., Henry, K., Jansen, A., Livescu, K.: Fixed-dimensional acoustic embeddings of variable-length segments in low-resource settings. In: 2013 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU), pp. 410–415. IEEE (2013)
Google Scholar
Bengio, S., Heigold, G.: Word embeddings for speech recognition. In: INTERSPEECH, pp. 1053–1057 (2014)
Google Scholar
Ghannay, S., Estève, Y., Camelin, N., Deleglise, P.: Acoustic word embeddings for ASR error detection. In: INTERSPEECH 2016, San Francisco, CA, USA, 9–12 September 2016
Google Scholar
Wang, J., Song, Y., Leung, T., Rosenberg, C., Wang, J., Philbin, J., Chen, B., Wu, Y.: Learning fine-grained image similarity with deep ranking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1386–1393 (2014)
Google Scholar
Weston, J., Bengio, S., Usunier, N.: Wsabie: scaling up to large vocabulary image annotation. In: IJCAI, vol. 11, pp. 2764–2770 (2011)
Google Scholar
Ghannay, S., Estève, Y., Camelin, N., et al.: Evaluation of acoustic word embeddings. In: ACL 2016, p. 62 (2016)
Google Scholar
Ghannay, S., Estève, Y., Camelin, N., Dutrey, C., Santiago, F., Adda-Decker, M.: Combining continuous word representation and prosodic features for ASR error prediction. In: Dediu, A.-H., Martín-Vide, C., Vicsi, K. (eds.) SLSP 2015. LNCS, vol. 9449, pp. 84–95. Springer, Cham (2015). doi:10.1007/978-3-319-25789-1_9
Chapter Google Scholar
Galliano, S., Geoffrois, E., Mostefa, D., Choukri, K., Bonastre, J.-F., Gravier, G.: The ESTER phase II evaluation campaign for the rich transcription of French Broadcast News. In: INTERSPEECH 2005, pp. 1149–1152 (2005)
Google Scholar
Galliano, S., Gravier, G., Chaubard, L.: The ESTER 2 evaluation campaign for the rich transcription of French radio broadcasts. In: INTERSPEECH, vol. 9, pp. 2583–2586 (2009)
Google Scholar
Estève, Y., Bazillon, T., Antoine, J.-Y., Béchet, F., Farinas, J.: The EPAC corpus: manual and automatic annotations of conversational speech in French broadcast news. In: LREC. Citeseer (2010)
Google Scholar
Gravier, G., Adda, G., Paulsson, N., Carr, M., Giraudel, A., Galibert, O.: The ETAPE corpus for the evaluation of speech-based TV content processing in the French language. In: Proceedings of the Eight International Conference on Language Resources and Evaluation (LREC 2012) (2012)
Google Scholar
Deléglise, P., Estève, Y., Meignier, S., Merlin, T.: Improvements to the LIUM French ASR system based on CMU Sphinx: what helps to significantly reduce the word error rate? In: INTERSPEECH, Brighton, UK, September 2009
Google Scholar
Cardinal, P., Boulianne, G., Comeau, M., Boisvert, M.: Real-time correction of closed-captions. In: Proceedings of the 45th Annual Meeting of the ACL on Interactive Poster and Demonstration Sessions, pp. 113–116. Association for Computational Linguistics (2007)
Google Scholar
Bonneau-Maynard, H., Quignard, M., Denis, A.: MEDIA: a semantically annotated corpus of task oriented dialogs in French. Lang. Resour. Eval. 43(4), 329 (2009)
Article Google Scholar
Devillers, L., Maynard, H., Rosset, S., Paroubek, P., McTait, K., Mostefa, D., Choukri, K., Charnay, L., Bousquet, C., Vigouroux, N., et al.: The French MEDIA/EVALDA project: the evaluation of the understanding capability of spoken language dialogue systems. In: LREC. Citeseer (2004)
Google Scholar
Rousseau, A., Boulianne, G., Deléglise, P., Estève, Y., Gupta, V., Meignier, S.: LIUM and CRIM ASR system combination for the REPERE evaluation campaign. In: Sojka, P., Horák, A., Kopeček, I., Pala, K. (eds.) TSD 2014. LNCS, vol. 8655, pp. 441–448. Springer, Cham (2014). doi:10.1007/978-3-319-10816-2_53
Google Scholar
Raymond, C., Riccardi, G.: Generative and discriminative algorithms for spoken language understanding. In: INTERSPEECH, pp. 1605–1608 (2007)
Google Scholar
Servan, C., Raymond, C., Béchet, F., Nocéra, P.: Conceptual decoding from word lattices: application to the spoken dialogue corpus media. In: The Ninth International Conference on Spoken Language Processing (INTERSPEECH 2006-ICSLP) (2006)
Google Scholar

Download references

Author information

Authors and Affiliations

LIUM - Le Mans University, Le Mans, France
Sahar Ghannay, Yannick Estève & Nathalie Camelin

Authors

Sahar Ghannay
View author publications
You can also search for this author in PubMed Google Scholar
Yannick Estève
View author publications
You can also search for this author in PubMed Google Scholar
Nathalie Camelin
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sahar Ghannay .

Editor information

Editors and Affiliations

University of Le Mans, Le Mans, France
Nathalie Camelin
University of Le Mans, Le Mans, France
Yannick Estève
Rovira i Virgili University, Tarragona, Spain
Carlos Martín-Vide

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ghannay, S., Estève, Y., Camelin, N. (2017). Enriching Confusion Networks for Post-processing. In: Camelin, N., Estève, Y., Martín-Vide, C. (eds) Statistical Language and Speech Processing. SLSP 2017. Lecture Notes in Computer Science(), vol 10583. Springer, Cham. https://doi.org/10.1007/978-3-319-68456-7_10

Download citation

DOI: https://doi.org/10.1007/978-3-319-68456-7_10
Published: 27 September 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-68455-0
Online ISBN: 978-3-319-68456-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Enriching Confusion Networks for Post-processing

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Introduction of Semantic Model to Help Speech Recognition

Part-of-Speech and Confusion-Set Constrained Language Model for Vietnamese Spelling Correction Corpus Construction

Analysis of Phonemes and Tones Confusion Rules Obtained by ASR

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Enriching Confusion Networks for Post-processing

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Introduction of Semantic Model to Help Speech Recognition

Part-of-Speech and Confusion-Set Constrained Language Model for Vietnamese Spelling Correction Corpus Construction

Analysis of Phonemes and Tones Confusion Rules Obtained by ASR

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation