Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

ASR Independent Hybrid Recurrent Neural Network Based Error Correction for Dialog System Applications

  • Conference paper
  • First Online:
Multimodal Analyses enabling Artificial Agents in Human-Machine Interaction (MA3HMI 2014)

Abstract

We proposed an automatic speech recognition (ASR) error correction method using hybrid word sequence matching and recurrent neural network for dialog system applications. Basically, the ASR errors are corrected by the word sequence matching whereas the remaining OOV (out of vocabulary) errors are corrected by the secondary method which uses a recurrent neural network based syllable prediction. We evaluated our method on a test parallel corpus (Korean) including ASR results and their correct transcriptions. Overall result indicates that the method effectively decreases the word error rate of the ASR results. The proposed method can correct ASR errors only with a text corpus without their speech recognition results, which means that the method is independent to the ASR engine. The method is general and can be applied to any speech based application such as spoken dialog systems.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 34.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 44.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    In some cases, the syllable prediction length is longer than the number of syllables of the detected erroneous word. Then, to predict a syllable, the method must predict a phoneme first.

  2. 2.

    The number of syllables and phonemes is constant for Korean.

References

  1. Brandow, R.L., Strzalkowski, T.: Improving speech recognition through text-based linguistic post-processing. US Patent 6,064,957, 16 May 2000

    Google Scholar 

  2. Choi, J., Kim, K., Lee, S., Kim, S., Lee, D., Lee, I., Lee, G.G.: Seamless error correction interface for voice word processor. In: 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4973–4976. IEEE (2012)

    Google Scholar 

  3. Choi, J., Lee, D., Ryu, S., Lee, K., Kim, K., Noh, H., Lee, G.G.: Engine-independent asr error management for dialog systems. In: Intenational Workshop Series on Spoken Dialogue Systems Technology (IWSDS) (2014)

    Google Scholar 

  4. Evermann, G., Woodland, P.: Posterior probability decoding, confidence estimation and system combination (2000)

    Google Scholar 

  5. Fiscus, J.G.: A post-processing system to yield reduced word error rates: Recognizer output voting error reduction (rover). In: Proceedings of the 1997 IEEE Workshop on Automatic Speech Recognition and Understanding, pp. 347–354. IEEE (1997)

    Google Scholar 

  6. Han, D., Choi, K.: A study on error correction using phoneme similarity in post-processing of speech recognition. J. Korea Inst. Intel. Transp. Syst. 6(3), 77–86 (2007). The Korean Institute of Intelligent Transport Systems (Korean ITS)

    MathSciNet  Google Scholar 

  7. Jeong, M., Jung, S., Lee, G.G.: Speech recognition error correction using maximum entropy language model. In: Proceedings of INTERSPEECH, pp. 2137–2140 (2004)

    Google Scholar 

  8. Mikolov, T., Karafiát, M., Burget, L., Cernockỳ, J., Khudanpur, S.: Recurrent neural network based language model. In: INTERSPEECH, pp. 1045–1048 (2010)

    Google Scholar 

  9. Mikolov, T., Kombrink, S., Burget, L., Cernocky, J., Khudanpur, S.: Extensions of recurrent neural network language model. In: 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5528–5531. IEEE (2011)

    Google Scholar 

  10. Ringger, E.K., Allen, J.F.: A fertility channel model for post-correction of continuous speech recognition. In: Proceedings of the Fourth International Conference on Spoken Language, 1996. ICSLP 1996, vol. 2, pp. 897–900. IEEE (1996)

    Google Scholar 

Download references

Acknowledgements

This work was partly supported by the ICT R&D program of MSIP/IITP [14-824-09-014, Basic Software Research in Human-level Lifelong Machine Learning (Machine Learning Center)] and by the National Research Foundation of Korea (NRF) [NRF-2014R1A2A1A01003041].

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Junhwi Choi .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Choi, J. et al. (2015). ASR Independent Hybrid Recurrent Neural Network Based Error Correction for Dialog System Applications. In: Böck, R., Bonin, F., Campbell, N., Poppe, R. (eds) Multimodal Analyses enabling Artificial Agents in Human-Machine Interaction. MA3HMI 2014. Lecture Notes in Computer Science(), vol 8757. Springer, Cham. https://doi.org/10.1007/978-3-319-15557-9_7

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-15557-9_7

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-15556-2

  • Online ISBN: 978-3-319-15557-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics