Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1609/aaai.v33i01.33016351guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
research-article
Free access

Adapting translation models for transcript disfluency detection

Published: 27 January 2019 Publication History

Abstract

Transcript disfluency detection (TDD) is an important component of the real-time speech translation system, which arouses more and more interests in recent years. This paper presents our study on adapting neural machine translation (NMT) models for TDD. We propose a general training framework for adapting NMT models to TDD task rapidly. In this framework, the main structure of the model is implemented similar to the NMT model. Additionally, several extended modules and training techniques which are independent of the NMT model are proposed to improve the performance, such as the constrained decoding, denoising autoen-coder initialization and a TDD-specific training object. With the proposed training framework, we achieve significant improvement. However, it is too slow in decoding to be practical. To build a feasible and production-ready solution for TDD, we propose a fast non-autoregressive TDD model following the non-autoregressive NMT model emerged recently. Even we do not assume the specific architecture of the NMT model, we build our TDD model on the basis of Transformer, which is the state-of-the-art NMT model. We conduct extensive experiments on the publicly available set, Switchboard, and in-house Chinese set. Experimental results show that the proposed model significantly outperforms previous state-of-the-art models.

References

[1]
Bahdanau, D.; Cho, K.; and Bengio, Y. 2014. Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv: 1409.0473.
[2]
Charniak, E., and Johnson, M. 2001. Edit detection and parsing for transcribed speech. In Proceedings of the second meeting of the North American Chapter of the Association for Computational Linguistics on Language technologies, 1-9. Association for Computational Linguistics.
[3]
Cho, E.; Niehues, J.; Ha, T.-L.; and Waibel, A. 2016. Multilingual disfluency removal using nmt. iwslt2016.
[4]
Ferguson, J.; Durrett, G.; and Klein, D. 2015. Disfluency detection with a semi-markov model and prosodic features. In Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 257-262.
[5]
Georgila, K. 2009. Using integer linear programming for detecting speech disfluencies. In Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Companion Volume: Short Papers, 109-112. Association for Computational Linguistics.
[6]
Gu, J.; Lu, Z.; Li, H.; and Li, V. O. 2016. Incorporating copying mechanism in sequence-to-sequence learning. arXiv preprint arXiv: 1603.06393.
[7]
Gu, J.; Bradbury, J.; Xiong, C.; Li, V. O.; and Socher, R. 2017. Non-autoregressive neural machine translation. arXiv preprint arXiv: 1711.02281.
[8]
Honnibal, M., and Johnson, M. 2014. Joint incremental disfluency detection and dependency parsing. Transactions of the Association of Computational Linguistics 2(1): 131-142.
[9]
Hough, J., and Schlangen, D. 2015. Recurrent neural networks for incremental disfluency detection. Interspeech 2015.
[10]
Johnson, M., and Charniak, E. 2004. A tag-based noisy-channel model of speech repairs. In Proceedings of the 42nd Annual Meeting of the Association for Computational Linguistics (ACL-04).
[11]
Liu, Y.; Shriberg, E.; Stolcke, A.; Hillard, D.; Ostendorf, M.; and Harper, M. 2006. Enriching speech recognition with automatic detection of sentence boundaries and disfluencies. IEEE Transactions on audio, speech, and language processing 14(5): 1526-1540.
[12]
Lou, P. J., and Johnson, M. 2017. Disfluency detection using a noisy channel model and a deep neural language model. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), volume 2, 547-553.
[13]
Meteer, M. W.; Taylor, A. A.; Maclntyre, R.; and Iyer, R. 1995. Dysfluency annotation stylebook for the switchboard corpus. University of Pennsylvania.
[14]
Neubig, G.; Akita, Y.; Mori, S.; and Kawahara, T. 2012. A monotonic statistical machine translation approach to speaking style transformation. Computer Speech & Language 26(5):349-370.
[15]
Ostendorf, M., and Hahn, S. 2013. A sequential repetition model for improved disfluency detection. In INTER-SPEECH, 2624-2628.
[16]
Qian, X., and Liu, Y. 2013. Disfluency detection using multi-step stacked learning. In Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 820-825.
[17]
Rasooli, M. S., and Tetreault, J. 2013. Joint parsing and disfluency detection in linear time. In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, 124-129.
[18]
Shriberg, E. E. 1994. Preliminaries to a theory of speech disfluencies. Ph.D. Dissertation, Citeseer.
[19]
Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A. N.; Kaiser, Ł.; and Polosukhin, I. 2017. Attention is all you need. In Advances in Neural Information Processing Systems, 6000-6010.
[20]
Wang, S.; Che, W.; Zhang, Y.; Zhang, M.; and Liu, T. 2017. Transition-based disfluency detection using lstms. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, 2785-2794.
[21]
Wang, E.; Chen, W.; Yang, Z.; Dong, Q.; Xu, S.; and Xu, B. 2018. Semi-supervised disfluency detection. In Proceedings of the 27th International Conference on Computational Linguistics, 3529-3538.
[22]
Wang, S.; Che, W.; and Liu, T. 2016. A neural attention model for disfluency detection. In Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers, 278-287.
[23]
Wu, S.; Zhang, D.; Zhou, M.; and Zhao, T. 2015. Efficient disfluency detection with transition-based parsing. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), volume 1, 495-503.
[24]
Yoshikawa, M.; Shindo, H.; and Matsumoto, Y. 2016. Joint transition-based dependency parsing and disfluency detection for automatic speech recognition texts. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, 1036-1041.
[25]
Zayats, V.; Ostendorf, M.; and Hajishirzi, H. 2014. Multidomain disfluency and repair detection. In Fifteenth Annual Conference of the International Speech Communication Association.
[26]
Zayats, V.; Ostendorf, M.; and Hajishirzi, H. 2016. Disfluency detection using a bidirectional lstm. arXiv preprint arXiv: 1604.03209.
[27]
Zwarts, S., and Johnson, M. 2011. The impact of language models and loss functions on repair disfluency detection. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies-Volume 1, 703-711. Association for Computational Linguistics.

Cited By

View all
  • (2024)Regularizing cross-attention learning for end-to-end speech translation with ASR and MT attention matricesExpert Systems with Applications: An International Journal10.1016/j.eswa.2024.123241247:COnline publication date: 1-Aug-2024
  • (2022)Towards improving Disfluency Detection from Speech using Shifted Delta Cepstral CoefficientsProceedings of the 2022 Fourteenth International Conference on Contemporary Computing10.1145/3549206.3549269(350-355)Online publication date: 4-Aug-2022
  • (2021)Combining Self-supervised Learning and Active Learning for Disfluency DetectionACM Transactions on Asian and Low-Resource Language Information Processing10.1145/348729021:3(1-25)Online publication date: 13-Dec-2021
  • Show More Cited By

Index Terms

  1. Adapting translation models for transcript disfluency detection
          Index terms have been assigned to the content through auto-classification.

          Recommendations

          Comments

          Information & Contributors

          Information

          Published In

          cover image Guide Proceedings
          AAAI'19/IAAI'19/EAAI'19: Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence and Thirty-First Innovative Applications of Artificial Intelligence Conference and Ninth AAAI Symposium on Educational Advances in Artificial Intelligence
          January 2019
          10088 pages
          ISBN:978-1-57735-809-1

          Sponsors

          • Association for the Advancement of Artificial Intelligence

          Publisher

          AAAI Press

          Publication History

          Published: 27 January 2019

          Qualifiers

          • Research-article
          • Research
          • Refereed limited

          Contributors

          Other Metrics

          Bibliometrics & Citations

          Bibliometrics

          Article Metrics

          • Downloads (Last 12 months)24
          • Downloads (Last 6 weeks)6
          Reflects downloads up to 09 Nov 2024

          Other Metrics

          Citations

          Cited By

          View all
          • (2024)Regularizing cross-attention learning for end-to-end speech translation with ASR and MT attention matricesExpert Systems with Applications: An International Journal10.1016/j.eswa.2024.123241247:COnline publication date: 1-Aug-2024
          • (2022)Towards improving Disfluency Detection from Speech using Shifted Delta Cepstral CoefficientsProceedings of the 2022 Fourteenth International Conference on Contemporary Computing10.1145/3549206.3549269(350-355)Online publication date: 4-Aug-2022
          • (2021)Combining Self-supervised Learning and Active Learning for Disfluency DetectionACM Transactions on Asian and Low-Resource Language Information Processing10.1145/348729021:3(1-25)Online publication date: 13-Dec-2021
          • (2021)Just Speak It: Minimize Cognitive Load for Eyes-Free Text Editing with a Smart Voice AssistantThe 34th Annual ACM Symposium on User Interface Software and Technology10.1145/3472749.3474795(910-921)Online publication date: 10-Oct-2021
          • (undefined)“Do this instead” – Robots that Adequately Respond to Corrected InstructionsACM Transactions on Human-Robot Interaction10.1145/3623385

          View Options

          View options

          PDF

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader

          Get Access

          Login options

          Media

          Figures

          Other

          Tables

          Share

          Share

          Share this Publication link

          Share on social media