research-article

Free access

Adapting translation models for transcript disfluency detection

AUTHORs:

Bo XuAuthors Info & Claims

AAAI'19/IAAI'19/EAAI'19: Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence and Thirty-First Innovative Applications of Artificial Intelligence Conference and Ninth AAAI Symposium on Educational Advances in Artificial Intelligence

Article No.: 779, Pages 6351 - 6358

https://doi.org/10.1609/aaai.v33i01.33016351

Published: 27 January 2019 Publication History

PDF eReader Publisher Site

Abstract

Transcript disfluency detection (TDD) is an important component of the real-time speech translation system, which arouses more and more interests in recent years. This paper presents our study on adapting neural machine translation (NMT) models for TDD. We propose a general training framework for adapting NMT models to TDD task rapidly. In this framework, the main structure of the model is implemented similar to the NMT model. Additionally, several extended modules and training techniques which are independent of the NMT model are proposed to improve the performance, such as the constrained decoding, denoising autoen-coder initialization and a TDD-specific training object. With the proposed training framework, we achieve significant improvement. However, it is too slow in decoding to be practical. To build a feasible and production-ready solution for TDD, we propose a fast non-autoregressive TDD model following the non-autoregressive NMT model emerged recently. Even we do not assume the specific architecture of the NMT model, we build our TDD model on the basis of Transformer, which is the state-of-the-art NMT model. We conduct extensive experiments on the publicly available set, Switchboard, and in-house Chinese set. Experimental results show that the proposed model significantly outperforms previous state-of-the-art models.

References

[1]

Bahdanau, D.; Cho, K.; and Bengio, Y. 2014. Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv: 1409.0473.

[2]

Charniak, E., and Johnson, M. 2001. Edit detection and parsing for transcribed speech. In Proceedings of the second meeting of the North American Chapter of the Association for Computational Linguistics on Language technologies, 1-9. Association for Computational Linguistics.

[3]

Cho, E.; Niehues, J.; Ha, T.-L.; and Waibel, A. 2016. Multilingual disfluency removal using nmt. iwslt2016.

[4]

Ferguson, J.; Durrett, G.; and Klein, D. 2015. Disfluency detection with a semi-markov model and prosodic features. In Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 257-262.

[5]

Georgila, K. 2009. Using integer linear programming for detecting speech disfluencies. In Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Companion Volume: Short Papers, 109-112. Association for Computational Linguistics.

[6]

Gu, J.; Lu, Z.; Li, H.; and Li, V. O. 2016. Incorporating copying mechanism in sequence-to-sequence learning. arXiv preprint arXiv: 1603.06393.

[7]

Gu, J.; Bradbury, J.; Xiong, C.; Li, V. O.; and Socher, R. 2017. Non-autoregressive neural machine translation. arXiv preprint arXiv: 1711.02281.

[8]

Honnibal, M., and Johnson, M. 2014. Joint incremental disfluency detection and dependency parsing. Transactions of the Association of Computational Linguistics 2(1): 131-142.

[9]

Hough, J., and Schlangen, D. 2015. Recurrent neural networks for incremental disfluency detection. Interspeech 2015.

[10]

Johnson, M., and Charniak, E. 2004. A tag-based noisy-channel model of speech repairs. In Proceedings of the 42nd Annual Meeting of the Association for Computational Linguistics (ACL-04).

[11]

Liu, Y.; Shriberg, E.; Stolcke, A.; Hillard, D.; Ostendorf, M.; and Harper, M. 2006. Enriching speech recognition with automatic detection of sentence boundaries and disfluencies. IEEE Transactions on audio, speech, and language processing 14(5): 1526-1540.

Digital Library

[12]

Lou, P. J., and Johnson, M. 2017. Disfluency detection using a noisy channel model and a deep neural language model. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), volume 2, 547-553.

[13]

Meteer, M. W.; Taylor, A. A.; Maclntyre, R.; and Iyer, R. 1995. Dysfluency annotation stylebook for the switchboard corpus. University of Pennsylvania.

[14]

Neubig, G.; Akita, Y.; Mori, S.; and Kawahara, T. 2012. A monotonic statistical machine translation approach to speaking style transformation. Computer Speech & Language 26(5):349-370.

Digital Library

[15]

Ostendorf, M., and Hahn, S. 2013. A sequential repetition model for improved disfluency detection. In INTER-SPEECH, 2624-2628.

[16]

Qian, X., and Liu, Y. 2013. Disfluency detection using multi-step stacked learning. In Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 820-825.

[17]

Rasooli, M. S., and Tetreault, J. 2013. Joint parsing and disfluency detection in linear time. In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, 124-129.

[18]

Shriberg, E. E. 1994. Preliminaries to a theory of speech disfluencies. Ph.D. Dissertation, Citeseer.

[19]

Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A. N.; Kaiser, Ł.; and Polosukhin, I. 2017. Attention is all you need. In Advances in Neural Information Processing Systems, 6000-6010.

Digital Library

[20]

Wang, S.; Che, W.; Zhang, Y.; Zhang, M.; and Liu, T. 2017. Transition-based disfluency detection using lstms. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, 2785-2794.

[21]

Wang, E.; Chen, W.; Yang, Z.; Dong, Q.; Xu, S.; and Xu, B. 2018. Semi-supervised disfluency detection. In Proceedings of the 27th International Conference on Computational Linguistics, 3529-3538.

[22]

Wang, S.; Che, W.; and Liu, T. 2016. A neural attention model for disfluency detection. In Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers, 278-287.

[23]

Wu, S.; Zhang, D.; Zhou, M.; and Zhao, T. 2015. Efficient disfluency detection with transition-based parsing. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), volume 1, 495-503.

[24]

Yoshikawa, M.; Shindo, H.; and Matsumoto, Y. 2016. Joint transition-based dependency parsing and disfluency detection for automatic speech recognition texts. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, 1036-1041.

[25]

Zayats, V.; Ostendorf, M.; and Hajishirzi, H. 2014. Multidomain disfluency and repair detection. In Fifteenth Annual Conference of the International Speech Communication Association.

[26]

Zayats, V.; Ostendorf, M.; and Hajishirzi, H. 2016. Disfluency detection using a bidirectional lstm. arXiv preprint arXiv: 1604.03209.

[27]

Zwarts, S., and Johnson, M. 2011. The impact of language models and loss functions on repair disfluency detection. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies-Volume 1, 703-711. Association for Computational Linguistics.

Cited By

Zhao XSun HLei YXiong D(2024)Regularizing cross-attention learning for end-to-end speech translation with ASR and MT attention matricesExpert Systems with Applications: An International Journal10.1016/j.eswa.2024.123241247:COnline publication date: 1-Aug-2024
https://dl.acm.org/doi/10.1016/j.eswa.2024.123241
Mehrotra UGarg SGurugubelli KVuppala ASahni SSaxena VIyengar S(2022)Towards improving Disfluency Detection from Speech using Shifted Delta Cepstral CoefficientsProceedings of the 2022 Fourteenth International Conference on Contemporary Computing10.1145/3549206.3549269(350-355)Online publication date: 4-Aug-2022
https://dl.acm.org/doi/10.1145/3549206.3549269
Wang SWang ZChe WZhao SLiu T(2021)Combining Self-supervised Learning and Active Learning for Disfluency DetectionACM Transactions on Asian and Low-Resource Language Information Processing10.1145/348729021:3(1-25)Online publication date: 13-Dec-2021
https://dl.acm.org/doi/10.1145/3487290
Show More Cited By

Index Terms

Adapting translation models for transcript disfluency detection
1. Applied computing
  1. Arts and humanities
    1. Language translation
2. Computing methodologies

Index terms have been assigned to the content through auto-classification.

Recommendations

Adapting translation models to translationese improves SMT
EACL '12: Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics

Translation models used for statistical machine translation are compiled from parallel corpora; such corpora are manually translated, but the direction of translation is usually unknown, and is consequently ignored. However, much research in Translation ...
Generative models of disfluency
Improving statistical machine translation by adapting translation models to translationese

Translation models used for statistical machine translation are compiled from parallel corpora that are manually translated. The common assumption is that parallel texts are symmetrical: The direction of translation is deemed irrelevant and is ...

Comments

Information & Contributors

Information

Published In

cover image Guide Proceedings

AAAI'19/IAAI'19/EAAI'19: Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence and Thirty-First Innovative Applications of Artificial Intelligence Conference and Ninth AAAI Symposium on Educational Advances in Artificial Intelligence

January 2019

10088 pages

ISBN:978-1-57735-809-1

Copyright © 2019 Association for the Advancement of Artificial Intelligence.

Sponsors

Association for the Advancement of Artificial Intelligence

Publisher

AAAI Press

Publication History

Published: 27 January 2019

Qualifiers

Research-article
Research
Refereed limited

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

5
Total Citations
View Citations
50
Total Downloads

Downloads (Last 12 months)24
Downloads (Last 6 weeks)6

Reflects downloads up to 09 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Zhao XSun HLei YXiong D(2024)Regularizing cross-attention learning for end-to-end speech translation with ASR and MT attention matricesExpert Systems with Applications: An International Journal10.1016/j.eswa.2024.123241247:COnline publication date: 1-Aug-2024
https://dl.acm.org/doi/10.1016/j.eswa.2024.123241
Mehrotra UGarg SGurugubelli KVuppala ASahni SSaxena VIyengar S(2022)Towards improving Disfluency Detection from Speech using Shifted Delta Cepstral CoefficientsProceedings of the 2022 Fourteenth International Conference on Contemporary Computing10.1145/3549206.3549269(350-355)Online publication date: 4-Aug-2022
https://dl.acm.org/doi/10.1145/3549206.3549269
Wang SWang ZChe WZhao SLiu T(2021)Combining Self-supervised Learning and Active Learning for Disfluency DetectionACM Transactions on Asian and Low-Resource Language Information Processing10.1145/348729021:3(1-25)Online publication date: 13-Dec-2021
https://dl.acm.org/doi/10.1145/3487290
Fan JXu CYu CShi Y(2021)Just Speak It: Minimize Cognitive Load for Eyes-Free Text Editing with a Smart Voice AssistantThe 34th Annual ACM Symposium on User Interface Software and Technology10.1145/3472749.3474795(910-921)Online publication date: 10-Oct-2021
https://dl.acm.org/doi/10.1145/3472749.3474795
Thierauf CThielstrom ROosterveld BBecker WScheutz M(undefined)“Do this instead” – Robots that Adequately Respond to Corrected InstructionsACM Transactions on Human-Robot Interaction10.1145/3623385
https://dl.acm.org/doi/10.1145/3623385

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents