A Multi-layered Approach to Evaluating Speech Translation Performance of Meetings

Nicmanis, Dāvis; Bergmanis, Toms; Salimbajevs, Askars; Pinnis, Mārcis

doi:10.1007/978-3-031-73110-5_10

Dāvis Nicmanis¹⁰,
Toms Bergmanis^10,11,
Askars Salimbajevs^10,11 &
…
Mārcis Pinnis^10,11

Part of the book series: Lecture Notes in Networks and Systems ((LNNS,volume 1154))

Included in the following conference series:

Proceedings of the Future Technologies Conference

175 Accesses

Abstract

Evaluating only the final output of cascaded speech translation systems offers a limited understanding of the individual performance of each component in the cascade. This limitation burdens the identification and improvement of problematic components. To address this issue, we present a multi-layer evaluation suite for automatic speech translation of meetings. Our data features public-domain English, Latvian, and Lithuanian recordings, augmented with multiple annotation layers ranging from raw speech transcription to translation. We further present how to use our data sets and annotations to evaluate components involved in cascaded speech translation systems: speaker diarisation, speech segmentation, automatic speech recognition, punctuation restoration and sentence splitting, speech normalisation, and machine translation. We also demonstrate an ablation study that allows us to analyse each component’s error contribution to the overall speech translation error. We publish our data and evaluation scripts, making our evaluation suite the first of its kind for Latvian and Lithuanian languages.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 199.99; Price excludes VAT (USA)

Softcover Book: USD 249.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

A Modular Approach for Romanian-English Speech Translation

Design and Evaluation of Speech Processing Systems for Meetei/Meitei Mayek

Open source platform for Estonian speech transcription

Article Open access 16 October 2024

References

Akhbardeh, F.: et al.: Findings of the 2021 conference on machine translation (WMT21). In: Proceedings of the Sixth Conference on Machine Translation, pp. 1–88. Association for Computational Linguistics (2021)
Google Scholar
Alam, T., Khan, A., Alam, F.: Punctuation restoration using transformer models for high-and low-resource languages. In: Proceedings of the Sixth Workshop on Noisy User-Generated Text (W-NUT 2020), pp. 132–142. Association for Computational Linguistics (2020)
Google Scholar
Ansari, E.: Findings of the IWSLT 2020 evaluation campaign. In: Proceedings of the 17th International Conference on Spoken Language Translation, pp. 1–34. Association for Computational Linguistics (2020)
Google Scholar
Ansari, E., Bojar, O.,Haddow, B., Mahmoudi, M.: SLTEV: comprehensive evaluation of spoken language translation. In: Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: System Demonstrations, pp. 71–79. Association for Computational Linguistics (2021)
Google Scholar
Bentivogli, L.: Cascade versus direct speech translation: Do the differences still make a difference? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, vol. 1 (Long Papers), pp. 2873–2887. Association for Computational Linguistics (2021)
Google Scholar
Cho, E., Fünfer, S., Stüker, S., Waibel, A.: A corpus of spontaneous speech in lectures: the KIT lecture corpus for spoken language processing and translation. In: Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC’14), pp. 1554–1559, Reykjavik, Iceland. European Language Resources Association (ELRA) (2014)
Google Scholar
Conneau, A.: Unsupervised cross-lingual representation learning at scale. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 8440–8451. Association for Computational Linguistics (2020)
Google Scholar
Di Gangi, M. A., Cattoni, R., Bentivogli, L., Negri, M., Turchi, M.: MuST-C: a multilingual speech translation corpus. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, vol. 1 (Long and Short Papers), pp. 2012–2017, Minneapolis, Minnesota. Association for Computational Linguistics (2019)
Google Scholar
Gaido, M., Negri, M., Cettolo, M., Turchi, M.: Beyond voice activity detection: hybrid audio segmentation for direct speech translation. In: Proceedings of the 4th International Conference on Natural Language and Speech Processing (ICNLSP 2021), pp. 55–62, Trento, Italy. Association for Computational Linguistics (2021)
Google Scholar
Iranzo-Sánchez, J.: Europarl-ST: a multilingual corpus for speech translation of parliamentary debates. In: ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 8229–8233 (2020)
Google Scholar
Junczys-Dowmunt, M., et al.: Marian: fast neural machine translation in c+. ACL 2018, 116 (2018)
Google Scholar
Ma, X., Dousti, M. J., Wang, C., Gu, J., Pino, J.: SIMULEVAL: an evaluation toolkit for simultaneous translation. In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pp. 144–150. Association for Computational Linguistics (2020)
Google Scholar
Martucci, G., Cettolo, M., Negri, M., Turchi, M.: Lexical modeling of ASR errors for robust speech translation. Proc. Interspeech 2021, 2282–2286 (2021)
Article Google Scholar
Matusov, E., Leusch, G., Bender, O., Ney,H.: Evaluating machine translation output with automatic sentence segmentation. In: Proceedings of the Second International Workshop on Spoken Language Translation, Pittsburgh, Pennsylvania, USA (2005)
Google Scholar
Matusov, E., Mauser, A., Ney, H.: Automatic sentence segmentation and punctuation prediction for spoken language translation. In: Proceedings of International Workshop on Spoken Language Translation (IWSLT 2006), pp. 158–165 (2006)
Google Scholar
NIST: The Rich Transcription Spring 2003 Evaluation (rt-03s) Plan (2003)
Google Scholar
Papineni, K., Roukos, S., Ward, T., Zhu, W.-J.: BLEU: a method for automatic evaluation of machine translation. In: Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, pp. 311–318 (2002)
Google Scholar
Popović, M.: ChrF++: Words helping character n-grams. In: Proceedings of the Second Conference on Machine Translation, pp. 612–618 (2017)
Google Scholar
Povey, D., et al.: The kaldi speech recognition toolkit. In: IEEE 2011 Workshop on Automatic Speech Recognition and Understanding, number CONF. IEEE Signal Processing Society (2011)
Google Scholar
Rei, R., Stewart, C., Farinha, A.C., Lavie, A.: COMET: a neural framework for MT evaluation. In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 2685–2702. Association for Computational Linguistics (2020)
Google Scholar
Snover, M., Dorr, B., Schwartz, R., Micciulla, L., Makhoul, J.: A study of translation edit rate with targeted human annotation. In: Proceedings of the 7th Conference of the Association for Machine Translation in the Americas: Technical Papers, pp. 223–231 (2006)
Google Scholar
Sperber, M., Paulik, M.: Speech translation and the end-to-end promise: taking stock of where we are. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 7409–7421. Association for Computational Linguistics (2020)
Google Scholar
Stüker, S., Kraft, F., Mohr, C., Herrmann, T., Cho, E., Waibel, A.: The KIT lecture corpus for speech translation. In: Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC’12), pp. 3409–3414, Istanbul, Turkey. European Language Resources Association (ELRA) (2012)
Google Scholar
Tsiamas, I., Gállego, G.I., Fonollosa, J.A.R., Costa-jussà, M.R.: SHAS: approaching optimal segmentation for end-to-end speech translation. Proc. Interspeech 2022, 106–110 (2022)
Google Scholar
Vaswani, A., et al.: Attention is all you need. Adv. Neural Inf. Process. Syst. 30 (2017)
Google Scholar
Wang, C., Pino, J., Wu, A., Gu, J.: CoVoST: a diverse multilingual speech-to-text translation corpus. In: Proceedings of The 12th Language Resources and Evaluation Conference, pp. 4197–4203, Marseille, France. European Language Resources Association (2020)
Google Scholar
Wang, C., Anne, W., Jiatao, G., Pino, J.: CoVoST 2 and massively multilingual speech translation. Proc. Interspeech 2021, 2247–2251 (2021)
Article Google Scholar

Download references

Acknowledgments

This research has been supported by the ICT Competence Center (www.itkc.lv) within the project 2.2 Mākslīgais intelekts reālā laika subtitrēšanai un dublēšanai tiešraidēm (2.2 Artificial intelligence for real time subtitling and dubbing in live streams) of EU Structural funds, ID no 5.1.1.2.i.0/1/22/A/CFLA/008.

Author information

Authors and Affiliations

Tilde SIA, Vienibas ave. 75a, Riga, 1004, Latvia
Dāvis Nicmanis, Toms Bergmanis, Askars Salimbajevs & Mārcis Pinnis
University of Latvia, Raina blvd. 19, Riga, 1050, Latvia
Toms Bergmanis, Askars Salimbajevs & Mārcis Pinnis

Authors

Dāvis Nicmanis
View author publications
You can also search for this author in PubMed Google Scholar
Toms Bergmanis
View author publications
You can also search for this author in PubMed Google Scholar
Askars Salimbajevs
View author publications
You can also search for this author in PubMed Google Scholar
Mārcis Pinnis
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Dāvis Nicmanis .

Editor information

Editors and Affiliations

Saga University, Saga, Japan
Kohei Arai

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Nicmanis, D., Bergmanis, T., Salimbajevs, A., Pinnis, M. (2024). A Multi-layered Approach to Evaluating Speech Translation Performance of Meetings. In: Arai, K. (eds) Proceedings of the Future Technologies Conference (FTC) 2024, Volume 1. FTC 2024. Lecture Notes in Networks and Systems, vol 1154. Springer, Cham. https://doi.org/10.1007/978-3-031-73110-5_10

Download citation

DOI: https://doi.org/10.1007/978-3-031-73110-5_10
Published: 05 November 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-73109-9
Online ISBN: 978-3-031-73110-5
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics

A Multi-layered Approach to Evaluating Speech Translation Performance of Meetings

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

A Modular Approach for Romanian-English Speech Translation

Design and Evaluation of Speech Processing Systems for Meetei/Meitei Mayek

Open source platform for Estonian speech transcription

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

A Multi-layered Approach to Evaluating Speech Translation Performance of Meetings

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

A Modular Approach for Romanian-English Speech Translation

Design and Evaluation of Speech Processing Systems for Meetei/Meitei Mayek

Open source platform for Estonian speech transcription

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation