Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

A Multi-layered Approach to Evaluating Speech Translation Performance of Meetings

  • Conference paper
  • First Online:
Proceedings of the Future Technologies Conference (FTC) 2024, Volume 1 (FTC 2024)

Part of the book series: Lecture Notes in Networks and Systems ((LNNS,volume 1154))

Included in the following conference series:

  • 175 Accesses

Abstract

Evaluating only the final output of cascaded speech translation systems offers a limited understanding of the individual performance of each component in the cascade. This limitation burdens the identification and improvement of problematic components. To address this issue, we present a multi-layer evaluation suite for automatic speech translation of meetings. Our data features public-domain English, Latvian, and Lithuanian recordings, augmented with multiple annotation layers ranging from raw speech transcription to translation. We further present how to use our data sets and annotations to evaluate components involved in cascaded speech translation systems: speaker diarisation, speech segmentation, automatic speech recognition, punctuation restoration and sentence splitting, speech normalisation, and machine translation. We also demonstrate an ablation study that allows us to analyse each component’s error contribution to the overall speech translation error. We publish our data and evaluation scripts, making our evaluation suite the first of its kind for Latvian and Lithuanian languages.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 199.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 249.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Akhbardeh, F.: et al.: Findings of the 2021 conference on machine translation (WMT21). In: Proceedings of the Sixth Conference on Machine Translation, pp. 1–88. Association for Computational Linguistics (2021)

    Google Scholar 

  2. Alam, T., Khan, A., Alam, F.: Punctuation restoration using transformer models for high-and low-resource languages. In: Proceedings of the Sixth Workshop on Noisy User-Generated Text (W-NUT 2020), pp. 132–142. Association for Computational Linguistics (2020)

    Google Scholar 

  3. Ansari, E.: Findings of the IWSLT 2020 evaluation campaign. In: Proceedings of the 17th International Conference on Spoken Language Translation, pp. 1–34. Association for Computational Linguistics (2020)

    Google Scholar 

  4. Ansari, E., Bojar, O.,Haddow, B., Mahmoudi, M.: SLTEV: comprehensive evaluation of spoken language translation. In: Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: System Demonstrations, pp. 71–79. Association for Computational Linguistics (2021)

    Google Scholar 

  5. Bentivogli, L.: Cascade versus direct speech translation: Do the differences still make a difference? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, vol. 1 (Long Papers), pp. 2873–2887. Association for Computational Linguistics (2021)

    Google Scholar 

  6. Cho, E., Fünfer, S., Stüker, S., Waibel, A.: A corpus of spontaneous speech in lectures: the KIT lecture corpus for spoken language processing and translation. In: Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC’14), pp. 1554–1559, Reykjavik, Iceland. European Language Resources Association (ELRA) (2014)

    Google Scholar 

  7. Conneau, A.: Unsupervised cross-lingual representation learning at scale. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 8440–8451. Association for Computational Linguistics (2020)

    Google Scholar 

  8. Di Gangi, M. A., Cattoni, R., Bentivogli, L., Negri, M., Turchi, M.: MuST-C: a multilingual speech translation corpus. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, vol. 1 (Long and Short Papers), pp. 2012–2017, Minneapolis, Minnesota. Association for Computational Linguistics (2019)

    Google Scholar 

  9. Gaido, M., Negri, M., Cettolo, M., Turchi, M.: Beyond voice activity detection: hybrid audio segmentation for direct speech translation. In: Proceedings of the 4th International Conference on Natural Language and Speech Processing (ICNLSP 2021), pp. 55–62, Trento, Italy. Association for Computational Linguistics (2021)

    Google Scholar 

  10. Iranzo-Sánchez, J.: Europarl-ST: a multilingual corpus for speech translation of parliamentary debates. In: ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 8229–8233 (2020)

    Google Scholar 

  11. Junczys-Dowmunt, M., et al.: Marian: fast neural machine translation in c+. ACL 2018, 116 (2018)

    Google Scholar 

  12. Ma, X., Dousti, M. J., Wang, C., Gu, J., Pino, J.: SIMULEVAL: an evaluation toolkit for simultaneous translation. In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pp. 144–150. Association for Computational Linguistics (2020)

    Google Scholar 

  13. Martucci, G., Cettolo, M., Negri, M., Turchi, M.: Lexical modeling of ASR errors for robust speech translation. Proc. Interspeech 2021, 2282–2286 (2021)

    Article  Google Scholar 

  14. Matusov, E., Leusch, G., Bender, O., Ney,H.: Evaluating machine translation output with automatic sentence segmentation. In: Proceedings of the Second International Workshop on Spoken Language Translation, Pittsburgh, Pennsylvania, USA (2005)

    Google Scholar 

  15. Matusov, E., Mauser, A., Ney, H.: Automatic sentence segmentation and punctuation prediction for spoken language translation. In: Proceedings of International Workshop on Spoken Language Translation (IWSLT 2006), pp. 158–165 (2006)

    Google Scholar 

  16. NIST: The Rich Transcription Spring 2003 Evaluation (rt-03s) Plan (2003)

    Google Scholar 

  17. Papineni, K., Roukos, S., Ward, T., Zhu, W.-J.: BLEU: a method for automatic evaluation of machine translation. In: Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, pp. 311–318 (2002)

    Google Scholar 

  18. Popović, M.: ChrF++: Words helping character n-grams. In: Proceedings of the Second Conference on Machine Translation, pp. 612–618 (2017)

    Google Scholar 

  19. Povey, D., et al.: The kaldi speech recognition toolkit. In: IEEE 2011 Workshop on Automatic Speech Recognition and Understanding, number CONF. IEEE Signal Processing Society (2011)

    Google Scholar 

  20. Rei, R., Stewart, C., Farinha, A.C., Lavie, A.: COMET: a neural framework for MT evaluation. In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 2685–2702. Association for Computational Linguistics (2020)

    Google Scholar 

  21. Snover, M., Dorr, B., Schwartz, R., Micciulla, L., Makhoul, J.: A study of translation edit rate with targeted human annotation. In: Proceedings of the 7th Conference of the Association for Machine Translation in the Americas: Technical Papers, pp. 223–231 (2006)

    Google Scholar 

  22. Sperber, M., Paulik, M.: Speech translation and the end-to-end promise: taking stock of where we are. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 7409–7421. Association for Computational Linguistics (2020)

    Google Scholar 

  23. Stüker, S., Kraft, F., Mohr, C., Herrmann, T., Cho, E., Waibel, A.: The KIT lecture corpus for speech translation. In: Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC’12), pp. 3409–3414, Istanbul, Turkey. European Language Resources Association (ELRA) (2012)

    Google Scholar 

  24. Tsiamas, I., Gállego, G.I., Fonollosa, J.A.R., Costa-jussà, M.R.: SHAS: approaching optimal segmentation for end-to-end speech translation. Proc. Interspeech 2022, 106–110 (2022)

    Google Scholar 

  25. Vaswani, A., et al.: Attention is all you need. Adv. Neural Inf. Process. Syst. 30 (2017)

    Google Scholar 

  26. Wang, C., Pino, J., Wu, A., Gu, J.: CoVoST: a diverse multilingual speech-to-text translation corpus. In: Proceedings of The 12th Language Resources and Evaluation Conference, pp. 4197–4203, Marseille, France. European Language Resources Association (2020)

    Google Scholar 

  27. Wang, C., Anne, W., Jiatao, G., Pino, J.: CoVoST 2 and massively multilingual speech translation. Proc. Interspeech 2021, 2247–2251 (2021)

    Article  Google Scholar 

Download references

Acknowledgments

This research has been supported by the ICT Competence Center (www.itkc.lv) within the project 2.2 Mākslīgais intelekts reālā laika subtitrēšanai un dublēšanai tiešraidēm (2.2 Artificial intelligence for real time subtitling and dubbing in live streams) of EU Structural funds, ID no 5.1.1.2.i.0/1/22/A/CFLA/008.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Dāvis Nicmanis .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Nicmanis, D., Bergmanis, T., Salimbajevs, A., Pinnis, M. (2024). A Multi-layered Approach to Evaluating Speech Translation Performance of Meetings. In: Arai, K. (eds) Proceedings of the Future Technologies Conference (FTC) 2024, Volume 1. FTC 2024. Lecture Notes in Networks and Systems, vol 1154. Springer, Cham. https://doi.org/10.1007/978-3-031-73110-5_10

Download citation

Publish with us

Policies and ethics