Abstract
Evaluating only the final output of cascaded speech translation systems offers a limited understanding of the individual performance of each component in the cascade. This limitation burdens the identification and improvement of problematic components. To address this issue, we present a multi-layer evaluation suite for automatic speech translation of meetings. Our data features public-domain English, Latvian, and Lithuanian recordings, augmented with multiple annotation layers ranging from raw speech transcription to translation. We further present how to use our data sets and annotations to evaluate components involved in cascaded speech translation systems: speaker diarisation, speech segmentation, automatic speech recognition, punctuation restoration and sentence splitting, speech normalisation, and machine translation. We also demonstrate an ablation study that allows us to analyse each component’s error contribution to the overall speech translation error. We publish our data and evaluation scripts, making our evaluation suite the first of its kind for Latvian and Lithuanian languages.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Akhbardeh, F.: et al.: Findings of the 2021 conference on machine translation (WMT21). In: Proceedings of the Sixth Conference on Machine Translation, pp. 1–88. Association for Computational Linguistics (2021)
Alam, T., Khan, A., Alam, F.: Punctuation restoration using transformer models for high-and low-resource languages. In: Proceedings of the Sixth Workshop on Noisy User-Generated Text (W-NUT 2020), pp. 132–142. Association for Computational Linguistics (2020)
Ansari, E.: Findings of the IWSLT 2020 evaluation campaign. In: Proceedings of the 17th International Conference on Spoken Language Translation, pp. 1–34. Association for Computational Linguistics (2020)
Ansari, E., Bojar, O.,Haddow, B., Mahmoudi, M.: SLTEV: comprehensive evaluation of spoken language translation. In: Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: System Demonstrations, pp. 71–79. Association for Computational Linguistics (2021)
Bentivogli, L.: Cascade versus direct speech translation: Do the differences still make a difference? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, vol. 1 (Long Papers), pp. 2873–2887. Association for Computational Linguistics (2021)
Cho, E., Fünfer, S., Stüker, S., Waibel, A.: A corpus of spontaneous speech in lectures: the KIT lecture corpus for spoken language processing and translation. In: Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC’14), pp. 1554–1559, Reykjavik, Iceland. European Language Resources Association (ELRA) (2014)
Conneau, A.: Unsupervised cross-lingual representation learning at scale. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 8440–8451. Association for Computational Linguistics (2020)
Di Gangi, M. A., Cattoni, R., Bentivogli, L., Negri, M., Turchi, M.: MuST-C: a multilingual speech translation corpus. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, vol. 1 (Long and Short Papers), pp. 2012–2017, Minneapolis, Minnesota. Association for Computational Linguistics (2019)
Gaido, M., Negri, M., Cettolo, M., Turchi, M.: Beyond voice activity detection: hybrid audio segmentation for direct speech translation. In: Proceedings of the 4th International Conference on Natural Language and Speech Processing (ICNLSP 2021), pp. 55–62, Trento, Italy. Association for Computational Linguistics (2021)
Iranzo-Sánchez, J.: Europarl-ST: a multilingual corpus for speech translation of parliamentary debates. In: ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 8229–8233 (2020)
Junczys-Dowmunt, M., et al.: Marian: fast neural machine translation in c+. ACL 2018, 116 (2018)
Ma, X., Dousti, M. J., Wang, C., Gu, J., Pino, J.: SIMULEVAL: an evaluation toolkit for simultaneous translation. In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pp. 144–150. Association for Computational Linguistics (2020)
Martucci, G., Cettolo, M., Negri, M., Turchi, M.: Lexical modeling of ASR errors for robust speech translation. Proc. Interspeech 2021, 2282–2286 (2021)
Matusov, E., Leusch, G., Bender, O., Ney,H.: Evaluating machine translation output with automatic sentence segmentation. In: Proceedings of the Second International Workshop on Spoken Language Translation, Pittsburgh, Pennsylvania, USA (2005)
Matusov, E., Mauser, A., Ney, H.: Automatic sentence segmentation and punctuation prediction for spoken language translation. In: Proceedings of International Workshop on Spoken Language Translation (IWSLT 2006), pp. 158–165 (2006)
NIST: The Rich Transcription Spring 2003 Evaluation (rt-03s) Plan (2003)
Papineni, K., Roukos, S., Ward, T., Zhu, W.-J.: BLEU: a method for automatic evaluation of machine translation. In: Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, pp. 311–318 (2002)
Popović, M.: ChrF++: Words helping character n-grams. In: Proceedings of the Second Conference on Machine Translation, pp. 612–618 (2017)
Povey, D., et al.: The kaldi speech recognition toolkit. In: IEEE 2011 Workshop on Automatic Speech Recognition and Understanding, number CONF. IEEE Signal Processing Society (2011)
Rei, R., Stewart, C., Farinha, A.C., Lavie, A.: COMET: a neural framework for MT evaluation. In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 2685–2702. Association for Computational Linguistics (2020)
Snover, M., Dorr, B., Schwartz, R., Micciulla, L., Makhoul, J.: A study of translation edit rate with targeted human annotation. In: Proceedings of the 7th Conference of the Association for Machine Translation in the Americas: Technical Papers, pp. 223–231 (2006)
Sperber, M., Paulik, M.: Speech translation and the end-to-end promise: taking stock of where we are. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 7409–7421. Association for Computational Linguistics (2020)
Stüker, S., Kraft, F., Mohr, C., Herrmann, T., Cho, E., Waibel, A.: The KIT lecture corpus for speech translation. In: Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC’12), pp. 3409–3414, Istanbul, Turkey. European Language Resources Association (ELRA) (2012)
Tsiamas, I., Gállego, G.I., Fonollosa, J.A.R., Costa-jussà, M.R.: SHAS: approaching optimal segmentation for end-to-end speech translation. Proc. Interspeech 2022, 106–110 (2022)
Vaswani, A., et al.: Attention is all you need. Adv. Neural Inf. Process. Syst. 30 (2017)
Wang, C., Pino, J., Wu, A., Gu, J.: CoVoST: a diverse multilingual speech-to-text translation corpus. In: Proceedings of The 12th Language Resources and Evaluation Conference, pp. 4197–4203, Marseille, France. European Language Resources Association (2020)
Wang, C., Anne, W., Jiatao, G., Pino, J.: CoVoST 2 and massively multilingual speech translation. Proc. Interspeech 2021, 2247–2251 (2021)
Acknowledgments
This research has been supported by the ICT Competence Center (www.itkc.lv) within the project 2.2 Mākslīgais intelekts reālā laika subtitrēšanai un dublēšanai tiešraidēm (2.2 Artificial intelligence for real time subtitling and dubbing in live streams) of EU Structural funds, ID no 5.1.1.2.i.0/1/22/A/CFLA/008.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Nicmanis, D., Bergmanis, T., Salimbajevs, A., Pinnis, M. (2024). A Multi-layered Approach to Evaluating Speech Translation Performance of Meetings. In: Arai, K. (eds) Proceedings of the Future Technologies Conference (FTC) 2024, Volume 1. FTC 2024. Lecture Notes in Networks and Systems, vol 1154. Springer, Cham. https://doi.org/10.1007/978-3-031-73110-5_10
Download citation
DOI: https://doi.org/10.1007/978-3-031-73110-5_10
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-73109-9
Online ISBN: 978-3-031-73110-5
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)