Abstract
In The Digital Library of Polish and Poland-related Ephemeral Prints from the 16th, 17th and 18th Centuries a small fraction of items contains manually created Latin–Polish dictionaries explaining Latin fragments injected into Polish content. At the same time, rapid development of machine translation creates new opportunities for creating such dictionaries automatically. In this paper, we verify whether existing translation solutions are already capable of generating useful results in this Latin-Polish setting. We investigate two systems available for this language pair: the familiar Google neural engine and the GPT-3 model, then we test the translation of isolated and context-embedded phrases and evaluate its results with both automatic and human metrics: BLEU and White’s 5-point scale of adequacy and fluency.
The work was financed by a research grant from the Polish Ministry of Science and Higher Education under the National Programme for the Development of Humanities for the years 2019–2023 (grant 11H 18 0413 86, grant funds received: 1,797,741 PLN).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
- 2.
Calculated based on a sample of 18 transcribed prints containing Latin-Polish dictionaries; 1895 words out of 24506.
- 3.
- 4.
- 5.
See also https://korba.edu.pl/overview?lang=en.
- 6.
See e.g. https://cbdu.ijp.pan.pl/id/eprint/3760/, https://cbdu.ijp.pan.pl/id/eprint/3770/ and https://cbdu.ijp.pan.pl/id/eprint/3780/ with the content available for the first one and the dictionary present only for the last one or a similar case with https://cbdu.ijp.pan.pl/id/eprint/13880/ and https://cbdu.ijp.pan.pl/id/eprint/13890/.
- 7.
With the fuzzy_index function from the Text::Fuzzy Perl module.
- 8.
- 9.
- 10.
- 11.
- 12.
- 13.
- 14.
- 15.
- 16.
See e.g. its openly accessible “playground” https://beta.openai.com/playground.
- 17.
- 18.
See their review e.g. in the Related Work section of [3].
References
Agirre, E., Gonzalez-Agirre, A., Lopez-Gazpio, I., Maritxalar, M., Rigau, G., Uria, L.: SemEval-2016 Task 2: interpretable semantic textual similarity. In: Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016), pp. 512–524. Association for Computational Linguistics, San Diego, California (2016). https://aclanthology.org/S16-1082
Doddington, G.: Automatic evaluation of machine translation quality using n-gram co-occurrence statistics. In: Proceedings of the Second International Conference on Human Language Technology Research, pp. 138–145. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA (2002). https://aclanthology.org/www.mt-archive.info/HLT-2002-Doddington.pdf
Freitag, M., Foster, G., Grangier, D., Ratnakar, V., Tan, Q., Macherey, W.: Experts, errors, and context: a large-scale study of human evaluation for machine translation. Trans. Assoc. Comput. Linguist. 9, 1460–1474 (2021). https://doi.org/10.1162/tacl_a_00437
Gruszczyński, W., Adamiec, D., Bronikowska, R., Wieczorek, A.: Elektroniczny Korpus Tekstów Polskich z XVII i XVIII w. - problemy teoretyczne i warsztatowe. Poradnik Językowy 777(8), 32–51 (2020). https://doi.org/10.33896/porj.2020.8.3
Gruszczyński, W., Ogrodniczuk, M.: Cyfrowa Biblioteka Druków Ulotnych Polskich i Polski dotyczących z XVI, XVII i XVIII w. w nauce i dydaktyce (Digital Library of Poland-related Old Ephemeral Prints in research and teaching. In: Polish). In: Materiały konferencji Polskie Biblioteki Cyfrowe 2010 (Proceedings of the Polish Digital Libraries 2010 conference), pp. 23–27. Poznań, Poland (2010)
Kocmi, T., Federmann, C., Grundkiewicz, R., Junczys-Dowmunt, M., Matsushita, H., Menezes, A.: To ship or not to ship: an extensive evaluation of automatic metrics for machine translation. In: Proceedings of the Sixth Conference on Machine Translation, pp. 478–494. Association for Computational Linguistics (2021). https://aclanthology.org/2021.wmt-1.57
Lavie, A., Agarwal, A.: METEOR: an automatic metric for mt evaluation with high levels of correlation with human judgments. In: Proceedings of the Second Workshop on Statistical Machine Translation, pp. 228–231. Association for Computational Linguistics, Prague, Czech Republic (2007). https://aclanthology.org/W07-0734
Maučec, M.S., Donaj, G.: Machine translation and the evaluation of its quality. In: Sadollah, A., Sinha, T.S. (eds.) Recent Trends in Computational Intelligence, chap. 8. IntechOpen, Rijeka (2019). https://doi.org/10.5772/intechopen.89063
Ogrodniczuk, M., Gruszczyński, W.: Digital library of poland-related old ephemeral prints: preserving multilingual cultural heritage. In: Proceedings of the Workshop on Language Technologies for Digital Humanities and Cultural Heritage, pp. 27–33. Hissar, Bulgaria (2011). http://www.aclweb.org/anthology/W11-4105
Ogrodniczuk, M., Gruszczyński, W.: Digital library 2.0 – source of knowledge and research collaboration platform. In: Calzolari, N., et al.(eds.) Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC 2014), pp. 1649–1653. European Language Resources Association, Reykjavík, Iceland (2014). http://www.lrec-conf.org/proceedings/lrec2014/pdf/14_Paper.pdf
Papineni, K., Roukos, S., Ward, T., Zhu, W.J.: BLEU: a method for automatic evaluation of machine translation. In: Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, pp. 311–318. Association for Computational Linguistics, Philadelphia, Pennsylvania, USA (2002). https://aclanthology.org/P02-1040
Popović, M.: CHRF: character n-gram F-score for automatic MT evaluation. In: Proceedings of the Tenth Workshop on Statistical Machine Translation, pp. 392–395. Association for Computational Linguistics, Lisbon, Portugal (2015). https://aclanthology.org/W15-3049
Przepiórkowski, A., Bańko, M., Górski, R.L., Lewandowska-Tomaszczyk, B. (eds.): Narodowy Korpus Języka Polskiego. Wydawnictwo Naukowe PWN, Warsaw (2012)
Snover, M., Dorr, B., Schwartz, R., Micciulla, L., Makhoul, J.: A study of translation edit rate with targeted human annotation. In: Proceedings of the 7th Conference of the Association for Machine Translation in the Americas: Technical Papers, pp. 223–231. Association for Machine Translation in the Americas, Cambridge, Massachusetts, USA (2006). https://aclanthology.org/2006.amta-papers.25
Vilar, D., Xu, J., D’Haro, L.F., Ney, H.: Error analysis of statistical machine translation output. In: Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC 2006), pp. 697–702. European Language Resources Association (ELRA), Genoa, Italy (2006). http://www.lrec-conf.org/proceedings/lrec2006/pdf/413_pdf.pdf
White, J.S., O’Connell, T.A., O’Mara, F.E.: The ARPA MT evaluation methodologies: evolution, lessons, and future approaches. In: Proceedings of the First Conference of the Association for Machine Translation in the Americas, pp. 193–205. Columbia, Maryland, USA (1994). https://aclanthology.org/1994.amta-1.25
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Ogrodniczuk, M., Kryńska, K. (2022). Evaluating Machine Translation of Latin Interjections in the Digital Library of Polish and Poland-related News Pamphlets. In: Tseng, YH., Katsurai, M., Nguyen, H.N. (eds) From Born-Physical to Born-Virtual: Augmenting Intelligence in Digital Libraries. ICADL 2022. Lecture Notes in Computer Science, vol 13636. Springer, Cham. https://doi.org/10.1007/978-3-031-21756-2_34
Download citation
DOI: https://doi.org/10.1007/978-3-031-21756-2_34
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-21755-5
Online ISBN: 978-3-031-21756-2
eBook Packages: Computer ScienceComputer Science (R0)