Evaluating Machine Translation of Latin Interjections in the Digital Library of Polish and Poland-related News Pamphlets

Ogrodniczuk, Maciej; Kryńska, Katarzyna

doi:10.1007/978-3-031-21756-2_34

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13636))

Included in the following conference series:

International Conference on Asian Digital Libraries

Abstract

In The Digital Library of Polish and Poland-related Ephemeral Prints from the 16th, 17th and 18th Centuries a small fraction of items contains manually created Latin–Polish dictionaries explaining Latin fragments injected into Polish content. At the same time, rapid development of machine translation creates new opportunities for creating such dictionaries automatically. In this paper, we verify whether existing translation solutions are already capable of generating useful results in this Latin-Polish setting. We investigate two systems available for this language pair: the familiar Google neural engine and the GPT-3 model, then we test the translation of isolated and context-embedded phrases and evaluate its results with both automatic and human metrics: BLEU and White’s 5-point scale of adequacy and fluency.

The work was financed by a research grant from the Polish Ministry of Science and Higher Education under the National Programme for the Development of Humanities for the years 2019–2023 (grant 11H 18 0413 86, grant funds received: 1,797,741 PLN).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 79.99; Price excludes VAT (USA)

Softcover Book: USD 99.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Embedding Transcription and Transliteration Layers in the Digital Library of Polish and Poland-Related News Pamphlets

Fully automatic multi-language translation with a catalogue of phrases: successful employment for the Swiss avalanche bulletin

Article 14 September 2015

SMT: A Case Study of Kazakh-English Word Alignment

Notes

1.
See https://cbdu.ijp.pan.pl/.
2.
Calculated based on a sample of 18 transcribed prints containing Latin-Polish dictionaries; 1895 words out of 24506.
3.
See https://cbdu.ijp.pan.pl/id/eprint/700/ and https://cbdu.ijp.pan.pl/id/eprint/2250/.
4.
See https://cbdu.ijp.pan.pl/id/eprint/4210/ and https://cbdu.ijp.pan.pl/id/eprint/4220/.
5.
See also https://korba.edu.pl/overview?lang=en.
6.
See e.g. https://cbdu.ijp.pan.pl/id/eprint/3760/, https://cbdu.ijp.pan.pl/id/eprint/3770/ and https://cbdu.ijp.pan.pl/id/eprint/3780/ with the content available for the first one and the dictionary present only for the last one or a similar case with https://cbdu.ijp.pan.pl/id/eprint/13880/ and https://cbdu.ijp.pan.pl/id/eprint/13890/.
7.
With the fuzzy_index function from the Text::Fuzzy Perl module.
8.
https://translate.google.pl/?sl=la &tl=pl.
9.
https://translateking.com/.
10.
https://livetranslatehub.com/.
11.
https://translatiz.com/.
12.
https://translate.yandex.com/?lang=la-pl.
13.
https://www.contdict.com.
14.
https://www.latin-online-translation.com/.
15.
https://lingvanex.com/demo/.
16.
See e.g. its openly accessible “playground” https://beta.openai.com/playground.
17.
See e.g. chapter 6 of [8] for more examples and [6] for evaluation of the correlation of various metrics with human judgements.
18.
See their review e.g. in the Related Work section of [3].

References

Agirre, E., Gonzalez-Agirre, A., Lopez-Gazpio, I., Maritxalar, M., Rigau, G., Uria, L.: SemEval-2016 Task 2: interpretable semantic textual similarity. In: Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016), pp. 512–524. Association for Computational Linguistics, San Diego, California (2016). https://aclanthology.org/S16-1082
Doddington, G.: Automatic evaluation of machine translation quality using n-gram co-occurrence statistics. In: Proceedings of the Second International Conference on Human Language Technology Research, pp. 138–145. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA (2002). https://aclanthology.org/www.mt-archive.info/HLT-2002-Doddington.pdf
Freitag, M., Foster, G., Grangier, D., Ratnakar, V., Tan, Q., Macherey, W.: Experts, errors, and context: a large-scale study of human evaluation for machine translation. Trans. Assoc. Comput. Linguist. 9, 1460–1474 (2021). https://doi.org/10.1162/tacl_a_00437
Gruszczyński, W., Adamiec, D., Bronikowska, R., Wieczorek, A.: Elektroniczny Korpus Tekstów Polskich z XVII i XVIII w. - problemy teoretyczne i warsztatowe. Poradnik Językowy 777(8), 32–51 (2020). https://doi.org/10.33896/porj.2020.8.3
Gruszczyński, W., Ogrodniczuk, M.: Cyfrowa Biblioteka Druków Ulotnych Polskich i Polski dotyczących z XVI, XVII i XVIII w. w nauce i dydaktyce (Digital Library of Poland-related Old Ephemeral Prints in research and teaching. In: Polish). In: Materiały konferencji Polskie Biblioteki Cyfrowe 2010 (Proceedings of the Polish Digital Libraries 2010 conference), pp. 23–27. Poznań, Poland (2010)
Google Scholar
Kocmi, T., Federmann, C., Grundkiewicz, R., Junczys-Dowmunt, M., Matsushita, H., Menezes, A.: To ship or not to ship: an extensive evaluation of automatic metrics for machine translation. In: Proceedings of the Sixth Conference on Machine Translation, pp. 478–494. Association for Computational Linguistics (2021). https://aclanthology.org/2021.wmt-1.57
Lavie, A., Agarwal, A.: METEOR: an automatic metric for mt evaluation with high levels of correlation with human judgments. In: Proceedings of the Second Workshop on Statistical Machine Translation, pp. 228–231. Association for Computational Linguistics, Prague, Czech Republic (2007). https://aclanthology.org/W07-0734
Maučec, M.S., Donaj, G.: Machine translation and the evaluation of its quality. In: Sadollah, A., Sinha, T.S. (eds.) Recent Trends in Computational Intelligence, chap. 8. IntechOpen, Rijeka (2019). https://doi.org/10.5772/intechopen.89063
Ogrodniczuk, M., Gruszczyński, W.: Digital library of poland-related old ephemeral prints: preserving multilingual cultural heritage. In: Proceedings of the Workshop on Language Technologies for Digital Humanities and Cultural Heritage, pp. 27–33. Hissar, Bulgaria (2011). http://www.aclweb.org/anthology/W11-4105
Ogrodniczuk, M., Gruszczyński, W.: Digital library 2.0 – source of knowledge and research collaboration platform. In: Calzolari, N., et al.(eds.) Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC 2014), pp. 1649–1653. European Language Resources Association, Reykjavík, Iceland (2014). http://www.lrec-conf.org/proceedings/lrec2014/pdf/14_Paper.pdf
Papineni, K., Roukos, S., Ward, T., Zhu, W.J.: BLEU: a method for automatic evaluation of machine translation. In: Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, pp. 311–318. Association for Computational Linguistics, Philadelphia, Pennsylvania, USA (2002). https://aclanthology.org/P02-1040
Popović, M.: CHRF: character n-gram F-score for automatic MT evaluation. In: Proceedings of the Tenth Workshop on Statistical Machine Translation, pp. 392–395. Association for Computational Linguistics, Lisbon, Portugal (2015). https://aclanthology.org/W15-3049
Przepiórkowski, A., Bańko, M., Górski, R.L., Lewandowska-Tomaszczyk, B. (eds.): Narodowy Korpus Języka Polskiego. Wydawnictwo Naukowe PWN, Warsaw (2012)
Google Scholar
Snover, M., Dorr, B., Schwartz, R., Micciulla, L., Makhoul, J.: A study of translation edit rate with targeted human annotation. In: Proceedings of the 7th Conference of the Association for Machine Translation in the Americas: Technical Papers, pp. 223–231. Association for Machine Translation in the Americas, Cambridge, Massachusetts, USA (2006). https://aclanthology.org/2006.amta-papers.25
Vilar, D., Xu, J., D’Haro, L.F., Ney, H.: Error analysis of statistical machine translation output. In: Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC 2006), pp. 697–702. European Language Resources Association (ELRA), Genoa, Italy (2006). http://www.lrec-conf.org/proceedings/lrec2006/pdf/413_pdf.pdf
White, J.S., O’Connell, T.A., O’Mara, F.E.: The ARPA MT evaluation methodologies: evolution, lessons, and future approaches. In: Proceedings of the First Conference of the Association for Machine Translation in the Americas, pp. 193–205. Columbia, Maryland, USA (1994). https://aclanthology.org/1994.amta-1.25

Download references

Author information

Authors and Affiliations

Institute of Computer Science, Polish Academy of Sciences, Jana Kazimierza 5, 01-248, Warszawa, Poland
Maciej Ogrodniczuk
Institute of Polish Language, Polish Academy of Sciences, al. Mickiewicza 31, 31-120, Kraków, Poland
Katarzyna Kryńska

Authors

Maciej Ogrodniczuk
View author publications
You can also search for this author in PubMed Google Scholar
Katarzyna Kryńska
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Maciej Ogrodniczuk .

Editor information

Editors and Affiliations

National Taiwan Normal University, Taipei, Taiwan
Yuen-Hsien Tseng
Doshisha University, Kyoto, Japan
Marie Katsurai
VNU University of Engineering and Technology, Hanoi, Vietnam
Hoa N. Nguyen

Copyright information

About this paper

Cite this paper

Ogrodniczuk, M., Kryńska, K. (2022). Evaluating Machine Translation of Latin Interjections in the Digital Library of Polish and Poland-related News Pamphlets. In: Tseng, YH., Katsurai, M., Nguyen, H.N. (eds) From Born-Physical to Born-Virtual: Augmenting Intelligence in Digital Libraries. ICADL 2022. Lecture Notes in Computer Science, vol 13636. Springer, Cham. https://doi.org/10.1007/978-3-031-21756-2_34

Download citation

DOI: https://doi.org/10.1007/978-3-031-21756-2_34
Published: 07 December 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-21755-5
Online ISBN: 978-3-031-21756-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Evaluating Machine Translation of Latin Interjections in the Digital Library of Polish and Poland-related News Pamphlets

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Embedding Transcription and Transliteration Layers in the Digital Library of Polish and Poland-Related News Pamphlets

Fully automatic multi-language translation with a catalogue of phrases: successful employment for the Swiss avalanche bulletin

SMT: A Case Study of Kazakh-English Word Alignment

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Evaluating Machine Translation of Latin Interjections in the Digital Library of Polish and Poland-related News Pamphlets

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Embedding Transcription and Transliteration Layers in the Digital Library of Polish and Poland-Related News Pamphlets

Fully automatic multi-language translation with a catalogue of phrases: successful employment for the Swiss avalanche bulletin

SMT: A Case Study of Kazakh-English Word Alignment

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation