Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Evaluating Machine Translation of Latin Interjections in the Digital Library of Polish and Poland-related News Pamphlets

  • Conference paper
  • First Online:
From Born-Physical to Born-Virtual: Augmenting Intelligence in Digital Libraries (ICADL 2022)

Abstract

In The Digital Library of Polish and Poland-related Ephemeral Prints from the 16th, 17th and 18th Centuries a small fraction of items contains manually created Latin–Polish dictionaries explaining Latin fragments injected into Polish content. At the same time, rapid development of machine translation creates new opportunities for creating such dictionaries automatically. In this paper, we verify whether existing translation solutions are already capable of generating useful results in this Latin-Polish setting. We investigate two systems available for this language pair: the familiar Google neural engine and the GPT-3 model, then we test the translation of isolated and context-embedded phrases and evaluate its results with both automatic and human metrics: BLEU and White’s 5-point scale of adequacy and fluency.

The work was financed by a research grant from the Polish Ministry of Science and Higher Education under the National Programme for the Development of Humanities for the years 2019–2023 (grant 11H 18 0413 86, grant funds received: 1,797,741 PLN).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 79.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 99.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    See https://cbdu.ijp.pan.pl/.

  2. 2.

    Calculated based on a sample of 18 transcribed prints containing Latin-Polish dictionaries; 1895 words out of 24506.

  3. 3.

    See https://cbdu.ijp.pan.pl/id/eprint/700/ and https://cbdu.ijp.pan.pl/id/eprint/2250/.

  4. 4.

    See https://cbdu.ijp.pan.pl/id/eprint/4210/ and https://cbdu.ijp.pan.pl/id/eprint/4220/.

  5. 5.

    See also https://korba.edu.pl/overview?lang=en.

  6. 6.

    See e.g. https://cbdu.ijp.pan.pl/id/eprint/3760/, https://cbdu.ijp.pan.pl/id/eprint/3770/ and https://cbdu.ijp.pan.pl/id/eprint/3780/ with the content available for the first one and the dictionary present only for the last one or a similar case with https://cbdu.ijp.pan.pl/id/eprint/13880/ and https://cbdu.ijp.pan.pl/id/eprint/13890/.

  7. 7.

    With the fuzzy_index function from the Text::Fuzzy Perl module.

  8. 8.

    https://translate.google.pl/?sl=la &tl=pl.

  9. 9.

    https://translateking.com/.

  10. 10.

    https://livetranslatehub.com/.

  11. 11.

    https://translatiz.com/.

  12. 12.

    https://translate.yandex.com/?lang=la-pl.

  13. 13.

    https://www.contdict.com.

  14. 14.

    https://www.latin-online-translation.com/.

  15. 15.

    https://lingvanex.com/demo/.

  16. 16.

    See e.g. its openly accessible “playground” https://beta.openai.com/playground.

  17. 17.

    See e.g. chapter 6 of [8] for more examples and [6] for evaluation of the correlation of various metrics with human judgements.

  18. 18.

    See their review e.g. in the Related Work section of [3].

References

  1. Agirre, E., Gonzalez-Agirre, A., Lopez-Gazpio, I., Maritxalar, M., Rigau, G., Uria, L.: SemEval-2016 Task 2: interpretable semantic textual similarity. In: Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016), pp. 512–524. Association for Computational Linguistics, San Diego, California (2016). https://aclanthology.org/S16-1082

  2. Doddington, G.: Automatic evaluation of machine translation quality using n-gram co-occurrence statistics. In: Proceedings of the Second International Conference on Human Language Technology Research, pp. 138–145. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA (2002). https://aclanthology.org/www.mt-archive.info/HLT-2002-Doddington.pdf

  3. Freitag, M., Foster, G., Grangier, D., Ratnakar, V., Tan, Q., Macherey, W.: Experts, errors, and context: a large-scale study of human evaluation for machine translation. Trans. Assoc. Comput. Linguist. 9, 1460–1474 (2021). https://doi.org/10.1162/tacl_a_00437

  4. Gruszczyński, W., Adamiec, D., Bronikowska, R., Wieczorek, A.: Elektroniczny Korpus Tekstów Polskich z XVII i XVIII w. - problemy teoretyczne i warsztatowe. Poradnik Językowy 777(8), 32–51 (2020). https://doi.org/10.33896/porj.2020.8.3

  5. Gruszczyński, W., Ogrodniczuk, M.: Cyfrowa Biblioteka Druków Ulotnych Polskich i Polski dotyczących z XVI, XVII i XVIII w. w nauce i dydaktyce (Digital Library of Poland-related Old Ephemeral Prints in research and teaching. In: Polish). In: Materiały konferencji Polskie Biblioteki Cyfrowe 2010 (Proceedings of the Polish Digital Libraries 2010 conference), pp. 23–27. Poznań, Poland (2010)

    Google Scholar 

  6. Kocmi, T., Federmann, C., Grundkiewicz, R., Junczys-Dowmunt, M., Matsushita, H., Menezes, A.: To ship or not to ship: an extensive evaluation of automatic metrics for machine translation. In: Proceedings of the Sixth Conference on Machine Translation, pp. 478–494. Association for Computational Linguistics (2021). https://aclanthology.org/2021.wmt-1.57

  7. Lavie, A., Agarwal, A.: METEOR: an automatic metric for mt evaluation with high levels of correlation with human judgments. In: Proceedings of the Second Workshop on Statistical Machine Translation, pp. 228–231. Association for Computational Linguistics, Prague, Czech Republic (2007). https://aclanthology.org/W07-0734

  8. Maučec, M.S., Donaj, G.: Machine translation and the evaluation of its quality. In: Sadollah, A., Sinha, T.S. (eds.) Recent Trends in Computational Intelligence, chap. 8. IntechOpen, Rijeka (2019). https://doi.org/10.5772/intechopen.89063

  9. Ogrodniczuk, M., Gruszczyński, W.: Digital library of poland-related old ephemeral prints: preserving multilingual cultural heritage. In: Proceedings of the Workshop on Language Technologies for Digital Humanities and Cultural Heritage, pp. 27–33. Hissar, Bulgaria (2011). http://www.aclweb.org/anthology/W11-4105

  10. Ogrodniczuk, M., Gruszczyński, W.: Digital library 2.0 – source of knowledge and research collaboration platform. In: Calzolari, N., et al.(eds.) Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC 2014), pp. 1649–1653. European Language Resources Association, Reykjavík, Iceland (2014). http://www.lrec-conf.org/proceedings/lrec2014/pdf/14_Paper.pdf

  11. Papineni, K., Roukos, S., Ward, T., Zhu, W.J.: BLEU: a method for automatic evaluation of machine translation. In: Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, pp. 311–318. Association for Computational Linguistics, Philadelphia, Pennsylvania, USA (2002). https://aclanthology.org/P02-1040

  12. Popović, M.: CHRF: character n-gram F-score for automatic MT evaluation. In: Proceedings of the Tenth Workshop on Statistical Machine Translation, pp. 392–395. Association for Computational Linguistics, Lisbon, Portugal (2015). https://aclanthology.org/W15-3049

  13. Przepiórkowski, A., Bańko, M., Górski, R.L., Lewandowska-Tomaszczyk, B. (eds.): Narodowy Korpus Języka Polskiego. Wydawnictwo Naukowe PWN, Warsaw (2012)

    Google Scholar 

  14. Snover, M., Dorr, B., Schwartz, R., Micciulla, L., Makhoul, J.: A study of translation edit rate with targeted human annotation. In: Proceedings of the 7th Conference of the Association for Machine Translation in the Americas: Technical Papers, pp. 223–231. Association for Machine Translation in the Americas, Cambridge, Massachusetts, USA (2006). https://aclanthology.org/2006.amta-papers.25

  15. Vilar, D., Xu, J., D’Haro, L.F., Ney, H.: Error analysis of statistical machine translation output. In: Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC 2006), pp. 697–702. European Language Resources Association (ELRA), Genoa, Italy (2006). http://www.lrec-conf.org/proceedings/lrec2006/pdf/413_pdf.pdf

  16. White, J.S., O’Connell, T.A., O’Mara, F.E.: The ARPA MT evaluation methodologies: evolution, lessons, and future approaches. In: Proceedings of the First Conference of the Association for Machine Translation in the Americas, pp. 193–205. Columbia, Maryland, USA (1994). https://aclanthology.org/1994.amta-1.25

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Maciej Ogrodniczuk .

Editor information

Editors and Affiliations

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Ogrodniczuk, M., Kryńska, K. (2022). Evaluating Machine Translation of Latin Interjections in the Digital Library of Polish and Poland-related News Pamphlets. In: Tseng, YH., Katsurai, M., Nguyen, H.N. (eds) From Born-Physical to Born-Virtual: Augmenting Intelligence in Digital Libraries. ICADL 2022. Lecture Notes in Computer Science, vol 13636. Springer, Cham. https://doi.org/10.1007/978-3-031-21756-2_34

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-21756-2_34

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-21755-5

  • Online ISBN: 978-3-031-21756-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics