Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3632754.3633076acmotherconferencesArticle/Chapter ViewAbstractPublication PagesfireConference Proceedingsconference-collections
extended-abstract

CIRAL at FIRE 2023: Cross-Lingual Information Retrieval for African Languages

Published: 12 February 2024 Publication History

Abstract

This paper provides a short overview of the CIRAL track at the Forum for Information Retrieval Evaluation (FIRE) 2023. CIRAL focused on cross-lingual information retrieval (CLIR) between English and four African languages which include Hausa, Somali, Swahili, and Yoruba. In a bid to promote CLIR research for African languages and curate a test collection for the languages, community evaluations were carried out via pooling. We briefly discuss details of the task, dataset, relevance assessment and results from the track in this paper.

References

[1]
David Ifeoluwa Adelani, Marek Masiak, Israel Abebe Azime, Jesujoba Oluwadara Alabi, Atnafu Lambebo Tonja, Christine Mwase, Odunayo Ogundepo, Bonaventure FP Dossou, Akintunde Oladipo, Doreen Nixdorf, 2023. MasakhaNews: News Topic Classification for African languages. arXiv preprint arXiv:2304.09972 (2023).
[2]
Nick Craswell, Bhaskar Mitra, Emine Yilmaz, Daniel Campos, and Ellen M Voorhees. 2020. Overview of the TREC 2019 Deep Learning track. arXiv preprint arXiv:2003.07820 (2020).
[3]
Noriko Kando, Kazuko Kuriyama, Toshihiko Nozue, Koji Eguchi, Hiroyuki Kato, and Souichiro Hidaka. 1999. Overview of IR tasks at the First NTCIR Workshop. In Proceedings of the first NTCIR workshop on research in Japanese text retrieval and term recognition. 11–44.
[4]
Dawn Lawrie, Sean MacAvaney, James Mayfield, Paul McNamee, Douglas W Oard, Luca Soldaini, and Eugene Yang. 2023. Overview of the TREC 2022 NeuCLIR Track. arXiv preprint arXiv:2304.12367 (2023).
[5]
Prasenjit Majumder, Mandar Mitra, Dipasree Pal, Ayan Bandyopadhyay, Samaresh Maiti, Sukomal Pal, Deboshree Modak, and Sucharita Sanyal. 2010. The FIRE 2008 evaluation exercise. In Proceedings of the ACM Transactions on Asian Language Information Processing (TALIP) 9, 3 (2010), 1–24.
[6]
Suraj Nair, Petra Galuscakova, and Douglas W Oard. 2020. Combining contextualized and non-contextualized query translations to improve CLIR. In Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval. 1581–1584.
[7]
Odunayo Ogundepo, Xinyu Zhang, Shuo Sun, Kevin Duh, and Jimmy Lin. 2022. AfriCLIRMatrix: Enabling Cross-Lingual Information Retrieval for African Languages. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Abu Dhabi, United Arab Emirates, 8721–8728.
[8]
Carl Rubino. 2016. Machine Translation for English Retrieval of Information in Any Language (Machine translation for English-based domain-appropriate triage of information in any language). In Conferences of the Association for Machine Translation in the Americas: MT Users’ Track. The Association for Machine Translation in the Americas, Austin, TX, USA, 322–354.
[9]
Peter Schäuble and Páraic Sheridan. 1998. Cross-language information retrieval (CLIR) track overview. In Proceedings of the Sixth Text Retrieval Conference (TREC6) (1998), 31–44.
[10]
Shuo Sun and Kevin Duh. 2020. CLIRMatrix: A massively large collection of bilingual and multilingual datasets for Cross-Lingual Information Retrieval. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). 4160–4170.
[11]
Manveer Singh Tamber, Ronak Pradeep, and Jimmy Lin. 2023. Pre-processing Matters! Improved Wikipedia Corpora for Open-Domain Question Answering. In Proceedings of the 45th European Conference on Information Retrieval, ECIR 2023, Dublin, Ireland, April 2–6, 2023, Proceedings, Part III. Springer, 163–176.
[12]
Mahsa Yarmohammadi, Xutai Ma, Sorami Hisamoto, Muhammad Rahman, Yiming Wang, Hainan Xu, Daniel Povey, Philipp Koehn, and Kevin Duh. 2019. Robust document representations for cross-lingual information retrieval in low-resource settings. In Proceedings of Machine Translation Summit XVII: Research Track. 12–20.
[13]
Xinyu Zhang, Kelechi Ogueji, Xueguang Ma, and Jimmy Lin. 2023. Toward Best Practices for Training Multilingual Dense Retrieval Models. ACM Transactions on Information Systems 42, 2 (2023), 1–33.
[14]
Xinyu Zhang, Nandan Thakur, Odunayo Ogundepo, Ehsan Kamalloo, David Alfonso-Hermelo, Xiaoguang Li, Qun Liu, Mehdi Rezagholizadeh, and Jimmy Lin. 2022. Making A MIRACL: Multilingual information retrieval across a continuum of languages. arXiv preprint arXiv:2210.09984 (2022).
[15]
Lingjun Zhao, Rabih Zbib, Zhuolin Jiang, Damianos Karakos, and Zhongqiang Huang. 2019. Weakly supervised attentional model for low resource ad-hoc cross-lingual information retrieval. In Proceedings of the 2nd Workshop on Deep Learning Approaches for Low-Resource NLP (DeepLo 2019). 259–264.

Cited By

View all
  • (2024)Shaping the Future of Endangered and Low-Resource Languages---Our Role in the Age of LLMs: A Keynote at ECIR 2024ACM SIGIR Forum10.1145/3687273.368728058:1(1-13)Online publication date: 7-Aug-2024

Index Terms

  1. CIRAL at FIRE 2023: Cross-Lingual Information Retrieval for African Languages

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Other conferences
    FIRE '23: Proceedings of the 15th Annual Meeting of the Forum for Information Retrieval Evaluation
    December 2023
    170 pages
    Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 12 February 2024

    Check for updates

    Qualifiers

    • Extended-abstract
    • Research
    • Refereed limited

    Funding Sources

    Conference

    FIRE 2023

    Acceptance Rates

    Overall Acceptance Rate 19 of 64 submissions, 30%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)56
    • Downloads (Last 6 weeks)4
    Reflects downloads up to 20 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Shaping the Future of Endangered and Low-Resource Languages---Our Role in the Age of LLMs: A Keynote at ECIR 2024ACM SIGIR Forum10.1145/3687273.368728058:1(1-13)Online publication date: 7-Aug-2024

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format.

    HTML Format

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media