Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3626772.3657943acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
short-paper
Open access

Language Fairness in Multilingual Information Retrieval

Published: 11 July 2024 Publication History

Abstract

Multilingual information retrieval (MLIR) considers the problem of ranking documents in several languages for a query expressed in a language that may differ from any of those languages. Recent work has observed that approaches such as combining ranked lists representing a single document language each or using multilingual pretrained language models demonstrate a preference for one language over others. This results in systematic unfair treatment of documents in different languages. This work proposes a language fairness metric to evaluate whether documents across different languages are fairly ranked through statistical equivalence testing using the Kruskal-Wallis test. In contrast to most prior work in group fairness, we do not consider any language to be an unprotected group. Thus our proposed measure, PEER (Probability of Equal Expected Rank), is the first fairness metric specifically designed to capture the language fairness of MLIR systems. We demonstrate the behavior of PEER on artificial ranked lists. We also evaluate real MLIR systems on two publicly available benchmarks and show that the PEER scores align with prior analytical findings on MLIR fairness. Our implementation is compatible with ir-measures and is available at http://github.com/hltcoe/peer_measure.

References

[1]
Jaime Arguello, Adam Ferguson, Emery Fine, Bhaskar Mitra, Hamed Zamani, and Fernando Diaz. 2021. Tip of the tongue known-item retrieval: A case study in movie identification. In Proceedings of the 2021 Conference on Human Information Interaction and Retrieval. 5--14.
[2]
Payal Bajaj, Daniel Campos, Nick Craswell, Li Deng, Jianfeng Gao, Xiaodong Liu, Rangan Majumder, Andrew McNamara, Bhaskar Mitra, Tri Nguyen, et al. 2016. Ms marco: A human generated machine reading comprehension dataset. arXiv preprint arXiv:1611.09268 (2016).
[3]
Asia J Biega, Krishna P Gummadi, and Gerhard Weikum. 2018. Equity of attention: Amortizing individual fairness in rankings. In Proc. SIGIR. 405--414.
[4]
Martin Braschler. 2001. CLEF 2001-Overview of Results. In Workshop of the Cross-Language Evaluation Forum for European Languages. Springer, 9--26.
[5]
Martin Braschler. 2002. CLEF 2002-Overview of results. In Workshop of the Cross-Language Evaluation Forum for European Languages. Springer, 9--27.
[6]
Martin Braschler. 2003. CLEF 2003--Overview of results. In Workshop of the Cross-Language Evaluation Forum for European Languages. Springer, 44--63.
[7]
Andrei Broder. 2002. A taxonomy of web search. In ACM Sigir forum, Vol. 36. ACM New York, NY, USA, 3--10.
[8]
Carlos Castillo. 2019. Fairness and transparency in ranking. In ACM SIGIR Forum, Vol. 52. ACM New York, NY, USA, 64--71.
[9]
Olivier Chapelle, Donald Metlzer, Ya Zhang, and Pierre Grinspan. 2009. Expected reciprocal rank for graded relevance. In Proceedings of the 18th ACM conference on Information and knowledge management. 621--630.
[10]
Danqi Chen, Adam Fisch, Jason Weston, and Antoine Bordes. 2017. Reading Wikipedia to answer open-domain questions. In 55th Annual Meeting of the Association for Computational Linguistics, ACL 2017. Association for Computational Linguistics (ACL), 1870--1879.
[11]
Monojit Choudhury and Amit Deshpande. 2021. How Linguistically Fair Are Multilingual Pre-Trained Language Models?. In Proceedings of the AAAI conference on artificial intelligence, Vol. 35. 12710--12718.
[12]
Jonathan H. Clark, Eunsol Choi, Michael Collins, Dan Garrette, Tom Kwiatkowski, Vitaly Nikolaev, and Jennimaria Palomaki. 2020. TyDi QA: A Benchmark for Information-Seeking Question Answering in Typologically Diverse Languages. Transactions of the Association for Computational Linguistics (2020).
[13]
Charles L. A. Clarke, Maheedhar Kolla, Gordon V. Cormack, Olga Vechtomova, Azin Ashkan, Stefan B"uttcher, and Ian MacKinnon. 2008. Novelty and diversity in information retrieval evaluation. In SIGIR.
[14]
Fernando Diaz, Bhaskar Mitra, Michael D Ekstrand, Asia J Biega, and Ben Carterette. 2020. Evaluating stochastic rankings with expected exposure. In Proceedings of the 29th ACM International Conference on Information & Knowledge Management. 275--284.
[15]
Cynthia Dwork, Moritz Hardt, Toniann Pitassi, Omer Reingold, and Richard Zemel. 2012. Fairness through awareness. In Proceedings of the 3rd innovations in theoretical computer science conference. 214--226.
[16]
Zhiqi Huang, Hansi Zeng, Hamed Zamani, and James Allan. 2023. Soft Prompt Decoding for Multilingual Dense Retrieval. arXiv preprint arXiv:2305.09025 (2023).
[17]
Nora Kassner, Philipp Dufter, and Hinrich Schütze. 2021. Multilingual LAMA: Investigating knowledge in multilingual pretrained language models. arXiv preprint arXiv:2102.00894 (2021).
[18]
William H Kruskal and W Allen Wallis. 1952. Use of ranks in one-criterion variance analysis. Journal of the American statistical Association, Vol. 47, 260 (1952), 583--621.
[19]
Heeyoung Kwon, Harsh Trivedi, Peter Jansen, Mihai Surdeanu, and Niranjan Balasubramanian. 2018. Controlling information aggregation for complex question answering. In Advances in Information Retrieval: 40th European Conference on IR Research, ECIR 2018, Grenoble, France, March 26--29, 2018, Proceedings 40. Springer, 750--757.
[20]
Dawn Lawrie, Sean MacAvaney, James Mayfield, Paul McNamee, Douglas W. Oard, Luca Soldaini, and Eugene Yang. 2023 a. Overview of the TREC 2022 NeuCLIR Track. In Proceedings of the 31st Text REtrieval Conference (Gaithersburg, Maryland). https://arxiv.org/abs/2304.12367
[21]
Dawn Lawrie, Eugene Yang, Douglas W Oard, and James Mayfield. 2023 b. Neural Approaches to Multilingual Information Retrieval. In European Conference on Information Retrieval. Springer, 521--536.
[22]
Jin Ha Lee, Allen Renear, and Linda C Smith. 2006. Known-item search: Variations on a concept. Proceedings of the american society for information science and technology, Vol. 43, 1 (2006), 1--17.
[23]
Alistair Moffat and Justin Zobel. 2008. Rank-biased precision for measurement of retrieval effectiveness. ACM Transactions on Information Systems (TOIS), Vol. 27, 1 (2008), 1--27.
[24]
Marco Morik, Ashudeep Singh, Jessica Hong, and Thorsten Joachims. 2020. Controlling fairness and bias in dynamic learning-to-rank. In Proc. SIGIR. 429--438.
[25]
Razieh Rahimi, Azadeh Shakery, and Irwin King. 2015. Multilingual information retrieval in the language modeling framework. Information Retrieval Journal, Vol. 18, 3 (2015), 246--281.
[26]
Stephen E Robertson and Ian Soboroff. 2002. The TREC 2002 Filtering Track Report. In Proceedings of the Eleventh Text Retrieval Conference.
[27]
Keshav Santhanam, Omar Khattab, Jon Saad-Falcon, Christopher Potts, and Matei Zaharia. 2022. ColBERTv2: Effective and Efficient Retrieval via Lightweight Late Interaction. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, Seattle, United States, 3715--3734.
[28]
Piotr Sapiezynski, Wesley Zeng, Ronald E Robertson, Alan Mislove, and Christo Wilson. 2019. Quantifying the impact of user attention on fair group representation in ranked lists. In Companion Proceedings of WWW. 553--562.
[29]
Stefan Schulz-Hardt, Dieter Frey, Carsten Lüthgens, and Serge Moscovici. 2000. Biased information search in group decision making. Journal of personality and social psychology, Vol. 78, 4 (2000), 655.
[30]
Ashudeep Singh and Thorsten Joachims. 2018. Fairness of exposure in rankings. In Proc. KDD.
[31]
Ryen White. 2013. Beliefs and biases in web search. In Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval. 3--12.
[32]
Meike Zehlike, Francesco Bonchi, Carlos Castillo, Sara Hajian, Mohamed Megahed, and Ricardo Baeza-Yates. 2017. Fa* ir: A fair top-k ranking algorithm. In Proc. of CIKM.
[33]
Meike Zehlike, Tom Sühr, Ricardo Baeza-Yates, Francesco Bonchi, Carlos Castillo, and Sara Hajian. 2022. Fair Top-k Ranking with multiple protected groups. Information processing & management, Vol. 59, 1 (2022), 102707.
[34]
Meike Zehlike, Ke Yang, and Julia Stoyanovich. 2021. Fairness in ranking: A survey. arXiv preprint arXiv:2103.14000 (2021).

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SIGIR '24: Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval
July 2024
3164 pages
ISBN:9798400704314
DOI:10.1145/3626772
This work is licensed under a Creative Commons Attribution International 4.0 License.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 11 July 2024

Check for updates

Author Tags

  1. language fairness
  2. multilingual retrieval
  3. statistical testing

Qualifiers

  • Short-paper

Conference

SIGIR 2024
Sponsor:

Acceptance Rates

Overall Acceptance Rate 792 of 3,983 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 220
    Total Downloads
  • Downloads (Last 12 months)220
  • Downloads (Last 6 weeks)50
Reflects downloads up to 18 Feb 2025

Other Metrics

Citations

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media