short-paper

Explain Like I am BM25: Interpreting a Dense Model's Ranked-List with a Sparse Approximation

Authors:

Michael Llordes,

Debasis Ganguly,

Chirag AgarwalAuthors Info & Claims

SIGIR '23: Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval

Pages 1976 - 1980

https://doi.org/10.1145/3539618.3591982

Published: 18 July 2023 Publication History

Abstract

Neural retrieval models (NRMs) have been shown to outperform their statistical counterparts owing to their ability to capture semantic meaning via dense document representations. These models, however, suffer from poor interpretability as they do not rely on explicit term matching. As a form of local per-query explanations, we introduce the notion of equivalent queries that are generated by maximizing the similarity between the NRM's results and the result set of a sparse retrieval system with the equivalent query. We then compare this approach with existing methods such as RM3-based query expansion and contrast differences in retrieval effectiveness and in the terms generated by each approach.

References

[1]

Avishek Anand, Lijun Lyu, Maximilian Idahl, Yumeng Wang, Jonas Wallat, and Zijian Zhang. 2022. Explainable Information Retrieval: A Survey. https://doi.org/10.48550/ARXIV.2211.02405

[2]

Anirban Chakraborty, Debasis Ganguly, and Owen Conlan. 2020. Retrievability based Document Selection for Relevance Feedback with Automatically Generated Query Variants. In CIKM. ACM, 125--134.

[3]

Jianbo Chen, Le S., Martin W., and Michael Jordan. 2018. Learning to Explain: An Information-Theoretic Perspective on Model Interpretation. In Proc. of ICML'19, Vol. 80. 882--891.

[4]

Ioannis Chios and Suzan Verberne. 2021. Helping results assessment by adding explainable elements to the deep relevance matching model. 3rd International Workshop on ExplainAble Recommendation and Search (EARS 2020). https://doi.org/10.48550/ARXIV.2106.05147

[5]

Jaekeol Choi, Jungin Choi, and Wonjong Rhee. 2020. Interpreting Neural Ranking Models using Grad-CAM. https://doi.org/10.48550/ARXIV.2005.05768

[6]

Nick Craswell, Bhaskar Mitra, Emine Yilmaz, Daniel Campos, Ellen M Voorhees, and Ian Soboroff. 2021. TREC deep learning track: Reusable test collections in the large data regime. In Proceedings of the 44th international ACM SIGIR conference on research and development in information retrieval. 2369--2375.

Digital Library

[7]

Zhuyun Dai and Jamie Callan. 2020. Context-Aware Term Weighting For First Stage Passage Retrieval. In Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval (Virtual Event, China) (SIGIR '20). Association for Computing Machinery, New York, NY, USA, 1533--1536. https://doi.org/10.1145/3397271.3401204

Digital Library

[8]

Zeon Trevor Fernando, Jaspreet Singh, and Avishek Anand. 2019. A Study on the Interpretability of Neural Retrieval Models Using DeepSHAP (SIGIR'19). Association for Computing Machinery, New York, NY, USA, 1005--1008. https://doi.org/10.1145/3331184.3331312

Digital Library

[9]

Diane Kelly and Leif Azzopardi. 2015. How many results per page?: A Study of SERP Size, Search Behavior and User Experience. In SIGIR. ACM, 183--192.

[10]

Omar Khattab and Matei Zaharia. 2020. Colbert: Efficient and effective passage search via contextualized late interaction over bert. In Proceedings of the 43rd International ACM SIGIR conference on research and development in Information Retrieval. 39--48.

Digital Library

[11]

Victor Lavrenko and W. Bruce Croft. 2001. Relevance Based Language Models. In Proc. of SIGIR '01. 120--127.

[12]

Hang Li, Shuai Wang, Shengyao Zhuang, Ahmed Mourad, Xueguang Ma, Jimmy Lin, and Guido Zuccon. 2022. To Interpolate or not to Interpolate: PRF, Dense and Sparse Retrievers. In SIGIR. ACM, 2495--2500.

[13]

Scott M Lundberg and Su-In Lee. 2017. A Unified Approach to Interpreting Model Predictions. In Advances in Neural Information Processing Systems, I. Guyon, U. Von Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett (Eds.), Vol. 30. Curran Associates, Inc. https://proceedings.neurips.cc/paper/2017/file/8a20a8621978632d76c43dfd28b67767-Paper.pdf

[14]

Ali Montazeralghaem, Hamed Zamani, and James Allan. 2020. A Reinforcement Learning Framework for Relevance Feedback. In SIGIR. ACM, 59--68.

[15]

Tri Nguyen, Mir Rosenberg, Xia Song, Jianfeng Gao, Saurabh Tiwary, Rangan Majumder, and Li Deng. 2016. MS MARCO: A human generated machine reading comprehension dataset. choice, Vol. 2640 (2016), 660.

[16]

Rodrigo Nogueira, Jimmy Lin, and AI Epistemic. 2019. From doc2query to docTTTTTquery. Online preprint, Vol. 6 (2019).

[17]

Ronak Pradeep, Rodrigo Nogueira, and Jimmy Lin. 2021. The expando-mono-duo design pattern for text ranking with pretrained sequence-to-sequence models. arXiv preprint arXiv:2101.05667 (2021).

[18]

Jiaming Qu, Jaime Arguello, and Yue Wang. 2021. A Deep Analysis of an Explainable Retrieval Model for Precision Medicine Literature Search. In Advances in Information Retrieval: 43rd European Conference on IR Research, ECIR 2021.

Digital Library

[19]

Razieh Rahimi, Youngwoo Kim, Hamed Zamani, and James Allan. 2021. Explaining Documents' Relevance to Search Queries. https://doi.org/10.48550/ARXIV.2111.01314

[20]

Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. 2016. ?Why Should I Trust You?": Explaining the Predictions of Any Classifier. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (San Francisco, California, USA) (KDD '16). Association for Computing Machinery, New York, NY, USA, 1135--1144.

Digital Library

[21]

S.E. Robertson, S. Walker, M.M. Beaulieu, M. Gatford, and A. Payne. 1996. Okapi at TREC-4.

[22]

Dwaipayan Roy, Debasis Ganguly, Mandar Mitra, and Gareth J. F. Jones. 2016. Word Vector Compositionality based Relevance Feedback using Kernel Density Estimation. In CIKM. ACM, 1281--1290.

[23]

Stuart J. Russell and Peter Norvig. 2009. Artificial Intelligence: a modern approach 3 ed.). Pearson.

Digital Library

[24]

Procheta Sen, Debasis Ganguly, Manisha Verma, and Gareth J. F. Jones. 2020. The Curious Case of IR Explainability: Explaining Document Scores within and across Ranking Models. In SIGIR. ACM, 2069--2072.

[25]

Procheta Sen, Sourav Saha, Debasis Ganguly, Manisha Verma, and Dwaipayan Roy. 2022. Measuring and Comparing the Consistency of IR Models for Query Pairs with Similar and Different Information Needs. In CIKM. ACM, 4449--4453.

[26]

Suzan Verberne. 2018. Explainable IR for Personalizing Professional Search. In Joint Proceedings of the First International Workshop on Professional Search (ProfS2018); the Second Workshop on Knowledge Graphs and Semantics for Text Retrieval, Analysis, and Understanding (KG4IR); and the International Workshop on Data Search (DATA:SEARCH'18) Co-located with (ACM SIGIR 2018), Ann Arbor, Michigan, USA, July 12, 2018 (CEUR Workshop Proceedings, Vol. 2127), Laura Dietz, Laura Koesten, and Suzan Verberne (Eds.). CEUR-WS.org, 35--42. http://ceur-ws.org/Vol-2127/paper4-profs.pdf

[27]

Manisha Verma and Debasis Ganguly. 2019. LIRME: Locally Interpretable Ranking Model Explanation. In Proc. of SIGIR 2019. 1281--1284.

Digital Library

[28]

Michael Völske, Alexander Bondarenko, Maik Fröbe, Benno Stein, Jaspreet Singh, Matthias Hagen, and Avishek Anand. 2021. Towards Axiomatic Explanations for Neural Ranking Models. In Proceedings of the 2021 ACM SIGIR International Conference on Theory of Information Retrieval (Virtual Event, Canada) (ICTIR '21). Association for Computing Machinery, New York, NY, USA, 13--22. https://doi.org/10.1145/3471158.3472256

Digital Library

[29]

William Webber, Alistair Moffat, and Justin Zobel. 2010a. A Similarity Measure for Indefinite Rankings. ACM Trans. Inf. Syst., Vol. 28, 4, Article 20 (2010).

Digital Library

[30]

William Webber, Alistair Moffat, and Justin Zobel. 2010b. A Similarity Measure for Indefinite Rankings. ACM Trans. Inf. Syst., Vol. 28, 4, Article 20 (nov 2010), 38 pages. https://doi.org/10.1145/1852102.1852106

Digital Library

[31]

Lee Xiong, Chenyan Xiong, Ye Li, Kwok-Fung Tang, Jialin Liu, Paul Bennett, Junaid Ahmed, and Arnold Overwijk. 2020. Approximate nearest neighbor negative contrastive learning for dense text retrieval. arXiv preprint arXiv:2007.00808 (2020).

[32]

Chengxiang Zhai and John Lafferty. 2001. A Study of Smoothing Methods for Language Models Applied to Ad Hoc Information Retrieval. In Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '01). Association for Computing Machinery, New York, NY, USA, 334--342.

Digital Library

[33]

Yongfeng Zhang, Yi Zhang, and Min Zhang. 2018. SIGIR 2018 Workshop on ExplainAble Recommendation and Search (EARS 2018). In The 41st International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '18). ACM.

Index Terms

Explain Like I am BM25: Interpreting a Dense Model's Ranked-List with a Sparse Approximation
1. Information systems
  1. Information retrieval
    1. Document representation
      1. Content analysis and feature selection
    2. Retrieval models and ranking

Recommendations

Sparse reconstruction by separable approximation

Finding sparse approximate solutions to large under-determined linear systems of equations is a common problem in signal/image processing and statistics. Basis pursuit, the least absolute shrinkage and selection operator (LASSO), wavelet-based ...
Sparse Approximation via Penalty Decomposition Methods

In this paper we consider sparse approximation problems, that is, general $l_0$ minimization problems with the $l_0$-``norm” of a vector being a part of constraints or objective function. In particular, we first study the first-order optimality conditions ...
Exact Sparse Approximation Problems via Mixed-Integer Programming: Formulations and Computational Performance

Sparse approximation addresses the problem of approximately fitting a linear model with a solution having as few non-zero components as possible. While most sparse estimation algorithms rely on suboptimal formulations, this work studies the performance ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

SIGIR '23: Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval

July 2023

3567 pages

ISBN:9781450394086

DOI:10.1145/3539618

General Chairs:
Hsin-Hsi Chen
National Taiwan University
,
Wei-Jou (Edward) Duh
National Taiwan University
,
Hen-Hsen Huang
Academia Sinica
,
Program Chairs:
Makoto P. Kato
Spotify
,
Josiane Mothe
Universite de Toulouse
,
Barbara Poblete
University of Chile and Amazon Visiting Academic

Copyright © 2023 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGIR: ACM Special Interest Group on Information Retrieval

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 18 July 2023

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Short-paper

Conference

SIGIR '23

Sponsor:

SIGIR

SIGIR '23: The 46th International ACM SIGIR Conference on Research and Development in Information Retrieval

July 23 - 27, 2023

Taipei, Taiwan

Acceptance Rates

Overall Acceptance Rate 792 of 3,983 submissions, 20%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
161
Total Downloads

Downloads (Last 12 months)91
Downloads (Last 6 weeks)8

Reflects downloads up to 27 Jan 2025

Other Metrics

View Author Metrics

Citations

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten