research-article

The Role of Complex NLP in Transformers for Text Ranking

Authors:

Jaap KampsAuthors Info & Claims

ICTIR '22: Proceedings of the 2022 ACM SIGIR International Conference on Theory of Information Retrieval

Pages 153 - 160

https://doi.org/10.1145/3539813.3545144

Published: 25 August 2022 Publication History

Abstract

Even though term-based methods such as BM25 provide strong baselines in ranking, under certain conditions they are dominated by large pre-trained masked language models (MLMs) such as BERT. To date, the source of their effectiveness remains unclear. Is it their ability to truly understand the meaning through modeling syntactic aspects? We answer this by manipulating the input order and position information in a way that destroys the natural sequence order of query and passage and shows that the model still achieves comparable performance. Overall, our results highlight that syntactic aspects do not play a critical role in the effectiveness of re-ranking with BERT. We point to other mechanisms such as query-passage cross-attention and richer embeddings that capture word meanings based on aggregated context regardless of the word order for being the main attributions for its superior performance.

References

[1]

Avi T Arampatzis, T Tsoris, Cornelis H. A. Koster, and Th P Van Der Weide. 1998. Phrase-based information retrieval. Information Processing & Management 34, 6 (1998), 693--707.

Digital Library

[2]

Payal Bajaj, Daniel Campos, Nick Craswell, Li Deng, Jianfeng Gao, Xiaodong Liu, Rangan Majumder, Andrew McNamara, Bhaskar Mitra, Tri Nguyen, et al. 2016. MS MARCO: A human generated machine reading comprehension dataset. (2016). arXiv:1611.09268

[3]

Arthur Câmara and Claudia Hauff. 2020. Diagnosing BERT with retrieval heuristics. In European Conference on Information Retrieval. Springer, 605--618.

[4]

Scott Deerwester, Susan T Dumais, George W Furnas, Thomas K Landauer, and Richard Harshman. 1990. Indexing by latent semantic analysis. Journal of the American society for information science 41, 6 (1990), 391--407.

[5]

Mostafa Dehghani. 2018. Toward Document Understanding for Information Retrieval. SIGIR Forum 51, 3 (feb 2018), 27--31. https://doi.org/10.1145/3190580.3190585

Digital Library

[6]

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. (2018). arXiv:1810.04805

[7]

Allyson Ettinger. 2019. What BERT is not: Lessons from a new suite of psycholinguistic diagnostics for language models. CoRR abs/1907.13528 (2019). arXiv:1907.13528 http://arxiv.org/abs/1907.13528

[8]

Joel L Fagan. 1988. Experiments in automatic phrase indexing for document retrieval: A comparison of syntactic and nonsyntactic methods. Ph.D. Dissertation. Cornell University.

[9]

Thibault Formal, Benjamin Piwowarski, and Stéphane Clinchant. 2021. A White Box Analysis of ColBERT. In ECIR.

[10]

Ashim Gupta, Giorgi Kvernadze, and Vivek Srikumar. 2021. Bert & family eat word salad: Experiments with text understanding. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 35. 12946--12954.

[11]

Zellig S. Harris. 1954. Distributional Structure. WORD 10, 2--3 (1954), 146--162. https://doi.org/10.1080/00437956.1954.11659520 arXiv:https://doi.org/10.1080/00437956.1954.11659520

[12]

Simon Kornblith, Mohammad Norouzi, Honglak Lee, and Geoffrey E. Hinton. 2019. Similarity of Neural Network Representations Revisited. CoRR abs/1905.00414 (2019). arXiv:1905.00414 http://arxiv.org/abs/1905.00414

[13]

Angelika Kratzer and Irene Heim. 1998. Semantics in generative grammar. Vol. 1185. Blackwell Oxford.

[14]

Jimmy Lin. 2021. The Neural Hype, Justified! A Recantation. SIGIR Forum 53, 2 (mar 2021), 88--93. https://doi.org/10.1145/3458553.3458563

[15]

Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. 2019. RoBERTa: A Robustly Optimized BERT Pretraining Approach. CoRR abs/1907.11692 (2019). arXiv:1907.11692 http://arxiv.org/abs/1907.11692

[16]

Sean MacAvaney, Sergey Feldman, Nazli Goharian, Doug Downey, and Arman Cohan. 2022. ABNIRML: Analyzing the Behavior of Neural IR Models. Transactions of the Association for Computational Linguistics 10 (03 2022), 224--239. https://doi.org/10.1162/tacl_a_00457

[17]

Donald Metzler, Yi Tay, Dara Bahri, and Marc Najork. 2021. Rethinking search. ACM SIGIR Forum 55, 1 (Jun 2021), 1--27. https://doi.org/10.1145/3476415.3476428

Digital Library

[18]

Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. 2013. Distributed representations of words and phrases and their compositionality. Advances in neural information processing systems 26 (2013).

[19]

Rodrigo Nogueira and Kyunghyun Cho. 2019. Passage Re-ranking with BERT. (2019). arXiv:1901.04085

[20]

Ankur P. Parikh, Oscar Täckström, Dipanjan Das, and Jakob Uszkoreit. 2016. A Decomposable Attention Model for Natural Language Inference. CoRR abs/1606.01933 (2016). arXiv:1606.01933 http://arxiv.org/abs/1606.01933

[21]

Matthew E. Peters, Mark Neumann, Mohit Iyyer, Matt Gardner, Christopher Clark, Kenton Lee, and Luke Zettlemoyer. 2018. Deep contextualized word representations. CoRR abs/1802.05365 (2018). arXiv:1802.05365 http://arxiv.org/abs/1802.05365

[22]

Thang Pham, Trung Bui, Long Mai, and Anh Nguyen. 2021. Out of Order: How important is the sequential order of words in a sentence in Natural Language Understanding tasks?. In Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021. Association for Computational Linguistics, Online, 1145--1160. https://doi.org/10.18653/v1/2021.findings-acl.98

[23]

David Rau and Jaap Kamps. 2022. How Different are Pre-trained Transformers for Text Ranking?. In Advances in Information Retrieval: 44th European Conference on IR Research, ECIR 2022, Stavanger, Norway, April 10--14, 2022, Proceedings, Part II. 207--214.

Digital Library

[24]

Daniël Rennings, Felipe Moraes, and Claudia Hauff. 2019. An axiomatic approach to diagnosing neural IR models. In European Conference on Information Retrieval. Springer, 489--503. https://doi.org/10.1007/978--3-030--15712--8_32

Digital Library

[25]

Mark Sanderson. 1994. Word Sense Disambiguation and Information Retrieval. In SIGIR '94, Bruce W. Croft and C. J. van Rijsbergen (Eds.). Springer London, London, 142--151.

[26]

Koustuv Sinha, Robin Jia, Dieuwke Hupkes, Joelle Pineau, Adina Williams, and Douwe Kiela. 2021. Masked Language Modeling and the Distributional Hypothesis: Order Word Matters Pre-training for Little. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Online and Punta Cana, Dominican Republic, 2888--2913. https://doi.org/10.18653/v1/2021.emnlp-main.230

[27]

Koustuv Sinha, Prasanna Parthasarathi, Joelle Pineau, and Adina Williams. 2021. UnNatural Language Inference. ACL (2021). https://arxiv.org/abs/2101.00010

[28]

Ellen M Voorhees. 1993. Using WordNet to disambiguate word senses for text retrieval. In Proceedings of the 16th annual international ACM SIGIR conference on Research and development in information retrieval. 171--180.

Digital Library

[29]

AlexWang, Amanpreet Singh, Julian Michael, Felix Hill, Omer Levy, and Samuel R. Bowman. 2019. GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding. In the Proceedings of ICLR.

[30]

Benyou Wang, Lifeng Shang, Christina Lioma, Xin Jiang, Hao Yang, Qun Liu, and Jakob Grue Simonsen. 2021. On Position Embeddings in BERT. In International Conference on Learning Representations. https://openreview.net/forum?id=onxoVA9FxMw

[31]

Thomas Wolf, Lysandre Debut, Victor Sanh, Julien Chaumond, Clement Delangue, Anthony Moi, Pierric Cistac, Tim Rault, Rémi Louf, Morgan Funtowicz, Joe Davison, Sam Shleifer, Patrick von Platen, Clara Ma, Yacine Jernite, Julien Plu, Canwen Xu, Teven Le Scao, Sylvain Gugger, Mariama Drame, Quentin Lhoest, and Alexander M. Rush. 2020. Transformers: State-of-the-Art Natural Language Processing. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations. Association for Computational Linguistics, Online, 38--45. https://www.aclweb.org/anthology/2020.emnlpdemos.6

Cited By

Rau DDehghani MKamps J(2024)Revisiting Bag of Words Document Representations for Efficient Ranking with TransformersACM Transactions on Information Systems10.1145/364046042:5(1-27)Online publication date: 29-Apr-2024
https://dl.acm.org/doi/10.1145/3640460
Leonhardt JMüller HRudra KKhosla MAnand AAnand A(2024)Efficient Neural Ranking Using Forward Indexes and Lightweight EncodersACM Transactions on Information Systems10.1145/363193942:5(1-34)Online publication date: 29-Apr-2024
https://dl.acm.org/doi/10.1145/3631939
Askari AAbolghasemi APasi GKraaij WVerberne S(2024)Injecting the score of the first-stage retriever as text improves BERT-based re-rankersDiscover Computing10.1007/s10791-024-09435-827:1Online publication date: 26-Jun-2024
https://doi.org/10.1007/s10791-024-09435-8
Show More Cited By

Index Terms

The Role of Complex NLP in Transformers for Text Ranking
1. Information systems
  1. Information retrieval
    1. Retrieval models and ranking
      1. Learning to rank

Recommendations

Pretrained Transformers for Text Ranking: BERT and Beyond
WSDM '21: Proceedings of the 14th ACM International Conference on Web Search and Data Mining

The goal of text ranking is to generate an ordered list of texts retrieved from a corpus in response to a query. Although the most common formulation of text ranking is search, instances of the task can also be found in many natural language processing ...
How Good are Transformers in Reordering?
Multi-disciplinary Trends in Artificial Intelligence
Abstract
Translation requires transfer of lexical items (words/phrases) from Source Language to Target Language and also reordering of the transferred lexical items as appropriate for the target language. Whatever be the approach used, quality of ...
Tied transformers: neural machine translation with shared encoder and decoder
AAAI'19/IAAI'19/EAAI'19: Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence and Thirty-First Innovative Applications of Artificial Intelligence Conference and Ninth AAAI Symposium on Educational Advances in Artificial Intelligence

Sharing source and target side vocabularies and word embeddings has been a popular practice in neural machine translation (briefly, NMT) for similar languages (e.g., English to French or German translation). The success of such word-level sharing ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

ICTIR '22: Proceedings of the 2022 ACM SIGIR International Conference on Theory of Information Retrieval

August 2022

289 pages

ISBN:9781450394123

DOI:10.1145/3539813

Program Chairs:
Fabio Crestani
Università della Svizzera Italiana - USI, Switzerland
,
Gabriella Pasi
Univ. Milano-Bicocca, Italy
,
Eric Gaussier
Univ. Grenoble-Alpes, France

Copyright © 2022 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGIR: ACM Special Interest Group on Information Retrieval

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 25 August 2022

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

NWO

Conference

ICTIR '22

Sponsor:

SIGIR

ICTIR '22: The 2022 ACM SIGIR International Conference on the Theory of Information Retrieval

July 11 - 12, 2022

Madrid, Spain

Acceptance Rates

ICTIR '22 Paper Acceptance Rate 32 of 80 submissions, 40%;

Overall Acceptance Rate 209 of 482 submissions, 43%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

4
Total Citations
View Citations
126
Total Downloads

Downloads (Last 12 months)36
Downloads (Last 6 weeks)2

Reflects downloads up to

Other Metrics

View Author Metrics

Citations

Cited By

Rau DDehghani MKamps J(2024)Revisiting Bag of Words Document Representations for Efficient Ranking with TransformersACM Transactions on Information Systems10.1145/364046042:5(1-27)Online publication date: 29-Apr-2024
https://dl.acm.org/doi/10.1145/3640460
Leonhardt JMüller HRudra KKhosla MAnand AAnand A(2024)Efficient Neural Ranking Using Forward Indexes and Lightweight EncodersACM Transactions on Information Systems10.1145/363193942:5(1-34)Online publication date: 29-Apr-2024
https://dl.acm.org/doi/10.1145/3631939
Askari AAbolghasemi APasi GKraaij WVerberne S(2024)Injecting the score of the first-stage retriever as text improves BERT-based re-rankersDiscover Computing10.1007/s10791-024-09435-827:1Online publication date: 26-Jun-2024
https://doi.org/10.1007/s10791-024-09435-8
Anand ASaha SSen PMitra M(2023)Explainability of Text Processing and Retrieval MethodsProceedings of the 15th Annual Meeting of the Forum for Information Retrieval Evaluation10.1145/3632754.3632944(153-157)Online publication date: 15-Dec-2023
https://dl.acm.org/doi/10.1145/3632754.3632944

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents