Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3539813.3545144acmconferencesArticle/Chapter ViewAbstractPublication PagesictirConference Proceedingsconference-collections
research-article

The Role of Complex NLP in Transformers for Text Ranking

Published: 25 August 2022 Publication History
  • Get Citation Alerts
  • Abstract

    Even though term-based methods such as BM25 provide strong baselines in ranking, under certain conditions they are dominated by large pre-trained masked language models (MLMs) such as BERT. To date, the source of their effectiveness remains unclear. Is it their ability to truly understand the meaning through modeling syntactic aspects? We answer this by manipulating the input order and position information in a way that destroys the natural sequence order of query and passage and shows that the model still achieves comparable performance. Overall, our results highlight that syntactic aspects do not play a critical role in the effectiveness of re-ranking with BERT. We point to other mechanisms such as query-passage cross-attention and richer embeddings that capture word meanings based on aggregated context regardless of the word order for being the main attributions for its superior performance.

    References

    [1]
    Avi T Arampatzis, T Tsoris, Cornelis H. A. Koster, and Th P Van Der Weide. 1998. Phrase-based information retrieval. Information Processing & Management 34, 6 (1998), 693--707.
    [2]
    Payal Bajaj, Daniel Campos, Nick Craswell, Li Deng, Jianfeng Gao, Xiaodong Liu, Rangan Majumder, Andrew McNamara, Bhaskar Mitra, Tri Nguyen, et al. 2016. MS MARCO: A human generated machine reading comprehension dataset. (2016). arXiv:1611.09268
    [3]
    Arthur Câmara and Claudia Hauff. 2020. Diagnosing BERT with retrieval heuristics. In European Conference on Information Retrieval. Springer, 605--618.
    [4]
    Scott Deerwester, Susan T Dumais, George W Furnas, Thomas K Landauer, and Richard Harshman. 1990. Indexing by latent semantic analysis. Journal of the American society for information science 41, 6 (1990), 391--407.
    [5]
    Mostafa Dehghani. 2018. Toward Document Understanding for Information Retrieval. SIGIR Forum 51, 3 (feb 2018), 27--31. https://doi.org/10.1145/3190580.3190585
    [6]
    Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. (2018). arXiv:1810.04805
    [7]
    Allyson Ettinger. 2019. What BERT is not: Lessons from a new suite of psycholinguistic diagnostics for language models. CoRR abs/1907.13528 (2019). arXiv:1907.13528 http://arxiv.org/abs/1907.13528
    [8]
    Joel L Fagan. 1988. Experiments in automatic phrase indexing for document retrieval: A comparison of syntactic and nonsyntactic methods. Ph.D. Dissertation. Cornell University.
    [9]
    Thibault Formal, Benjamin Piwowarski, and Stéphane Clinchant. 2021. A White Box Analysis of ColBERT. In ECIR.
    [10]
    Ashim Gupta, Giorgi Kvernadze, and Vivek Srikumar. 2021. Bert & family eat word salad: Experiments with text understanding. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 35. 12946--12954.
    [11]
    Zellig S. Harris. 1954. Distributional Structure. WORD 10, 2--3 (1954), 146--162. https://doi.org/10.1080/00437956.1954.11659520 arXiv:https://doi.org/10.1080/00437956.1954.11659520
    [12]
    Simon Kornblith, Mohammad Norouzi, Honglak Lee, and Geoffrey E. Hinton. 2019. Similarity of Neural Network Representations Revisited. CoRR abs/1905.00414 (2019). arXiv:1905.00414 http://arxiv.org/abs/1905.00414
    [13]
    Angelika Kratzer and Irene Heim. 1998. Semantics in generative grammar. Vol. 1185. Blackwell Oxford.
    [14]
    Jimmy Lin. 2021. The Neural Hype, Justified! A Recantation. SIGIR Forum 53, 2 (mar 2021), 88--93. https://doi.org/10.1145/3458553.3458563
    [15]
    Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. 2019. RoBERTa: A Robustly Optimized BERT Pretraining Approach. CoRR abs/1907.11692 (2019). arXiv:1907.11692 http://arxiv.org/abs/1907.11692
    [16]
    Sean MacAvaney, Sergey Feldman, Nazli Goharian, Doug Downey, and Arman Cohan. 2022. ABNIRML: Analyzing the Behavior of Neural IR Models. Transactions of the Association for Computational Linguistics 10 (03 2022), 224--239. https://doi.org/10.1162/tacl_a_00457
    [17]
    Donald Metzler, Yi Tay, Dara Bahri, and Marc Najork. 2021. Rethinking search. ACM SIGIR Forum 55, 1 (Jun 2021), 1--27. https://doi.org/10.1145/3476415.3476428
    [18]
    Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. 2013. Distributed representations of words and phrases and their compositionality. Advances in neural information processing systems 26 (2013).
    [19]
    Rodrigo Nogueira and Kyunghyun Cho. 2019. Passage Re-ranking with BERT. (2019). arXiv:1901.04085
    [20]
    Ankur P. Parikh, Oscar Täckström, Dipanjan Das, and Jakob Uszkoreit. 2016. A Decomposable Attention Model for Natural Language Inference. CoRR abs/1606.01933 (2016). arXiv:1606.01933 http://arxiv.org/abs/1606.01933
    [21]
    Matthew E. Peters, Mark Neumann, Mohit Iyyer, Matt Gardner, Christopher Clark, Kenton Lee, and Luke Zettlemoyer. 2018. Deep contextualized word representations. CoRR abs/1802.05365 (2018). arXiv:1802.05365 http://arxiv.org/abs/1802.05365
    [22]
    Thang Pham, Trung Bui, Long Mai, and Anh Nguyen. 2021. Out of Order: How important is the sequential order of words in a sentence in Natural Language Understanding tasks?. In Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021. Association for Computational Linguistics, Online, 1145--1160. https://doi.org/10.18653/v1/2021.findings-acl.98
    [23]
    David Rau and Jaap Kamps. 2022. How Different are Pre-trained Transformers for Text Ranking?. In Advances in Information Retrieval: 44th European Conference on IR Research, ECIR 2022, Stavanger, Norway, April 10--14, 2022, Proceedings, Part II. 207--214.
    [24]
    Daniël Rennings, Felipe Moraes, and Claudia Hauff. 2019. An axiomatic approach to diagnosing neural IR models. In European Conference on Information Retrieval. Springer, 489--503. https://doi.org/10.1007/978--3-030--15712--8_32
    [25]
    Mark Sanderson. 1994. Word Sense Disambiguation and Information Retrieval. In SIGIR '94, Bruce W. Croft and C. J. van Rijsbergen (Eds.). Springer London, London, 142--151.
    [26]
    Koustuv Sinha, Robin Jia, Dieuwke Hupkes, Joelle Pineau, Adina Williams, and Douwe Kiela. 2021. Masked Language Modeling and the Distributional Hypothesis: Order Word Matters Pre-training for Little. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Online and Punta Cana, Dominican Republic, 2888--2913. https://doi.org/10.18653/v1/2021.emnlp-main.230
    [27]
    Koustuv Sinha, Prasanna Parthasarathi, Joelle Pineau, and Adina Williams. 2021. UnNatural Language Inference. ACL (2021). https://arxiv.org/abs/2101.00010
    [28]
    Ellen M Voorhees. 1993. Using WordNet to disambiguate word senses for text retrieval. In Proceedings of the 16th annual international ACM SIGIR conference on Research and development in information retrieval. 171--180.
    [29]
    AlexWang, Amanpreet Singh, Julian Michael, Felix Hill, Omer Levy, and Samuel R. Bowman. 2019. GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding. In the Proceedings of ICLR.
    [30]
    Benyou Wang, Lifeng Shang, Christina Lioma, Xin Jiang, Hao Yang, Qun Liu, and Jakob Grue Simonsen. 2021. On Position Embeddings in BERT. In International Conference on Learning Representations. https://openreview.net/forum?id=onxoVA9FxMw
    [31]
    Thomas Wolf, Lysandre Debut, Victor Sanh, Julien Chaumond, Clement Delangue, Anthony Moi, Pierric Cistac, Tim Rault, Rémi Louf, Morgan Funtowicz, Joe Davison, Sam Shleifer, Patrick von Platen, Clara Ma, Yacine Jernite, Julien Plu, Canwen Xu, Teven Le Scao, Sylvain Gugger, Mariama Drame, Quentin Lhoest, and Alexander M. Rush. 2020. Transformers: State-of-the-Art Natural Language Processing. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations. Association for Computational Linguistics, Online, 38--45. https://www.aclweb.org/anthology/2020.emnlpdemos.6

    Cited By

    View all
    • (2024)Revisiting Bag of Words Document Representations for Efficient Ranking with TransformersACM Transactions on Information Systems10.1145/364046042:5(1-27)Online publication date: 29-Apr-2024
    • (2024)Efficient Neural Ranking Using Forward Indexes and Lightweight EncodersACM Transactions on Information Systems10.1145/363193942:5(1-34)Online publication date: 29-Apr-2024
    • (2024)Injecting the score of the first-stage retriever as text improves BERT-based re-rankersDiscover Computing10.1007/s10791-024-09435-827:1Online publication date: 26-Jun-2024
    • Show More Cited By

    Index Terms

    1. The Role of Complex NLP in Transformers for Text Ranking

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      ICTIR '22: Proceedings of the 2022 ACM SIGIR International Conference on Theory of Information Retrieval
      August 2022
      289 pages
      ISBN:9781450394123
      DOI:10.1145/3539813
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 25 August 2022

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. analysis
      2. neural bag-of-words
      3. neural re-ranking
      4. nlp in ranking
      5. transformers

      Qualifiers

      • Research-article

      Funding Sources

      • NWO

      Conference

      ICTIR '22
      Sponsor:

      Acceptance Rates

      ICTIR '22 Paper Acceptance Rate 32 of 80 submissions, 40%;
      Overall Acceptance Rate 209 of 482 submissions, 43%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)36
      • Downloads (Last 6 weeks)2
      Reflects downloads up to

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)Revisiting Bag of Words Document Representations for Efficient Ranking with TransformersACM Transactions on Information Systems10.1145/364046042:5(1-27)Online publication date: 29-Apr-2024
      • (2024)Efficient Neural Ranking Using Forward Indexes and Lightweight EncodersACM Transactions on Information Systems10.1145/363193942:5(1-34)Online publication date: 29-Apr-2024
      • (2024)Injecting the score of the first-stage retriever as text improves BERT-based re-rankersDiscover Computing10.1007/s10791-024-09435-827:1Online publication date: 26-Jun-2024
      • (2023)Explainability of Text Processing and Retrieval MethodsProceedings of the 15th Annual Meeting of the Forum for Information Retrieval Evaluation10.1145/3632754.3632944(153-157)Online publication date: 15-Dec-2023

      View Options

      Get Access

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media