Article

MIRS: [MASK] Insertion Based Retrieval Stabilizer for Query Variations

Authors:

Yi GuoAuthors Info & Claims

Database and Expert Systems Applications: 34th International Conference, DEXA 2023, Penang, Malaysia, August 28–30, 2023, Proceedings, Part I

Pages 392 - 407

https://doi.org/10.1007/978-3-031-39847-6_31

Published: 28 August 2023 Publication History

Abstract

Pre-trained Language Models (PLMs) have greatly pushed the frontier of document retrieval tasks. Recent studies, however, show that PLMs are vulnerable to query variations, i.e., queries containing misspellings or word re-ordering of original queries, and etc. Despite the increasing interest to robustify the retriever performance, the impact of the query variations is not fully exploited. To effectively address this problem, this paper revisits the Masked-Language Modeling (MLM) and proposes a robust fine-tuning algorithm, termed [MASK] Insertion based Retrieval Stabilizer (MIRS). The proposed algorithm differs from existing methods via the injection of [MASK] tokens into query variations and further encouraging the representation similarity between the pair of original queries and their variations. In comparison to MLM, the traditional [MASK] substitution-then-prediction is less emphasized in MIRS. Additionally, an in-depth analysis of our algorithm is also provided to reveal: (1) the latent representation (or semantic) of the original query forms a convex hull, while the impact of the query variation is then quantified as a “distortion” to this hull via deviating the hull vertices; and (2) inserted [MASK] tokens play a significant role in enlarging the intersection between the newly-formed hull (after variations) and the original one, thereby preserving more semantic from original queries. With the proposed [MASK] injection, MIRS exhibits a relative 1.8 MRR@10 absolute point enhancement on average in the retrieval accuracy, verified using 5 baselines across 3 public datasets with 4 types of query variations. We also provide intensive ablation studies to investigate the hyperparameter sensitiveness, to breakdown the model into individual components to manifest their efficacy, and further, to evaluate the out-of-domain model generalizability.

References

[1]

Chen, X., Luo, J., He, B., Sun, L., Sun, Y.: Towards robust dense retrieval via local ranking alignment. In: Raedt, L.D. (ed.) Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence, IJCAI-22, pp. 1980–1986. International Joint Conferences on Artificial Intelligence Organization (7 2022)

Abstract

References

Cited By

Recommendations

Query Variations and their Effect on Comparing Information Retrieval Systems

From Query Variations To Learned Relevance Modeling

Improving Query Representations for Dense Retrieval with Pseudo Relevance Feedback

Comments

Information

Published In

Publisher

Publication History

Author Tags

Qualifiers

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

View options

Figures

Other

Share

Share this Publication link

Share on social media

Affiliations