Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

A contrastive neural disentanglement approach for query performance prediction

  • Published:
Machine Learning Aims and scope Submit manuscript

Abstract

We propose a novel approach, referred to as contrastive disentangled representation for query performance prediction (CoDiR-QPP), to estimate search query performance by disentangling query content semantics from query difficulty. Our proposed approach leverages neural disentanglement to isolate the information need expressed in search queries from the complexities that affect retrieval performance. Motivated by empirical observations that varying query formulations for the same information need can significantly impact retrieval outcomes, we hypothesize that separating content semantics from query difficulty can enhance query performance prediction. Utilizing contrastive learning, CoDiR-QPP distinguishes between well-performing and poorly performing query variants, facilitating the estimation of a given query’s performance. Our extensive experiments on four standard benchmark datasets demonstrate that CoDiR-QPP outperforms state-of-the-art baselines in predicting query performance, offering improved semantic similarity computation and higher correlation metrics such as Kendall \(\tau\), Spearman \(\rho\), and scaled Mean Absolute Ranking Error (sMARE).

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

Data availability

All the data and code to reproduce the results are available on https://github.com/sara-salamat/DisentangledQPP

Notes

  1. https://github.com/sara-salamat/DisentangledQPP

  2. https://huggingface.co/datasets/PhilipMay/stsb_multi_mt

References

  • Arabzadeh, N., Khodabakhsh, M. & Bagheri, E. (2021) Bert-qpp: Contextualized pre-trained transformers for query performance prediction. In: CIKM

  • Arabzadeh, N. & Bigdeli, A.e.a. (2021) Matches made in heaven: Toolkit and large-scale datasets for supervised query reformulation. In: CIKM, 4417–4425

  • Arabzadeh, N., Zarrinkalam, F., Jovanovic, J. & Bagheri, E. (2020) Neural embedding-based metrics for pre-retrieval query performance prediction. In: ECIR

  • Arabzadeh, N. (2020). Neural embedding-based specificity metrics for pre-retrieval query performance prediction. IP &M Journal, 57(4), 102248.

    MATH  Google Scholar 

  • Arabzadeh, N., Zarrinkalam, F., Jovanovic, J. & Bagheri, E. (2019) Geometric estimation of specificity within embedding spaces. In: CIKM, 2109–2112

  • Arabzadeh, N., Seifikar, M. & Clarke, C.L. (2022) Unsupervised question clarity prediction through retrieved item coherency. In: CIKM, 3811–3816

  • Arabzadeh, N., Mitra, B. & Bagheri, E. (2021) Ms marco chameleons: Challenging the ms marco leaderboard with extremely obstinate queries. In: Proceedings of the 30th ACM International Conference on Information & Knowledge Management, 4426–4435

  • Benham, R., Culpepper, J.S., Gallagher, L., Lu, X. & Mackenzie, J.M. (2018) Towards efficient and effective query variant generation. In: DESIRES

  • Bui, N.D., Yu, Y. & Jiang, L. (2021) Self-supervised contrastive learning for code retrieval and summarization via semantic-preserving transformations. In: SIGIR

  • Carbonneau, M.-A., Zaïdi, J., Boilard, J., & Gagnon, G. (2024). Measuring disentanglement: A review of metrics. IEEE Transactions on Neural Networks and Learning Systems, 35(7), 8747–8761. https://doi.org/10.1109/TNNLS.2022.3218982

    Article  MATH  Google Scholar 

  • Carmel, D. & Yom-Tov, E. (2010) Estimating the query difficulty for information retrieval. In: SIGIR

  • Chan, C.-M., Xu, C., Yuan, R., Luo, H., Xue, W., Guo, Y. & Fu, J. (2024) Rq-rag: Learning to refine queries for retrieval augmented generation. arXiv preprint arXiv:2404.00610

  • Colombo, P., Staerman, G., Noiry, N. & Piantanida, P. (2022) Learning disentangled textual representations via statistical measures of similarity. In: Muresan, S., Nakov, P., Villavicencio, A. (eds.) Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2614–2630. Association for Computational Linguistics, Dublin, Ireland. https://doi.org/10.18653/v1/2022.acl-long.187 . https://aclanthology.org/2022.acl-long.187

  • Cronen-Townsend, S., Zhou, Y. & Croft, W.B. (2002) Predicting query performance. In: SIGIR. SIGIR ’02. ACM,

  • Datta, S., MacAvaney, S., Ganguly, D. & Greene, D. (2022) A ‘pointwise-query, listwise-document’ based query performance prediction approach. In: SIGIR

  • Datta, S. & Ganguly, D.e.a. (2022) A relative information gain-based query performance prediction framework with generated query variants. ACM TOIS 41(2)

  • Deveaud, R., Mothe, J., Ullah, M. Z., & Nie, J.-Y. (2018). Learning to adaptively rank document retrieval system configurations. ACM TOIS, 37(1), 1–41.

    Article  Google Scholar 

  • Devlin, J., Chang, M.-W., Lee, K. & Toutanova, K. (2018) Bert: Pre-training of deep bidirectional transformers for language understanding. ArXiv

  • Deveaud, R., Mothe, J. & Nie, J.-Y. (2016) Learning to rank system configurations. In: CIKM, 2001–2004

  • Do, K. & Tran, T. (2019) Theory and evaluation metrics for learning disentangled representations. arXiv preprint arXiv:1908.09961

  • Faggioli, G., Zendel, O., Culpepper, J. S., Ferro, N., & Scholer, F. (2022). smare: A new paradigm to evaluate and understand query performance prediction methods. Information Retrieval Journal, 25(2), 94–122.

    Article  Google Scholar 

  • Fu, Z., Tan, X., Peng, N., Zhao, D. & Yan, R. (2018) Style transfer in text: Exploration and evaluation. AAAI 32(1)

  • Ganguly, D. & Yilmaz, E. (2023) Query-specific variable depth pooling via query performance prediction. In: SIGIR, 2303–2307

  • Hauff, C., Azzopardi, L., Hiemstra, D. & Jong, F. (2010) Query performance prediction: Evaluation contrasted with effectiveness. In: ECIR, 204–216 . Springer

  • Hauff, C. (2010) Predicting the effectiveness of queries and retrieval systems. In: SIGIR Forum, 44, 88

  • Hauff, C., Hiemstra, D. & Jong, F. (2008) A survey of pre-retrieval query performance predictors. In: CIKM

  • Hambarde, K. A., & Proença, H. (2023). Information retrieval: Recent advances and beyond. IEEE Access, 11, 76581–76604.

    Article  Google Scholar 

  • Hashemi, H., Zamani, H. & Croft, W.B. (2019) Performance prediction for non-factoid question answering. In: ICTIR, 55–58

  • He, J., Larson, M. & Rijke, M. (2008) Using coherence-based measures to predict query difficulty. In: ECIR, 689–694. Springer,

  • He, B., & Ounis, I. (2006). Query performance prediction. Information Systems, 31(7), 585–594.

    Article  MATH  Google Scholar 

  • He, B. & Ounis, I. (2004) Inferring query performance using pre-retrieval predictors. In: SPIRE. Springer

  • Hofstätter, S., Althammer, S. & al., M.S. (2020) Improving efficient neural ranking models with cross-architecture knowledge distillation. arXiv:2010.02666

  • Hu, Z., Yang, Z., Liang, X., Salakhutdinov, R. & Xing, E.P. (2017) Toward controlled generation of text. In: Precup, D., Teh, Y.W. (eds.) ICML. PMLR, 70

  • Izacard, G. & Caron, M.e.a. (2021) Unsupervised dense information retrieval with contrastive learning. arXiv:2112.09118

  • John, V., Mou, L., Bahuleyan, H., Vechtomova, O.: Disentangled representation learning for non-parallel text style transfer. In: ACL (2018)

  • Khramtsova, E., Zhuang, S., Baktashmotlagh, M. & Zuccon, G. (2024) Leveraging llms for unsupervised dense retriever ranking. ArXiv

  • Kingma, D. & Welling, M. (2013) Auto-encoding variational bayes. arXiv:1312.6114

  • Kwok, K.L. (1996) A new method of weighting query terms for ad-hoc retrieval. In: SIGIR, 187–195

  • Li, Y., Liu, Z., Xiong, C. & Liu, Z. (2021) More robust dense retrieval with contrastive dual learning. In: ICTIR, 287–296

  • Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis, M., Zettlemoyer, L. & Stoyanov, V. (2019) Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692

  • Mahapatra, D., Jimeno Yepes, A.J., Kuanar, S., Roy, S., Bozorgtabar, B., Reyes, M. & Ge, Z. (2023) Class Specific Feature Disentanglement and Text Embeddings for Multi-label Generalized Zero Shot CXR Classification, 276–286. Springer, ???. https://doi.org/10.1007/978-3-031-43895-0_26

  • Mackie, I., Dalton, J. & Yates, A. (2021) How deep is your learning: the dl-hard annotated deep learning dataset. In: Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval

  • Meng, C., Arabzadeh, N., Askari, A., Aliannejadi, M. & Rijke, M. (2024) Ranked list truncation for large language model-based re-ranking. ArXiv

  • Ngweta, L., Maity, S., Gittens, A., Sun, Y. & Yurochkin, M. (2023) Simple disentanglement of style and content in visual representations. In: ICML

  • Pal, D. & Ganguly, D. (2021) Effective query formulation in conversation contextualization: A query specificity-based approach. In: ICTIR, 177–183

  • Pérez-Iglesias, J. & Araujo, L. (2010) Standard deviation as a query hardness estimator. In: SPIRE, 207–212. Springer

  • Raiber, F. & Kurland, O. (2014) Query-performance prediction: setting the expectations straight. In: SIGIR

  • Roitman, H., Erera, S. & Feigenblat, G. (2019) A study of query performance prediction for answer quality determination. In: ICTIR

  • Roy, D., Ganguly, D., Mitra, M., & Jones, G. J. (2019). Estimating gaussian mixture models in the local neighbourhood of embedded word vectors for query performance prediction. IPM, 56(3), 1026–1045.

    MATH  Google Scholar 

  • Sarnikar, S., Zhang, Z., & Zhao, J. L. (2014). Query-performance prediction for effective query routing in domain-specific repositories. JASIST, 65(8), 1597–1614.

    MATH  Google Scholar 

  • Sha, L., & Lukasiewicz, T. (2024). Text attribute control via closed-loop disentanglement. Transactions of the Association for Computational Linguistics, 12, 190–209. https://doi.org/10.1162/tacl_a_00640

    Article  Google Scholar 

  • Shen, T., Lei, T., Barzilay, R. & Jaakkola, T. (2017) Style transfer from non-parallel text by cross-alignment. In: NeurIPS. NIPS’17

  • Shtok, A., Kurland, O. & Carmel, D. (2010) Using statistical decision theory and relevance models for query-performance prediction. In: SIGIR, 259–266

  • Shtok, A., Kurland, O., Carmel, D., Raiber, F., & Markovits, G. (2012). Predicting query performance by query-drift estimation. TOIS, 30(2), 1–35.

    Article  Google Scholar 

  • Tonellotto, N., Macdonald, C. & Ounis, I. (2013) Efficient and effective retrieval using selective pruning. In: WSDM, 63–72

  • Tao, Y. & Wu, S. (2014) Query performance prediction by considering score magnitude and variance together. In: CIKM, 1891–1894

  • Xie, J., Girshick, R. & Farhadi, A. (2016) Unsupervised deep embedding for clustering analysis. In: ICML. PMLR

  • Yang, N., Wei, F., Jiao, B., Jiang, D. & Yang, L. (2021) xmoco: Cross momentum contrastive learning for open-domain question answering. In: ACL | IJCNLP, 6120–6129

  • Yang, Z., Hu, Z., Dyer, C., Xing, E.P. & Berg-Kirkpatrick, T. (2018) Unsupervised text style transfer using language models as discriminators. In: NeurIPS

  • Zamani, H., Croft, W.B. & Culpepper, J.S. (2018) Neural query performance prediction using weak supervision from multiple signals. In: SIGIR, 105–114

  • Nogueira, R., Lin, J. & Epistemic, A. (2019) From doc2query to doctttttquery. Online preprint 6(2)

  • Zamani, H., Dumais, S., Craswell, N., Bennett, P. & Lueck, G. (2020) Generating clarifying questions for information retrieval. In: WWW

  • Zhao, Y., Scholer, F. & Tsegay, Y. (2008) Effective pre-retrieval query performance prediction using similarity and variability evidence. In: ECIR

  • Zhou, Y. & Croft, W.B. (2007) Query performance prediction in web search environments. In: SIGIR, 543–550

  • Zhou, J., & Troyanskaya, O. G. (2015). Predicting effects of noncoding variants with deep learning-based sequence model. Nature Methods, 12(10), 931–934.

    Article  MATH  Google Scholar 

  • Zuccon, G., Palotti, J. & Hanbury, A. (2016) Query variations and their effect on comparing information retrieval systems. In: CIKM

Download references

Author information

Authors and Affiliations

Authors

Contributions

Sa.S. ran experiments and wrote the methodology and experiment sections. N.A. ran baselines and helped in writing other parts. E.B. and M.Z. supervised the research and writing of this project. Sh.S. helped with ideation and writing. A.B. helped with ideation. All authors reviewed the manuscript.

Corresponding author

Correspondence to Sara Salamat.

Ethics declarations

Conflict of interest

The authors declare no Conflict of interest.

Additional information

Editors: Longbing Cao, David Anastasiu, Qi Zhang, Xiaolin Huang.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Salamat, S., Arabzadeh, N., Seyedsalehi, S. et al. A contrastive neural disentanglement approach for query performance prediction. Mach Learn 114, 109 (2025). https://doi.org/10.1007/s10994-025-06752-x

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s10994-025-06752-x

Keywords