Abstract
Simulating user interactions enables a more user-oriented evaluation of information retrieval (IR) systems. While user simulations are cost-efficient and reproducible, many approaches often lack fidelity regarding real user behavior. Most notably, current user models neglect the user’s context, which is the primary driver of perceived relevance and the interactions with the search results. To this end, this work introduces the simulation of context-driven query reformulations. The proposed query generation methods build upon recent Large Language Model (LLM) approaches and consider the user’s context throughout the simulation of a search session. Compared to simple context-free query generation approaches, these methods show better effectiveness and allow the simulation of more efficient IR sessions. Similarly, our evaluations consider more interaction context than current session-based measures and reveal interesting complementary insights in addition to the established evaluation protocols. We conclude with directions for future work and provide an entirely open experimental setup.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Alaofi, M., Gallagher, L., Sanderson, M., Scholer, F., Thomas, P.: Can generative LLMs create query variants for test collections? An exploratory study. In: SIGIR, pp. 1869–1873. ACM (2023)
Allan, J., Harman, D., Kanoulas, E., Li, D., Gysel, C.V., Voorhees, E.M.: TREC 2017 common core track overview. In: TREC. NIST Special Publication 500-324. National Institute of Standards and Technology (NIST) (2017)
Azzopardi, L., Järvelin, K., Kamps, J., Smucker, M.D.: Report on the SIGIR 2010 workshop on the simulation of interaction. SIGIR Forum 44(2), 35–47 (2010)
Balog, K., Maxwell, D., Thomas, P., Zhang, S.: Sim4IR: the SIGIR 2021 workshop on simulation for information retrieval evaluation. In: SIGIR, pp. 2697–2698. ACM (2021)
Balog, K., Zhai, C.: User simulation for evaluating information access systems. CoRR abs/2306.08550 (2023)
Baskaya, F., Keskustalo, H., Järvelin, K.: Modeling behavioral factors in interactive information retrieval. In: He, Q., Iyengar, A., Nejdl, W., Pei, J., Rastogi, R. (eds.) 22nd ACM International Conference on Information and Knowledge Management, CIKM 2013, San Francisco, CA, USA, 27 October–1 November 2013, pp. 2297–2302. ACM (2013). https://doi.org/10.1145/2505515.2505660
Breuer, T., Fuhr, N., Schaer, P.: Validating simulations of user query variants. In: Hagen, M., et al. (eds.) ECIR 2022. LNCS, vol. 13185, pp. 80–94. Springer, Cham (2022). https://doi.org/10.1007/978-3-030-99736-6_6
Breuer, T., Fuhr, N., Schaer, P.: Validating synthetic usage data in living lab environments. J. Data Inf. Qual. (2023, accepted). https://doi.org/10.1145/3623640
Carterette, B., Bah, A., Zengin, M.: Dynamic test collections for retrieval evaluation. In: Allan, J., Croft, W.B., de Vries, A.P., Zhai, C. (eds.) Proceedings of the 2015 International Conference on the Theory of Information Retrieval, ICTIR 2015, Northampton, Massachusetts, USA, 27–30 September 2015, pp. 91–100. ACM (2015). https://doi.org/10.1145/2808194.2809470
Engelmann, B., Breuer, T., Schaer, P.: Simulating users in interactive web table retrieval. In: Proceedings of the 32nd ACM International Conference on Information and Knowledge Management, CIKM 2023, pp. 3875–3879. Association for Computing Machinery, New York (2023). https://doi.org/10.1145/3583780.3615187
Günther, S., Hagen, M.: Assessing query suggestions for search session simulation. In: Sim4IR: The SIGIR 2021 Workshop on Simulation for Information Retrieval Evaluation (2021). https://ceur-ws.org/Vol-2911/paper6.pdf
Hagen, M., Michel, M., Stein, B.: Simulating ideal and average users. In: Ma, S., Wen, J.-R., Liu, Y., Dou, Z., Zhang, M., Chang, Y., Zhao, X. (eds.) AIRS 2016. LNCS, vol. 9994, pp. 138–154. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-48051-0_11
Hersh, W.R., et al.: Do batch and user evaluation give the same results? In: Yannakoudakis, E.J., Belkin, N.J., Ingwersen, P., Leong, M.K. (eds.) Proceedings of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2000, 24–28 July 2000, Athens, Greece, pp. 17–24. ACM (2000). https://doi.org/10.1145/345508.345539
Hofmann, K., Schuth, A., Whiteson, S., de Rijke, M.: Reusing historical interaction data for faster online learning to rank for IR. In: Leonardi, S., Panconesi, A., Ferragina, P., Gionis, A. (eds.) Sixth ACM International Conference on Web Search and Data Mining, WSDM 2013, Rome, Italy, 4–8 February 2013, pp. 183–192. ACM (2013). https://doi.org/10.1145/2433396.2433419
Järvelin, K., Kekäläinen, J.: IR evaluation methods for retrieving highly relevant documents. In: SIGIR, pp. 41–48. ACM (2000)
Järvelin, K., Price, S.L., Delcambre, L.M.L., Nielsen, M.L.: Discounted cumulated gain based evaluation of multiple-query IR sessions. In: Macdonald, C., Ounis, I., Plachouras, V., Ruthven, I., White, R.W. (eds.) ECIR 2008. LNCS, vol. 4956, pp. 4–15. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-78646-7_4
Krebs, J.R., Ryan, J.C., Charnov, E.L.: Hunting by expectation or optimal foraging? A study of patch use by chickadees. Anim. Behav. 22, 953–964 (1974). https://doi.org/10.1016/0003-3472(74)90018-9
Lipani, A., Carterette, B., Yilmaz, E.: From a user model for query sessions to session rank biased precision (sRBP). In: Fang, Y., Zhang, Y., Allan, J., Balog, K., Carterette, B., Guo, J. (eds.) Proceedings of the 2019 ACM SIGIR International Conference on Theory of Information Retrieval, ICTIR 2019, Santa Clara, CA, USA, 2–5 October 2019, pp. 109–116. ACM (2019). https://doi.org/10.1145/3341981.3344216
MacAvaney, S., Yates, A., Feldman, S., Downey, D., Cohan, A., Goharian, N.: Simplified data wrangling with ir_datasets. In: Diaz, F., Shah, C., Suel, T., Castells, P., Jones, R., Sakai, T. (eds.) The 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2021, Virtual Event, Canada, 11–15 July 2021, pp. 2429–2436. ACM (2021). https://doi.org/10.1145/3404835.3463254
Macdonald, C., Tonellotto, N., MacAvaney, S., Ounis, I.: PyTerrier: declarative experimentation in python from BM25 to dense retrieval. In: CIKM, pp. 4526–4533. ACM (2021)
Mackie, I., Chatterjee, S., Dalton, J.: Generative relevance feedback with large language models. In: Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2023, July 2023, pp. 2026–2031. ACM (2023). https://doi.org/10.1145/3539618.3591992
Maxwell, D.: Modelling search and stopping in interactive information retrieval. Ph.D. thesis, University of Glasgow, UK (2019)
Maxwell, D., Azzopardi, L.: Simulating interactive information retrieval: SimIIR: a framework for the simulation of interaction. In: SIGIR, pp. 1141–1144. ACM (2016)
Maxwell, D., Azzopardi, L., Järvelin, K., Keskustalo, H.: Searching and stopping: an analysis of stopping rules and strategies. In: CIKM, pp. 313–322. ACM (2015)
Moffat, A., Zobel, J.: Rank-biased precision for measurement of retrieval effectiveness. ACM Trans. Inf. Syst. 27(1), 2:1–2:27 (2008)
Nogueira, R.F., Jiang, Z., Pradeep, R., Lin, J.: Document ranking with a pretrained sequence-to-sequence model. In: EMNLP (Findings). Findings of ACL, EMNLP 2020, pp. 708–718. Association for Computational Linguistics (2020)
Nogueira, R.F., Yang, W., Lin, J., Cho, K.: Document expansion by query prediction. CoRR abs/1904.08375 (2019)
Pääkkönen, T., Kekäläinen, J., Keskustalo, H., Azzopardi, L., Maxwell, D., Järvelin, K.: Validating simulated interaction for retrieval evaluation. Inf. Retr. J. 20(4), 338–362 (2017)
Robertson, S.E., Zaragoza, H.: The probabilistic relevance framework: BM25 and beyond. Found. Trends Inf. Retr. 3(4), 333–389 (2009)
Scells, H., Zhuang, S., Zuccon, G.: Reduce, reuse, recycle: green information retrieval research. In: SIGIR, pp. 2825–2837. ACM (2022)
Tague, J., Nelson, M.J.: Simulation of user judgments in bibliographic retrieval systems. In: Crouch, C.J. (ed.) Theoretical Issues in Information Retrieval, Proceedings of the Fourth International Conference on Information Storage and Retrieval, Oakland, California, USA, 31 May–2 June 1981, pp. 66–71. ACM (1981). https://doi.org/10.1145/511754.511764
Tague, J., Nelson, M.J., Wu, H.: Problems in the simulation of bibliographic retrieval systems. In: Oddy, R.N., Robertson, S.E., van Rijsbergen, C.J., Williams, P.W. (eds.) Information Retrieval Research, Proceedings of the Joint ACM/BCS Symposium in Information Storage and Retrieval, Cambridge, UK, June 1980, pp. 236–255. Butterworths (1980). https://dl.acm.org/citation.cfm?id=636684
Turpin, A., Hersh, W.R.: Why batch and user evaluations do not give the same results. In: Croft, W.B., Harper, D.J., Kraft, D.H., Zobel, J. (eds.) Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2001, 9–13 September 2001, New Orleans, Louisiana, USA, pp. 225–231. ACM (2001). https://doi.org/10.1145/383952.383992
Voorhees, E.M., Ellis, A. (eds.): Proceedings of the Twenty-Seventh Text REtrieval Conference, TREC 2018, Gaithersburg, Maryland, USA, 14–16 November 2018, NIST Special Publication, 500-331. National Institute of Standards and Technology (NIST) (2018). https://trec.nist.gov/pubs/trec27/trec2018.html
Wang, L., Yang, N., Wei, F.: Query2doc: query expansion with large language models. In: Conference on Empirical Methods in Natural Language Processing, pp. 9414–9423. Association for Computational Linguistics (2023). https://api.semanticscholar.org/CorpusID:257505063
Wang, X., MacAvaney, S., Macdonald, C., Ounis, I.: Generative query reformulation for effective adhoc search (2023)
Wolf, T., et al.: Transformers: state-of-the-art natural language processing. In: Liu, Q., Schlangen, D. (eds.) Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, EMNLP 2020, Demos, Online, 16–20 November 2020, pp. 38–45. Association for Computational Linguistics (2020). https://doi.org/10.18653/V1/2020.EMNLP-DEMOS.6
Zerhoudi, S., et al.: The SimIIR 2.0 framework: user types, Markov model-based interaction simulation, and advanced query generation. In: CIKM, pp. 4661–4666. ACM (2022)
Zhang, Y., Liu, X., Zhai, C.: Information retrieval evaluation as search simulation: a general formal framework for IR evaluation. In: ICTIR, pp. 193–200. ACM (2017)
Acknowledgements
This work was supported by Klaus Tschira Stiftung (JoIE - 00.003.2020) and Deutsche Forschungsgemeinschaft (RESIRE - 509543643).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Engelmann, B., Breuer, T., Friese, J.I., Schaer, P., Fuhr, N. (2024). Context-Driven Interactive Query Simulations Based on Generative Large Language Models. In: Goharian, N., et al. Advances in Information Retrieval. ECIR 2024. Lecture Notes in Computer Science, vol 14609. Springer, Cham. https://doi.org/10.1007/978-3-031-56060-6_12
Download citation
DOI: https://doi.org/10.1007/978-3-031-56060-6_12
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-56059-0
Online ISBN: 978-3-031-56060-6
eBook Packages: Computer ScienceComputer Science (R0)