Abstract
Protecting the privacy of a user querying an Information Retrieval (IR) system is of utmost importance. The problem is exacerbated when the IR system is not cooperative in satisfying the user’s privacy requirements. To address this, obfuscation techniques split the user’s sensitive query into multiple non-sensitive ones that can be safely transmitted to the IR system. To generate such queries, current approaches rely on lexical databases, such as WordNet, or heuristics of word co-occurrences. At the same time, advances in Natural Language Processing (NLP) have shown the power of Differential Privacy (DP) in releasing privacy-preserving text for completely different purposes, such as spam detection and sentiment analysis. We investigate for the first time whether DP mechanisms, originally designed for specific NLP tasks, can effectively be used in IR to obfuscate queries. We also assess their performance compared to state-of-the-art techniques in IR. Our empirical evaluation shows that the Vickrey DP mechanism based on the Mahalanobis norm with privacy budget \(\epsilon \in [10, 12.5]\) achieves state-of-the-art privacy protection and improved effectiveness. Furthermore, differently from previous approaches that are substantially on/off, by changing the privacy budget \(\epsilon \), DP allows users to adjust their desired level of privacy protection, offering a trade-off between effectiveness and privacy.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Andrés, M.E., Bordenabe, N., Chatzikokolakis, K., Palamidessi, C.: Geo-indistinguishability: differential privacy for location-based systems. In: Sadeghi, A., Gligor, V.D., Yung, M. (eds.) 2013 ACM SIGSAC Conference on Computer and Communications Security, CCS 2013, Berlin, Germany, 4–8 November 2013, pp. 901–914. ACM (2013). https://doi.org/10.1145/2508859.2516735
Arampatzis, A., Efraimidis, P.S., Drosatos, G.: Enhancing deniability against query-logs. In: Clough, P.D., et al. (eds.) Advances in Information Retrieval - 33rd European Conference on IR Research, ECIR 2011, Dublin, Ireland, 18–21 April 2011. Proceedings. LNCS, vol. 6611, pp. 117–128. Springer, Cham (2011). https://doi.org/10.1007/978-3-642-20161-5_13
Arampatzis, A., Drosatos, G., Efraimidis, P.: A versatile tool for privacy-enhanced web search. In: Serdyukov, P., et al. (eds.) Advances in Information Retrieval - 35th European Conference on IR Research, ECIR 2013, Moscow, Russia, 24–27 March 2013. Proceedings. LNCS, vol. 7814, pp. 368–379. Springer, Cham (2013). https://doi.org/10.1007/978-3-642-36973-5_31
Arampatzis, A., Efraimidis, P.S., Drosatos, G.: A query scrambler for search privacy on the internet. Inf. Retr. 16(6), 657–679 (2013). https://doi.org/10.1007/s10791-012-9212-1
Arampatzis, A., Drosatos, G., Efraimidis, P.S.: Versatile query scrambling for private web search. Inf. Retr. J. 18(4), 331–358 (2015). https://doi.org/10.1007/s10791-015-9256-0
Barbaro, M., Zeller, T.: A Face is Exposed for AoL Searcher No. 4417749. New York Times (2006)
Bavadekar, S., et al.: Google COVID-19 search trends symptoms dataset: anonymization process description (version 1.0). CoRR, abs/2009.01265 (2020). https://arxiv.org/abs/2009.01265
Castellà-Roca, J., Viejo, A., Herrera-Joancomartí, J.: Preserving user’s privacy in web search engines. Comput. Commun. 32(13–14), 1541–1551 (2009). https://doi.org/10.1016/j.comcom.2009.05.009
Chatzikokolakis, K., Andrés, M., Bordenabe, N., Palamidessi, C.: Broadening the scope of differential privacy using metrics. In: Cristofaro, E.D., Wright, M.K. (eds.) Privacy Enhancing Technologies - 13th International Symposium, PETS 2013, Bloomington, IN, USA, 10–12 July 2013. Proceedings. LNCS, vol. 7981, pp. 82–102. Springer, Cham (2013). https://doi.org/10.1007/978-3-642-39077-7_5
Craswell, N., Mitra, B., Yilmaz, E., Campos, D., Voorhees, E.M.: Overview of the TREC 2019 deep learning track. CoRR, abs/2003.07820 (2020). https://arxiv.org/abs/2003.07820
Culpepper, J.S., Faggioli, G., Ferro, N., Kurland, O.: Topic difficulty: collection and query formulation effects. ACM Trans. Inf. Syst. 40(1), 19:1–19:36 (2022). https://doi.org/10.1145/3470563
Domingo-Ferrer, J., González-Nicolás, Ú.: Rational behavior in peer-to-peer profile obfuscation for anonymous keyword search. Inf. Sci. 185(1), 191–204 (2012). https://doi.org/10.1016/j.ins.2011.09.010
Domingo-Ferrer, J., Bras-Amorós, M., Wu, Q., Manjón, J.A.: User-private information retrieval based on a peer-to-peer community. Data Knowl. Eng. 68(11), 1237–1252 (2009). https://doi.org/10.1016/j.datak.2009.06.004
Domingo-Ferrer, J., Solanas, A., Castellà-Roca, J.: H(k)-private information retrieval from privacy-uncooperative queryable databases. Online Inf. Rev. 33(4), 720–744 (2009). https://doi.org/10.1108/14684520910985693
Dwork, C., Roth, A.: The algorithmic foundations of differential privacy. Found. Trends Theor. Comput. Sci. 9(3–4), 211–407 (2014). https://doi.org/10.1561/0400000042
Elovici, Y., Shapira, B., Maschiach, A.: A new privacy model for web surfing. In: Halevy, A.Y., Gal, A. (eds.) Next Generation Information Technologies and Systems, 5th International Workshop, NGITS 2002, Caesarea, Israel, 24–25 June 2002, Proceedings. LNCS, vol. 2382, pp. 45–57. Springer, Cham (2002). https://doi.org/10.1007/3-540-45431-4_5
Fernandes, N., Dras, M., McIver, A.: Generalised differential privacy for text document processing. In: Nielson, F., Sands, D. (eds.) Principles of Security and Trust - 8th International Conference, POST 2019, Held as Part of the European Joint Conferences on Theory and Practice of Software, ETAPS 2019, Prague, Czech Republic, 6–11 April 2019, Proceedings. LNCS, vol. 11426, pp. 123–148. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-17138-4_6
Feyisetan, O., Balle, B., Drake, T., Diethe, T.: Privacy- and utility-preserving textual analysis via calibrated multivariate perturbations. In: Caverlee, J., Hu, X.B., Lalmas, M., Wang, W. (eds.) Proceedings of the 13th International Conference on Web Search and Data Mining, pp. 178–186. ACM, January 2020. https://doi.org/10.1145/3336191.3371856
Fröbe, M., Schmidt, E.O., Hagen, M.: Efficient query obfuscation with keyqueries. In: He, J., et al. (eds.) WI-IAT 2021: IEEE/WIC/ACM International Conference on Web Intelligence, Melbourne VIC Australia, 14–17 December 2021, pp. 154–161. ACM (2021). https://doi.org/10.1145/3486622.3493950
Götz, M., Machanavajjhala, A., Wang, G., Xiao, X., Gehrke, J.: Publishing search logs - a comparative study of privacy guarantees. IEEE Trans. Knowl. Data Eng. 24(3), 520–532 (2012). https://doi.org/10.1109/TKDE.2011.26
Hofstätter, S., Lin, S., Yang, J., Lin, J., Hanbury, A.: Efficiently teaching an effective dense retriever with balanced topic aware sampling. In: Diaz, F., Shah, C., Suel, T., Castells, P., Jones, R., Sakai, T. (eds.) SIGIR 2021: The 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, Virtual Event, Canada, 11–15 July 2021, pp. 113–122. ACM (2021). https://doi.org/10.1145/3404835.3462891
Howe, D., Nissenbaum, H.: TrackMeNot: resisting surveillance in web search. Technical report queries (2009)
Izacard, G., et al.: Unsupervised dense information retrieval with contrastive learning. Trans. Mach. Learn. Res. 2022 (2022). https://openreview.net/forum?id=jKN1pXi7b0
Khan, R., Ullah, M., Khan, A., Uddin, M.I., Al-Yahya, M.: NN-QuPiD attack: neural network-based privacy quantification model for private information retrieval protocols. Complexity 2021, 6651662:1–6651662:8 (2021). https://doi.org/10.1155/2021/6651662
Kharitonov, E.: Federated online learning to rank with evolution strategies. In: Culpepper, J.S., Moffat, A., Bennett, P.N., Lerman, K. (eds.) Proceedings of the Twelfth ACM International Conference on Web Search and Data Mining, WSDM 2019, Melbourne, VIC, Australia, 11–15 February 2019, pp. 249–257. ACM (2019). https://doi.org/10.1145/3289600.3290968
Korolova, A., Kenthapadi, K., Mishra, N., Ntoulas, A.: Releasing search queries and clicks privately. In: Quemada, J., León, G., Maarek, Y.S., Nejdl, W. (eds.) Proceedings of the 18th International Conference on World Wide Web, WWW 2009, Madrid, Spain, 20–24 April 2009, pp. 171–180. ACM (2009). https://doi.org/10.1145/1526709.1526733
Laud, P., Pankova, A., Pettai, M.: A framework of metrics for differential privacy from local sensitivity. Proc. Priv. Enhancing Technol. 2020(2), 175–208 (2020). https://doi.org/10.2478/popets-2020-0023
Mahalanobis, P.C.: On the generalized distance in statistics. Sankhyā: Indian J. Stat. Ser. A (2008-) 80, S1–S7 (2018). ISSN 0976836X, 09768378. https://www.jstor.org/stable/48723335
Miller, G.A.: WordNet: a lexical database for English. Commun. ACM 38(11), 39–41 (1995). https://doi.org/10.1145/219717.219748
Nguyen, T., et al.: MS MARCO: a human generated machine reading comprehension dataset. In: Besold, T.R., Bordes, A., d’Avila Garcez, A.S., Wayne, G. (eds.) Proceedings of the Workshop on Cognitive Computation: Integrating Neural and Symbolic Approaches 2016 Co-located with the 30th Annual Conference on Neural Information Processing Systems (NIPS 2016), Barcelona, Spain, 9 December 2016, vol. 1773 of CEUR Workshop Proceedings. CEUR-WS.org (2016). https://ceur-ws.org/Vol-1773/CoCoNIPS_2016_paper9.pdf
Peddinti, S.T., Saxena, N.: On the effectiveness of anonymizing networks for web search privacy. In: Cheung, B.S.N., Hui, L.C.K., Sandhu, R.S., Wong, D.S. (eds.) Proceedings of the 6th ACM Symposium on Information, Computer and Communications Security, ASIACCS 2011, Hong Kong, China, 22–24 March 2011, pp. 483–489. ACM (2011). https://doi.org/10.1145/1966913.1966984
Peddinti, S.T., Saxena, N.: Web search query privacy: evaluating query obfuscation and anonymizing networks. J. Comput. Secur. 22(1), 155–199 (2014). https://doi.org/10.3233/JCS-130491
Pennington, J., Socher, R., Manning, C.D.: GloVe: global vectors for word representation. In: Moschitti, A., Pang, B., Daelemans, W. (eds.) Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, EMNLP 2014, 25–29 October 2014, Doha, Qatar, A Meeting of SIGDAT, A Special Interest Group of the ACL, pp. 1532–1543. ACL (2014). https://doi.org/10.3115/v1/d14-1162
Robertson, S.E., Walker, S., Jones, S., Hancock-Beaulieu, M., Gatford, M.: Okapi at TREC-3. In: Harman, D.K. (ed.) Proceedings of The Third Text REtrieval Conference, TREC 1994, Gaithersburg, Maryland, USA, 2–4 November 1994, vol. 500–225 of NIST Special Publication, pp. 109–126. National Institute of Standards and Technology (NIST) (1994). http://trec.nist.gov/pubs/trec3/papers/city.ps.gz
Salton, G., Wong, A., Yang, C.: A vector space model for automatic indexing. Commun. ACM 18(11), 613–620 (1975). https://doi.org/10.1145/361219.361220
Sánchez, D., Batet, M., Viejo, A., Rodriguez-Garcia, M., Castellà-Roca, J.: A semantic-preserving differentially private method for releasing query logs. Inf. Sci. 460–461, 223–237 (2018). https://doi.org/10.1016/j.ins.2018.05.046
Tang, J., Zhu, T., Xiong, P., Wang, Y., Ren, W.: Privacy and utility trade-off for textual analysis via calibrated multivariate perturbations. In: Kutylowski, M., Zhang, J., Chen, C. (eds.) Network and System Security - 14th International Conference, NSS 2020, Melbourne, VIC, Australia, 25–27 November 2020, Proceedings. LNCS, vol. 12570, pp. 342–353. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-65745-1_20
Ullah, M., Islam, M.A., Khan, R., Aleem, M., Iqbal, M.A.: ObSecure Logging (OSLo): a framework to protect and evaluate the web search privacy in health care domain. J. Med. Imaging Health Inform. 9(6), 1181–1190 (2019). https://doi.org/10.1166/jmihi.2019.2708
Vickrey, W.: Counterspeculation, auctions, and competitive sealed tenders (1961)
Voorhees, E.M.: Overview of the TREC 2004 robust track. In: Voorhees, E.M., Buckland, L.P. (eds.) Proceedings of the Thirteenth Text REtrieval Conference, TREC 2004, Gaithersburg, Maryland, USA, 16–19 November 2004, vol. 500-261 of NIST Special Publication. National Institute of Standards and Technology (NIST) (2004). http://trec.nist.gov/pubs/trec13/papers/ROBUST.OVERVIEW.pdf
Wang, S., Liu, B., Zhuang, S., Zuccon, G.: Effective and privacy-preserving federated online learning to rank. In: Hasibi, F., Fang, Y., Aizawa, A. (eds.) ICTIR 2021: The 2021 ACM SIGIR International Conference on the Theory of Information Retrieval, Virtual Event, Canada, 11 July 2021, pp. 3–12. ACM (2021). https://doi.org/10.1145/3471158.3472236
Wang, W., Wei, F., Dong, L., Bao, H., Yang, N., Zhou, M.: MiniLM: deep self-attention distillation for task-agnostic compression of pre-trained transformers. In: Larochelle, H., Ranzato, M., Hadsell, R., Balcan, M., Lin, H. (eds.) Advances in Neural Information Processing Systems: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, 6–12 December 2020, vol. 33, Virtual (2020)
Wu, Z., Palmer, M.S.: Verb semantics and lexical selection. In: Pustejovsky, J. (ed.) 32nd Annual Meeting of the Association for Computational Linguistics, 27–30 June 1994, New Mexico State University, Las Cruces, New Mexico, USA, Proceedings, pp. 133–138. Morgan Kaufmann Publishers/ACL (1994). https://doi.org/10.3115/981732.981751, https://aclanthology.org/P94-1019/
Wu, Z., Shen, S., Lian, X., Su, X., Chen, E.: A dummy-based user privacy protection approach for text information retrieval. Knowl. Based Syst. 195, 105679 (2020). https://doi.org/10.1016/j.knosys.2020.105679
Xu, Z., Aggarwal, A., Feyisetan, O., Teissier, N.: A differentially private text perturbation method using regularized Mahalanobis metric. In: Proceedings of the Second Workshop on Privacy in NLP. Association for Computational Linguistics (2020). https://doi.org/10.18653/v1/2020.privatenlp-1.2
Xu, Z., Aggarwal, A., Feyisetan, O., Teissier, N.: On a utilitarian approach to privacy preserving text generation. CoRR, abs/2104.11838, April 2021. https://doi.org/10.48550/ARXIV.2104.11838
Yu, P., Ahmad, W., Wang, H.: Hide-n-Seek: an intent-aware privacy protection plugin for personalized web search. In: Collins-Thompson, K., Mei, Q., Davison, B.D., Liu, Y., Yilmaz, E. (eds.) The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval, SIGIR 2018, Ann Arbor, MI, USA, 8–12 July 2018, pp. 1333–1336. ACM (2018). https://doi.org/10.1145/3209978.3210180
Zhang, S., Yang, G.H., Singh, L.: Anonymizing query logs by differential privacy. In: Perego, R., Sebastiani, F., Aslam, J.A., Ruthven, I., Zobel, J. (eds.) Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval, SIGIR 2016, Pisa, Italy, 17–21 July 2016, pp. 753–756. ACM (2016). https://doi.org/10.1145/2911451.2914732
Acknowledgments
This work has received support from CAMEO, PRIN 2022 n. 2022ZLL7MW.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Faggioli, G., Ferro, N. (2024). Query Obfuscation for Information Retrieval Through Differential Privacy. In: Goharian, N., et al. Advances in Information Retrieval. ECIR 2024. Lecture Notes in Computer Science, vol 14608. Springer, Cham. https://doi.org/10.1007/978-3-031-56027-9_17
Download citation
DOI: https://doi.org/10.1007/978-3-031-56027-9_17
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-56026-2
Online ISBN: 978-3-031-56027-9
eBook Packages: Computer ScienceComputer Science (R0)