Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Query Obfuscation for Information Retrieval Through Differential Privacy

  • Conference paper
  • First Online:
Advances in Information Retrieval (ECIR 2024)

Abstract

Protecting the privacy of a user querying an Information Retrieval (IR) system is of utmost importance. The problem is exacerbated when the IR system is not cooperative in satisfying the user’s privacy requirements. To address this, obfuscation techniques split the user’s sensitive query into multiple non-sensitive ones that can be safely transmitted to the IR system. To generate such queries, current approaches rely on lexical databases, such as WordNet, or heuristics of word co-occurrences. At the same time, advances in Natural Language Processing (NLP) have shown the power of Differential Privacy (DP) in releasing privacy-preserving text for completely different purposes, such as spam detection and sentiment analysis. We investigate for the first time whether DP mechanisms, originally designed for specific NLP tasks, can effectively be used in IR to obfuscate queries. We also assess their performance compared to state-of-the-art techniques in IR. Our empirical evaluation shows that the Vickrey DP mechanism based on the Mahalanobis norm with privacy budget \(\epsilon \in [10, 12.5]\) achieves state-of-the-art privacy protection and improved effectiveness. Furthermore, differently from previous approaches that are substantially on/off, by changing the privacy budget \(\epsilon \), DP allows users to adjust their desired level of privacy protection, offering a trade-off between effectiveness and privacy.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 74.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 89.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Andrés, M.E., Bordenabe, N., Chatzikokolakis, K., Palamidessi, C.: Geo-indistinguishability: differential privacy for location-based systems. In: Sadeghi, A., Gligor, V.D., Yung, M. (eds.) 2013 ACM SIGSAC Conference on Computer and Communications Security, CCS 2013, Berlin, Germany, 4–8 November 2013, pp. 901–914. ACM (2013). https://doi.org/10.1145/2508859.2516735

  2. Arampatzis, A., Efraimidis, P.S., Drosatos, G.: Enhancing deniability against query-logs. In: Clough, P.D., et al. (eds.) Advances in Information Retrieval - 33rd European Conference on IR Research, ECIR 2011, Dublin, Ireland, 18–21 April 2011. Proceedings. LNCS, vol. 6611, pp. 117–128. Springer, Cham (2011). https://doi.org/10.1007/978-3-642-20161-5_13

  3. Arampatzis, A., Drosatos, G., Efraimidis, P.: A versatile tool for privacy-enhanced web search. In: Serdyukov, P., et al. (eds.) Advances in Information Retrieval - 35th European Conference on IR Research, ECIR 2013, Moscow, Russia, 24–27 March 2013. Proceedings. LNCS, vol. 7814, pp. 368–379. Springer, Cham (2013). https://doi.org/10.1007/978-3-642-36973-5_31

  4. Arampatzis, A., Efraimidis, P.S., Drosatos, G.: A query scrambler for search privacy on the internet. Inf. Retr. 16(6), 657–679 (2013). https://doi.org/10.1007/s10791-012-9212-1

  5. Arampatzis, A., Drosatos, G., Efraimidis, P.S.: Versatile query scrambling for private web search. Inf. Retr. J. 18(4), 331–358 (2015). https://doi.org/10.1007/s10791-015-9256-0

  6. Barbaro, M., Zeller, T.: A Face is Exposed for AoL Searcher No. 4417749. New York Times (2006)

    Google Scholar 

  7. Bavadekar, S., et al.: Google COVID-19 search trends symptoms dataset: anonymization process description (version 1.0). CoRR, abs/2009.01265 (2020). https://arxiv.org/abs/2009.01265

  8. Castellà-Roca, J., Viejo, A., Herrera-Joancomartí, J.: Preserving user’s privacy in web search engines. Comput. Commun. 32(13–14), 1541–1551 (2009). https://doi.org/10.1016/j.comcom.2009.05.009

  9. Chatzikokolakis, K., Andrés, M., Bordenabe, N., Palamidessi, C.: Broadening the scope of differential privacy using metrics. In: Cristofaro, E.D., Wright, M.K. (eds.) Privacy Enhancing Technologies - 13th International Symposium, PETS 2013, Bloomington, IN, USA, 10–12 July 2013. Proceedings. LNCS, vol. 7981, pp. 82–102. Springer, Cham (2013). https://doi.org/10.1007/978-3-642-39077-7_5

  10. Craswell, N., Mitra, B., Yilmaz, E., Campos, D., Voorhees, E.M.: Overview of the TREC 2019 deep learning track. CoRR, abs/2003.07820 (2020). https://arxiv.org/abs/2003.07820

  11. Culpepper, J.S., Faggioli, G., Ferro, N., Kurland, O.: Topic difficulty: collection and query formulation effects. ACM Trans. Inf. Syst. 40(1), 19:1–19:36 (2022). https://doi.org/10.1145/3470563

  12. Domingo-Ferrer, J., González-Nicolás, Ú.: Rational behavior in peer-to-peer profile obfuscation for anonymous keyword search. Inf. Sci. 185(1), 191–204 (2012). https://doi.org/10.1016/j.ins.2011.09.010

  13. Domingo-Ferrer, J., Bras-Amorós, M., Wu, Q., Manjón, J.A.: User-private information retrieval based on a peer-to-peer community. Data Knowl. Eng. 68(11), 1237–1252 (2009). https://doi.org/10.1016/j.datak.2009.06.004

  14. Domingo-Ferrer, J., Solanas, A., Castellà-Roca, J.: H(k)-private information retrieval from privacy-uncooperative queryable databases. Online Inf. Rev. 33(4), 720–744 (2009). https://doi.org/10.1108/14684520910985693

  15. Dwork, C., Roth, A.: The algorithmic foundations of differential privacy. Found. Trends Theor. Comput. Sci. 9(3–4), 211–407 (2014). https://doi.org/10.1561/0400000042

    Article  MathSciNet  Google Scholar 

  16. Elovici, Y., Shapira, B., Maschiach, A.: A new privacy model for web surfing. In: Halevy, A.Y., Gal, A. (eds.) Next Generation Information Technologies and Systems, 5th International Workshop, NGITS 2002, Caesarea, Israel, 24–25 June 2002, Proceedings. LNCS, vol. 2382, pp. 45–57. Springer, Cham (2002). https://doi.org/10.1007/3-540-45431-4_5

  17. Fernandes, N., Dras, M., McIver, A.: Generalised differential privacy for text document processing. In: Nielson, F., Sands, D. (eds.) Principles of Security and Trust - 8th International Conference, POST 2019, Held as Part of the European Joint Conferences on Theory and Practice of Software, ETAPS 2019, Prague, Czech Republic, 6–11 April 2019, Proceedings. LNCS, vol. 11426, pp. 123–148. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-17138-4_6

  18. Feyisetan, O., Balle, B., Drake, T., Diethe, T.: Privacy- and utility-preserving textual analysis via calibrated multivariate perturbations. In: Caverlee, J., Hu, X.B., Lalmas, M., Wang, W. (eds.) Proceedings of the 13th International Conference on Web Search and Data Mining, pp. 178–186. ACM, January 2020. https://doi.org/10.1145/3336191.3371856

  19. Fröbe, M., Schmidt, E.O., Hagen, M.: Efficient query obfuscation with keyqueries. In: He, J., et al. (eds.) WI-IAT 2021: IEEE/WIC/ACM International Conference on Web Intelligence, Melbourne VIC Australia, 14–17 December 2021, pp. 154–161. ACM (2021). https://doi.org/10.1145/3486622.3493950

  20. Götz, M., Machanavajjhala, A., Wang, G., Xiao, X., Gehrke, J.: Publishing search logs - a comparative study of privacy guarantees. IEEE Trans. Knowl. Data Eng. 24(3), 520–532 (2012). https://doi.org/10.1109/TKDE.2011.26

  21. Hofstätter, S., Lin, S., Yang, J., Lin, J., Hanbury, A.: Efficiently teaching an effective dense retriever with balanced topic aware sampling. In: Diaz, F., Shah, C., Suel, T., Castells, P., Jones, R., Sakai, T. (eds.) SIGIR 2021: The 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, Virtual Event, Canada, 11–15 July 2021, pp. 113–122. ACM (2021). https://doi.org/10.1145/3404835.3462891

  22. Howe, D., Nissenbaum, H.: TrackMeNot: resisting surveillance in web search. Technical report queries (2009)

    Google Scholar 

  23. Izacard, G., et al.: Unsupervised dense information retrieval with contrastive learning. Trans. Mach. Learn. Res. 2022 (2022). https://openreview.net/forum?id=jKN1pXi7b0

  24. Khan, R., Ullah, M., Khan, A., Uddin, M.I., Al-Yahya, M.: NN-QuPiD attack: neural network-based privacy quantification model for private information retrieval protocols. Complexity 2021, 6651662:1–6651662:8 (2021). https://doi.org/10.1155/2021/6651662

  25. Kharitonov, E.: Federated online learning to rank with evolution strategies. In: Culpepper, J.S., Moffat, A., Bennett, P.N., Lerman, K. (eds.) Proceedings of the Twelfth ACM International Conference on Web Search and Data Mining, WSDM 2019, Melbourne, VIC, Australia, 11–15 February 2019, pp. 249–257. ACM (2019). https://doi.org/10.1145/3289600.3290968

  26. Korolova, A., Kenthapadi, K., Mishra, N., Ntoulas, A.: Releasing search queries and clicks privately. In: Quemada, J., León, G., Maarek, Y.S., Nejdl, W. (eds.) Proceedings of the 18th International Conference on World Wide Web, WWW 2009, Madrid, Spain, 20–24 April 2009, pp. 171–180. ACM (2009). https://doi.org/10.1145/1526709.1526733

  27. Laud, P., Pankova, A., Pettai, M.: A framework of metrics for differential privacy from local sensitivity. Proc. Priv. Enhancing Technol. 2020(2), 175–208 (2020). https://doi.org/10.2478/popets-2020-0023

    Article  Google Scholar 

  28. Mahalanobis, P.C.: On the generalized distance in statistics. Sankhyā: Indian J. Stat. Ser. A (2008-) 80, S1–S7 (2018). ISSN 0976836X, 09768378. https://www.jstor.org/stable/48723335

  29. Miller, G.A.: WordNet: a lexical database for English. Commun. ACM 38(11), 39–41 (1995). https://doi.org/10.1145/219717.219748

  30. Nguyen, T., et al.: MS MARCO: a human generated machine reading comprehension dataset. In: Besold, T.R., Bordes, A., d’Avila Garcez, A.S., Wayne, G. (eds.) Proceedings of the Workshop on Cognitive Computation: Integrating Neural and Symbolic Approaches 2016 Co-located with the 30th Annual Conference on Neural Information Processing Systems (NIPS 2016), Barcelona, Spain, 9 December 2016, vol. 1773 of CEUR Workshop Proceedings. CEUR-WS.org (2016). https://ceur-ws.org/Vol-1773/CoCoNIPS_2016_paper9.pdf

  31. Peddinti, S.T., Saxena, N.: On the effectiveness of anonymizing networks for web search privacy. In: Cheung, B.S.N., Hui, L.C.K., Sandhu, R.S., Wong, D.S. (eds.) Proceedings of the 6th ACM Symposium on Information, Computer and Communications Security, ASIACCS 2011, Hong Kong, China, 22–24 March 2011, pp. 483–489. ACM (2011). https://doi.org/10.1145/1966913.1966984

  32. Peddinti, S.T., Saxena, N.: Web search query privacy: evaluating query obfuscation and anonymizing networks. J. Comput. Secur. 22(1), 155–199 (2014). https://doi.org/10.3233/JCS-130491

  33. Pennington, J., Socher, R., Manning, C.D.: GloVe: global vectors for word representation. In: Moschitti, A., Pang, B., Daelemans, W. (eds.) Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, EMNLP 2014, 25–29 October 2014, Doha, Qatar, A Meeting of SIGDAT, A Special Interest Group of the ACL, pp. 1532–1543. ACL (2014). https://doi.org/10.3115/v1/d14-1162

  34. Robertson, S.E., Walker, S., Jones, S., Hancock-Beaulieu, M., Gatford, M.: Okapi at TREC-3. In: Harman, D.K. (ed.) Proceedings of The Third Text REtrieval Conference, TREC 1994, Gaithersburg, Maryland, USA, 2–4 November 1994, vol. 500–225 of NIST Special Publication, pp. 109–126. National Institute of Standards and Technology (NIST) (1994). http://trec.nist.gov/pubs/trec3/papers/city.ps.gz

  35. Salton, G., Wong, A., Yang, C.: A vector space model for automatic indexing. Commun. ACM 18(11), 613–620 (1975). https://doi.org/10.1145/361219.361220

  36. Sánchez, D., Batet, M., Viejo, A., Rodriguez-Garcia, M., Castellà-Roca, J.: A semantic-preserving differentially private method for releasing query logs. Inf. Sci. 460–461, 223–237 (2018). https://doi.org/10.1016/j.ins.2018.05.046

  37. Tang, J., Zhu, T., Xiong, P., Wang, Y., Ren, W.: Privacy and utility trade-off for textual analysis via calibrated multivariate perturbations. In: Kutylowski, M., Zhang, J., Chen, C. (eds.) Network and System Security - 14th International Conference, NSS 2020, Melbourne, VIC, Australia, 25–27 November 2020, Proceedings. LNCS, vol. 12570, pp. 342–353. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-65745-1_20

  38. Ullah, M., Islam, M.A., Khan, R., Aleem, M., Iqbal, M.A.: ObSecure Logging (OSLo): a framework to protect and evaluate the web search privacy in health care domain. J. Med. Imaging Health Inform. 9(6), 1181–1190 (2019). https://doi.org/10.1166/jmihi.2019.2708

  39. Vickrey, W.: Counterspeculation, auctions, and competitive sealed tenders (1961)

    Google Scholar 

  40. Voorhees, E.M.: Overview of the TREC 2004 robust track. In: Voorhees, E.M., Buckland, L.P. (eds.) Proceedings of the Thirteenth Text REtrieval Conference, TREC 2004, Gaithersburg, Maryland, USA, 16–19 November 2004, vol. 500-261 of NIST Special Publication. National Institute of Standards and Technology (NIST) (2004). http://trec.nist.gov/pubs/trec13/papers/ROBUST.OVERVIEW.pdf

  41. Wang, S., Liu, B., Zhuang, S., Zuccon, G.: Effective and privacy-preserving federated online learning to rank. In: Hasibi, F., Fang, Y., Aizawa, A. (eds.) ICTIR 2021: The 2021 ACM SIGIR International Conference on the Theory of Information Retrieval, Virtual Event, Canada, 11 July 2021, pp. 3–12. ACM (2021). https://doi.org/10.1145/3471158.3472236

  42. Wang, W., Wei, F., Dong, L., Bao, H., Yang, N., Zhou, M.: MiniLM: deep self-attention distillation for task-agnostic compression of pre-trained transformers. In: Larochelle, H., Ranzato, M., Hadsell, R., Balcan, M., Lin, H. (eds.) Advances in Neural Information Processing Systems: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, 6–12 December 2020, vol. 33, Virtual (2020)

    Google Scholar 

  43. Wu, Z., Palmer, M.S.: Verb semantics and lexical selection. In: Pustejovsky, J. (ed.) 32nd Annual Meeting of the Association for Computational Linguistics, 27–30 June 1994, New Mexico State University, Las Cruces, New Mexico, USA, Proceedings, pp. 133–138. Morgan Kaufmann Publishers/ACL (1994). https://doi.org/10.3115/981732.981751, https://aclanthology.org/P94-1019/

  44. Wu, Z., Shen, S., Lian, X., Su, X., Chen, E.: A dummy-based user privacy protection approach for text information retrieval. Knowl. Based Syst. 195, 105679 (2020). https://doi.org/10.1016/j.knosys.2020.105679

  45. Xu, Z., Aggarwal, A., Feyisetan, O., Teissier, N.: A differentially private text perturbation method using regularized Mahalanobis metric. In: Proceedings of the Second Workshop on Privacy in NLP. Association for Computational Linguistics (2020). https://doi.org/10.18653/v1/2020.privatenlp-1.2

  46. Xu, Z., Aggarwal, A., Feyisetan, O., Teissier, N.: On a utilitarian approach to privacy preserving text generation. CoRR, abs/2104.11838, April 2021. https://doi.org/10.48550/ARXIV.2104.11838

  47. Yu, P., Ahmad, W., Wang, H.: Hide-n-Seek: an intent-aware privacy protection plugin for personalized web search. In: Collins-Thompson, K., Mei, Q., Davison, B.D., Liu, Y., Yilmaz, E. (eds.) The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval, SIGIR 2018, Ann Arbor, MI, USA, 8–12 July 2018, pp. 1333–1336. ACM (2018). https://doi.org/10.1145/3209978.3210180

  48. Zhang, S., Yang, G.H., Singh, L.: Anonymizing query logs by differential privacy. In: Perego, R., Sebastiani, F., Aslam, J.A., Ruthven, I., Zobel, J. (eds.) Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval, SIGIR 2016, Pisa, Italy, 17–21 July 2016, pp. 753–756. ACM (2016). https://doi.org/10.1145/2911451.2914732

Download references

Acknowledgments

This work has received support from CAMEO, PRIN 2022 n. 2022ZLL7MW.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Guglielmo Faggioli .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Faggioli, G., Ferro, N. (2024). Query Obfuscation for Information Retrieval Through Differential Privacy. In: Goharian, N., et al. Advances in Information Retrieval. ECIR 2024. Lecture Notes in Computer Science, vol 14608. Springer, Cham. https://doi.org/10.1007/978-3-031-56027-9_17

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-56027-9_17

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-56026-2

  • Online ISBN: 978-3-031-56027-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics