Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3366030.3366081acmotherconferencesArticle/Chapter ViewAbstractPublication PagesiiwasConference Proceedingsconference-collections
research-article
Public Access

A Privacy-Preserving Similarity Search Scheme over Encrypted Word Embeddings

Published: 22 February 2020 Publication History

Abstract

Recent evolution in cloud computing platforms have attracted the largest amount of data than ever before. Today, even the most sensitive data are being outsourced, thus, protection is essential to ensure that privacy is not traded for the convenience provided by cloud platforms. Traditional symmetric encryption schemes provide good protection; however, they ruin the merits of cloud computing. Attempts have been made to obtain a scheme where both functionality and protection can be achieved. However, features provided in existing searchable encryption schemes tend to be left behind the latest findings in the information retrieval (IR) area.
In this study, we propose a privacy-preserving similar document search system based on Simhash. Our scheme is open to the latest machine-learning based IR schemes, and performance has been tuned utilizing a VP-tree based index, which is optimized for security. Analysis and various tests on real-world datasets demonstrate the scheme's security and efficiency on real-world datasets.

References

[1]
Dawn Xiaodong Song, David Wagner, and Adrian Perrig. 2000. Practical techniques for searches on encrypted data. In Proceedings of the 2000 IEEE Symposium on Security and Privacy (SP '00). IEEE Computer Society, Washington, DC, USA, 44-. isbn: 0-7695-0665-8. http://dl.acm.org/citation.cfm?id=882494.884426.
[2]
Tomas Mikolov, Kai Chen, Greg S. Corrado, and Jeffrey Dean. 2013. Efficient estimation of word representations in vector space. (2013). http://arxiv.org/abs/1301.3781.
[3]
Dwaipayan Roy, Debasis Ganguly, Sumit Bhatia, Srikanta Bedathur, and Mandar Mitra. 2018. Using word embeddings for information retrieval: how collection and term normalization choices affect performance. In Proceedings of the 27th ACM International Conference on Information and Knowledge Management (CIKM '18). ACM, Torino, Italy, 1835--1838. isbn: 978-1-4503-6014-2. http://doi.acm.org/10.1145/3269206.3269277.
[4]
Reza Curtmola, Juan Garay, Seny Kamara, and Rafail Ostrovsky. 2006. Searchable symmetric encryption: improved definitions and efficient constructions. In Proceedings of the 13th ACM Conference on Computer and Communications Security (CCS '06). ACM, Alexandria, Virginia, USA, 79--88. isbn: 1-59593-518-5. http://doi.acm.org/10.1145/1180405.1180417.
[5]
Dan Boneh and Brent Waters. 2007. Conjunctive, subset, and range queries on encrypted data. In Proceedings of the 4th Conference on Theory of Cryptography (TCC'07). Springer-Verlag, Amsterdam, The Netherlands, 535--554. isbn: 978-3-540-70935-0. http://dl.acm.org/citation.cfm?id=1760749.1760788.
[6]
Dan Boneh, Giovanni Di Crescenzo, Rafail Ostrovsky, and Giuseppe Persiano. 2004. Public key encryption with keyword search. In Advances in Cryptology - EUROCRYPT 2004. Christian Cachin and Jan L. Camenisch, editors. Springer Berlin Heidelberg, Berlin, Heidelberg, 506--522. isbn: 978-3-540-24676-3.
[7]
David Cash, Stanislaw Jarecki, Charanjit Jutla, Hugo Krawczyk, Marcel-Cătălin Roşu, and Michael Steiner. 2013. Highly-scalable searchable symmetric encryption with support for boolean queries. In Advances in Cryptology - CRYPTO 2013. Ran Canetti and Juan A. Garay, editors. Springer Berlin Heidelberg, Berlin, Heidelberg, 353--373. isbn: 978-3-642-40041-4.
[8]
N. Cao, C. Wang, M. Li, K. Ren, and W. Lou. 2014. Privacy-preserving multi-keyword ranked search over encrypted cloud data. IEEE Transactions on Parallel and Distributed Systems, 25, 1, (January 2014), 222--233. issn: 1045-9219.
[9]
Z. Xia, X. Wang, X. Sun, and Q. Wang. 2016. A secure and dynamic multi-keyword ranked search scheme over encrypted cloud data. IEEE Transactions on Parallel and Distributed Systems, 27, 2, 340--352. issn: 1045-9219.
[10]
Wenhai Sun, Bing Wang, Ning Cao, Ming Li, Wenjing Lou, Y. Thomas Hou, and Hui Li. 2013. Privacy-preserving multi-keyword text search in the cloud supporting similarity-based ranking. In Proceedings of the 8th ACM SIGSAC Symposium on Information, Computer and Communications Security (ASIA CCS '13). ACM, Hangzhou, China, 71--82. isbn: 978-1-4503-1767-2. http://doi.acm.org/10.1145/2484313.2484322.
[11]
Zhangjie Fu, Sun Xingming, Qi Liu, Lu Zhou, and Jiangang Shu. 2015. Achieving efficient cloud search services: multi-keyword ranked search over encrypted cloud data supporting parallel computing. IEICE Transactions on Communications, E98.B, (January 2015), 190--200.
[12]
Zhangjie Fu, Kui Ren, Jiangang Shu, Xingming Sun, and Fengxiao Huang. 2016. Enabling personalized search over encrypted outsourced data with efficiency improvement. IEEE Trans. Parallel Distrib. Syst., 27, 9, (September 2016), 2546--2559. issn: 1045-9219. 1109/TPDS. 2015. 2506573. https://doi.org/10.1109/TPDS.2015.2506573.
[13]
Cheng Guo, Pengxu Tian, and Xinyu Jie Yingmoand Tang. 2018. A privacy preserving similarity search scheme over encrypted high-dimensional data for multiple data owners. In Cloud Computing and Security. Xingming Sun, Zhaoqing Pan, and Elisa Bertino, editors. Springer International Publishing, Cham, 484--495. isbn: 978-3-030-00009-7.
[14]
Jon Louis Bentley. 1975. Multidimensional binary search trees used for associative searching. Commun. ACM, 18, 9, (September 1975), 509--517. issn: 0001-0782. http://doi.acm.org/10.1145/361002.361007.
[15]
Antonin Guttman. 1984. R-trees: a dynamic index structure for spatial searching. In Proceedings of the 1984 ACM SIGMOD International Conference on Management of Data (SIGMOD '84). ACM, Boston, Massachusetts, 47--57. isbn: 0-89791-128-8. http://doi.acm.org/10.1145/602259.602266.
[16]
Norbert Beckmann, Hans-Peter Kriegel, Ralf Schneider, and Bernhard Seeger. 1990. The r*-tree: an efficient and robust access method for points and rectangles. In Proceedings of the 1990 ACM SIGMOD International Conference on Management of Data (SIGMOD '90). ACM, Atlantic City, New Jersey, USA, 322--331. isbn: 0-89791-365-5. http://doi.acm.org/10.1145/93597.98741.
[17]
Peter N. Yianilos. 1993. Data structures and algorithms for nearest neighbor search in general metric spaces. In Proceedings of the Fourth Annual ACM-SIAM Symposium on Discrete Algorithms (SODA '93). Society for Industrial and Applied Mathematics, Austin, Texas, USA, 311--321. isbn: 0-89871-313-7. http://dl.acm.org/citation.cfm?id=313559.313789.
[18]
A. Broder. 1997. On the resemblance and containment of documents. In Proceedings of the Compression and Complexity of Sequences 1997 (SEQUENCES '97). IEEE Computer Society, Washington, DC, USA, 21-. isbn: 0-8186-8132-2. http://dl.acm.org/citation.cfm?id=829502.830043.
[19]
Moses S. Charikar. 2002. Similarity estimation techniques from rounding algorithms. In Proceedings of the Thiry-fourth Annual ACM Symposium on Theory of Computing (STOC '02). ACM, Montreal, Quebec, Canada, 380--388. isbn: 1-58113-495-9. http://doi.acm.org/10.1145/509907.509965.
[20]
[n. d.] Herumi/mcl. (). https://github.com/herumi/mcl.

Cited By

View all
  • (2023)Private Web Search with TiptoeProceedings of the 29th Symposium on Operating Systems Principles10.1145/3600006.3613134(396-416)Online publication date: 23-Oct-2023
  • (2023)Privacy-Preserving Retrieval Scheme Over Encrypted Medical Records with Relevance RankingParallel and Distributed Computing, Applications and Technologies10.1007/978-981-99-8211-0_9(81-92)Online publication date: 29-Nov-2023

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
iiWAS2019: Proceedings of the 21st International Conference on Information Integration and Web-based Applications & Services
December 2019
709 pages
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

In-Cooperation

  • JKU: Johannes Kepler Universität Linz
  • @WAS: International Organization of Information Integration and Web-based Applications and Services

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 22 February 2020

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. LSH
  2. VP-tree
  3. cloud computing
  4. searchable encryption
  5. similarity search

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Funding Sources

Conference

iiWAS2019

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)130
  • Downloads (Last 6 weeks)10
Reflects downloads up to 25 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2023)Private Web Search with TiptoeProceedings of the 29th Symposium on Operating Systems Principles10.1145/3600006.3613134(396-416)Online publication date: 23-Oct-2023
  • (2023)Privacy-Preserving Retrieval Scheme Over Encrypted Medical Records with Relevance RankingParallel and Distributed Computing, Applications and Technologies10.1007/978-981-99-8211-0_9(81-92)Online publication date: 29-Nov-2023

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media