Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3127479.3127485acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article

DLSH: a distribution-aware LSH scheme for approximate nearest neighbor query in cloud computing

Published: 24 September 2017 Publication History

Abstract

Cloud computing needs to process and analyze massive high-dimensional data in a real-time manner. Approximate queries in cloud computing systems can provide timely queried results with acceptable accuracy, thus alleviating the consumption of a large amount of resources. Locality Sensitive Hashing (LSH) is able to maintain the data locality and support approximate queries. However, due to randomly choosing hash functions, LSH has to use too many functions to guarantee the query accuracy. The extra computation and storage overheads exacerbate the real performance of LSH. In order to reduce the overheads and deliver high performance, we propose a distribution-aware scheme, called DLSH, to offer cost-effective approximate nearest neighbor query service for cloud computing. The idea of DLSH is to leverage the principal components of the data distribution as the projection vectors of hash functions in LSH, further quantify the weight of each hash function and adjust the interval value in each hash table. We then refine the queried result set based on the hit frequency to significantly decrease the time overhead of distance computation. Extensive experiments in a large-scale cloud computing testbed demonstrate significant improvements in terms of multiple system performance metrics. We have released the source code of DLSH for public use.

References

[1]
December 2015. How many photos are uploaded to Flickr every day, month, year? https://www.flickr.com/photos/franckmichel/6855169886/in/photostream/ (December 2015).
[2]
Updated July 2016. The Top 20 Valuable Facebook Statistics. https://zephoria.com/top-15-valuable-facebook-statistics/ (Updated July 2016).
[3]
Hervé Abdi and Lynne J Williams. 2010. Principal component analysis. Wiley Interdisciplinary Reviews: Computational Statistics 2, 4 (2010), 433--459.
[4]
Alexandr Andoni and Piotr Indyk. 2005. E2LSH 0.1 user manual. (2005).
[5]
Alexandr Andoni and Piotr Indyk. 2006. Near-Optimal Hashing Algorithms for Approximate Nearest Neighbor in High Dimensions. In Proceedings of the 47th Annual IEEE Symposium on Foundations of Computer Sciemce (FOCS'06). IEEE, 459--468.
[6]
Alexandr Andoni, Piotr Indyk, Huy L. Nguyen, and Ilya Razenshteyn. 2014. Beyond Locality-Sensitive Hashing. In Proceedings of the Twenty-Fifth Annual ACM-SIAM Symposium on Discrete Algorithms. SIAM, 1018--1028.
[7]
Alexandr Andoni and Ilya Razenshteyn. 2015. Optimal Data-Dependent Hashing for Approximate Near Neighbors. In Proceedings of the forty-seventh annual ACM symposium on Theory of computing. ACM, 793--801.
[8]
Ning Cao, Cong Wang, Ming Li, Kui Ren, and Wenjing Lou. 2014. Privacy-Preserving Multi-Keyword Ranked Search over Encrypted Cloud Data. IEEE Transactions on parallel and distributed systems 25, 1 (2014), 222--233.
[9]
Mayur Datar, Nicole Immorlica, Piotr Indyk, and Vahab S Mirrokni. 2004. Locality-Sensitive Hashing Scheme Based on p-Stable Distributions. In Proceedings of the twentieth annual symposium on Computational geometry. ACM, 253--262.
[10]
Junhao Gan, Jianlin Feng, Qiong Fang, and Wilfred Ng. 2012. Locality-Sensitive Hashing Scheme Based on Dynamic Collision Counting. In Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data. ACM, 541--552.
[11]
Jinyang Gao, Hosagrahar Visvesvaraya Jagadish, Wei Lu, and Beng Chin Ooi. 2014. DSH: Data Sensitive Hashing for High-Dimensional k-NN Search. In Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data. ACM, 1127--1138.
[12]
Yu Hua, Bin Xiao, Dan Feng, and Bo Yu. 2008. Bounded LSH for Similarity Search in Peer-to-Peer File Systems. In Proceedings of the 37th International Conference on Parallel Processing. IEEE, 644--651.
[13]
Yu Hua, Bin Xiao, and Xue Liu. 2013. NEST: Locality-aware Approximate Query Service for Cloud Computing. In Proceedings IEEE INFOCOM. IEEE, 1303--1311.
[14]
Qiang Huang, Jianlin Feng, Yikai Zhang, Qiong Fang, and Wilfred Ng. 2015. Query-Aware Locality-Sensitive Hashing for Sp-proximate Nearest Neighbor Search. Proceedings of the VLDB Endowment 9, 1, 1--12.
[15]
Piotr Indyk and Rajeev Motwani. 1998. Approximate Nearest Neighbors: Towards Removing the Curse of Dimensionality. In Proceedings of the thirtieth annual ACM symposium on Theory of computing. ACM, 604--613.
[16]
Weihao Kong and Wu-Jun Li. 2012. Isotropic Hashing. In Advances in Neural Information Processing Systems. 1646--1654.
[17]
Simon Korman and Shai Avidan. 2016. Coherency Sensitive Hashing. IEEE transactions on pattern analysis and machine intelligence 38, 6 (2016), 1099--1112.
[18]
Gautam Kumar, Ganesh Ananthanarayanan, Sylvia Ratnasamy, and Ion Stoica. 2016. Hold'em or Fold'em?: Aggregation Queries under Performance Variations. In Proceedomgs pf tje Eleventh European Conference on Computer Systems (EuroSys'16). ACM.
[19]
Ming Li, Shucheng Yu, Ning Cao, and Wenjing Lou. 2011. Authorized Private Keyword Search over Encrypted Data in Cloud Computing. In Proceedings of the 31st International Conference on Distributed Computing Systems (ICDCS'11). IEEE, 383--392.
[20]
Guosheng Lin, Chunhua Shen, Qinfeng Shi, Anton van den Hengel, and David Suter. 2014. Fast Supervised Hashing with Decision Trees for High-Dimensional Data. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1963--1970.
[21]
Jia Liu, Bin Xiao, Kai Bu, and Lijun Chen. 2014. Efficient Distributed Query Processing in Large RFID-enabled Supply Chains. In Proceedings of the IEEE Conference on Computer Communications (INFOCOM'14). IEEE, 163--171.
[22]
Qin Liu, Chiu C Tan, Jie Wu, and Guojun Wang. 2012. Efficient Information Retrieval for Ranked Queries in Cost-Effective Cloud Environments. In Proceedings of the 31st Annual IEEE International Conference on Computer Communications. IEEE, 2581--2585.
[23]
David G Lowe. 2004. Distinctive Image Features from Scale-Invariant Keypoints. International Journal of Computer Vision 60, 2 (2004), 91--110.
[24]
Qin Lv, William Josephson, Zhe Wang, Moses Charikar, and Kai Li. 2007. Multi-Probe LSH: Efficient Indexing for High-Dimensional Similarity Search. In Proceedings of the 33rd International Conference on Very Large Data Bases. VLDB Endowment, 950--961.
[25]
Yusuke Matsushita and Toshikazu Wada. 2009. Principal Component Hashing: An Accelerated Approximate Nearest Neighbor Search. Pacific-Rim Symposium on Image and Video Technology (2009), 374--385.
[26]
Krystian Mikolajczyk, Tinne Tuytelaars, Cordelia Schmid, Andrew Zisserman, Jiri Matas, Frederik Schaffalitzky, Timor Kadir, and Luc Van Gool. 2005. A Comparison of Affine Region Detectors. International Journal of Computer Vision 65, 1--2 (2005), 43--72.
[27]
Nhan Nguyen and Philippas Tsigas. 2014. Lock-Free Cuckoo Hashing. In Proceedings of the 34th International Conference on Distributed Computing Systems. IEEE, 627--636.
[28]
David Nister and Henrik Stewenius. 2006. Scalable Recognition with a Vocabulary Tree. In Proceedings of the 2006 IEEE Computer Society Conference on Computer Version and Pattern Recognition (CVPR'06), Vol. 2. IEEE, 2161--2168.
[29]
Rina Panigrahy. 2006. Entropy based Nearest Neighbor Search in High Dimensions. In Proceedings of the 17th Annual ACM-SIAM Symposium on Discrete Algorithm. Society for Industrial and Applied Mathematics, 1186--1195.
[30]
Yongjoo Park, Michael Cafarella, and Barzan Mozafari. 2015. Neighbor-Sensitive Hashing. Proceedings of the VLDB Endowment 9, 3 (2015), 144--155.
[31]
Sébastien Poullot, Olivier Buisson, and Michel Crucianu. 2007. Z-grid-based Probabilistic Retrieval for Scaling Up Content-Based Copy Detection. In Proceedings of the 6th ACM International Conference on Image and Video Retrieval. ACM, 348--355.
[32]
Maxim Raginsky and Svetlana Lazebnik. 2009. Locality-Sensitive Binary Codes from Shift-Invariant Kernels. In Advances in Neural Information Processing Systems. 1509--1517.
[33]
Fumin Shen, Chunhua Shen, Wei Liu, and Heng Tao Shen. 2015. Supervised Discrete Hashing. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 37--45.
[34]
Evan R Sparks, Ameet Talwalkar, Daniel Haas, Michael J Franklin, Michael I Jordan, and Tim Kraska. 2015. Automating Model Search for Large Scale Machine Learning. In Proceedings of the Sixth ACM Symposium on Cloud Computing. ACM, 368--380.
[35]
Wenhai Sun, Xuefeng Liu, Wenjing Lou, Y Thomas Hou, and Hui Li. 2015. Catch You If You Lie to Me: Efficient Verifiable Conjunctive Keyword Search over Large Dynamic Encrypted Cloud Data. In Proceedings of the IEEE Conference on Computer Communications (INFOCOM'15). IEEE, 2110--2118.
[36]
Yuzhe Tang and Ling Liu. 2015. Privacy-Preserving Multi-Keyword Search in Information Networks. IEEE Transactions on Knowledge and Data Engineering 27, 9 (2015), 2424--2437.
[37]
Yufei Tao, Ke Yi, Cheng Sheng, and Panos Kalnis. 2009. Quality and Efficiency in High Dimensional Nearest Neighbor Search. In Proceedings of the 2009 ACM SIGMOD Conference on Management of data. ACM, 563--576.
[38]
Yufei Tao, Ke Yi, Cheng Sheng, and Panos Kalnis. 2010. Efficient and Accurate Nearest Neighbor and Closest Pair Search in High-Dimensional Space. ACM Transactions on Database Systems (TODS) 35, 3 (2010), 20.
[39]
Shixin Tian, Ying Cai, and Zhenbi Hu. 2016. A Parity-Based Data Outsourcing Model for Query Authentication and Correction. In Proceedings of the 36th International Conference on Distributed Computing Systems (ICDCS'16). IEEE, 395--404.
[40]
Vernon Turner, John F Gantz, David Reinsel, and Stephen Minton. 2014. The digital universe of opportunities: rich data and the increasing value of the internet of things. International Data Corporation, White Paper, IDC_1672 (2014).
[41]
Raajay Viswanathan, Ganesh Ananthanarayanan, and Aditya Akella. 2016. CLARINET: WAN-Aware Optimization for Analytics Queries. In Proceedings of the 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI'16). USENIX Association, 435--450.
[42]
Dongsheng Wang, Xiaohua Jia, Cong Wang, Kan Yang, Shaojing Fu, and Ming Xu. 2015. Generalized Pattern Matching String Search on Encrypted Data in Cloud Systems. In Proceedings of the IEEE Conference on Computer Communications (INFOCOM'15). IEEE, 2101--2109.
[43]
Yair Weiss, Antonio Torralba, and Rob Fergus. 2009. Spectral Hashing. In Advances in Neural Information Processing Systems. 1753--1760.
[44]
Huiqi Xu, Shumin Guo, and Keke Chen. 2014. Building Confidential and Efficient Query Services in the Cloud with RASP Data Perturbation. IEEE Transactions on Knowledge and Data Engineering 26, 2 (2014), 322--335.
[45]
Hao Xu, Jingdong Wang, Zhu Li, Gang Zeng, Shipeng Li, and Nenghai Yu. 2011. Complementary Hashing for Approximate Nearest Neighbor Search. In Proceedings of the 2011 IEEE International Conference on Computer Vision. IEEE, 1631--1638.
[46]
Lei Xu, Hong Jiang, Lei Tian, and Ziling Huang. 2014. Propeller: A Scalable Real-Time File-Search Service in Distributed Systems. In Proceedings of the 34th International Conference on Distributed Computing Systems (ICDCS'14). IEEE, 378--388.
[47]
Myung Keun Yoon, JinWoo Son, and Seon-Ho Shin. 2014. Bloom Tree: A Search Tree Based on Bloom Filters for Multiple-Set Membership Testing. In Proceedings of the IEEE Conference on Computer Communications (INFOCOM'14). IEEE, 1429--1437.
[48]
Felix X Yu, Sanjiv Kumar, Yunchao Gong, and Shih-Fu Chang. 2014. Circulant Binary Embedding. In Proceedings of the International Conference on Machine Learning, Vol. 6. 7.
[49]
Deli Zhang and Damian Dechev. 2016. An Efficient Lock-Free Logarithmic Search Data Structure Based on Multi-dimensional List. In Proceedings of the 36th International Conference on Distributed Computing Systems (ICDCS'16). IEEE, 281--292.
[50]
Lan Zhang, Taeho Jung, Cihang Liu, Xuan Ding, Xiang-Yang Li, and Yunhao Liu. 2015. POP: Privacy-Preserving Outsourced Photo Sharing and Searching for Mobile Devices. In Proceedings of the 35th International Conference on Distributed Computing Systems (ICDCS'15). IEEE, 308--317.
[51]
Wei Zhang, Ke Gao, Yong-dong Zhang, and Jin-tao Li. 2010. Data-Oriented Locality Sensitive Hashing. In Proceedings of the 18th ACM international conference on Multimedia. ACM, 1131--1134.

Cited By

View all
  • (2019)Bandwidth and Energy Efficient Image Sharing for Situation Awareness in DisastersIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2018.285993030:1(15-28)Online publication date: 1-Jan-2019
  • (2019)Towards Faster Distributed Deep Learning Using Data Hashing Techniques2019 IEEE International Conference on Big Data (Big Data)10.1109/BigData47090.2019.9006378(6189-6191)Online publication date: Dec-2019

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SoCC '17: Proceedings of the 2017 Symposium on Cloud Computing
September 2017
672 pages
ISBN:9781450350280
DOI:10.1145/3127479
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 24 September 2017

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. approximate nearest neighbor query
  2. cloud computing
  3. locality sensitive hashing
  4. storage systems

Qualifiers

  • Research-article

Funding Sources

  • National Key Research and Development Program of China
  • State Key Laboratory of Computer Architecture

Conference

SoCC '17
Sponsor:
SoCC '17: ACM Symposium on Cloud Computing
September 24 - 27, 2017
California, Santa Clara

Acceptance Rates

Overall Acceptance Rate 169 of 722 submissions, 23%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)5
  • Downloads (Last 6 weeks)1
Reflects downloads up to 09 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2019)Bandwidth and Energy Efficient Image Sharing for Situation Awareness in DisastersIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2018.285993030:1(15-28)Online publication date: 1-Jan-2019
  • (2019)Towards Faster Distributed Deep Learning Using Data Hashing Techniques2019 IEEE International Conference on Big Data (Big Data)10.1109/BigData47090.2019.9006378(6189-6191)Online publication date: Dec-2019

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media