research-article

Public Access

Exploiting accelerators for efficient high dimensional similarity search

Authors:

Sandeep R. Agrawal,

Christopher M. Dee,

Alvin R. LebeckAuthors Info & Claims

PPoPP '16: Proceedings of the 21st ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming

Article No.: 3, Pages 1 - 12

https://doi.org/10.1145/2851141.2851144

Published: 27 February 2016 Publication History

Abstract

Similarity search finds the most similar matches in an object collection for a given query; making it an important problem across a wide range of disciplines such as web search, image recognition and protein sequencing. Practical implementations of High Dimensional Similarity Search (HDSS) search across billions of possible solutions for multiple queries in real time, making its performance and efficiency a significant challenge. Existing clusters and datacenters use commercial multicore hardware to perform search, which may not provide the optimal performance and performance per Watt.

This work explores the performance, power and cost benefits of using throughput accelerators like GPUs to perform similarity search for query cohorts even under tight deadlines. We propose optimized implementations of similarity search for both the host and the accelerator. Augmenting existing Xeon servers with accelerators results in a 3× improvement in throughput per machine, resulting in a more than 2.5× reduction in cost of ownership, even for discounted Xeon servers. Replacing a Xeon based cluster with an accelerator based cluster for similarity search reduces the total cost of ownership by more than 6× to 16× while consuming significantly less power than an ARM based cluster.

References

[1]

S. R. Agrawal. Harnessing Data Parallel Hardware for Server Workloads. PhD thesis, Duke University, 2015.

[2]

S. R. Agrawal, V. Pistol, J. Pang, J. Tran, D. Tarjan, and A. R. Lebeck. Rhythm: Harnessing data parallel hardware for server workloads. In Proceedings of the 19^th International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS '14, pages 19--34, New York, NY, USA, 2014. ACM. ISBN 978-1-4503-2305-5.

Digital Library

[3]

L. A. Barroso, J. Clidaras, and U. Hölzle. The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines, Second Edition. 2013.

Digital Library

[4]

C. Böhm, S. Berchtold, and D. A. Keim. Searching in high-dimensional spaces: Index structures for improving the performance of multimedia databases. ACM Comput. Surv., 33(3):322--373, Sept. 2001. ISSN 0360-0300.

Digital Library

[5]

S. Brin and L. Page. The anatomy of a large-scale hypertextual web search engine. In Proceedings of the Seventh International Conference on World Wide Web 7, WWW7, pages 107--117, Amsterdam, The Netherlands, The Netherlands, 1998. Elsevier Science Publishers B. V. URL http://dl.acm.org/citation.cfm?id=297805.297827.

Digital Library

[6]

S. Dalton, L. Olson, and N. Bell. Optimizing sparse matrix---matrix multiplication for the gpu. ACM Trans. Math. Softw., 41(4):25:1--25:20, Oct. 2015. ISSN 0098-3500. URL http://doi.acm.org/10.1145/2699470.

Digital Library

[7]

S. Ding, J. He, H. Yan, and T. Suel. Using graphics processors for high performance IR query processing. In Proceedings of the 18^th International Conference on World Wide Web, WWW'09, pages 421--430, New York, USA, 2009. ACM. ISBN 978-1-60558-487-4.

Digital Library

[8]

W. Dong. High-dimensional Similarity Search for Large Datasets. PhD thesis, Princeton, NJ, USA, 2011. AAI3481579.

Digital Library

[9]

H. Esmaeilzadeh, E. Blem, R. St. Amant, K. Sankaralingam, and D. Burger. Dark silicon and the end of multicore scaling. In Proceedings of the 38th Annual International Symposium on Computer Architecture, ISCA '11, pages 365--376, New York, NY, USA, 2011. ACM. ISBN 978-1-4503-0472-6. URL http://doi.acm.org/10.1145/2000064.2000108.

Digital Library

[10]

M. Ferdman, A. Adileh, O. Kocberber, S. Volos, M. Alisafaee, D. Jevdjic, C. Kaynak, A. D. Popescu, A. Ailamaki, and B. Falsafi. Clearing the clouds: a study of emerging scale-out workloads on modern hardware. In Proceedings of the seventeenth international conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS XVII, pages 37--48, New York, NY, USA, 2012. ACM. ISBN 978-1-4503-0759-8. 2150982.

Digital Library

[11]

A. Gionis, P. Indyk, and R. Motwani. Similarity search in high dimensions via hashing. In Proceedings of the 25^th International Conference on Very Large Data Bases, VLDB '99, pages 518--529, San Francisco, CA, USA, 1999. Morgan Kaufmann Publishers Inc. ISBN 1-55860-615-7.

Digital Library

[12]

W. Gish and D. J. States. Identification of protein coding regions by database similarity search. Nat Genet, 3(3):266--272, Mar. 1993.

[13]

N. Goharian, T. El-Ghazawi, and D. Grossman. Enterprise text processing: a sparse matrix approach. In Information Technology: Coding and Computing, 2001. Proceedings. International Conference on, pages 71--75, Apr. 2001.

Digital Library

[14]

Google Zeitgeist 2012. Google zeitgeist 2012. http://www.google.com/zeitgeist/2012/#the-world.

[15]

T. H. Hetherington, T. G. Rogers, L. Hsu, M. O'Connor, and T. M. Aamodt. Characterizing and evaluating a key-value store application on heterogeneous CPU-GPU systems. In Proceedings of the 2012 IEEE International Symposium on Performance Analysis of Systems & Software, ISPASS '12, pages 88--98, Washington, DC, USA, 2012. IEEE Computer Society. ISBN 978-1-4673-1143-4.

Digital Library

[16]

T. H. Hetherington, M. O'Connor, and T. M. Aamodt. Memcachedgpu: Scaling-up scale-out key-value stores. In Proceedings of the Sixth ACM Symposium on Cloud Computing, SoCC '15, pages 43--57, New York, NY, USA, 2015. ACM. ISBN 978-1-4503-3651-2. URL http://doi.acm.org/10.1145/2806777.2806836.

Digital Library

[17]

P. Indyk and R. Motwani. Approximate nearest neighbors: Towards removing the curse of dimensionality. In Proceedings of the Thirtieth Annual ACM Symposium on Theory of Computing, STOC '98, pages 604--613, New York, NY, USA, 1998. ACM. ISBN 0-89791-962-9.

Digital Library

[18]

Intel. Advancing moore's law in 2014the road to 14 nm. 2014. URL http://www.intel.com/content/www/us/en/silicon-innovations/advancing-moores-law-in-2014-presentation.html.

[19]

V. Janapa Reddi, B. C. Lee, T. Chilimbi, and K. Vaid. Web search using mobile cores: quantifying and mitigating the price of efficiency. In Proceedings of the 37^th annual international symposium on Computer architecture, ISCA '10, pages 314--325, New York, NY, USA, 2010. ACM. ISBN 978-1-4503-0053-7.

Digital Library

[20]

M. Jeon, Y. He, S. Elnikety, A. L. Cox, and S. Rixner. Adaptive parallelism for web search. In Proceedings of the 8^th ACM European Conference on Computer Systems, EuroSys '13, pages 155--168, New York, NY, USA, 2013. ACM. ISBN 978-1-4503-1994-2.

Digital Library

[21]

S. Keckler, W. Dally, B. Khailany, M. Garland, and D. Glasco. GPUs and the future of parallel computing. Micro, IEEE, 31(5):7--17, Sept. 2011. ISSN 0272-1732.

Digital Library

[22]

T. Kgil, A. Saidi, N. Binkert, S. Reinhardt, K. Flautner, and T. Mudge. Picoserver: Using 3D stacking technology to build energy efficient servers. J. Emerg. Technol. Comput. Syst., 4(4):16:1--16:34, Nov. 2008. ISSN 1550-4832.

Digital Library

[23]

J. Koomey. A simple model for determining true total cost of ownership for data centers.

[24]

J. R. Larus and M. Parkes. Using Cohort Scheduling to Enhance Server Performance (Extended Abstract). In LCTES '01: Proceedings of the ACM SIGPLAN workshop on Languages, compilers and tools for embedded systems, pages 182--187, New York, NY, USA, 2001. ACM. ISBN 1-58113-425-8.

Digital Library

[25]

D. Lipman and W. Pearson. Rapid and sensitive protein similarity searches. Science, 227(4693):1435--1441, 1985. URL http://www.sciencemag.org/content/227/4693/1435.abstract.

[26]

W. Liu and B. Vinter. A framework for general sparse matrix-matrix multiplication on gpus and heterogeneous processors. J. Parallel Distrib. Comput., 85(C):47--61, Nov. 2015. ISSN 0743-7315. URL http://dx.doi.org/10.1016/j.jpdc.2015.06.010.

Digital Library

[27]

P. Lotfi-Kamran, B. Grot, M. Ferdman, S. Volos, O. Kocberber, J. Picorel, A. Adileh, D. Jevdjic, S. Idgunji, E. Ozer, and B. Falsafi. Scaleout processors. In Proceedings of the 39^th Annual International Symposium on Computer Architecture, ISCA '12, pages --, Washington DC, USA, 2012. IEEE Computer Society. ISBN 978-1-4503-1642-2.

Digital Library

[28]

C. D. Manning, P. Raghavan, and H. Schütze. Introduction to Information Retrieval. Cambridge University Press, New York, NY, USA, 2008. ISBN 0521865719, 9780521865715.

Digital Library

[29]

T. Mudge and U. Holzle. Challenges and opportunities for extremely energy-efficient processors. IEEE Micro, 30(4):20--24, July 2010. ISSN 0272-1732.

Digital Library

[30]

M. M. A. Patwary, N. R. Satish, N. Sundaram, J. Park, M. J. Anderson, S. G. Vadlamudi, D. Das, S. G. Pudov, V. O. Pirogov, and P. Dubey. Parallel efficient sparse matrix-matrix multiplication on multicore platforms. In High Performance Computing, pages 48--57. Springer, 2015.

[31]

A. Putnam, A. Caulfield, E. Chung, D. Chiou, K. Constantinides, J. Demme, H. Esmaeilzadeh, J. Fowers, G. P. Gopal, J. Gray, M. Haselman, S. Hauck, S. Heil, A. Hormati, J.-Y. Kim, S. Lanka, J. Larus, E. Peterson, S. Pope, A. Smith, J. Thong, P. Y. Xiao, and D. Burger. A reconfigurable fabric for accelerating large-scale datacenter services. In 41st Annual International Symposium on Computer Architecture (ISCA), June 2014.

Digital Library

[32]

A. W. M. Smeulders, M. Worring, S. Santini, A. Gupta, and R. Jain. Content-based image retrieval at the end of the early years. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 22(12): 1349--1380, Dec. 2000. ISSN 0162-8828.

Digital Library

[33]

A. Spink, D. Wolfram, M. B. J. Jansen, and T. Saracevic. Searching the web: The public and their queries. J. Am. Soc. Inf. Sci. Technol., 52(3):226--234, Feb. 2001. ISSN 1532-2882.

[34]

N. Sundaram, A. Turmukhametova, N. Satish, T. Mostak, P. Indyk, S. Madden, and P. Dubey. Streaming similarity search over one billion tweets using parallel locality-sensitive hashing. Proc. VLDB Endow., 6(14):1930--1941, Sept. 2013. ISSN 2150-8097. URL http://dl.acm.org/citation.cfm?id=2556549.2556574.

Digital Library

[35]

H. Sundmaeker, P. Guillemin, P. Friess, and S. Woelfflé. Vision and challenges for realising the internet of things. Cluster of European Research Projects on the Internet of Things, European Commision, 2010.

[36]

Verizon. State of the market the internet of things 2015. 2015. URL http://www.verizonenterprise.com/resources/reports/rp_state-of-market-the-market-the-internet-of-things-2015_en_xg.pdf.

Cited By

Caminal HChronis YWu TPatel JMartínez JSalapura VZahran MChong FTang L(2022)Accelerating database analytic query workloads using an associative processorProceedings of the 49th Annual International Symposium on Computer Architecture10.1145/3470496.3527435(623-637)Online publication date: 18-Jun-2022
https://dl.acm.org/doi/10.1145/3470496.3527435
Adavally S(2021)Improving Memory Performance for Both High Performance Computing and Embedded/Edge Computing Systemsundefined10.12794/metadc1873542Online publication date: Dec-2021
https://doi.org/10.12794/metadc1873542
Nagasaka YMatsuoka SAzad ABuluç A(2018)High-Performance Sparse Matrix-Matrix Products on Intel KNL and Multicore ArchitecturesWorkshop Proceedings of the 47th International Conference on Parallel Processing10.1145/3229710.3229720(1-10)Online publication date: 13-Aug-2018
https://dl.acm.org/doi/10.1145/3229710.3229720
Show More Cited By

Index Terms

Exploiting accelerators for efficient high dimensional similarity search
1. Computer systems organization
  1. Architectures
    1. Distributed architectures
      1. Client-server architectures
2. Information systems
  1. Data management systems
    1. Middleware for databases
      1. Application servers
      2. Database web servers

Recommendations

Exploiting accelerators for efficient high dimensional similarity search
PPoPP '16

Similarity search finds the most similar matches in an object collection for a given query; making it an important problem across a wide range of disciplines such as web search, image recognition and protein sequencing. Practical implementations of High ...
A comparative investigation of device-specific mechanisms for exploiting HPC accelerators
GPGPU-8: Proceedings of the 8th Workshop on General Purpose Processing using GPUs

A variety of computational accelerators have been greatly improved in recent years. Intel's MIC (Many Integrated Core) and both GPU architectures, NVIDIA's Kepler and AMD's Graphics Core Next, all represent real innovations in the field of HPC. Based ...
Multi-GPU DGEMM and High Performance Linpack on Highly Energy-Efficient Clusters

High Performance Linpack can maximize requirements throughout a computer system. An efficient multi-GPU double-precision general matrix multiply (DGEMM), together with adjustments to the HPL, is required to utilize a heterogeneous computer to its full ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

PPoPP '16: Proceedings of the 21st ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming

February 2016

420 pages

ISBN:9781450340922

DOI:10.1145/2851141

General Chair:
Rafael Asenjo
University of Málaga, Spain
,
Program Chair:
Tim Harris
Oracle Labs, Cambridge, UK

ACM SIGPLAN Notices Volume 51, Issue 8
PPoPP '16
August 2016
405 pages
ISSN:0362-1340
EISSN:1558-1160
DOI:10.1145/3016078
Editor:
Matthew Fluet
Issue’s Table of Contents

Copyright © 2016 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 27 February 2016

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

National Science Foundation

Conference

PPoPP '16

Sponsor:

PPoPP '16: 21st ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming

March 12 - 16, 2016

Barcelona, Spain

Acceptance Rates

Overall Acceptance Rate 230 of 1,014 submissions, 23%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

7
Total Citations
View Citations
600
Total Downloads

Downloads (Last 12 months)56
Downloads (Last 6 weeks)8

Reflects downloads up to 24 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Caminal HChronis YWu TPatel JMartínez JSalapura VZahran MChong FTang L(2022)Accelerating database analytic query workloads using an associative processorProceedings of the 49th Annual International Symposium on Computer Architecture10.1145/3470496.3527435(623-637)Online publication date: 18-Jun-2022
https://dl.acm.org/doi/10.1145/3470496.3527435
Adavally S(2021)Improving Memory Performance for Both High Performance Computing and Embedded/Edge Computing Systemsundefined10.12794/metadc1873542Online publication date: Dec-2021
https://doi.org/10.12794/metadc1873542
Nagasaka YMatsuoka SAzad ABuluç A(2018)High-Performance Sparse Matrix-Matrix Products on Intel KNL and Multicore ArchitecturesWorkshop Proceedings of the 47th International Conference on Parallel Processing10.1145/3229710.3229720(1-10)Online publication date: 13-Aug-2018
https://dl.acm.org/doi/10.1145/3229710.3229720
Agrawal SIdicula SRaghavan AVlachos EGovindaraju VVaradarajan VBalkesen CGiannikis GRoth CAgarwal NSedlar EHunter HMoreno JEmer JSanchez D(2017)A many-core architecture for in-memory data processingProceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture10.1145/3123939.3123985(245-258)Online publication date: 14-Oct-2017
https://dl.acm.org/doi/10.1145/3123939.3123985
Govindaraju VIdicula SAgrawal SVardarajan VRaghavan AWen JBalkesen CGiannikis GAgarwal NSedlar E(2017)Big Data Processing: Scalability with Extreme Single-Node Performance2017 IEEE International Congress on Big Data (BigData Congress)10.1109/BigDataCongress.2017.26(129-136)Online publication date: Jun-2017
https://doi.org/10.1109/BigDataCongress.2017.26
Xu LLim SLi MButt AKannan R(2017)Scaling up data-parallel analytics platforms: Linear algebraic operation cases2017 IEEE International Conference on Big Data (Big Data)10.1109/BigData.2017.8257935(273-282)Online publication date: Dec-2017
https://doi.org/10.1109/BigData.2017.8257935
Xu LLim SButt ASukumar SKannan R(2016)Fatman vs. littleboyProceedings of the 1st Joint International Workshop on Parallel Data Storage & Data Intensive Scalable Computing Systems10.5555/3019046.3019051(25-30)Online publication date: 13-Nov-2016
https://dl.acm.org/doi/10.5555/3019046.3019051
Xu LLim SButt ASukumar SKannan R(2016)FatMan vs. LittleBoy: Scaling Up Linear Algebraic Operations in Scale-Out Data Platforms2016 1st Joint International Workshop on Parallel Data Storage and data Intensive Scalable Computing Systems (PDSW-DISCS)10.1109/PDSW-DISCS.2016.009(25-30)Online publication date: Nov-2016
https://doi.org/10.1109/PDSW-DISCS.2016.009
Sun GJun S(2020)Bandwidth Efficient Near-Storage Accelerator for High-Dimensional Similarity Search2020 International Conference on Field-Programmable Technology (ICFPT)10.1109/ICFPT51103.2020.00026(129-138)Online publication date: Dec-2020
https://doi.org/10.1109/ICFPT51103.2020.00026

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Figures

Tables

Media

View Table of Conten