Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/2851141.2851144acmconferencesArticle/Chapter ViewAbstractPublication PagesppoppConference Proceedingsconference-collections
research-article
Public Access

Exploiting accelerators for efficient high dimensional similarity search

Published: 27 February 2016 Publication History

Abstract

Similarity search finds the most similar matches in an object collection for a given query; making it an important problem across a wide range of disciplines such as web search, image recognition and protein sequencing. Practical implementations of High Dimensional Similarity Search (HDSS) search across billions of possible solutions for multiple queries in real time, making its performance and efficiency a significant challenge. Existing clusters and datacenters use commercial multicore hardware to perform search, which may not provide the optimal performance and performance per Watt.
This work explores the performance, power and cost benefits of using throughput accelerators like GPUs to perform similarity search for query cohorts even under tight deadlines. We propose optimized implementations of similarity search for both the host and the accelerator. Augmenting existing Xeon servers with accelerators results in a 3× improvement in throughput per machine, resulting in a more than 2.5× reduction in cost of ownership, even for discounted Xeon servers. Replacing a Xeon based cluster with an accelerator based cluster for similarity search reduces the total cost of ownership by more than 6× to 16× while consuming significantly less power than an ARM based cluster.

References

[1]
S. R. Agrawal. Harnessing Data Parallel Hardware for Server Workloads. PhD thesis, Duke University, 2015.
[2]
S. R. Agrawal, V. Pistol, J. Pang, J. Tran, D. Tarjan, and A. R. Lebeck. Rhythm: Harnessing data parallel hardware for server workloads. In Proceedings of the 19th International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS '14, pages 19--34, New York, NY, USA, 2014. ACM. ISBN 978-1-4503-2305-5.
[3]
L. A. Barroso, J. Clidaras, and U. Hölzle. The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines, Second Edition. 2013.
[4]
C. Böhm, S. Berchtold, and D. A. Keim. Searching in high-dimensional spaces: Index structures for improving the performance of multimedia databases. ACM Comput. Surv., 33(3):322--373, Sept. 2001. ISSN 0360-0300.
[5]
S. Brin and L. Page. The anatomy of a large-scale hypertextual web search engine. In Proceedings of the Seventh International Conference on World Wide Web 7, WWW7, pages 107--117, Amsterdam, The Netherlands, The Netherlands, 1998. Elsevier Science Publishers B. V. URL http://dl.acm.org/citation.cfm?id=297805.297827.
[6]
S. Dalton, L. Olson, and N. Bell. Optimizing sparse matrix---matrix multiplication for the gpu. ACM Trans. Math. Softw., 41(4):25:1--25:20, Oct. 2015. ISSN 0098-3500. URL http://doi.acm.org/10.1145/2699470.
[7]
S. Ding, J. He, H. Yan, and T. Suel. Using graphics processors for high performance IR query processing. In Proceedings of the 18th International Conference on World Wide Web, WWW'09, pages 421--430, New York, USA, 2009. ACM. ISBN 978-1-60558-487-4.
[8]
W. Dong. High-dimensional Similarity Search for Large Datasets. PhD thesis, Princeton, NJ, USA, 2011. AAI3481579.
[9]
H. Esmaeilzadeh, E. Blem, R. St. Amant, K. Sankaralingam, and D. Burger. Dark silicon and the end of multicore scaling. In Proceedings of the 38th Annual International Symposium on Computer Architecture, ISCA '11, pages 365--376, New York, NY, USA, 2011. ACM. ISBN 978-1-4503-0472-6. URL http://doi.acm.org/10.1145/2000064.2000108.
[10]
M. Ferdman, A. Adileh, O. Kocberber, S. Volos, M. Alisafaee, D. Jevdjic, C. Kaynak, A. D. Popescu, A. Ailamaki, and B. Falsafi. Clearing the clouds: a study of emerging scale-out workloads on modern hardware. In Proceedings of the seventeenth international conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS XVII, pages 37--48, New York, NY, USA, 2012. ACM. ISBN 978-1-4503-0759-8. 2150982.
[11]
A. Gionis, P. Indyk, and R. Motwani. Similarity search in high dimensions via hashing. In Proceedings of the 25th International Conference on Very Large Data Bases, VLDB '99, pages 518--529, San Francisco, CA, USA, 1999. Morgan Kaufmann Publishers Inc. ISBN 1-55860-615-7.
[12]
W. Gish and D. J. States. Identification of protein coding regions by database similarity search. Nat Genet, 3(3):266--272, Mar. 1993.
[13]
N. Goharian, T. El-Ghazawi, and D. Grossman. Enterprise text processing: a sparse matrix approach. In Information Technology: Coding and Computing, 2001. Proceedings. International Conference on, pages 71--75, Apr. 2001.
[14]
Google Zeitgeist 2012. Google zeitgeist 2012. http://www.google.com/zeitgeist/2012/#the-world.
[15]
T. H. Hetherington, T. G. Rogers, L. Hsu, M. O'Connor, and T. M. Aamodt. Characterizing and evaluating a key-value store application on heterogeneous CPU-GPU systems. In Proceedings of the 2012 IEEE International Symposium on Performance Analysis of Systems & Software, ISPASS '12, pages 88--98, Washington, DC, USA, 2012. IEEE Computer Society. ISBN 978-1-4673-1143-4.
[16]
T. H. Hetherington, M. O'Connor, and T. M. Aamodt. Memcachedgpu: Scaling-up scale-out key-value stores. In Proceedings of the Sixth ACM Symposium on Cloud Computing, SoCC '15, pages 43--57, New York, NY, USA, 2015. ACM. ISBN 978-1-4503-3651-2. URL http://doi.acm.org/10.1145/2806777.2806836.
[17]
P. Indyk and R. Motwani. Approximate nearest neighbors: Towards removing the curse of dimensionality. In Proceedings of the Thirtieth Annual ACM Symposium on Theory of Computing, STOC '98, pages 604--613, New York, NY, USA, 1998. ACM. ISBN 0-89791-962-9.
[18]
Intel. Advancing moore's law in 2014the road to 14 nm. 2014. URL http://www.intel.com/content/www/us/en/silicon-innovations/advancing-moores-law-in-2014-presentation.html.
[19]
V. Janapa Reddi, B. C. Lee, T. Chilimbi, and K. Vaid. Web search using mobile cores: quantifying and mitigating the price of efficiency. In Proceedings of the 37th annual international symposium on Computer architecture, ISCA '10, pages 314--325, New York, NY, USA, 2010. ACM. ISBN 978-1-4503-0053-7.
[20]
M. Jeon, Y. He, S. Elnikety, A. L. Cox, and S. Rixner. Adaptive parallelism for web search. In Proceedings of the 8th ACM European Conference on Computer Systems, EuroSys '13, pages 155--168, New York, NY, USA, 2013. ACM. ISBN 978-1-4503-1994-2.
[21]
S. Keckler, W. Dally, B. Khailany, M. Garland, and D. Glasco. GPUs and the future of parallel computing. Micro, IEEE, 31(5):7--17, Sept. 2011. ISSN 0272-1732.
[22]
T. Kgil, A. Saidi, N. Binkert, S. Reinhardt, K. Flautner, and T. Mudge. Picoserver: Using 3D stacking technology to build energy efficient servers. J. Emerg. Technol. Comput. Syst., 4(4):16:1--16:34, Nov. 2008. ISSN 1550-4832.
[23]
J. Koomey. A simple model for determining true total cost of ownership for data centers.
[24]
J. R. Larus and M. Parkes. Using Cohort Scheduling to Enhance Server Performance (Extended Abstract). In LCTES '01: Proceedings of the ACM SIGPLAN workshop on Languages, compilers and tools for embedded systems, pages 182--187, New York, NY, USA, 2001. ACM. ISBN 1-58113-425-8.
[25]
D. Lipman and W. Pearson. Rapid and sensitive protein similarity searches. Science, 227(4693):1435--1441, 1985. URL http://www.sciencemag.org/content/227/4693/1435.abstract.
[26]
W. Liu and B. Vinter. A framework for general sparse matrix-matrix multiplication on gpus and heterogeneous processors. J. Parallel Distrib. Comput., 85(C):47--61, Nov. 2015. ISSN 0743-7315. URL http://dx.doi.org/10.1016/j.jpdc.2015.06.010.
[27]
P. Lotfi-Kamran, B. Grot, M. Ferdman, S. Volos, O. Kocberber, J. Picorel, A. Adileh, D. Jevdjic, S. Idgunji, E. Ozer, and B. Falsafi. Scaleout processors. In Proceedings of the 39th Annual International Symposium on Computer Architecture, ISCA '12, pages --, Washington DC, USA, 2012. IEEE Computer Society. ISBN 978-1-4503-1642-2.
[28]
C. D. Manning, P. Raghavan, and H. Schütze. Introduction to Information Retrieval. Cambridge University Press, New York, NY, USA, 2008. ISBN 0521865719, 9780521865715.
[29]
T. Mudge and U. Holzle. Challenges and opportunities for extremely energy-efficient processors. IEEE Micro, 30(4):20--24, July 2010. ISSN 0272-1732.
[30]
M. M. A. Patwary, N. R. Satish, N. Sundaram, J. Park, M. J. Anderson, S. G. Vadlamudi, D. Das, S. G. Pudov, V. O. Pirogov, and P. Dubey. Parallel efficient sparse matrix-matrix multiplication on multicore platforms. In High Performance Computing, pages 48--57. Springer, 2015.
[31]
A. Putnam, A. Caulfield, E. Chung, D. Chiou, K. Constantinides, J. Demme, H. Esmaeilzadeh, J. Fowers, G. P. Gopal, J. Gray, M. Haselman, S. Hauck, S. Heil, A. Hormati, J.-Y. Kim, S. Lanka, J. Larus, E. Peterson, S. Pope, A. Smith, J. Thong, P. Y. Xiao, and D. Burger. A reconfigurable fabric for accelerating large-scale datacenter services. In 41st Annual International Symposium on Computer Architecture (ISCA), June 2014.
[32]
A. W. M. Smeulders, M. Worring, S. Santini, A. Gupta, and R. Jain. Content-based image retrieval at the end of the early years. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 22(12): 1349--1380, Dec. 2000. ISSN 0162-8828.
[33]
A. Spink, D. Wolfram, M. B. J. Jansen, and T. Saracevic. Searching the web: The public and their queries. J. Am. Soc. Inf. Sci. Technol., 52(3):226--234, Feb. 2001. ISSN 1532-2882.
[34]
N. Sundaram, A. Turmukhametova, N. Satish, T. Mostak, P. Indyk, S. Madden, and P. Dubey. Streaming similarity search over one billion tweets using parallel locality-sensitive hashing. Proc. VLDB Endow., 6(14):1930--1941, Sept. 2013. ISSN 2150-8097. URL http://dl.acm.org/citation.cfm?id=2556549.2556574.
[35]
H. Sundmaeker, P. Guillemin, P. Friess, and S. Woelfflé. Vision and challenges for realising the internet of things. Cluster of European Research Projects on the Internet of Things, European Commision, 2010.
[36]
Verizon. State of the market the internet of things 2015. 2015. URL http://www.verizonenterprise.com/resources/reports/rp_state-of-market-the-market-the-internet-of-things-2015_en_xg.pdf.

Cited By

View all
  • (2022)Accelerating database analytic query workloads using an associative processorProceedings of the 49th Annual International Symposium on Computer Architecture10.1145/3470496.3527435(623-637)Online publication date: 18-Jun-2022
  • (2021)Improving Memory Performance for Both High Performance Computing and Embedded/Edge Computing Systemsundefined10.12794/metadc1873542Online publication date: Dec-2021
  • (2018)High-Performance Sparse Matrix-Matrix Products on Intel KNL and Multicore ArchitecturesWorkshop Proceedings of the 47th International Conference on Parallel Processing10.1145/3229710.3229720(1-10)Online publication date: 13-Aug-2018
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
PPoPP '16: Proceedings of the 21st ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
February 2016
420 pages
ISBN:9781450340922
DOI:10.1145/2851141
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 27 February 2016

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. GPGPU
  2. energy efficiency
  3. high throughput
  4. total cost of ownership

Qualifiers

  • Research-article

Funding Sources

Conference

PPoPP '16
Sponsor:

Acceptance Rates

Overall Acceptance Rate 230 of 1,014 submissions, 23%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)56
  • Downloads (Last 6 weeks)8
Reflects downloads up to 24 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2022)Accelerating database analytic query workloads using an associative processorProceedings of the 49th Annual International Symposium on Computer Architecture10.1145/3470496.3527435(623-637)Online publication date: 18-Jun-2022
  • (2021)Improving Memory Performance for Both High Performance Computing and Embedded/Edge Computing Systemsundefined10.12794/metadc1873542Online publication date: Dec-2021
  • (2018)High-Performance Sparse Matrix-Matrix Products on Intel KNL and Multicore ArchitecturesWorkshop Proceedings of the 47th International Conference on Parallel Processing10.1145/3229710.3229720(1-10)Online publication date: 13-Aug-2018
  • (2017)A many-core architecture for in-memory data processingProceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture10.1145/3123939.3123985(245-258)Online publication date: 14-Oct-2017
  • (2017)Big Data Processing: Scalability with Extreme Single-Node Performance2017 IEEE International Congress on Big Data (BigData Congress)10.1109/BigDataCongress.2017.26(129-136)Online publication date: Jun-2017
  • (2017)Scaling up data-parallel analytics platforms: Linear algebraic operation cases2017 IEEE International Conference on Big Data (Big Data)10.1109/BigData.2017.8257935(273-282)Online publication date: Dec-2017
  • (2016)Fatman vs. littleboyProceedings of the 1st Joint International Workshop on Parallel Data Storage & Data Intensive Scalable Computing Systems10.5555/3019046.3019051(25-30)Online publication date: 13-Nov-2016
  • (2016)FatMan vs. LittleBoy: Scaling Up Linear Algebraic Operations in Scale-Out Data Platforms2016 1st Joint International Workshop on Parallel Data Storage and data Intensive Scalable Computing Systems (PDSW-DISCS)10.1109/PDSW-DISCS.2016.009(25-30)Online publication date: Nov-2016
  • (2020)Bandwidth Efficient Near-Storage Accelerator for High-Dimensional Similarity Search2020 International Conference on Field-Programmable Technology (ICFPT)10.1109/ICFPT51103.2020.00026(129-138)Online publication date: Dec-2020

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media