research-article

Navigating big data with high-throughput, energy-efficient data partitioning

Authors:

Raymond J. Barker,

Kenneth A. RossAuthors Info & Claims

ISCA '13: Proceedings of the 40th Annual International Symposium on Computer Architecture

Pages 249 - 260

https://doi.org/10.1145/2485922.2485944

Published: 23 June 2013 Publication History

Abstract

The global pool of data is growing at 2.5 quintillion bytes per day, with 90% of it produced in the last two years alone [24]. There is no doubt the era of big data has arrived. This paper explores targeted deployment of hardware accelerators to improve the throughput and energy efficiency of large-scale data processing. In particular, data partitioning is a critical operation for manipulating large data sets. It is often the limiting factor in database performance and represents a significant fraction of the overall runtime of large data queries.

To accelerate partitioning, this paper describes a hardware accelerator for range partitioning, or HARP, and a hardware-software data streaming framework. The streaming framework offers a seamless execution environment for streaming accelerators such as HARP. Together, HARP and the streaming framework provide an order of magnitude improvement in partitioning performance and energy. A detailed analysis of a 32nm physical design shows 7.8 times the throughput of a highly optimized and optimistic software implementation, while consuming just 6.9% of the area and 4.3% of the power of a single Xeon core in the same technology generation.

References

[1]

A. Ailamaki, D. J. DeWitt, M. D. Hill, and D. A. Wood. DBMSs on a modern processor: Where does time go? In VLDB, 1999.

Digital Library

[2]

S. Blanas, Y. Li, and J. M. Patel. Design and evaluation of main memory hash join algorithms for multi-core CPUs. In SIGMOD, 2011.

Digital Library

[3]

Bluespec, Inc. Bluespec Core Technology. http://www.bluespec.com.

[4]

H. Boral and D. J. DeWitt. Database machines: an idea whose time has passed? In IWDM, 1983.

[5]

R. D. Cameron and D. Lin. Architectural support for SWAR text processing with parallel bit streams: the inductive doubling principle. In ASPLOS, 2009.

Digital Library

[6]

Centrum Wiskunde and Informatica. http://www.monetdb.org.

[7]

S. Chakraborty and L. Thiele. A new task model for streaming applications and its schedulability analysis. In DATE, 2005.

Digital Library

[8]

D. Chatziantoniou and K. A. Ross. Partitioned optimization of complex queries. Information Systems (IS), 32(2):248--282, 2007.

Digital Library

[9]

J. Cieslewicz and K. A. Ross. Data partitioning on chip multiprocessors. In DaMoN, 2008.

Digital Library

[10]

S. Ciricescu, R. Essick, B. Lucas, P. May, K. Moat, J. Norris, M. Schuette, and A. Saidi. The reconfigurable streaming vector processor (RSVPTM). In MICRO, 2003.

Digital Library

[11]

B. F. Cooper and K. Schwan. Distributed stream management using utility-driven self-adaptive middleware. In CAC, 2005.

Digital Library

[12]

Q. Deng, D. Meisner, L. Ramos, T. F. Wenisch, and R. Bianchini. Memscale: active low-power modes for main memory. In ASPLOS, 2011.

Digital Library

[13]

M. Duller, J. S. Rellermeyer, G. Alonso, and N. Tatbul. Virtualizing stream processing. In Middleware, 2011.

Digital Library

[14]

E. Ebrahimi, R. Miftakhutdinov, C. Fallin, C. J. Lee, J. A. Joao, O. Mutlu, and Y. N. Patt. Parallel application memory scheduling. In MICRO, 2011.

Digital Library

[15]

B. Flachs et al. A streaming processing unit for a CELL processor. In ISSCC, 2005.

[16]

S. C. Goldstein, H. Schmit, M. Moe, M. Budiu, S. Cadambi, R. R. Taylor, and R. Laufer. PipeRench: a co/processor for streaming multimedia acceleration. In ISCA, 1999.

Digital Library

[17]

M. I. Gordon, W. Thies, and S. Amarasinghe. Exploiting coarse-grained task, data, and pipeline parallelism in stream programs. In ASPLOS, 2006.

Digital Library

[18]

N. K. Govindaraju and D. Manocha. Efficient relational database management using graphics processors. In DaMoN, 2005.

Digital Library

[19]

V. Govindaraju, C.-H. Ho, and K. Sankaralingam. Dynamically specialized datapaths for energy efficient computing. In HPCA, 2011.

Digital Library

[20]

G. Graefe and P.-A. Larson. B-tree indexes and CPU caches. In ICDE, 2001.

Digital Library

[21]

N. Hardavellas, M. Ferdman, B. Falsafi, and A. Ailamaki. Toward dark silicon in servers. IEEE Micro, 31(4), 2011.

Digital Library

[22]

HP Labs. http://www.hpl.hp.com/research/cacti/.

[23]

IBM. DB2 Partitioning Features. http://www.ibm.com/developerworks/data/library/techarticle/dm-0608mcinerney.

[24]

IBM. IBM What is big data? Bringing big data to enterprise. http://www-01.ibm.com/software/data/bigdata/.

[25]

Intel Corporation. Intel® Xeon® Processor E5620. http://ark.intel.com/products/47925.

[26]

E. Ipek, O. Mutlu, J. F. Martínez, and R. Caruana. Self-optimizing memory controllers: A reinforcement learning approach. In ISCA, 2008.

Digital Library

[27]

N. Jain, L. Amini, H. Andrade, R. King, Y. Park, P. Selo, and C. Venkatramani. Design, implementation, and evaluation of the linear road bnchmark on the stream processing core. In SIGMOD, 2006.

Digital Library

[28]

N. P. Jouppi. Improvind direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers. In ISCA, 1990.

Digital Library

[29]

C. Kim, E. Sedlar, J. Chhugani, T. Kaldewey, A. D. Nguyen, A. D. Blas, V. W. Lee, N. Satish, and P. Dubey. Sort vs. hash revisited: Fast join implementation on modern multi-core CPUs. PVLDB, 2(2):1378--1389, 2009.

Digital Library

[30]

C. Kozyrakis, A. Kansal, S. Sankar, and K. Vaid. Server engineering insights for large-scale online services. IEEE Micro, 30(4), July/August 2010.

Digital Library

[31]

J. Krueger, C. Kim, M. Grund, N. Satish, D. Schwalb, J. Chhugani, H. Plattner, P. Dubey, and A. Zeier. Fast updates on read-optimized databases using multi-core CPUs. PVLDB, 5(1):61--72, Sept. 2011.

Digital Library

[32]

D. Lin, N. Medforth, K. S. Herdy, A. Shriraman, and R. Cameron. Parabix: Boosting the efficiency of text processing on commodity processors. In HPCA, 2012.

Digital Library

[33]

K. T. Malladi, F. Nothaft, K. Periyathambi, B. C. Lee, C. Kozyrakis, and M. Horowitz. Towards energy-proportional datacenter memory with mobile dram. In ISCA, 2012.

Digital Library

[34]

Microsoft. Microsoft SQL Server 2012. http://technet.microsoft.com/en-us/sqlserver/ff898410.

[35]

C. Mohan. Impact of recent hardware and software trends on high performance transaction processing and analytics. In TPCTC, 2011.

Digital Library

[36]

R. Müller and J. Teubner. FPGAs: a new point in the database design space. In EDBT, 2010.

Digital Library

[37]

MySQL. Date and time datatype representation. http://dev.mysql.com/doc/internals/en/date-and-time-data-type-representation.html.

[38]

C. Natarajan, B. Christenson, and F. Briggs. A study of performance impact of memory controller features in multi-processor server environment. In WMPI, 2004.

Digital Library

[39]

L. Neumeyer, B. Robbins, A. Nair, and A. Kesari. S4: Distributed stream computing platform. In ICDMW, 2010.

Digital Library

[40]

Oracle. Oracle Database 11g: Partitioning. http://www.oracle.com/technetwork/database/options/partitioning/index.html.

[41]

N. Rafique, W.-T. Lim, and M. Thottethodi. Effective Management of DRAM Bandwidth in Multicore Processors. In PACT, 2007.

Digital Library

[42]

S. Rixner. Memory controller optimizations for web servers. In MICRO, 2004.

Digital Library

[43]

K. A. Ross and J. Cieslewicz. Optimal splitters for database partitioning with size bounds. In ICDT, pages 98--110, 2009.

Digital Library

[44]

P. Saab. Scaling memcached at Facebook, Dec 2008. https://www.facebook.com/note.php?note_id=39391378919.

[45]

V. Salapura, T. Karkhanis, P. Nagpurkar, and J. Moreira. Accelerating business analytics applications. In HPCA, 2012.

Digital Library

[46]

B. Schlegel, R. Gemulla, and W. Lehner. k-ary search on modern processors. In DaMoN, 2009.

Digital Library

[47]

J. Shao and B. Davis. A burst scheduling access reordering mechanism. In HPCA, 2007.

Digital Library

[48]

H. Subramoni, F. Petrini, V. Agarwal, and D. Pasetto. Intra-socket and inter-socket communication in multi-core systems. IEEE Computer Architecture Letters, 9:13--16, January 2010.

Digital Library

[49]

Synopsys, Inc. 32/28nm Generic Library for IC Design, Design Compiler, IC Compiler. http://www.synopsys.com.

[50]

L. Tang, J. Mars, N. Vachharajani, R. Hundt, and M. L. Soffa. The impact of memory subsystem resource sharing on datacenter applications. In ISCA, 2011.

Digital Library

[51]

Transaction Processing Performance Council. http://www.tpc.org/tpch/default.asp.

[52]

M. A. Watkins and D. H. Albonesi. ReMAP: A reconfigurable heterogeneous multicore architecture. In MICRO, 2010.

Digital Library

[53]

L. Woods, J. Teubner, and G. Alonso. Complex event detection at wire speed with FPGAs. PVLDB, 3(1):660--669, 2010.

Digital Library

[54]

Y. Ye, K. A. Ross, and N. Vesdapunt. Scalable aggregation on multicore processors. In DaMoN, 2011.

Digital Library

[55]

J. Zhou and K. A. Ross. Implementing database operations using SIMD instructions. In SIGMOD, 2002.

Digital Library

Cited By

Najafi MQadah TSadoghi MJacobsen H(2024)DIBA: A Re-Configurable Stream ProcessorIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2024.338119236:9(4550-4566)Online publication date: Sep-2024
https://doi.org/10.1109/TKDE.2024.3381192
Gonzalez AKolli AKhan SLiu SDadu VKarandikar SChang JAsanovic KRanganathan PSolihin YHeinrich M(2023)Profiling Hyperscale Big Data ProcessingProceedings of the 50th Annual International Symposium on Computer Architecture10.1145/3579371.3589082(1-16)Online publication date: 17-Jun-2023
https://dl.acm.org/doi/10.1145/3579371.3589082
Lin PKuo YLai B(2022)A Highly Parallel Fine-Grained Sort-Merge Join on Near Memory Computing2022 IEEE International Symposium on Circuits and Systems (ISCAS)10.1109/ISCAS48785.2022.9937941(2566-2570)Online publication date: 28-May-2022
https://doi.org/10.1109/ISCAS48785.2022.9937941
Show More Cited By

Index Terms

Navigating big data with high-throughput, energy-efficient data partitioning
1. Computer systems organization
  1. Embedded and cyber-physical systems
  2. Real-time systems

Recommendations

FPGA-based Data Partitioning
SIGMOD '17: Proceedings of the 2017 ACM International Conference on Management of Data

Implementing parallel operators in multi-core machines often involves a data partitioning step that divides the data into cache-size blocks and arranges them so to allow concurrent threads to process them in parallel. Data partitioning is expensive, in ...
Q100: the architecture and design of a database processing unit
ASPLOS '14

In this paper, we propose Database Processing Units, or DPUs, a class of domain-specific database processors that can efficiently handle database applications. As a proof of concept, we present the instruction set architecture, microarchitecture, and ...
Navigating big data with high-throughput, energy-efficient data partitioning
ICSA '13

The global pool of data is growing at 2.5 quintillion bytes per day, with 90% of it produced in the last two years alone [24]. There is no doubt the era of big data has arrived. This paper explores targeted deployment of hardware accelerators to improve ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

ISCA '13: Proceedings of the 40th Annual International Symposium on Computer Architecture

June 2013

686 pages

ISBN:9781450320795

DOI:10.1145/2485922

General Chair:
Avi Mendelson
Technion

ACM SIGARCH Computer Architecture News Volume 41, Issue 3
ICSA '13
June 2013
666 pages
ISSN:0163-5964
DOI:10.1145/2508148
Issue’s Table of Contents

Copyright © 2013 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

IEEE CS

In-Cooperation

SIGARCH: ACM Special Interest Group on Computer Architecture

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 23 June 2013

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

Conference

ISCA'13

Sponsor:

ISCA'13: The 40th Annual International Symposium on Computer Architecture

June 23 - 27, 2013

Tel-Aviv, Israel

Acceptance Rates

ISCA '13 Paper Acceptance Rate 56 of 288 submissions, 19%;

Overall Acceptance Rate 543 of 3,203 submissions, 17%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

81
Total Citations
View Citations
1,169
Total Downloads

Downloads (Last 12 months)38
Downloads (Last 6 weeks)3

Reflects downloads up to 03 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Najafi MQadah TSadoghi MJacobsen H(2024)DIBA: A Re-Configurable Stream ProcessorIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2024.338119236:9(4550-4566)Online publication date: Sep-2024
https://doi.org/10.1109/TKDE.2024.3381192
Gonzalez AKolli AKhan SLiu SDadu VKarandikar SChang JAsanovic KRanganathan PSolihin YHeinrich M(2023)Profiling Hyperscale Big Data ProcessingProceedings of the 50th Annual International Symposium on Computer Architecture10.1145/3579371.3589082(1-16)Online publication date: 17-Jun-2023
https://dl.acm.org/doi/10.1145/3579371.3589082
Lin PKuo YLai B(2022)A Highly Parallel Fine-Grained Sort-Merge Join on Near Memory Computing2022 IEEE International Symposium on Circuits and Systems (ISCAS)10.1109/ISCAS48785.2022.9937941(2566-2570)Online publication date: 28-May-2022
https://doi.org/10.1109/ISCAS48785.2022.9937941
Najafi MSadoghi MJacobsen H(2020)Scalable Multiway Stream Joins in HardwareIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2019.291686032:12(2438-2452)Online publication date: 1-Dec-2020
https://doi.org/10.1109/TKDE.2019.2916860
Asmussen NRoitzsch MHärtig HDan TDahlia M(2019)M3XProceedings of the 2019 USENIX Conference on Usenix Annual Technical Conference10.5555/3358807.3358859(617-631)Online publication date: 10-Jul-2019
https://dl.acm.org/doi/10.5555/3358807.3358859
Fang YZou CChien A(2019)Accelerating raw data analysis with the ACCORDA software and hardware architectureProceedings of the VLDB Endowment10.14778/3342263.334263412:11(1568-1582)Online publication date: 1-Jul-2019
https://dl.acm.org/doi/10.14778/3342263.3342634
Rheindt SFried ALenke ONolte LWild THerkersdorf A(2019)NEMESYSProceedings of the International Symposium on Memory Systems10.1145/3357526.3357545(3-18)Online publication date: 30-Sep-2019
https://dl.acm.org/doi/10.1145/3357526.3357545
Lottarini ACerqueira JRepetti TEdwards SRoss KSeok MKim MManne SHunter HAltman E(2019)Master of none accelerationProceedings of the 46th International Symposium on Computer Architecture10.1145/3307650.3322220(762-773)Online publication date: 22-Jun-2019
https://dl.acm.org/doi/10.1145/3307650.3322220
Drumond MDaglis AMirzadeh NUstiugov DPicorel JFalsafi BGrot BPnevmatikatos D(2018)Algorithm/Architecture Co-Design for Near-Memory ProcessingACM SIGOPS Operating Systems Review10.1145/3273982.327399252:1(109-122)Online publication date: 28-Aug-2018
https://dl.acm.org/doi/10.1145/3273982.3273992
Balkesen CKunal NGiannikis GFender PSundara SSchmidt FWen JAgrawal SRaghavan AVaradarajan VViswanathan AChandrasekaran BIdicula SAgarwal NSedlar EDas GJermaine CBernstein P(2018)RAPIDProceedings of the 2018 International Conference on Management of Data10.1145/3183713.3190655(1407-1419)Online publication date: 27-May-2018
https://dl.acm.org/doi/10.1145/3183713.3190655
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten