Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/2485922.2485944acmotherconferencesArticle/Chapter ViewAbstractPublication PagesiscaConference Proceedingsconference-collections
research-article

Navigating big data with high-throughput, energy-efficient data partitioning

Published: 23 June 2013 Publication History

Abstract

The global pool of data is growing at 2.5 quintillion bytes per day, with 90% of it produced in the last two years alone [24]. There is no doubt the era of big data has arrived. This paper explores targeted deployment of hardware accelerators to improve the throughput and energy efficiency of large-scale data processing. In particular, data partitioning is a critical operation for manipulating large data sets. It is often the limiting factor in database performance and represents a significant fraction of the overall runtime of large data queries.
To accelerate partitioning, this paper describes a hardware accelerator for range partitioning, or HARP, and a hardware-software data streaming framework. The streaming framework offers a seamless execution environment for streaming accelerators such as HARP. Together, HARP and the streaming framework provide an order of magnitude improvement in partitioning performance and energy. A detailed analysis of a 32nm physical design shows 7.8 times the throughput of a highly optimized and optimistic software implementation, while consuming just 6.9% of the area and 4.3% of the power of a single Xeon core in the same technology generation.

References

[1]
A. Ailamaki, D. J. DeWitt, M. D. Hill, and D. A. Wood. DBMSs on a modern processor: Where does time go? In VLDB, 1999.
[2]
S. Blanas, Y. Li, and J. M. Patel. Design and evaluation of main memory hash join algorithms for multi-core CPUs. In SIGMOD, 2011.
[3]
Bluespec, Inc. Bluespec Core Technology. http://www.bluespec.com.
[4]
H. Boral and D. J. DeWitt. Database machines: an idea whose time has passed? In IWDM, 1983.
[5]
R. D. Cameron and D. Lin. Architectural support for SWAR text processing with parallel bit streams: the inductive doubling principle. In ASPLOS, 2009.
[6]
Centrum Wiskunde and Informatica. http://www.monetdb.org.
[7]
S. Chakraborty and L. Thiele. A new task model for streaming applications and its schedulability analysis. In DATE, 2005.
[8]
D. Chatziantoniou and K. A. Ross. Partitioned optimization of complex queries. Information Systems (IS), 32(2):248--282, 2007.
[9]
J. Cieslewicz and K. A. Ross. Data partitioning on chip multiprocessors. In DaMoN, 2008.
[10]
S. Ciricescu, R. Essick, B. Lucas, P. May, K. Moat, J. Norris, M. Schuette, and A. Saidi. The reconfigurable streaming vector processor (RSVPTM). In MICRO, 2003.
[11]
B. F. Cooper and K. Schwan. Distributed stream management using utility-driven self-adaptive middleware. In CAC, 2005.
[12]
Q. Deng, D. Meisner, L. Ramos, T. F. Wenisch, and R. Bianchini. Memscale: active low-power modes for main memory. In ASPLOS, 2011.
[13]
M. Duller, J. S. Rellermeyer, G. Alonso, and N. Tatbul. Virtualizing stream processing. In Middleware, 2011.
[14]
E. Ebrahimi, R. Miftakhutdinov, C. Fallin, C. J. Lee, J. A. Joao, O. Mutlu, and Y. N. Patt. Parallel application memory scheduling. In MICRO, 2011.
[15]
B. Flachs et al. A streaming processing unit for a CELL processor. In ISSCC, 2005.
[16]
S. C. Goldstein, H. Schmit, M. Moe, M. Budiu, S. Cadambi, R. R. Taylor, and R. Laufer. PipeRench: a co/processor for streaming multimedia acceleration. In ISCA, 1999.
[17]
M. I. Gordon, W. Thies, and S. Amarasinghe. Exploiting coarse-grained task, data, and pipeline parallelism in stream programs. In ASPLOS, 2006.
[18]
N. K. Govindaraju and D. Manocha. Efficient relational database management using graphics processors. In DaMoN, 2005.
[19]
V. Govindaraju, C.-H. Ho, and K. Sankaralingam. Dynamically specialized datapaths for energy efficient computing. In HPCA, 2011.
[20]
G. Graefe and P.-A. Larson. B-tree indexes and CPU caches. In ICDE, 2001.
[21]
N. Hardavellas, M. Ferdman, B. Falsafi, and A. Ailamaki. Toward dark silicon in servers. IEEE Micro, 31(4), 2011.
[22]
HP Labs. http://www.hpl.hp.com/research/cacti/.
[23]
IBM. DB2 Partitioning Features. http://www.ibm.com/developerworks/data/library/techarticle/dm-0608mcinerney.
[24]
IBM. IBM What is big data? Bringing big data to enterprise. http://www-01.ibm.com/software/data/bigdata/.
[25]
Intel Corporation. Intel® Xeon® Processor E5620. http://ark.intel.com/products/47925.
[26]
E. Ipek, O. Mutlu, J. F. Martínez, and R. Caruana. Self-optimizing memory controllers: A reinforcement learning approach. In ISCA, 2008.
[27]
N. Jain, L. Amini, H. Andrade, R. King, Y. Park, P. Selo, and C. Venkatramani. Design, implementation, and evaluation of the linear road bnchmark on the stream processing core. In SIGMOD, 2006.
[28]
N. P. Jouppi. Improvind direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers. In ISCA, 1990.
[29]
C. Kim, E. Sedlar, J. Chhugani, T. Kaldewey, A. D. Nguyen, A. D. Blas, V. W. Lee, N. Satish, and P. Dubey. Sort vs. hash revisited: Fast join implementation on modern multi-core CPUs. PVLDB, 2(2):1378--1389, 2009.
[30]
C. Kozyrakis, A. Kansal, S. Sankar, and K. Vaid. Server engineering insights for large-scale online services. IEEE Micro, 30(4), July/August 2010.
[31]
J. Krueger, C. Kim, M. Grund, N. Satish, D. Schwalb, J. Chhugani, H. Plattner, P. Dubey, and A. Zeier. Fast updates on read-optimized databases using multi-core CPUs. PVLDB, 5(1):61--72, Sept. 2011.
[32]
D. Lin, N. Medforth, K. S. Herdy, A. Shriraman, and R. Cameron. Parabix: Boosting the efficiency of text processing on commodity processors. In HPCA, 2012.
[33]
K. T. Malladi, F. Nothaft, K. Periyathambi, B. C. Lee, C. Kozyrakis, and M. Horowitz. Towards energy-proportional datacenter memory with mobile dram. In ISCA, 2012.
[34]
Microsoft. Microsoft SQL Server 2012. http://technet.microsoft.com/en-us/sqlserver/ff898410.
[35]
C. Mohan. Impact of recent hardware and software trends on high performance transaction processing and analytics. In TPCTC, 2011.
[36]
R. Müller and J. Teubner. FPGAs: a new point in the database design space. In EDBT, 2010.
[37]
MySQL. Date and time datatype representation. http://dev.mysql.com/doc/internals/en/date-and-time-data-type-representation.html.
[38]
C. Natarajan, B. Christenson, and F. Briggs. A study of performance impact of memory controller features in multi-processor server environment. In WMPI, 2004.
[39]
L. Neumeyer, B. Robbins, A. Nair, and A. Kesari. S4: Distributed stream computing platform. In ICDMW, 2010.
[40]
Oracle. Oracle Database 11g: Partitioning. http://www.oracle.com/technetwork/database/options/partitioning/index.html.
[41]
N. Rafique, W.-T. Lim, and M. Thottethodi. Effective Management of DRAM Bandwidth in Multicore Processors. In PACT, 2007.
[42]
S. Rixner. Memory controller optimizations for web servers. In MICRO, 2004.
[43]
K. A. Ross and J. Cieslewicz. Optimal splitters for database partitioning with size bounds. In ICDT, pages 98--110, 2009.
[44]
P. Saab. Scaling memcached at Facebook, Dec 2008. https://www.facebook.com/note.php?note_id=39391378919.
[45]
V. Salapura, T. Karkhanis, P. Nagpurkar, and J. Moreira. Accelerating business analytics applications. In HPCA, 2012.
[46]
B. Schlegel, R. Gemulla, and W. Lehner. k-ary search on modern processors. In DaMoN, 2009.
[47]
J. Shao and B. Davis. A burst scheduling access reordering mechanism. In HPCA, 2007.
[48]
H. Subramoni, F. Petrini, V. Agarwal, and D. Pasetto. Intra-socket and inter-socket communication in multi-core systems. IEEE Computer Architecture Letters, 9:13--16, January 2010.
[49]
Synopsys, Inc. 32/28nm Generic Library for IC Design, Design Compiler, IC Compiler. http://www.synopsys.com.
[50]
L. Tang, J. Mars, N. Vachharajani, R. Hundt, and M. L. Soffa. The impact of memory subsystem resource sharing on datacenter applications. In ISCA, 2011.
[51]
Transaction Processing Performance Council. http://www.tpc.org/tpch/default.asp.
[52]
M. A. Watkins and D. H. Albonesi. ReMAP: A reconfigurable heterogeneous multicore architecture. In MICRO, 2010.
[53]
L. Woods, J. Teubner, and G. Alonso. Complex event detection at wire speed with FPGAs. PVLDB, 3(1):660--669, 2010.
[54]
Y. Ye, K. A. Ross, and N. Vesdapunt. Scalable aggregation on multicore processors. In DaMoN, 2011.
[55]
J. Zhou and K. A. Ross. Implementing database operations using SIMD instructions. In SIGMOD, 2002.

Cited By

View all
  • (2024)DIBA: A Re-Configurable Stream ProcessorIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2024.338119236:9(4550-4566)Online publication date: Sep-2024
  • (2023)Profiling Hyperscale Big Data ProcessingProceedings of the 50th Annual International Symposium on Computer Architecture10.1145/3579371.3589082(1-16)Online publication date: 17-Jun-2023
  • (2022)A Highly Parallel Fine-Grained Sort-Merge Join on Near Memory Computing2022 IEEE International Symposium on Circuits and Systems (ISCAS)10.1109/ISCAS48785.2022.9937941(2566-2570)Online publication date: 28-May-2022
  • Show More Cited By

Index Terms

  1. Navigating big data with high-throughput, energy-efficient data partitioning

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Other conferences
      ISCA '13: Proceedings of the 40th Annual International Symposium on Computer Architecture
      June 2013
      686 pages
      ISBN:9781450320795
      DOI:10.1145/2485922
      • cover image ACM SIGARCH Computer Architecture News
        ACM SIGARCH Computer Architecture News  Volume 41, Issue 3
        ICSA '13
        June 2013
        666 pages
        ISSN:0163-5964
        DOI:10.1145/2508148
        Issue’s Table of Contents
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Sponsors

      • IEEE CS

      In-Cooperation

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 23 June 2013

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. accelerator
      2. data partitioning
      3. microarchitecture
      4. specialized functional unit
      5. streaming data

      Qualifiers

      • Research-article

      Funding Sources

      Conference

      ISCA'13
      Sponsor:

      Acceptance Rates

      ISCA '13 Paper Acceptance Rate 56 of 288 submissions, 19%;
      Overall Acceptance Rate 543 of 3,203 submissions, 17%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)38
      • Downloads (Last 6 weeks)3
      Reflects downloads up to 03 Feb 2025

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)DIBA: A Re-Configurable Stream ProcessorIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2024.338119236:9(4550-4566)Online publication date: Sep-2024
      • (2023)Profiling Hyperscale Big Data ProcessingProceedings of the 50th Annual International Symposium on Computer Architecture10.1145/3579371.3589082(1-16)Online publication date: 17-Jun-2023
      • (2022)A Highly Parallel Fine-Grained Sort-Merge Join on Near Memory Computing2022 IEEE International Symposium on Circuits and Systems (ISCAS)10.1109/ISCAS48785.2022.9937941(2566-2570)Online publication date: 28-May-2022
      • (2020)Scalable Multiway Stream Joins in HardwareIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2019.291686032:12(2438-2452)Online publication date: 1-Dec-2020
      • (2019)M3XProceedings of the 2019 USENIX Conference on Usenix Annual Technical Conference10.5555/3358807.3358859(617-631)Online publication date: 10-Jul-2019
      • (2019)Accelerating raw data analysis with the ACCORDA software and hardware architectureProceedings of the VLDB Endowment10.14778/3342263.334263412:11(1568-1582)Online publication date: 1-Jul-2019
      • (2019)NEMESYSProceedings of the International Symposium on Memory Systems10.1145/3357526.3357545(3-18)Online publication date: 30-Sep-2019
      • (2019)Master of none accelerationProceedings of the 46th International Symposium on Computer Architecture10.1145/3307650.3322220(762-773)Online publication date: 22-Jun-2019
      • (2018)Algorithm/Architecture Co-Design for Near-Memory ProcessingACM SIGOPS Operating Systems Review10.1145/3273982.327399252:1(109-122)Online publication date: 28-Aug-2018
      • (2018)RAPIDProceedings of the 2018 International Conference on Management of Data10.1145/3183713.3190655(1407-1419)Online publication date: 27-May-2018
      • Show More Cited By

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Figures

      Tables

      Media

      Share

      Share

      Share this Publication link

      Share on social media