Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/2485922.2485945acmotherconferencesArticle/Chapter ViewAbstractPublication PagesiscaConference Proceedingsconference-collections
research-article

LINQits: big data on little clients

Published: 23 June 2013 Publication History
  • Get Citation Alerts
  • Abstract

    We present LINQits, a flexible hardware template that can be mapped onto programmable logic or ASICs in a heterogeneous system-on-chip for a mobile device or server. Unlike fixed-function accelerators, LINQits accelerates a domain-specific query language called LINQ. LINQits does not provide coverage for all possible applications---however, existing applications (re-)written with LINQ in mind benefit extensively from hardware acceleration. Furthermore, the LINQits framework offers a graceful and transparent migration path from software to hardware.
    LINQits is prototyped on a 2W heterogeneous SoC called the ZYNQ processor, which combines dual ARM A9 processors with an FPGA on a single die in 28nm silicon technology. Our physical measurements show that LINQits improves energy efficiency by 8.9 to 30.6 times and performance by 10.7 to 38.1 times compared to optimized, multithreaded C programs running on conventional ARM A9 processors.

    References

    [1]
    "Big Data Definition," mike2.openmethodology.org/wiki/Big_Data_Definition.
    [2]
    "Mono Platform," www.mono-project.com.
    [3]
    P. Bakkum and K. Skadron, "Accelerating SQL Database Operations on a GPU with CUDA," in GPGPU'10.
    [4]
    J. Benson, R. Cofell, C. Frericks, C.-H. Ho, V. Govindaraju, T. Nowatzki, and K. Sankaralingam, "Design, Integration and Implementation of the DySER Hardware Accelerator into OpenSPARC," in HPCA'12.
    [5]
    K. Brown, A. Sujeeth, H. J. Lee, T. Rompf, H. Chafi, M. Odersky, and K. Olukotun, "A Heterogeneous Parallel Framework for Domain-Specific Languages," in PACT'11.
    [6]
    M. Budiu, J. Shotton, D. G. Murray, and M. Finocchio, "Parallelizing the Training of the Kinect Body Parts Labeling Algorithm," in Big Learning: Algorithms, Systems and Tools for Learning at Scale, Sierra Nevada, Spain, December 16--17 2011.
    [7]
    B. Catanzaro, M. Garland, and K. Keutzer, "Copperhead: Compiling an Embedded Data Parallel Language," in PPoPP'11.
    [8]
    Chipworks, Inc. Inside the Apple iPad 4--A6X a very new beast! www.chipworks.com/blog/recentteardowns/2012/11/01/inside-the-apple-ipad-4-a6x-to-be-revealed/.
    [9]
    E. S. Chung, P. A. Milder, J. C. Hoe, and K. Mai, "Single-Chip Heterogeneous Computing: Does the Future Include Custom Logic, FPGAs, and GPGPUs?" in MICRO'10.
    [10]
    R. H. Dennard, F. H. Gaensslen, H.-n. Yu, V. Leo Rideovt, E. Bassous, and A. R. Leblanc, "Design of Ion-Implanted MOSFET's with Very Small Physical Dimensions," Solid-State Circuits Newsletter, IEEE, vol. 12, no. 1, pp. 38--50, winter 2007.
    [11]
    D. J. DeWitt, "DIRECT - A Multiprocessor Organization for Supporting Relational Database Management Systems," in ISCA'78.
    [12]
    D. J. DeWitt and R. H. Gerber, "Multiprocessor Hash-Based Join Algorithms," in VLDB'85.
    [13]
    D. J. Dewitt, R. H. Gerber, G. Graefe, M. L. Heytens, K. B. Kumar, and M. Muralikrishna, "Gamma - A High Performance Dataflow Database Machine," in VLDB'86.
    [14]
    H. Esmaeilzadeh, E. Blem, R. St. Amant, K. Sankaralingam, and D. Burger, "Dark Silicon and the End of Multicore Scaling," in ISCA'11.
    [15]
    Gartner, "The Mobile Scenario: Understanding Mobile Trends Through 2017," gartner.com/it/page.jsp?id=2227215, Nov 2012.
    [16]
    J. R. Goodman, "An Investigation of Multiprocessor Structures and Algorithms for Database Management," May 1981.
    [17]
    N. K. Govindaraju and D. Manocha, "Efficient Relational Database Management Using Graphics Processors," in DaMoN'05.
    [18]
    N. Hardavellas, M. Ferdman, B. Falsafi, and A. Ailamaki, "Toward Dark Silicon in Servers," IEEE Micro, vol. 31, no. 4, pp. 6--15, Jul. 2011.
    [19]
    B. He, M. Lu, K. Yang, R. Fang, N. K. Govindaraju, Q. Luo, and P. V. Sander, "Relational Query Coprocessing on Graphics Processors," ACM Trans. Database Syst., vol. 34, no. 4, Dec. 2009.
    [20]
    B. He, K. Yang, R. Fang, M. Lu, N. Govindaraju, Q. Luo, and P. Sander, "Relational Joins on Graphics Processors," in SIGMOD'08.
    [21]
    Herb Sutter, "Elements of Modern C++ Style," herbsutter.com/elements-of-modern-c-style, Oct 2010.
    [22]
    IBM, Inc. The Netezza Data Appliance Architecture: A Platform for High Performance Data Warehousing and Analytics.
    [23]
    Intel, Inc. Intel Math Kernel Library. http://www.intel.com/software/products/mkl.
    [24]
    M. Isard et al., "Dryad: Distributed Data-Parallel Programs from Sequential Building Blocks," in Proc. EuroSys, 2007.
    [25]
    T. Kaldewey, G. Lohman, R. Mueller, and P. Volk, "GPU Join Processing Revisited," in DaMoN'12.
    [26]
    S. Kamil, D. Coetzee, S. Beamer, H. Cook, E. Gonina, J. Harper, J. Morlan, and A. Fox, "Portable Parallel Performance from Sequential, Productive, Embedded Domain-Specific Languages," SIGPLAN Not., vol. 47, no. 8, pp. 303--304, Feb. 2012.
    [27]
    C. Kim, T. Kaldewey, V. W. Lee, E. Sedlar, A. D. Nguyen, N. Satish, J. Chhugani, A. Di Blas, and P. Dubey, "Sort vs. Hash Revisited: Fast Join Implementation on Modern Multi-core CPUs," Proc. VLDB Endow.
    [28]
    I. Kuon and J. Rose, "Measuring the Gap Between FPGAs and ASICs," in FPGA'06.
    [29]
    I. Lebedev, C. Fletcher, S. Cheng, J. Martin, A. Doupnik, D. Burke, M. Lin, and J. Wawrzynek, "Exploring Many-core Design Templates for FPGAs and ASICs," Int. J. Reconfig. Comput., vol. 2012, pp. 8:8--8:8, Jan. 2012. {Online}. Available: http://dx.doi.org/10.1155/2012/439141
    [30]
    S. Manegold, P. Boncz, and M. Kersten, "Optimizing Main-Memory Join on Modern Hardware," IEEE Trans. on Knowl. and Data Eng., vol. 14, no. 4, pp. 709--730, Jul. 2002.
    [31]
    F. McSherry, Y. Yu, M. Budiu, M. Isard, and D. Fetterly, Scaling Up Machine Learning. Cambridge University Press, 2011, ch. Large-Scale Machine Learning using DryadLINQ.
    [32]
    Microsoft, Inc., "LINQ (Language-Integrated Query)," msdn.microsoft.com/en-us/library/bb397926.aspx.
    [33]
    R. Mueller, J. Teubner, and G. Alonso, "Glacier: A Query-to-Hardware Compiler," in SIGMOD'10.
    [34]
    R. Mueller, "Streams on Wires: A Query Compiler for FPGAs," Proc. VLDB Endow., vol. 2, no. 1, pp. 229--240, Aug. 2009.
    [35]
    NVIDIA, Inc. www.nvidia.com.
    [36]
    NVIDIA, Inc. www.nvidia.com/object/cuda_home_new.html.
    [37]
    Y. Oge, T. Miyoshi, H. Kawashima, and T. Yoshinaga, "An Implementation of Handshake Join on FPGA," in ICNC'11.
    [38]
    C. J. Rossbach, Y. Yu, J. Currey, and J.-P. Martin, "Dandelion: A Compiler and Runtime for Distibuted Heterogeneous Systems," Technical Report: MSR-TR-2013-44, Microsoft Research Silicon Valley, 2013.
    [39]
    Samsung, Inc. www.samsung.com/global/business/semiconductor/minisite/Exynos/index.html.
    [40]
    Y. Shan, B. Wang, J. Yan, Y. Wang, N. Xu, and H. Yang, "FPMR: MapReduce framework on FPGA," in FPGA'10.
    [41]
    A. Shatdal, C. Kant, and J. F. Naughton, "Cache Conscious Algorithms for Relational Query Processing," in VLDB'94.
    [42]
    G. Venkatesh, J. Sampson, N. Goulding, S. Garcia, V. Bryksin, J. Lugo-Martinez, S. Swanson, and M. B. Taylor, "Conservation Cores: Reducing the Energy of Mature Computations," in ASPLOS'10.
    [43]
    Victor Podlozhnyuk, NVIDIA Inc. (2007) Black-Scholes Option Pricing.
    [44]
    Wei-keng Liao. {Online}. Available: users.eecs.northwestern.edu/~wkliao/Kmeans
    [45]
    S. Williams, L. Oliker, R. Vuduc, J. Shalf, K. Yelick, and J. Demmel, "Optimization of Sparse Matrix-Vector Multiplication on Emerging Multicore Platforms," in SC'07.
    [46]
    H. Wu, G. Diamos, S. Cadambi, and S. Yalamanchili, "Kernel Weaver: Automatically Fusing Database Primitives for Efficient GPU Computation," in MICRO'12.
    [47]
    Xilinx, Inc. Vivado High-Level Synthesis. www.xilinx.com/tools/autoesl.htm.
    [48]
    Xilinx, Inc., "ZC702 Evaluation Board for the Zynq-7000 XC7Z020. All Programmable SoC User Guide, October 8, 2012."
    [49]
    Y. Yu, M. Isard, D. Fetterly, M. Budiu, U. Erlingsson, P. K. Gunda, J. Currey, F. McSherry, and K. Achan, "Some Sample Programs Written in DryadLINQ," Microsoft Research, Tech. Rep. MSR-TR-2009-182, December 2009.

    Cited By

    View all
    • (2023)Fast, Light-weight, and Accurate Performance Evaluation using Representative Datacenter BehaviorsProceedings of the 24th International Middleware Conference10.1145/3590140.3629117(220-233)Online publication date: 27-Nov-2023
    • (2022)Applications and Techniques for Fast Machine Learning in ScienceFrontiers in Big Data10.3389/fdata.2022.7874215Online publication date: 12-Apr-2022
    • (2022)Auto-Partitioning Heterogeneous Task-Parallel Programs with StreamBlocksProceedings of the International Conference on Parallel Architectures and Compilation Techniques10.1145/3559009.3569659(398-411)Online publication date: 8-Oct-2022
    • Show More Cited By

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Other conferences
    ISCA '13: Proceedings of the 40th Annual International Symposium on Computer Architecture
    June 2013
    686 pages
    ISBN:9781450320795
    DOI:10.1145/2485922
    • cover image ACM SIGARCH Computer Architecture News
      ACM SIGARCH Computer Architecture News  Volume 41, Issue 3
      ICSA '13
      June 2013
      666 pages
      ISSN:0163-5964
      DOI:10.1145/2508148
      Issue’s Table of Contents
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    • IEEE CS

    In-Cooperation

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 23 June 2013

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. ASIC
    2. FPGA
    3. big data
    4. co-processor accelerator
    5. database
    6. mobile
    7. query language

    Qualifiers

    • Research-article

    Conference

    ISCA'13
    Sponsor:

    Acceptance Rates

    ISCA '13 Paper Acceptance Rate 56 of 288 submissions, 19%;
    Overall Acceptance Rate 543 of 3,203 submissions, 17%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)34
    • Downloads (Last 6 weeks)5
    Reflects downloads up to 12 Aug 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2023)Fast, Light-weight, and Accurate Performance Evaluation using Representative Datacenter BehaviorsProceedings of the 24th International Middleware Conference10.1145/3590140.3629117(220-233)Online publication date: 27-Nov-2023
    • (2022)Applications and Techniques for Fast Machine Learning in ScienceFrontiers in Big Data10.3389/fdata.2022.7874215Online publication date: 12-Apr-2022
    • (2022)Auto-Partitioning Heterogeneous Task-Parallel Programs with StreamBlocksProceedings of the International Conference on Parallel Architectures and Compilation Techniques10.1145/3559009.3569659(398-411)Online publication date: 8-Oct-2022
    • (2022)Accelerating database analytic query workloads using an associative processorProceedings of the 49th Annual International Symposium on Computer Architecture10.1145/3470496.3527435(623-637)Online publication date: 18-Jun-2022
    • (2021)When application-specific ISA meets FPGAs: a multi-layer virtualization framework for heterogeneous cloud FPGAsProceedings of the 26th ACM International Conference on Architectural Support for Programming Languages and Operating Systems10.1145/3445814.3446699(123-134)Online publication date: 19-Apr-2021
    • (2021)An Elastic Task Scheduling Scheme on Coarse-Grained Reconfigurable ArchitecturesIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2021.308480432:12(3066-3080)Online publication date: 1-Dec-2021
    • (2020)Predictable accelerator design with time-sensitive affine typesProceedings of the 41st ACM SIGPLAN Conference on Programming Language Design and Implementation10.1145/3385412.3385974(393-407)Online publication date: 11-Jun-2020
    • (2020)Virtualizing FPGAs in the CloudProceedings of the Twenty-Fifth International Conference on Architectural Support for Programming Languages and Operating Systems10.1145/3373376.3378491(845-858)Online publication date: 9-Mar-2020
    • (2020)AQUOMAN: An Analytic-Query Offloading Machine2020 53rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO)10.1109/MICRO50266.2020.00041(386-399)Online publication date: Oct-2020
    • (2020)GorgonProceedings of the ACM/IEEE 47th Annual International Symposium on Computer Architecture10.1109/ISCA45697.2020.00035(309-321)Online publication date: 30-May-2020
    • Show More Cited By

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media