research-article

LINQits: big data on little clients

Authors:

Jaewon LeeAuthors Info & Claims

ISCA '13: Proceedings of the 40th Annual International Symposium on Computer Architecture

Pages 261 - 272

https://doi.org/10.1145/2485922.2485945

Published: 23 June 2013 Publication History

Abstract

We present LINQits, a flexible hardware template that can be mapped onto programmable logic or ASICs in a heterogeneous system-on-chip for a mobile device or server. Unlike fixed-function accelerators, LINQits accelerates a domain-specific query language called LINQ. LINQits does not provide coverage for all possible applications---however, existing applications (re-)written with LINQ in mind benefit extensively from hardware acceleration. Furthermore, the LINQits framework offers a graceful and transparent migration path from software to hardware.

LINQits is prototyped on a 2W heterogeneous SoC called the ZYNQ processor, which combines dual ARM A9 processors with an FPGA on a single die in 28nm silicon technology. Our physical measurements show that LINQits improves energy efficiency by 8.9 to 30.6 times and performance by 10.7 to 38.1 times compared to optimized, multithreaded C programs running on conventional ARM A9 processors.

References

[1]

"Big Data Definition," mike2.openmethodology.org/wiki/Big_Data_Definition.

[2]

"Mono Platform," www.mono-project.com.

[3]

P. Bakkum and K. Skadron, "Accelerating SQL Database Operations on a GPU with CUDA," in GPGPU'10.

Digital Library

[4]

J. Benson, R. Cofell, C. Frericks, C.-H. Ho, V. Govindaraju, T. Nowatzki, and K. Sankaralingam, "Design, Integration and Implementation of the DySER Hardware Accelerator into OpenSPARC," in HPCA'12.

Digital Library

[5]

K. Brown, A. Sujeeth, H. J. Lee, T. Rompf, H. Chafi, M. Odersky, and K. Olukotun, "A Heterogeneous Parallel Framework for Domain-Specific Languages," in PACT'11.

Digital Library

[6]

M. Budiu, J. Shotton, D. G. Murray, and M. Finocchio, "Parallelizing the Training of the Kinect Body Parts Labeling Algorithm," in Big Learning: Algorithms, Systems and Tools for Learning at Scale, Sierra Nevada, Spain, December 16--17 2011.

[7]

B. Catanzaro, M. Garland, and K. Keutzer, "Copperhead: Compiling an Embedded Data Parallel Language," in PPoPP'11.

Digital Library

[8]

Chipworks, Inc. Inside the Apple iPad 4--A6X a very new beast! www.chipworks.com/blog/recentteardowns/2012/11/01/inside-the-apple-ipad-4-a6x-to-be-revealed/.

[9]

E. S. Chung, P. A. Milder, J. C. Hoe, and K. Mai, "Single-Chip Heterogeneous Computing: Does the Future Include Custom Logic, FPGAs, and GPGPUs?" in MICRO'10.

Digital Library

[10]

R. H. Dennard, F. H. Gaensslen, H.-n. Yu, V. Leo Rideovt, E. Bassous, and A. R. Leblanc, "Design of Ion-Implanted MOSFET's with Very Small Physical Dimensions," Solid-State Circuits Newsletter, IEEE, vol. 12, no. 1, pp. 38--50, winter 2007.

[11]

D. J. DeWitt, "DIRECT - A Multiprocessor Organization for Supporting Relational Database Management Systems," in ISCA'78.

Digital Library

[12]

D. J. DeWitt and R. H. Gerber, "Multiprocessor Hash-Based Join Algorithms," in VLDB'85.

Digital Library

[13]

D. J. Dewitt, R. H. Gerber, G. Graefe, M. L. Heytens, K. B. Kumar, and M. Muralikrishna, "Gamma - A High Performance Dataflow Database Machine," in VLDB'86.

Digital Library

[14]

H. Esmaeilzadeh, E. Blem, R. St. Amant, K. Sankaralingam, and D. Burger, "Dark Silicon and the End of Multicore Scaling," in ISCA'11.

Digital Library

[15]

Gartner, "The Mobile Scenario: Understanding Mobile Trends Through 2017," gartner.com/it/page.jsp?id=2227215, Nov 2012.

[16]

J. R. Goodman, "An Investigation of Multiprocessor Structures and Algorithms for Database Management," May 1981.

[17]

N. K. Govindaraju and D. Manocha, "Efficient Relational Database Management Using Graphics Processors," in DaMoN'05.

Digital Library

[18]

N. Hardavellas, M. Ferdman, B. Falsafi, and A. Ailamaki, "Toward Dark Silicon in Servers," IEEE Micro, vol. 31, no. 4, pp. 6--15, Jul. 2011.

Digital Library

[19]

B. He, M. Lu, K. Yang, R. Fang, N. K. Govindaraju, Q. Luo, and P. V. Sander, "Relational Query Coprocessing on Graphics Processors," ACM Trans. Database Syst., vol. 34, no. 4, Dec. 2009.

Digital Library

[20]

B. He, K. Yang, R. Fang, M. Lu, N. Govindaraju, Q. Luo, and P. Sander, "Relational Joins on Graphics Processors," in SIGMOD'08.

Digital Library

[21]

Herb Sutter, "Elements of Modern C++ Style," herbsutter.com/elements-of-modern-c-style, Oct 2010.

[22]

IBM, Inc. The Netezza Data Appliance Architecture: A Platform for High Performance Data Warehousing and Analytics.

[23]

Intel, Inc. Intel Math Kernel Library. http://www.intel.com/software/products/mkl.

[24]

M. Isard et al., "Dryad: Distributed Data-Parallel Programs from Sequential Building Blocks," in Proc. EuroSys, 2007.

Digital Library

[25]

T. Kaldewey, G. Lohman, R. Mueller, and P. Volk, "GPU Join Processing Revisited," in DaMoN'12.

Digital Library

[26]

S. Kamil, D. Coetzee, S. Beamer, H. Cook, E. Gonina, J. Harper, J. Morlan, and A. Fox, "Portable Parallel Performance from Sequential, Productive, Embedded Domain-Specific Languages," SIGPLAN Not., vol. 47, no. 8, pp. 303--304, Feb. 2012.

Digital Library

[27]

C. Kim, T. Kaldewey, V. W. Lee, E. Sedlar, A. D. Nguyen, N. Satish, J. Chhugani, A. Di Blas, and P. Dubey, "Sort vs. Hash Revisited: Fast Join Implementation on Modern Multi-core CPUs," Proc. VLDB Endow.

Digital Library

[28]

I. Kuon and J. Rose, "Measuring the Gap Between FPGAs and ASICs," in FPGA'06.

Digital Library

[29]

I. Lebedev, C. Fletcher, S. Cheng, J. Martin, A. Doupnik, D. Burke, M. Lin, and J. Wawrzynek, "Exploring Many-core Design Templates for FPGAs and ASICs," Int. J. Reconfig. Comput., vol. 2012, pp. 8:8--8:8, Jan. 2012. {Online}. Available: http://dx.doi.org/10.1155/2012/439141

Digital Library

[30]

S. Manegold, P. Boncz, and M. Kersten, "Optimizing Main-Memory Join on Modern Hardware," IEEE Trans. on Knowl. and Data Eng., vol. 14, no. 4, pp. 709--730, Jul. 2002.

Digital Library

[31]

F. McSherry, Y. Yu, M. Budiu, M. Isard, and D. Fetterly, Scaling Up Machine Learning. Cambridge University Press, 2011, ch. Large-Scale Machine Learning using DryadLINQ.

[32]

Microsoft, Inc., "LINQ (Language-Integrated Query)," msdn.microsoft.com/en-us/library/bb397926.aspx.

[33]

R. Mueller, J. Teubner, and G. Alonso, "Glacier: A Query-to-Hardware Compiler," in SIGMOD'10.

Digital Library

[34]

R. Mueller, "Streams on Wires: A Query Compiler for FPGAs," Proc. VLDB Endow., vol. 2, no. 1, pp. 229--240, Aug. 2009.

Digital Library

[35]

NVIDIA, Inc. www.nvidia.com.

[36]

NVIDIA, Inc. www.nvidia.com/object/cuda_home_new.html.

[37]

Y. Oge, T. Miyoshi, H. Kawashima, and T. Yoshinaga, "An Implementation of Handshake Join on FPGA," in ICNC'11.

Digital Library

[38]

C. J. Rossbach, Y. Yu, J. Currey, and J.-P. Martin, "Dandelion: A Compiler and Runtime for Distibuted Heterogeneous Systems," Technical Report: MSR-TR-2013-44, Microsoft Research Silicon Valley, 2013.

[39]

Samsung, Inc. www.samsung.com/global/business/semiconductor/minisite/Exynos/index.html.

[40]

Y. Shan, B. Wang, J. Yan, Y. Wang, N. Xu, and H. Yang, "FPMR: MapReduce framework on FPGA," in FPGA'10.

Digital Library

[41]

A. Shatdal, C. Kant, and J. F. Naughton, "Cache Conscious Algorithms for Relational Query Processing," in VLDB'94.

Digital Library

[42]

G. Venkatesh, J. Sampson, N. Goulding, S. Garcia, V. Bryksin, J. Lugo-Martinez, S. Swanson, and M. B. Taylor, "Conservation Cores: Reducing the Energy of Mature Computations," in ASPLOS'10.

Digital Library

[43]

Victor Podlozhnyuk, NVIDIA Inc. (2007) Black-Scholes Option Pricing.

[44]

Wei-keng Liao. {Online}. Available: users.eecs.northwestern.edu/~wkliao/Kmeans

[45]

S. Williams, L. Oliker, R. Vuduc, J. Shalf, K. Yelick, and J. Demmel, "Optimization of Sparse Matrix-Vector Multiplication on Emerging Multicore Platforms," in SC'07.

Digital Library

[46]

H. Wu, G. Diamos, S. Cadambi, and S. Yalamanchili, "Kernel Weaver: Automatically Fusing Database Primitives for Efficient GPU Computation," in MICRO'12.

Digital Library

[47]

Xilinx, Inc. Vivado High-Level Synthesis. www.xilinx.com/tools/autoesl.htm.

[48]

Xilinx, Inc., "ZC702 Evaluation Board for the Zynq-7000 XC7Z020. All Programmable SoC User Guide, October 8, 2012."

[49]

Y. Yu, M. Isard, D. Fetterly, M. Budiu, U. Erlingsson, P. K. Gunda, J. Currey, F. McSherry, and K. Achan, "Some Sample Programs Written in DryadLINQ," Microsoft Research, Tech. Rep. MSR-TR-2009-182, December 2009.

Cited By

Lee JMin DByun IJang HKim J(2023)Fast, Light-weight, and Accurate Performance Evaluation using Representative Datacenter BehaviorsProceedings of the 24th International Middleware Conference10.1145/3590140.3629117(220-233)Online publication date: 27-Nov-2023
https://dl.acm.org/doi/10.1145/3590140.3629117
Deiana ATran NAgar JBlott MDi Guglielmo GDuarte JHarris PHauck SLiu MNeubauer MNgadiuba JOgrenci-Memik SPierini MAarrestad TBähr SBecker JBerthold ABonventre RMüller Bravo TDiefenthaler MDong ZFritzsche NGholami AGovorkova EGuo DHazelwood KHerwig CKhan BKim SKlijnsma TLiu YLo KNguyen TPezzullo GRasoulinezhad SRivera RScholberg KSelig JSen SStrukov DTang WThais SUnger KVilalta Rvon Krosigk BWang SWarburton T(2022)Applications and Techniques for Fast Machine Learning in ScienceFrontiers in Big Data10.3389/fdata.2022.7874215Online publication date: 12-Apr-2022
https://doi.org/10.3389/fdata.2022.787421
Emami MBezati EJanneck JLarus JKloeckner AMoreira J(2022)Auto-Partitioning Heterogeneous Task-Parallel Programs with StreamBlocksProceedings of the International Conference on Parallel Architectures and Compilation Techniques10.1145/3559009.3569659(398-411)Online publication date: 8-Oct-2022
https://dl.acm.org/doi/10.1145/3559009.3569659
Show More Cited By

Index Terms

LINQits: big data on little clients
1. Computer systems organization
  1. Architectures
    1. Other architectures

Recommendations

LINQits: big data on little clients
ICSA '13

We present LINQits, a flexible hardware template that can be mapped onto programmable logic or ASICs in a heterogeneous system-on-chip for a mobile device or server. Unlike fixed-function accelerators, LINQits accelerates a domain-specific query ...
Efficient AES implementations on ASICs and FPGAs
AES'04: Proceedings of the 4th international conference on Advanced Encryption Standard

In this article, we present two AES hardware architectures: one for ASICs and one for FPGAs. Both architectures utilize the similarities of encryption and decryption to provide a high throughput using only a relatively small area. The presented ...
In-Package Domain-Specific ASICs for Intel® Stratix® 10 FPGAs: A Case Study of Accelerating Deep Learning Using TensorTile ASIC(Abstract Only)
FPGA '18: Proceedings of the 2018 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays

FPGAs or ASICs? There is a long-running debate on this. FPGAs are extremely flexible while ASICs offer top efficiency but inflexible. We believe that FPGAs and ASICs are better together, to offer both flexible and efficient solutions. We propose single-...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

ISCA '13: Proceedings of the 40th Annual International Symposium on Computer Architecture

June 2013

686 pages

ISBN:9781450320795

DOI:10.1145/2485922

General Chair:
Avi Mendelson
Technion

ACM SIGARCH Computer Architecture News Volume 41, Issue 3
ICSA '13
June 2013
666 pages
ISSN:0163-5964
DOI:10.1145/2508148
Issue’s Table of Contents

Copyright © 2013 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

IEEE CS

In-Cooperation

SIGARCH: ACM Special Interest Group on Computer Architecture

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 23 June 2013

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

ISCA'13

Sponsor:

ISCA'13: The 40th Annual International Symposium on Computer Architecture

June 23 - 27, 2013

Tel-Aviv, Israel

Acceptance Rates

ISCA '13 Paper Acceptance Rate 56 of 288 submissions, 19%;

Overall Acceptance Rate 543 of 3,203 submissions, 17%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

82
Total Citations
View Citations
1,309
Total Downloads

Downloads (Last 12 months)34
Downloads (Last 6 weeks)5

Reflects downloads up to 12 Aug 2024

Other Metrics

View Author Metrics

Citations

Cited By

Lee JMin DByun IJang HKim J(2023)Fast, Light-weight, and Accurate Performance Evaluation using Representative Datacenter BehaviorsProceedings of the 24th International Middleware Conference10.1145/3590140.3629117(220-233)Online publication date: 27-Nov-2023
https://dl.acm.org/doi/10.1145/3590140.3629117
Deiana ATran NAgar JBlott MDi Guglielmo GDuarte JHarris PHauck SLiu MNeubauer MNgadiuba JOgrenci-Memik SPierini MAarrestad TBähr SBecker JBerthold ABonventre RMüller Bravo TDiefenthaler MDong ZFritzsche NGholami AGovorkova EGuo DHazelwood KHerwig CKhan BKim SKlijnsma TLiu YLo KNguyen TPezzullo GRasoulinezhad SRivera RScholberg KSelig JSen SStrukov DTang WThais SUnger KVilalta Rvon Krosigk BWang SWarburton T(2022)Applications and Techniques for Fast Machine Learning in ScienceFrontiers in Big Data10.3389/fdata.2022.7874215Online publication date: 12-Apr-2022
https://doi.org/10.3389/fdata.2022.787421
Emami MBezati EJanneck JLarus JKloeckner AMoreira J(2022)Auto-Partitioning Heterogeneous Task-Parallel Programs with StreamBlocksProceedings of the International Conference on Parallel Architectures and Compilation Techniques10.1145/3559009.3569659(398-411)Online publication date: 8-Oct-2022
https://dl.acm.org/doi/10.1145/3559009.3569659
Caminal HChronis YWu TPatel JMartínez JSalapura VZahran MChong FTang L(2022)Accelerating database analytic query workloads using an associative processorProceedings of the 49th Annual International Symposium on Computer Architecture10.1145/3470496.3527435(623-637)Online publication date: 18-Jun-2022
https://dl.acm.org/doi/10.1145/3470496.3527435
Zha YLi JSherwood TBerger EKozyrakis C(2021)When application-specific ISA meets FPGAs: a multi-layer virtualization framework for heterogeneous cloud FPGAsProceedings of the 26th ACM International Conference on Architectural Support for Programming Languages and Operating Systems10.1145/3445814.3446699(123-134)Online publication date: 19-Apr-2021
https://dl.acm.org/doi/10.1145/3445814.3446699
Chen LZhu JDeng YLi ZChen JJiang XYin SWei SLiu L(2021)An Elastic Task Scheduling Scheme on Coarse-Grained Reconfigurable ArchitecturesIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2021.308480432:12(3066-3080)Online publication date: 1-Dec-2021
https://doi.org/10.1109/TPDS.2021.3084804
Nigam RAtapattu SThomas SLi ZBauer TYe YKoti ASampson AZhang ZDonaldson ATorlak E(2020)Predictable accelerator design with time-sensitive affine typesProceedings of the 41st ACM SIGPLAN Conference on Programming Language Design and Implementation10.1145/3385412.3385974(393-407)Online publication date: 11-Jun-2020
https://dl.acm.org/doi/10.1145/3385412.3385974
Zha YLi JLarus JCeze LStrauss K(2020)Virtualizing FPGAs in the CloudProceedings of the Twenty-Fifth International Conference on Architectural Support for Programming Languages and Operating Systems10.1145/3373376.3378491(845-858)Online publication date: 9-Mar-2020
https://dl.acm.org/doi/10.1145/3373376.3378491
Xu SBourgeat THuang TKim HLee SArvind A(2020)AQUOMAN: An Analytic-Query Offloading Machine2020 53rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO)10.1109/MICRO50266.2020.00041(386-399)Online publication date: Oct-2020
https://doi.org/10.1109/MICRO50266.2020.00041
Vilim MRucker AZhang YLiu SOlukotun KMartínez JDuato JEeckhout L(2020)GorgonProceedings of the ACM/IEEE 47th Annual International Symposium on Computer Architecture10.1109/ISCA45697.2020.00035(309-321)Online publication date: 30-May-2020
https://dl.acm.org/doi/10.1109/ISCA45697.2020.00035
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents