research-article

Automated Real-Time Analysis of Streaming Big and Dense Data on Reconfigurable Platforms

Authors:

Bita Darvish Rouhani,

Azalia Mirhoseini,

Ebrahim M. Songhori,

Farinaz KoushanfarAuthors Info & Claims

ACM Transactions on Reconfigurable Technology and Systems (TRETS), Volume 10, Issue 1

Article No.: 8, Pages 1 - 22

https://doi.org/10.1145/2974023

Published: 19 December 2016 Publication History

Abstract

We propose SSketch, a novel automated framework for efficient analysis of dynamic big data with dense (non-sparse) correlation matrices on reconfigurable platforms. SSketch targets streaming applications where each data sample can be processed only once and storage is severely limited. Our framework adaptively learns from the stream of input data and updates a corresponding ensemble of lower-dimensional data structures, a.k.a., a sketch matrix. A new sketching methodology is introduced that tailors the problem of transforming the big data with dense correlations to an ensemble of lower-dimensional subspaces such that it is suitable for hardware-based acceleration performed by reconfigurable hardware. The new method is scalable, while it significantly reduces costly memory interactions and enhances matrix computation performance by leveraging coarse-grained parallelism existing in the dataset. SSketch provides an automated optimization methodology for creating the most accurate data sketch for a given set of user-defined constraints, including runtime and power as well as platform constraints such as memory. To facilitate automation, SSketch takes advantage of a Hardware/Software (HW/SW) co-design approach: It provides an Application Programming Interface that can be customized for rapid prototyping of an arbitrary matrix-based data analysis algorithm. Proof-of-concept evaluations on a variety of visual datasets with more than 11 million non-zeros demonstrate up to a 200-fold speedup on our hardware-accelerated realization of SSketch compared to a software-based deployment on a general-purpose processor.

References

[1]

Mircea Andrecut. 2008. Fast GPU implementation of sparse signal recovery from random projections. arXiv preprint arXiv:0809.1833.

[2]

Lin Bai, Patrick Maechler, Michael Muehlberghuber, and Hubert Kaeslin. 2012. High-speed compressed sensing reconstruction on FPGA using OMP and AMP. In Proceedings of the 2012 19th IEEE International Conference on Electronics, Circuits and Systems (ICECS). IEEE, 53--56.

[3]

Jeffrey D. Blanchard and Jared Tanner. 2013. GPU accelerated greedy algorithms for compressed sensing. Math. Program. Comput. 5, 3 (2013), 267--304.

[4]

Kenneth L. Clarkson and David P. Woodruff. 2009. Numerical linear algebra in the streaming model. In Proceedings of the 41st Annual ACM Symposium on Theory of Computing. ACM, 205--214.

Digital Library

[5]

Jason Cong, Muhuan Huang, and Peng Zhang. 2014. Combining computation and communication optimizations in system synthesis for streaming applications. In Proceedings of the 2014 ACM/SIGDA International Symposium on Field-programmable Gate Arrays. ACM, 213--222.

Digital Library

[6]

N. Council. 2013. Frontiers in massive data analysis. (2013).

[7]

Xilinx Datasheet. 2014. Xilinx Virtex 6 Datasheet. Retrieved 2014 from http://www.xilinx.com/publications/prod_mktg/Virtex6_Product_Table.pdf.

[8]

Petros Drineas and Michael W. Mahoney. 2005. On the Nyström method for approximating a gram matrix for improved kernel-based learning. J. Mach. Learn. Res. 6 (2005), 2153--2175.

Digital Library

[9]

Eva L. Dyer, Aswin C. Sankaranarayanan, and Richard G. Baraniuk. 2013. Greedy feature selection for subspace clustering. J. Mach. Learn. Res. 14, 1 (2013), 2487--2517.

Digital Library

[10]

Yong Fang, Liang Chen, Jiaji Wu, and Bormin Huang. 2011. GPU implementation of orthogonal matching pursuit for compressive sensing. In Proceedings of the 2011 IEEE 17th International Conference on Parallel and Distributed Systems (ICPADS). IEEE, 1044--1047.

Digital Library

[11]

Gene H. Golub and Christian Reinsch. 1970. Singular value decomposition and least squares solutions. Numer. Math. 14, 5 (1970), 403--420.

Digital Library

[12]

Pierre Greisen, Marian Runo, Patrice Guillet, Simon Heinzle, Aljoscha Smolic, Hubert Kaeslin, and Markus Gross. 2013. Evaluation and FPGA implementation of sparse linear solvers for video processing applications. IEEE Trans. Circ. Syst. Vid. Technol. 23, 8 (2013), 1402--1407.

Digital Library

[13]

A. Kulkarni, T. Abtahi, E. Smith, and T. Mohsenin. 2016. Low energy sketching engines on many-core platform for big data acceleration. In Proceedings of the 26th Edition on Great Lakes Symposium on VLSI (GLSVLSI’16). ACM, New York, NY, 57--62.

Digital Library

[14]

A. Kulkarni, A. Jafari, C. Sagedy, and T. Mohsenin. 2016a. Sketching-based high-performance biomedical big data processing accelerator. In Proceedings of the 2016 IEEE International Symposium on Circuits and Systems (ISCAS). 1138--1141.

[15]

A. Kulkarni, A. Jafari, C. Shea, and T. Mohsenin. 2016b. CS-based secured big data processing on FPGA. In Proceedings of the 2016 IEEE 24th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM). 201--201.

[16]

Amey M. Kulkarni, Houman Homayoun, and Tinoosh Mohsenin. 2014. A parallel and reconfigurable architecture for efficient OMP compressive sensing reconstruction. In Proceedings of the 24th Edition of the Great Lakes Symposium on VLSI. ACM, 299--304.

Digital Library

[17]

Luis M. Ledesma-Carrillo, Eduardo Cabal-Yepez, Rene de J. Romero-Troncoso, Arturo Garcia-Perez, Roque Osornio-Rios, Tobia D. Carozzi, and others. 2011. Reconfigurable FPGA-Based unit for singular value decomposition of large mxn matrices. In Proceedings of the 2011 International Conference on Reconfigurable Computing and FPGAs (ReConFig). IEEE, 345--350.

Digital Library

[18]

Edo Liberty. 2013. Simple and deterministic matrix sketching. In Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 581--588.

Digital Library

[19]

Stanford Dataset Archive LightField. 2014. Retrieved from http://lightfield.stanford.edu/.

[20]

Patrick Maechler, Pierre Greisen, Norbert Felber, and Andreas Burg. 2010. Matching pursuit: Evaluation and implementatio for LTE channel estimation. In Proceedings of 2010 IEEE International Symposium on Circuits and Systems (ISCAS). IEEE, 589--592.

[21]

Gunnar Martinsson, Adrianna Gillman, Edo Liberty, Nathan Halko, Vladimir Rokhlin, Sijia Hao, Yoel Shkolnisky, Patrick Young, Joel Tropp, Mark Tygert, and others. 2010. Randomized methods for computing the singular value decomposition (SVD) of very large matrices. In Proceedings of the Workshop on Algorithms for Modern Massive Data Sets, Palo Alto.

[22]

Kshitij Marwah, Gordon Wetzstein, Yosuke Bando, and Ramesh Raskar. 2013. Compressive light field photography using overcomplete dictionaries and optimized projections. ACM Trans. Graph. 32, 4 (2013), 46.

Digital Library

[23]

Azalia Mirhoseini, Eva Dyer, Ebrahim Songhori, Richard Baraniuk, Farinaz Koushanfar, and others. 2015. RankMap: A platform-aware framework for distributed learning from dense datasets. arXiv preprint arXiv:1503.08169 (2015).

[24]

Azalia Mirhoseini, Bita Darvish Rouhani, Ebrahim M. Songhori, and Farinaz Koushanfar. 2016. Perform-ML: Performance optimized machine learning by platform and content aware customization. In Proceedings of the 53rd Annual Design Automation Conference. ACM, 20.

Digital Library

[25]

Douglas C. Montgomery, Elizabeth A. Peck, and G. Geoffrey Vining. 2012. Introduction to Linear Regression Analysis, Vol. 821. John Wiley 8 Sons.

[26]

Dimitris S. Papailiopoulos, Alexandros G. Dimakis, and Stavros Korokythakis. 2013. Sparse pca through low-rank approximations. arXiv preprint arXiv:1303.0551 (2013).

[27]

Franjo Plavec, Zvonko Vranesic, and Stephen Brown. 2013. Exploiting task-and data-level parallelism in streaming applications implemented in FPGAs. ACM Trans. Reconf. Technol. Syst. 6, 4 (2013), 16.

Digital Library

[28]

Antonio Plaza, Javier Plaza, Alexander Paz, and Sergio Sanchez. 2011. Parallel hyperspectral image and signal processing {applications corner}. Sign. Process. Mag. 28, 3 (2011), 119--126.

[29]

Sanguthevar Rajasekaran and Mingjun Song. 2006. A novel scheme for the parallel computation of SVDs. In High Performance Computing and Communications. Springer, 129--137.

Digital Library

[30]

Fengbo Ren, Richard Dorrace, Wenyao Xu, and Dejan Markovic. 2013. A single-precision compressive sensing signal reconstruction engine on FPGAs. In Proceedings of the 2013 23rd International Conference on Field Programmable Logic and Applications (FPL). IEEE, 1--4.

[31]

Bita Darvish Rouhani, Ebrahim Songhori, Azalia Mirhoseini, and Farinaz Koushanfar. 2015. SSketch: An automated framework for streaming sketch-based analysis of big data on FPGA. In Proceedings of the 23rd IEEE International Symposium on Field-Programmable Custom Computing Machines Conference (FCCM) (2015).

Digital Library

[32]

R. Rubinstein. 2009. Omp-Box v10. (2009).

[33]

Hyperspectral Remote Sensing Dataset Salina. 2014. Retrieved 2014 from http://www.ehu.es/ccwintco/index.php/Hyperspectral_Remote_Sensing_Scenes.

[34]

Avi Septimus and Raphael Steinberg. 2010. Compressive sampling hardware reconstruction. In Proceedings of 2010 IEEE International Symposium on Circuits and Systems (ISCAS). IEEE, 3316--3319.

[35]

Anatoli Sergyienko and Oleg Maslennikov. 2002. Implementation of givens QR-decomposition in FPGA. In Parallel Processing and Applied Mathematics. Springer, 458--465.

Digital Library

[36]

Hyperspectral Dataset Stanford. 2014. Retrieved 2014 from http://scien.stanford.edu/index.php/landscapes.

[37]

Jerome L. V. M. Stanislaus and Tinoosh Mohsenin. 2012. High performance compressive sensing reconstruction hardware with QRD process. In Proceedings of the 2012 IEEE International Symposium on Circuits and Systems (ISCAS). IEEE, 29--32.

[38]

Jerome L. V. M. Stanislaus and Tinoosh Mohsenin. 2013. Low-complexity FPGA implementation of compressive sensing reconstruction. In Proceedings of the 2013 International Conference on Computing, Networking and Communications (ICNC). IEEE, 671--675.

Digital Library

[39]

Robert Tibshirani. 1996. Regression shrinkage and selection via the lasso. J. Roy. Stat. Soc. Ser. B (1996), 267--288.

[40]

Wei Zhang, Vaughn Betz, and Jonathan Rose. 2012. Portable and scalable FPGA-based acceleration of a direct linear system solver. ACM Trans. Reconfig. Technol. Syst. 5, 1 (2012), 6.

Digital Library

[41]

Daniel Zinn, Quinn Hart, Timothy McPhillips, Bertram Ludascher, Yogesh Simmhan, Michail Giakkoupis, and Viktor K. Prasanna. 2011. Towards reliable, performant workflows for streaming-applications on cloud platforms. In Proceedings of the 2011 11th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing. IEEE Computer Society, 235--244.

Digital Library

[42]

Hui Zou, Trevor Hastie, and Robert Tibshirani. 2006. Sparse principal component analysis. J. Comput. Graph. Stat. 15, 2 (2006), 265--286.

Cited By

Singh DChandel R(2023)FPGA-Based Hardware-Accelerated Design of Linear Prediction Analysis for Real-Time Speech SignalArabian Journal for Science and Engineering10.1007/s13369-023-07926-248:11(14927-14941)Online publication date: 23-May-2023
https://doi.org/10.1007/s13369-023-07926-2
Alahari RKodati SKalitkar K(2022)Floating Point Implementation of the Improved QRD and OMP for Compressive Sensing Signal ReconstructionSensing and Imaging10.1007/s11220-022-00389-z23:1Online publication date: 26-Jun-2022
https://doi.org/10.1007/s11220-022-00389-z
Yang JZheng BChen Z(2020)Optimization of Tourism Information Analysis System Based on Big Data AlgorithmComplexity10.1155/2020/88414192020Online publication date: 1-Jan-2020
https://dl.acm.org/doi/10.1155/2020/8841419
Show More Cited By

Index Terms

Automated Real-Time Analysis of Streaming Big and Dense Data on Reconfigurable Platforms
1. Computing methodologies
  1. Machine learning
    1. Learning settings
      1. Online learning settings
    2. Machine learning approaches
      1. Factorization methods
2. Information systems
  1. Data management systems
    1. Database management system engines
      1. Stream management

Recommendations

SSketch: An Automated Framework for Streaming Sketch-Based Analysis of Big Data on FPGA
FCCM '15: Proceedings of the 2015 IEEE 23rd Annual International Symposium on Field-Programmable Custom Computing Machines

This paper proposes SSketch, a novel automated computing framework for FPGA-based online analysis of big data with dense (non-sparse) correlation matrices. SSketch targets streaming applications where each data sample can be processed only once and ...
A Scalable Heterogeneous Dataflow Architecture For Big Data Analytics Using FPGAs (Abstract Only)
FPGA '16: Proceedings of the 2016 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays

Due to rapidly expanding data size, there is increasing need for scalable, high-performance, and low-energy frameworks for large- scale data computation. We build a dataflow architecture that harnesses FPGA resources within a distributed analytics ...
Accelerating Big Data Analytics Using FPGAs
FCCM '15: Proceedings of the 2015 IEEE 23rd Annual International Symposium on Field-Programmable Custom Computing Machines

Emerging big data analytics applications require a significant amount of server computational power. As chips are hitting power limits, computing systems are moving away from general-purpose designs and toward greater specialization. Hardware ...

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Reconfigurable Technology and Systems

ACM Transactions on Reconfigurable Technology and Systems Volume 10, Issue 1

March 2017

206 pages

ISSN:1936-7406

EISSN:1936-7414

DOI:10.1145/3002131

Editor:
Steve Wilton
Department of Electrical and Computer Engineering / University of British Columbia / Kaiser 4112, 5500-2332 Main Mall / Vancouver, BC V6T 1Z4 Canada

Issue’s Table of Contents

Copyright © 2016 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 19 December 2016

Accepted: 01 July 2016

Revised: 01 April 2016

Received: 01 July 2015

Published in TRETS Volume 10, Issue 1

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed

Funding Sources

Office of Naval Research (ONR)
National Science Foundation (NSF) TrustHub

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

11
Total Citations
View Citations
259
Total Downloads

Downloads (Last 12 months)9
Downloads (Last 6 weeks)0

Reflects downloads up to 20 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Singh DChandel R(2023)FPGA-Based Hardware-Accelerated Design of Linear Prediction Analysis for Real-Time Speech SignalArabian Journal for Science and Engineering10.1007/s13369-023-07926-248:11(14927-14941)Online publication date: 23-May-2023
https://doi.org/10.1007/s13369-023-07926-2
Alahari RKodati SKalitkar K(2022)Floating Point Implementation of the Improved QRD and OMP for Compressive Sensing Signal ReconstructionSensing and Imaging10.1007/s11220-022-00389-z23:1Online publication date: 26-Jun-2022
https://doi.org/10.1007/s11220-022-00389-z
Yang JZheng BChen Z(2020)Optimization of Tourism Information Analysis System Based on Big Data AlgorithmComplexity10.1155/2020/88414192020Online publication date: 1-Jan-2020
https://dl.acm.org/doi/10.1155/2020/8841419
Chrysos GPapapetrou OPnevmatikatos DDollas AGarofalakis M(2019)Data Stream Statistics Over Sliding Windows: How to Summarize 150 Million Updates Per Second on a Single Node2019 29th International Conference on Field Programmable Logic and Applications (FPL)10.1109/FPL.2019.00052(278-285)Online publication date: Sep-2019
https://doi.org/10.1109/FPL.2019.00052
Skliarova ISklyarov VSkliarova ISklyarov V(2019)Hardware/Software Co-designFPGA-BASED Hardware Accelerators10.1007/978-3-030-20721-2_6(213-241)Online publication date: 31-May-2019
https://doi.org/10.1007/978-3-030-20721-2_6
Skliarova ISklyarov VSkliarova ISklyarov V(2019)Hardware Accelerators for Data SearchFPGA-BASED Hardware Accelerators10.1007/978-3-030-20721-2_3(69-103)Online publication date: 31-May-2019
https://doi.org/10.1007/978-3-030-20721-2_3
Hussain SRouhani BGhasemzadeh MKoushanfar F(2018)MAXeleratorProceedings of the 55th Annual Design Automation Conference10.1145/3195970.3196074(1-6)Online publication date: 24-Jun-2018
https://dl.acm.org/doi/10.1145/3195970.3196074
Darvish Rouhani BGhasemzadeh MKoushanfar FAnderson JBazargan K(2018)CausaLearnProceedings of the 2018 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays10.1145/3174243.3174259(1-10)Online publication date: 15-Feb-2018
https://dl.acm.org/doi/10.1145/3174243.3174259
Hussain SRouhani BGhasemzadeh MKoushanfar F(2018)MAXelerator: FPGA Accelerator for Privacy Preserving Multiply-Accumulate (MAC) on Cloud Servers2018 55th ACM/ESDA/IEEE Design Automation Conference (DAC)10.1109/DAC.2018.8465770(1-6)Online publication date: Jun-2018
https://doi.org/10.1109/DAC.2018.8465770
Mirhoseini ARouhani BSonghori EKoushanfar F(2017)ExtDict: Extensible Dictionaries for Data- and Platform-Aware Large-Scale Learning2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)10.1109/IPDPSW.2017.171(379-388)Online publication date: May-2017
https://doi.org/10.1109/IPDPSW.2017.171
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Issue’s Table of Contents