Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Automated Real-Time Analysis of Streaming Big and Dense Data on Reconfigurable Platforms

Published: 19 December 2016 Publication History

Abstract

We propose SSketch, a novel automated framework for efficient analysis of dynamic big data with dense (non-sparse) correlation matrices on reconfigurable platforms. SSketch targets streaming applications where each data sample can be processed only once and storage is severely limited. Our framework adaptively learns from the stream of input data and updates a corresponding ensemble of lower-dimensional data structures, a.k.a., a sketch matrix. A new sketching methodology is introduced that tailors the problem of transforming the big data with dense correlations to an ensemble of lower-dimensional subspaces such that it is suitable for hardware-based acceleration performed by reconfigurable hardware. The new method is scalable, while it significantly reduces costly memory interactions and enhances matrix computation performance by leveraging coarse-grained parallelism existing in the dataset. SSketch provides an automated optimization methodology for creating the most accurate data sketch for a given set of user-defined constraints, including runtime and power as well as platform constraints such as memory. To facilitate automation, SSketch takes advantage of a Hardware/Software (HW/SW) co-design approach: It provides an Application Programming Interface that can be customized for rapid prototyping of an arbitrary matrix-based data analysis algorithm. Proof-of-concept evaluations on a variety of visual datasets with more than 11 million non-zeros demonstrate up to a 200-fold speedup on our hardware-accelerated realization of SSketch compared to a software-based deployment on a general-purpose processor.

References

[1]
Mircea Andrecut. 2008. Fast GPU implementation of sparse signal recovery from random projections. arXiv preprint arXiv:0809.1833.
[2]
Lin Bai, Patrick Maechler, Michael Muehlberghuber, and Hubert Kaeslin. 2012. High-speed compressed sensing reconstruction on FPGA using OMP and AMP. In Proceedings of the 2012 19th IEEE International Conference on Electronics, Circuits and Systems (ICECS). IEEE, 53--56.
[3]
Jeffrey D. Blanchard and Jared Tanner. 2013. GPU accelerated greedy algorithms for compressed sensing. Math. Program. Comput. 5, 3 (2013), 267--304.
[4]
Kenneth L. Clarkson and David P. Woodruff. 2009. Numerical linear algebra in the streaming model. In Proceedings of the 41st Annual ACM Symposium on Theory of Computing. ACM, 205--214.
[5]
Jason Cong, Muhuan Huang, and Peng Zhang. 2014. Combining computation and communication optimizations in system synthesis for streaming applications. In Proceedings of the 2014 ACM/SIGDA International Symposium on Field-programmable Gate Arrays. ACM, 213--222.
[6]
N. Council. 2013. Frontiers in massive data analysis. (2013).
[7]
Xilinx Datasheet. 2014. Xilinx Virtex 6 Datasheet. Retrieved 2014 from http://www.xilinx.com/publications/prod_mktg/Virtex6_Product_Table.pdf.
[8]
Petros Drineas and Michael W. Mahoney. 2005. On the Nyström method for approximating a gram matrix for improved kernel-based learning. J. Mach. Learn. Res. 6 (2005), 2153--2175.
[9]
Eva L. Dyer, Aswin C. Sankaranarayanan, and Richard G. Baraniuk. 2013. Greedy feature selection for subspace clustering. J. Mach. Learn. Res. 14, 1 (2013), 2487--2517.
[10]
Yong Fang, Liang Chen, Jiaji Wu, and Bormin Huang. 2011. GPU implementation of orthogonal matching pursuit for compressive sensing. In Proceedings of the 2011 IEEE 17th International Conference on Parallel and Distributed Systems (ICPADS). IEEE, 1044--1047.
[11]
Gene H. Golub and Christian Reinsch. 1970. Singular value decomposition and least squares solutions. Numer. Math. 14, 5 (1970), 403--420.
[12]
Pierre Greisen, Marian Runo, Patrice Guillet, Simon Heinzle, Aljoscha Smolic, Hubert Kaeslin, and Markus Gross. 2013. Evaluation and FPGA implementation of sparse linear solvers for video processing applications. IEEE Trans. Circ. Syst. Vid. Technol. 23, 8 (2013), 1402--1407.
[13]
A. Kulkarni, T. Abtahi, E. Smith, and T. Mohsenin. 2016. Low energy sketching engines on many-core platform for big data acceleration. In Proceedings of the 26th Edition on Great Lakes Symposium on VLSI (GLSVLSI’16). ACM, New York, NY, 57--62.
[14]
A. Kulkarni, A. Jafari, C. Sagedy, and T. Mohsenin. 2016a. Sketching-based high-performance biomedical big data processing accelerator. In Proceedings of the 2016 IEEE International Symposium on Circuits and Systems (ISCAS). 1138--1141.
[15]
A. Kulkarni, A. Jafari, C. Shea, and T. Mohsenin. 2016b. CS-based secured big data processing on FPGA. In Proceedings of the 2016 IEEE 24th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM). 201--201.
[16]
Amey M. Kulkarni, Houman Homayoun, and Tinoosh Mohsenin. 2014. A parallel and reconfigurable architecture for efficient OMP compressive sensing reconstruction. In Proceedings of the 24th Edition of the Great Lakes Symposium on VLSI. ACM, 299--304.
[17]
Luis M. Ledesma-Carrillo, Eduardo Cabal-Yepez, Rene de J. Romero-Troncoso, Arturo Garcia-Perez, Roque Osornio-Rios, Tobia D. Carozzi, and others. 2011. Reconfigurable FPGA-Based unit for singular value decomposition of large mxn matrices. In Proceedings of the 2011 International Conference on Reconfigurable Computing and FPGAs (ReConFig). IEEE, 345--350.
[18]
Edo Liberty. 2013. Simple and deterministic matrix sketching. In Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 581--588.
[19]
Stanford Dataset Archive LightField. 2014. Retrieved from http://lightfield.stanford.edu/.
[20]
Patrick Maechler, Pierre Greisen, Norbert Felber, and Andreas Burg. 2010. Matching pursuit: Evaluation and implementatio for LTE channel estimation. In Proceedings of 2010 IEEE International Symposium on Circuits and Systems (ISCAS). IEEE, 589--592.
[21]
Gunnar Martinsson, Adrianna Gillman, Edo Liberty, Nathan Halko, Vladimir Rokhlin, Sijia Hao, Yoel Shkolnisky, Patrick Young, Joel Tropp, Mark Tygert, and others. 2010. Randomized methods for computing the singular value decomposition (SVD) of very large matrices. In Proceedings of the Workshop on Algorithms for Modern Massive Data Sets, Palo Alto.
[22]
Kshitij Marwah, Gordon Wetzstein, Yosuke Bando, and Ramesh Raskar. 2013. Compressive light field photography using overcomplete dictionaries and optimized projections. ACM Trans. Graph. 32, 4 (2013), 46.
[23]
Azalia Mirhoseini, Eva Dyer, Ebrahim Songhori, Richard Baraniuk, Farinaz Koushanfar, and others. 2015. RankMap: A platform-aware framework for distributed learning from dense datasets. arXiv preprint arXiv:1503.08169 (2015).
[24]
Azalia Mirhoseini, Bita Darvish Rouhani, Ebrahim M. Songhori, and Farinaz Koushanfar. 2016. Perform-ML: Performance optimized machine learning by platform and content aware customization. In Proceedings of the 53rd Annual Design Automation Conference. ACM, 20.
[25]
Douglas C. Montgomery, Elizabeth A. Peck, and G. Geoffrey Vining. 2012. Introduction to Linear Regression Analysis, Vol. 821. John Wiley 8 Sons.
[26]
Dimitris S. Papailiopoulos, Alexandros G. Dimakis, and Stavros Korokythakis. 2013. Sparse pca through low-rank approximations. arXiv preprint arXiv:1303.0551 (2013).
[27]
Franjo Plavec, Zvonko Vranesic, and Stephen Brown. 2013. Exploiting task-and data-level parallelism in streaming applications implemented in FPGAs. ACM Trans. Reconf. Technol. Syst. 6, 4 (2013), 16.
[28]
Antonio Plaza, Javier Plaza, Alexander Paz, and Sergio Sanchez. 2011. Parallel hyperspectral image and signal processing {applications corner}. Sign. Process. Mag. 28, 3 (2011), 119--126.
[29]
Sanguthevar Rajasekaran and Mingjun Song. 2006. A novel scheme for the parallel computation of SVDs. In High Performance Computing and Communications. Springer, 129--137.
[30]
Fengbo Ren, Richard Dorrace, Wenyao Xu, and Dejan Markovic. 2013. A single-precision compressive sensing signal reconstruction engine on FPGAs. In Proceedings of the 2013 23rd International Conference on Field Programmable Logic and Applications (FPL). IEEE, 1--4.
[31]
Bita Darvish Rouhani, Ebrahim Songhori, Azalia Mirhoseini, and Farinaz Koushanfar. 2015. SSketch: An automated framework for streaming sketch-based analysis of big data on FPGA. In Proceedings of the 23rd IEEE International Symposium on Field-Programmable Custom Computing Machines Conference (FCCM) (2015).
[32]
R. Rubinstein. 2009. Omp-Box v10. (2009).
[33]
Hyperspectral Remote Sensing Dataset Salina. 2014. Retrieved 2014 from http://www.ehu.es/ccwintco/index.php/Hyperspectral_Remote_Sensing_Scenes.
[34]
Avi Septimus and Raphael Steinberg. 2010. Compressive sampling hardware reconstruction. In Proceedings of 2010 IEEE International Symposium on Circuits and Systems (ISCAS). IEEE, 3316--3319.
[35]
Anatoli Sergyienko and Oleg Maslennikov. 2002. Implementation of givens QR-decomposition in FPGA. In Parallel Processing and Applied Mathematics. Springer, 458--465.
[36]
Hyperspectral Dataset Stanford. 2014. Retrieved 2014 from http://scien.stanford.edu/index.php/landscapes.
[37]
Jerome L. V. M. Stanislaus and Tinoosh Mohsenin. 2012. High performance compressive sensing reconstruction hardware with QRD process. In Proceedings of the 2012 IEEE International Symposium on Circuits and Systems (ISCAS). IEEE, 29--32.
[38]
Jerome L. V. M. Stanislaus and Tinoosh Mohsenin. 2013. Low-complexity FPGA implementation of compressive sensing reconstruction. In Proceedings of the 2013 International Conference on Computing, Networking and Communications (ICNC). IEEE, 671--675.
[39]
Robert Tibshirani. 1996. Regression shrinkage and selection via the lasso. J. Roy. Stat. Soc. Ser. B (1996), 267--288.
[40]
Wei Zhang, Vaughn Betz, and Jonathan Rose. 2012. Portable and scalable FPGA-based acceleration of a direct linear system solver. ACM Trans. Reconfig. Technol. Syst. 5, 1 (2012), 6.
[41]
Daniel Zinn, Quinn Hart, Timothy McPhillips, Bertram Ludascher, Yogesh Simmhan, Michail Giakkoupis, and Viktor K. Prasanna. 2011. Towards reliable, performant workflows for streaming-applications on cloud platforms. In Proceedings of the 2011 11th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing. IEEE Computer Society, 235--244.
[42]
Hui Zou, Trevor Hastie, and Robert Tibshirani. 2006. Sparse principal component analysis. J. Comput. Graph. Stat. 15, 2 (2006), 265--286.

Cited By

View all
  • (2023)FPGA-Based Hardware-Accelerated Design of Linear Prediction Analysis for Real-Time Speech SignalArabian Journal for Science and Engineering10.1007/s13369-023-07926-248:11(14927-14941)Online publication date: 23-May-2023
  • (2022)Floating Point Implementation of the Improved QRD and OMP for Compressive Sensing Signal ReconstructionSensing and Imaging10.1007/s11220-022-00389-z23:1Online publication date: 26-Jun-2022
  • (2020)Optimization of Tourism Information Analysis System Based on Big Data AlgorithmComplexity10.1155/2020/88414192020Online publication date: 1-Jan-2020
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Reconfigurable Technology and Systems
ACM Transactions on Reconfigurable Technology and Systems  Volume 10, Issue 1
March 2017
206 pages
ISSN:1936-7406
EISSN:1936-7414
DOI:10.1145/3002131
  • Editor:
  • Steve Wilton
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 19 December 2016
Accepted: 01 July 2016
Revised: 01 April 2016
Received: 01 July 2015
Published in TRETS Volume 10, Issue 1

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. FPGA
  2. HW/SW co-design
  3. Streaming model
  4. big data
  5. dense matrix
  6. lower dimensional embedding
  7. matrix sketching
  8. matrix-based analysis

Qualifiers

  • Research-article
  • Research
  • Refereed

Funding Sources

  • Office of Naval Research (ONR)
  • National Science Foundation (NSF) TrustHub

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)11
  • Downloads (Last 6 weeks)1
Reflects downloads up to 10 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2023)FPGA-Based Hardware-Accelerated Design of Linear Prediction Analysis for Real-Time Speech SignalArabian Journal for Science and Engineering10.1007/s13369-023-07926-248:11(14927-14941)Online publication date: 23-May-2023
  • (2022)Floating Point Implementation of the Improved QRD and OMP for Compressive Sensing Signal ReconstructionSensing and Imaging10.1007/s11220-022-00389-z23:1Online publication date: 26-Jun-2022
  • (2020)Optimization of Tourism Information Analysis System Based on Big Data AlgorithmComplexity10.1155/2020/88414192020Online publication date: 1-Jan-2020
  • (2019)Data Stream Statistics Over Sliding Windows: How to Summarize 150 Million Updates Per Second on a Single Node2019 29th International Conference on Field Programmable Logic and Applications (FPL)10.1109/FPL.2019.00052(278-285)Online publication date: Sep-2019
  • (2019)Hardware/Software Co-designFPGA-BASED Hardware Accelerators10.1007/978-3-030-20721-2_6(213-241)Online publication date: 31-May-2019
  • (2019)Hardware Accelerators for Data SearchFPGA-BASED Hardware Accelerators10.1007/978-3-030-20721-2_3(69-103)Online publication date: 31-May-2019
  • (2018)MAXeleratorProceedings of the 55th Annual Design Automation Conference10.1145/3195970.3196074(1-6)Online publication date: 24-Jun-2018
  • (2018)CausaLearnProceedings of the 2018 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays10.1145/3174243.3174259(1-10)Online publication date: 15-Feb-2018
  • (2018)MAXelerator: FPGA Accelerator for Privacy Preserving Multiply-Accumulate (MAC) on Cloud Servers2018 55th ACM/ESDA/IEEE Design Automation Conference (DAC)10.1109/DAC.2018.8465770(1-6)Online publication date: Jun-2018
  • (2017)ExtDict: Extensible Dictionaries for Data- and Platform-Aware Large-Scale Learning2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)10.1109/IPDPSW.2017.171(379-388)Online publication date: May-2017
  • Show More Cited By

View Options

Get Access

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media