short-paper

Open access

FPGA-based Near Data Processing Platform Selection Using Fast Performance Modeling (WiP Paper)

Authors:

Nazanin Farahpour,

Glenn ReinmanAuthors Info & Claims

LCTES '20: The 21st ACM SIGPLAN/SIGBED Conference on Languages, Compilers, and Tools for Embedded Systems

Pages 151 - 155

https://doi.org/10.1145/3372799.3394373

Published: 16 June 2020 Publication History

Abstract

With the trend of adopting FPGAs in data centers, various FPGA acceleration platforms have been developed in recent years. Each server could incorporate one or many of these FPGAs at different compute hierarchy levels to match its workload intensity. FPGAs could either be used as IO-attached accelerators or be closely integrated with CPU as on-chip co-processors. For a more data-centric approach, an FPGA could be moved closer to the data medium (RAM or disk) and serve as a near-memory or near-storage accelerator.

In this work, we present a quantitative model and in-depth analysis of application characteristics to determine when an application is more suitable for each acceleration hierarchy. We analyze 18 benchmarks from six domains and create a preliminary guideline for both application and hardware developers.

Supplementary Material

MP4 File (3372799.3394373.mp4)

Presentation Video

Download
16.76 MB

References

[1]

2017. An Introduction to CCIX. https://www.synopsys.com/designware-ip/technical-bulletin/introduction-ccix-2017q3.html.

[2]

2017. Mobiveil Announces FPGA-Based SSD Platform. https://www.globenewswire.com/news-release/2017/08/03/1215428/0/en/Mobiveil-Announces-FPGA-Based-SSD-Platform-for-3D-NAND-Flash-Devices-Upgrades-NVMe-PCI-Express-Controllers-to-Support-Latest-Specifications. html.

[3]

2018. Amazon EC2 F1 Instance. https://aws.amazon.com/ec2/instance-types/f1/.

[4]

2018. Intel Xeon Scalable Processor 6138p. https://www.eejournal.com/article/intel-delivers-xeon-scalable-processor-6138p-with-arria-10-gx-1150-fpga/.

[5]

2019. Versal: The First Adaptive Compute Acceleration Platform(ACAP). https://www.xilinx.com/support/documentation/white_papers/wp505-versal-acap.pdf.

[6]

2019.With AgileX Intel gets a coherent FPGA strategy. https://www.nextplatform.com/2019/04/02/with-agilex-intel-gets-a-coherent-fpga-strategy.

[7]

2019. Xilinx Vitis Accelerated Libraries. https://www.xilinx.com/products/design-tools/vitis/vitis-libraries. html.

[8]

Ping Chi, Shuangchen Li, Cong Xu, Tao Zhang, Jishen Zhao, Yongpan Liu, Yu Wang, and Yuan Xie. 2016. PRIME: A Novel Processing-in-Memory Architecture for Neural Network Computation in ReRAM-Based Main Memory. In Proceedings of the 43rd International Symposium on Computer Architecture (Seoul, Republic of Korea) (ISCA '16). IEEE Press, 27--39.

Digital Library

[9]

Derek Chiou. 2017. The microsoft catapult project. In 2017 IEEE International Symposium on Workload Characterization (IISWC). 124--124.

[10]

Sangyeun Cho, Chanik Park, Hyunok Oh, Sungchan Kim, Youngmin Yi, and Gregory R Ganger. 2013. Active Disk Meets Flash: A Case for Intelligent SSDs. In Proceedings of the 27th International ACM Conference on International Conference on Supercomputing (Eugene, Oregon, USA) (ICS '13). Association for Computing Machinery, New York, NY, USA, 91--102.

Digital Library

[11]

Jason Cong, Zhenman Fang, Michael Gill, Farnoosh Javadi, and Glenn Reinman. 2017. AIM: accelerating computational genomics through scalable and noninvasive accelerator-interposed memory. In Proceedings of the International Symposium on Memory Systems (Alexandria, Virginia) (MEMSYS '17). Association for Computing Machinery, New York, NY, USA, 3--14.

Digital Library

[12]

Jason Cong, Zhenman Fang, Michael Gill, and Glenn Reinman. 2015. PARADE: A cycle-accurate full-system simulation Platform for Accelerator-Rich Architectural Design and Exploration. In 2015 IEEE/ACM International Conference on Computer-Aided Design (ICCAD). 380--387.

Digital Library

[13]

Jaeyoung Do, Yang-Suk Kee, Jignesh M. Patel, Chanik Park, Kwanghyun Park, and David J. DeWitt. 2013. Query Processing on Smart SSDs: Opportunities and Challenges. In Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data (New York, New York, USA) (SIGMOD '13). Association for Computing Machinery, New York, NY, USA, 1221--1230.

[14]

Sang-Woo Jun, Ming Liu, Sungjin Lee, Jamey Hicks, John Ankcorn, Myron King, Shuotao Xu, and Arvind. 2015. Bluedbm: An appliance for big data analytics. In Proceedings of the 42nd Annual International Symposium on Computer Architecture (Portland, Oregon) (ISCA '15). Association for Computing Machinery, New York, NY, USA, 1--13.

Digital Library

[15]

J. Kim, C. S. Oh, H. Lee, D. Lee, H. Hwang, S. Hwang, B. Na, J. Moon, J. Kim, H. Park, J. Ryu, K. Park, S. Kang, S. Kim, H. Kim, J. Bang, H. Cho, M. Jang, C. Han, J. Lee, K. Kyung, J. Choi, and Y. Jun. 2011. A 1.2V 12.8GB/s 2Gb mobile Wide-I/O DRAM with 4×128 I/Os using TSV-based stacking. (2011), 496--498.

[16]

Gunjae Koo, Kiran Kumar Matam, Te I, H. V. Krishna Giri Narra, Jing Li, Hung-Wei Tseng, Steven Swanson, and Murali Annavaram. 2017. Summarizer: Trading Communication with Computing near Storage. In Proceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture (Cambridge, Massachusetts) (MICRO-50 '17). Association for Computing Machinery, New York, NY, USA, 219--231.

[17]

Sandeep Kumar, Christof Paar, Jan Pelzl, Gerd Pfeiffer, and Manfred Schimmler. 2006. Breaking Ciphers with COPACOBANA --a Cost-Optimized Parallel Code Breaker. In Proceedings of the 8th International Conference on Cryptographic Hardware and Embedded Systems (Yokohama, Japan) (CHES'06). Springer-Verlag, Berlin, Heidelberg, 101--118.

Digital Library

[18]

D. U. Lee, K. W. Kim, K. W. Kim, H. Kim, J. Y. Kim, Y. J. Park, J. H. Kim, D. S. Kim, H. B. Park, J. W. Shin, J. H. Cho, K. H. Kwon, M. J. Kim, J. Lee, K. W. Park, B. Chung, and S. Hong. 2014. 25.2 A 1.2V 8Gb 8-channel 128GB/s high-bandwidth memory (HBM) stacked DRAM with effective microbump I/O test methods using 29nm process and TSV. In 2014 IEEE International Solid-State Circuits Conference Digest of Technical Papers (ISSCC). 432--433.

[19]

Asit K. Mishra, Joseph L. Hellerstein, Walfredo Cirne, and Chita R. Das. 2010. Towards Characterizing Cloud Backend Workloads: Insights from Google Compute Clusters. SIGMETRICS Perform. Eval. Rev., Vol. 37, 4 (March 2010), 34--41.

Digital Library

[20]

Zhenyuan Ruan, Tong He, and Jason Cong. 2019. INSIDER: Designing In-Storage Computing System for Emerging High-Performance Drive. In 2019 USENIX Annual Technical Conference, USENIX ATC 2019, Renton, WA, USA, July 10-12, 2019. USENIX Association, 379--394.

[21]

Sudharsan Seshadri, Mark Gahagan, Sundaram Bhaskaran, Trevor Bunker, Arup De, Yanqin Jin, Yang Liu, and Steven Swanson. 2014. Willow: A User-Programmable SSD. In Proceedings of the 11th USENIX Conference on Operating Systems Design and Implementation (Broomfield, CO) (OSDI'14). USENIX Association, USA, 67--80.

[22]

Ali Shafiee, Anirban Nag, Naveen Muralimanohar, Rajeev Balasubramonian, John Paul Strachan, Miao Hu, R. Stanley Williams, and Vivek Srikumar. 2016. ISAAC: A Convolutional Neural Network Accelerator with in-Situ Analog Arithmetic in Crossbars. In Proceedings of the 43rd International Symposium on Computer Architecture (Seoul, Republic of Korea) (ISCA '16). IEEE Press, 14--26.

Digital Library

[23]

Malcolm Singh and Ben Leonhardi. 2011. Introduction to the IBM Netezza warehouse appliance. In Proceedings of the 2011 Conference of the Center for Advanced Studies on Collaborative Research. IBM Corp., 385--386.

Digital Library

[24]

Linghao Song, Xuehai Qian, Hai Li, and Yiran Chen. 2017. PipeLayer: A Pipelined ReRAM-Based Accelerator for Deep Learning. In 2017 IEEE International Symposium on High Performance Computer Architecture (HPCA). 541--552.

[25]

Bharat Sukhwani, Thomas Roewer, Charles L. Haymes, Kyu-Hyoun Kim, Adam J. McPadden, Daniel M. Dreps, Dean Sanner, Jan Van Lunteren, and Sameh Asaad. 2017. Contutto: A Novel FPGA-Based Prototyping Platform Enabling Innovation in the Memory Subsystem of a Server Class Processor. In Proceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture (Cambridge, Massachusetts) (MICRO-50 '17). Association for Computing Machinery, New York, NY, USA, 15--26.

Digital Library

Cited By

Fakhry DAbdelsalam MEl-Kharashi MSafar M(2023)A review on computational storage devices and near memory computing for high performance applicationsMemories - Materials, Devices, Circuits and Systems10.1016/j.memori.2023.1000514(100051)Online publication date: Jul-2023
https://doi.org/10.1016/j.memori.2023.100051
Farahpour NHao YFang ZReinman G(2020)Reconfigurable Accelerator Compute Hierarchy: A Case Study using Content-Based Image Retrieval2020 IEEE International Symposium on Workload Characterization (IISWC)10.1109/IISWC50251.2020.00034(276-287)Online publication date: Oct-2020
https://doi.org/10.1109/IISWC50251.2020.00034

Index Terms

FPGA-based Near Data Processing Platform Selection Using Fast Performance Modeling (WiP Paper)

Recommendations

Design and evaluation of a hardware/software FPGA-based system for fast image processing

We evaluate the performance of a hardware/software architecture designed to perform a wide range of fast image processing tasks. The system architecture is based on hardware featuring a Field Programmable Gate Array (FPGA) co-processor and a host ...
FPGA-based hardware accelerator for high-performance data-stream processing

An approach to solving high-performance data-stream processing is proposed based on hardware solutions that use a field-programmable gate array. The described HDG hardware solution was successfully applied to video data streams. The computation capacity ...
A General-Purpose FPGA-Based Reconfigurable Platform for Video and Image Processing
ISNN 2009: Proceedings of the 6th International Symposium on Neural Networks: Advances in Neural Networks - Part III

This paper presents a general-purpose, multi-task, and reconfigurable platform for video and image processing. With the increasing requirements of processing power in many of today's video and image processing applications, it is important to go beyond ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

LCTES '20: The 21st ACM SIGPLAN/SIGBED Conference on Languages, Compilers, and Tools for Embedded Systems

June 2020

163 pages

ISBN:9781450370943

DOI:10.1145/3372799

General Chair:
Jingling Xue
UNSW Sydney, Australia
,
Program Chair:
Changhee Jung
Purdue University, USA

Copyright © 2020 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 16 June 2020

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Short-paper

Conference

LCTES '20

Sponsor:

LCTES '20: 21st ACM SIGPLAN/SIGBED Conference on Languages, Compilers, and Tools for Embedded Systems

June 16, 2020

London, United Kingdom

Acceptance Rates

Overall Acceptance Rate 116 of 438 submissions, 26%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

2
Total Citations
View Citations
523
Total Downloads

Downloads (Last 12 months)91
Downloads (Last 6 weeks)17

Reflects downloads up to 25 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Fakhry DAbdelsalam MEl-Kharashi MSafar M(2023)A review on computational storage devices and near memory computing for high performance applicationsMemories - Materials, Devices, Circuits and Systems10.1016/j.memori.2023.1000514(100051)Online publication date: Jul-2023
https://doi.org/10.1016/j.memori.2023.100051
Farahpour NHao YFang ZReinman G(2020)Reconfigurable Accelerator Compute Hierarchy: A Case Study using Content-Based Image Retrieval2020 IEEE International Symposium on Workload Characterization (IISWC)10.1109/IISWC50251.2020.00034(276-287)Online publication date: Oct-2020
https://doi.org/10.1109/IISWC50251.2020.00034

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents