research-article

Two Reconfigurable NDP Servers: Understanding the Impact of Near-Data Processing on Data Center Applications

Authors:

Stephen FischerAuthors Info & Claims

ACM Transactions on Storage (TOS), Volume 17, Issue 4

Article No.: 31, Pages 1 - 27

https://doi.org/10.1145/3460201

Published: 15 October 2021 Publication History

Abstract

Existing near-data processing (NDP)-powered architectures have demonstrated their strength for some data-intensive applications. Data center servers, however, have to serve not only data-intensive but also compute-intensive applications. An in-depth understanding of the impact of NDP on various data center applications is still needed. For example, can a compute-intensive application also benefit from NDP? In addition, current NDP techniques focus on maximizing the data processing rate by always utilizing all computing resources at all times. Is this “always running in full gear” strategy consistently beneficial for an application? To answer these questions, we first propose two reconfigurable NDP-powered servers called RANS (Reconfigurable ARM-based NDP Server) and RFNS (Reconfigurable FPGA-based NDP Server). Next, we implement a single-engine prototype for each of them based on a conventional data center and then evaluate their effectiveness. Experimental results measured from the two prototypes are then extrapolated to estimate the properties of the two full-size reconfigurable NDP servers. Finally, several new findings are presented. For example, we find that while RANS can only benefit data-intensive applications, RFNS can offer benefits for both data-intensive and compute-intensive applications. Moreover, we find that for certain applications the reconfigurability of RANS/RFNS can deliver noticeable energy efficiency without any performance degradation.

References

[1]

A. Adya, D. Myers, J. Howell, J. Elson, C. Meek, V. Khemani, Stefan Fulger, Pan Gu, Lakshminath Bhuvanagiri, Jason Hunter, R. Peon, Larry Kai, A. Shraer, Arif Merchant, and Kfir Lev-Ari. 2016. Slicer: Auto-sharding for datacenter applications. In 11th USENIX Symposium on Operating Systems Design and Implementation (OSDI'16). 739–753.

Digital Library

[2]

Junwhan Ahn, Sungpack Hong, Sungjoo Yoo, Onur Mutlu, and Kiyoung Choi. 2016. A scalable processing-in-memory accelerator for parallel graph processing. ACM SIGARCH Computer Architecture News 43, 3 (2016), 105–117.

Digital Library

[3]

Krste Asanovic and David Patterson. 2014. Firebox: A hardware building block for 2020 warehouse-scale computers. In 12th USENIX Conference on File and Storage Technologies (FAST'14). Keynote presentation.

[4]

Lakshmi N. Bairavasundaram, Garth R. Goodson, Shankar Pasupathy, and Jiri Schindler. 2007. An analysis of latent sector errors in disk drives. In Proceedings of the 2007 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems (SIGMETRICS'07). ACM, 289–300.

Digital Library

[5]

Rajeev Balasubramonian, Jichuan Chang, Troy Manning, Jaime H. Moreno, Richard Murphy, Ravi Nair, and Steven Swanson. 2014. Near-data processing: Insights from a micro-46 workshop. IEEE Micro 34, 4 (2014), 36–42.

[6]

Antonio Barbalace, Anthony Iliopoulos, Holm Rauchfuss, and Goetz Brasche. 2017. It's time to think about an operating system for near data processing architectures. In Workshop on Hot Topics in Operating Systems. ACM, 56–61.

Digital Library

[7]

Jing Bi, Haitao Yuan, Wei Tan, MengChu Zhou, Yushun Fan, Jia Zhang, and Jianqiang Li. 2017. Application-aware dynamic fine-grained resource provisioning in a virtualized cloud data center. IEEE Transactions on Automation Science and Engineering 14, 2 (2017), 1172–1184.

[8]

Amirali Boroumand, Saugata Ghose, Minesh Patel, Hasan Hassan, Brandon Lucia, Kevin Hsieh, Krishna T. Malladi, Hongzhong Zheng, and Onur Mutlu. 2017. LazyPIM: An efficient cache coherence mechanism for processing-in-memory. IEEE Computer Architecture Letters 16, 1 (2017), 46–50.

Digital Library

[9]

C. L. Philip Chen and Chun-Yang Zhang. 2014. Data-intensive applications, challenges, techniques and technologies: A survey on big data. Information Sciences 275 (2014), 314–347.

[10]

Sangyeun Cho, Chanik Park, Hyunok Oh, Sungchan Kim, Youngmin Yi, and Gregory R. Ganger. 2013. Active disk meets flash: A case for intelligent SSDs. In 27th International ACM Conference on International Conference on Supercomputing (ICS'13). ACM, 91–102.

Digital Library

[11]

Seokhei Cho, Changhyun Park, Youjip Won, Sooyong Kang, Jaehyuk Cha, Sungroh Yoon, and Jongmoo Choi. 2015. Design tradeoffs of SSDs: From energy consumption's perspective. ACM Transactions on Storage (TOS) 11, 2 (2015), 8.

Digital Library

[12]

CNXSoft. 2015. AllWinner A64 a quad core 64-bit ARM cortex A53 SoC for tablets. https://www.cnx-software.com/2015/01/08/allwinner-a64-is-a-5-quad-core-64-bit-arm-cortex-a53-soc-for-tablets/z.

[13]

George S. Davidson, Jim R. Cowie, Stephen C. Helmreich, Ron A. Zacharski, and Kevin W. Boyack. 2006. Data-centric Computing with the Netezza Architecture. Technical Report. Sandia National Laboratories.

[14]

Arup De, Maya Gokhale, Rajesh Gupta, and Steven Swanson. 2013. Minerva: Accelerating data analysis in next-generation SSDs. In Field-Programmable Custom Computing Machines (FCCM'13). IEEE, 9–16.

Digital Library

[15]

Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. Imagenet: A large-scale hierarchical image database. In IEEE Conference on Computer Vision and Pattern Recognition, 2009 (CVPR'09). IEEE, 248–255.

[16]

Jaeyoung Do, Yang-Suk Kee, Jignesh M. Patel, Chanik Park, Kwanghyun Park, and David J. DeWitt. 2013. Query processing on smart SSDs: Opportunities and challenges. In Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data. ACM, 1221–1230.

Digital Library

[17]

Jaliya Ekanayake, Shrideep Pallickara, and Geoffrey Fox. [n. d.]. Mapreduce for data intensive scientific analyses. In eScience'08. IEEE, 277–284.

Digital Library

[18]

Fidus Systems Inc. 2017. Fidus Sidewinder-100. https://www.xilinx.com/products/boards-and-kits/1-o1x8yv.html.

[19]

Mingyu Gao, Grant Ayers, and Christos Kozyrakis. 2015. Practical near-data processing for in-memory analytics frameworks. In 2015 International Conference on Parallel Architecture and Compilation (PACT'15). IEEE, 113–124.

Digital Library

[20]

Boncheol Gu, Andre S. Yoon, Duck-Ho Bae, Insoon Jo, Jinyoung Lee, Jonghyun Yoon, Jeong-Uk Kang, Kwon Chanho Yoon, Sangyeun Cho, Jaeheon Jeong, and Duckhyun Chang. 2016. Biscuit: A framework for near-data processing of big data workloads. In 43rd Annual International Symposium on Computer Architecture (ISCA'16). ACM/IEEE, 153–165.

Digital Library

[21]

Hongjiang He and Hui Guo. 2008. The realization of FFT algorithm based on FPGA co-processor. In Second International Symposium on Intelligent Information Technology Application, 2008 (IITA'08). Vol. 3. IEEE, 239–243.

Digital Library

[22]

DIODES Incorporated. 2018. PI3DBS16212, 2:1 Mux/De-Mux Switch. https://www.diodes.com/assets/Databriefs/PI3DBS16212-Product-Brief.pdf.

[23]

Intel. 2017. Intel® Xeon® Gold 6154 Processor. https://ark.intel.com/products/120495/Intel-Xeon-Gold-6154-Processor-24_75M-Cache-3_00-GHz.

[24]

Zsolt István, David Sidler, and Gustavo Alonso. 2017. Caribou: Intelligent distributed storage. Proceedings of the VLDB Endowment 10, 11 (2017), 1202–1213.

Digital Library

[25]

Insoon Jo, Duck-Ho Bae, Andre S. Yoon, Jeong-Uk Kang, Sangyeun Cho, Daniel D. G. Lee, and Jaeheon Jeong. 2016. YourSQL: A high-performance database system leveraging in-storage computing. Proceedings of the VLDB Endowment 9, 12 (2016), 924–935.

Digital Library

[26]

Sang-Woo Jun, Ming Liu, Sungjin Lee, Jamey Hicks, John Ankcorn, Myron King, Shuotao Xu, and Arvind. 2015. BlueDBM: An appliance for big data analytics. In 42nd Annual International Symposium on Computer Architecture (ISCA'15). ACM/IEEE, 1–13.

Digital Library

[27]

Dzmitry Kliazovich, Pascal Bouvry, and Samee Ullah Khan. 2013. Simulation and performance analysis of data intensive and workload intensive cloud computing data centers. In Optical Interconnects for Future Data Center Networks. Springer, 47–63.

[28]

Gunjae Koo, Kiran Kumar Matam, H. V. Narra, Jing Li, Hung-Wei Tseng, Steven Swanson, Murali Annavaram, et al. 2017. Summarizer: Trading communication with computing near storage. In Proceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture. ACM, 219–231.

Digital Library

[29]

David Mayhew and Venkata Krishnan. 2003. PCI express and advanced switching: Evolutionary path to building next generation interconnects. In Proceedings of the 11th Symposium on High Performance Interconnects, 2003. IEEE, 21–29.

[30]

Eduardo Pinheiro, Wolf-Dietrich Weber, and Luiz André Barroso. 2007. Failure trends in a large disk drive population. In 5th USENIX Conference on File and Storage Technologies (FAST'07). USENIX Association.

Digital Library

[31]

David Putzolu, Sanjay Bakshi, Satyendra Yadav, and Raj Yavatkar. 2000. The phoenix framework: A practical architecture for programmable networks. IEEE Communications Magazine 38, 3 (2000), 160–165.

Digital Library

[32]

Rodinia. 2009. Rodinia: Accelerating compute-intensive applications with accelerators. http://www.cs.virginia.edu/ skadron/wiki/rodinia/index.php/Rodinia:Accelerating_Compute-Intensive_Applications_with_Acceleratorsz.

[33]

Samsung. 2016. SmartSSD® Computational Storage Drive. https://samsungsemiconductor-us.com/smartssd//.

[34]

Samsung. 2017. Mission Peak NGSFF All Flash NVMe Reference Design. http://www.samsung.com/semiconductor/insights/tech-leadership/mission-peak-ngsff-all-flash-nvme-reference-design/.

[35]

Marco Serafini, Essam Mansour, Ashraf Aboulnaga, Kenneth Salem, Taha Rafiq, and Umar Farooq Minhas. 2014. Accordion: Elastic Scalability for Database Systems Supporting Distributed Transactions. Proceedings of the VLDB Endowment 7, 12 (2014), 1035–1046.

Digital Library

[36]

Xiaojia Song, Tao Xie, and Stephen Fischer. 2019. A near-data processing server architecture and its impact on data center applications. In International Conference on High Performance Computing. Springer, 81–98.

[37]

Xiaojia Song, Tao Xie, and Wen Pan. 2018. RISP: A reconfigurable in-storage processing framework with energy-awareness. In 2018 18th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID'18). IEEE, 193–202.

Digital Library

[38]

Rebecca Taft, E. Mansour, M. Serafini, J. Duggan, Aaron J. Elmore, A. Aboulnaga, A. Pavlo, and M. Stonebraker. 2014. E-store: Fine-grained elastic partitioning for distributed transaction processing. Proceedings of the VLDB Endowment 8 (2014), 245–256.

Digital Library

[39]

Justin Talbot, Richard M. Yoo, and Christos Kozyrakis. 2011. Phoenix++: Modular mapreduce for shared-memory systems. In Proceedings of the 2nd International Workshop on MapReduce and Its Applications. ACM, 9–16.

Digital Library

[40]

Devesh Tiwari, Simona Boboila, Sudharshan S. Vazhkudai, Youngjae Kim, Xiaosong Ma, Peter Desnoyers, and Yan Solihin. 2013. Active flash: Towards energy-efficient, in-situ data analytics on extreme-scale machines. In 11th USENIX Conference on File and Storage Technologies (FAST'13). 119–132.

Digital Library

[41]

Endong Wang, Qing Zhang, Bo Shen, Guangyong Zhang, Xiaowei Lu, Qing Wu, and Yajuan Wang. 2014. Intel math kernel library. In High-Performance Computing on the Intel® Xeon Phi\(^{™}\). Springer, 167–188.

Digital Library

[42]

Louis Woods, Zsolt István, and Gustavo Alonso. 2014. Ibex: An intelligent storage engine with support for advanced SQL offloading. Proceedings of the VLDB Endowment 7, 11 (2014), 963–974.

Digital Library

[43]

Caesar Wu, Rajkumar Buyya, and Kotagiri Ramamohanarao. 2016. Big data analytics = machine learning + cloud computing. arXiv preprint:1601.03115 (2016).

[44]

Xing Wu and Frank Mueller. 2011. ScalaExtrap: Trace-based communication extrapolation for spmd programs. In Proceedings of the 16th ACM Symposium on Principles and Practice of Parallel Programming (PPoPP'11). ACM, 113–122.

Digital Library

[45]

Wm A. Wulf and Sally A. McKee. 1995. Hitting the memory wall: Implications of the obvious. ACM SIGARCH Computer Architecture News 23, 1 (1995), 20–24.

Digital Library

[46]

Xilinx. 2017. Xilinx Xilinx Virtex UltraScale+ FPGA VCU1525. https://www.xilinx.com/products/boards-and-kits/vcu1525-a.html.

[47]

Masato Yoshimi, Yasin Oge, and Tsutomu Yoshinaga. 2017. Pipelined parallel join and its FPGA-based acceleration. ACM Transactions on Reconfigurable Technology and Systems (TRETS) 10, 4 (2017), 28.

Digital Library

[48]

Dongping Zhang, Nuwan Jayasena, Alexander Lyashevsky, Joseph L. Greathouse, Lifan Xu, and Michael Ignatowski. 2014. TOP-PIM: Throughput-oriented programmable processing in memory. In the 23rd International Symposium on High-performance Parallel and Distributed Computing (HPDC'14). ACM, 85–98.

Digital Library

Cited By

Sun HZhao CYue YQin X(2025)ProckStore: An NDP-empowered key-value store with asynchronous and multi-threaded compaction scheme for optimized performanceJournal of Systems Architecture10.1016/j.sysarc.2025.103342160(103342)Online publication date: Mar-2025
https://doi.org/10.1016/j.sysarc.2025.103342
Mavrogeorgis NVasiladiotis CMu PKhordadi AFranke BBarbalace ARodríguez GSadayappan PSukumaran-Rajam A(2024)UNIFICO: Thread Migration in Heterogeneous-ISA CPUs without State TransformationProceedings of the 33rd ACM SIGPLAN International Conference on Compiler Construction10.1145/3640537.3641565(86-99)Online publication date: 17-Feb-2024
https://dl.acm.org/doi/10.1145/3640537.3641565
Bitalebi HSafaei FEbrahimi M(2024)Nearest data processing in GPUSustainable Computing: Informatics and Systems10.1016/j.suscom.2024.10104744(101047)Online publication date: Dec-2024
https://doi.org/10.1016/j.suscom.2024.101047
Show More Cited By

Index Terms

Two Reconfigurable NDP Servers: Understanding the Impact of Near-Data Processing on Data Center Applications
1. Computer systems organization
  1. Architectures
    1. Other architectures
      1. Heterogeneous (hybrid) systems
      2. Reconfigurable computing
  2. Dependable and fault-tolerant systems and networks
    1. Processors and memory architectures
    2. Secondary storage organization

Recommendations

Partitioning signal processing applications to different granularity reconfigurable logic
SSIP'05: Proceedings of the 5th WSEAS international conference on Signal, speech and image processing

In this paper, we propose a methodology for partitioning DSP applications between the fine and coarse-grain reconfigurable hardware for improving performance. The fine-grain logic is implemented by an embedded FPGA unit, while for the coarse-grain ...
Accelerating Big Data Analytics Using FPGAs
FCCM '15: Proceedings of the 2015 IEEE 23rd Annual International Symposium on Field-Programmable Custom Computing Machines

Emerging big data analytics applications require a significant amount of server computational power. As chips are hitting power limits, computing systems are moving away from general-purpose designs and toward greater specialization. Hardware ...
Combining Hadoop with MPI to Solve Metagenomics Problems that are both Data- and Compute-intensive

Metagenomics, the study of all microbial species cohabitants in an environment, often produces large amount of sequence data varying from several GBs to a few TBs. Analyzing metagenomics data includes both data-intensive and compute-intensive steps, ...

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Storage

ACM Transactions on Storage Volume 17, Issue 4

November 2021

201 pages

ISSN:1553-3077

EISSN:1553-3093

DOI:10.1145/3487989

Editor:
Sam H. Noh
Ulsan National Institute of Science and Technology, Ulsan, Republic of Korea

Issue’s Table of Contents

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 15 October 2021

Accepted: 01 April 2021

Revised: 01 March 2021

Received: 01 July 2020

Published in TOS Volume 17, Issue 4

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Refereed

Funding Sources

National Science Foundation

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

6
Total Citations
View Citations
426
Total Downloads

Downloads (Last 12 months)54
Downloads (Last 6 weeks)4

Reflects downloads up to 10 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Sun HZhao CYue YQin X(2025)ProckStore: An NDP-empowered key-value store with asynchronous and multi-threaded compaction scheme for optimized performanceJournal of Systems Architecture10.1016/j.sysarc.2025.103342160(103342)Online publication date: Mar-2025
https://doi.org/10.1016/j.sysarc.2025.103342
Mavrogeorgis NVasiladiotis CMu PKhordadi AFranke BBarbalace ARodríguez GSadayappan PSukumaran-Rajam A(2024)UNIFICO: Thread Migration in Heterogeneous-ISA CPUs without State TransformationProceedings of the 33rd ACM SIGPLAN International Conference on Compiler Construction10.1145/3640537.3641565(86-99)Online publication date: 17-Feb-2024
https://dl.acm.org/doi/10.1145/3640537.3641565
Bitalebi HSafaei FEbrahimi M(2024)Nearest data processing in GPUSustainable Computing: Informatics and Systems10.1016/j.suscom.2024.10104744(101047)Online publication date: Dec-2024
https://doi.org/10.1016/j.suscom.2024.101047
Li JChen XLiu DLi LWang JZeng ZTan YQiao L(2022)Horae: A Hybrid I/O Request Scheduling Technique for Near-Data Processing-Based SSDIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2022.319751841:11(3803-3813)Online publication date: 1-Nov-2022
https://dl.acm.org/doi/10.1109/TCAD.2022.3197518
Lin SMarathe ALarson PChen CSun CLee PYu WLi JMeng JLin RChenxi XZhuxii Q(2022)Near Data Processing in Taurus Database2022 IEEE 38th International Conference on Data Engineering (ICDE)10.1109/ICDE53745.2022.00170(1662-1674)Online publication date: May-2022
https://doi.org/10.1109/ICDE53745.2022.00170
Song XXie TFischer S(2022)Accelerating kNN search in high dimensional datasets on FPGA by reducing external memory accessFuture Generation Computer Systems10.1016/j.future.2022.07.009137:C(189-200)Online publication date: 1-Dec-2022
https://dl.acm.org/doi/10.1016/j.future.2022.07.009

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Full Text

View this article in Full Text.

HTML Format

View this article in HTML Format.

Figures

Tables

Media

View full text|Download PDF

View Issue’s Table of Contents