Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Two Reconfigurable NDP Servers: Understanding the Impact of Near-Data Processing on Data Center Applications

Published: 15 October 2021 Publication History

Abstract

Existing near-data processing (NDP)-powered architectures have demonstrated their strength for some data-intensive applications. Data center servers, however, have to serve not only data-intensive but also compute-intensive applications. An in-depth understanding of the impact of NDP on various data center applications is still needed. For example, can a compute-intensive application also benefit from NDP? In addition, current NDP techniques focus on maximizing the data processing rate by always utilizing all computing resources at all times. Is this “always running in full gear” strategy consistently beneficial for an application? To answer these questions, we first propose two reconfigurable NDP-powered servers called RANS (Reconfigurable ARM-based NDP Server) and RFNS (Reconfigurable FPGA-based NDP Server). Next, we implement a single-engine prototype for each of them based on a conventional data center and then evaluate their effectiveness. Experimental results measured from the two prototypes are then extrapolated to estimate the properties of the two full-size reconfigurable NDP servers. Finally, several new findings are presented. For example, we find that while RANS can only benefit data-intensive applications, RFNS can offer benefits for both data-intensive and compute-intensive applications. Moreover, we find that for certain applications the reconfigurability of RANS/RFNS can deliver noticeable energy efficiency without any performance degradation.

References

[1]
A. Adya, D. Myers, J. Howell, J. Elson, C. Meek, V. Khemani, Stefan Fulger, Pan Gu, Lakshminath Bhuvanagiri, Jason Hunter, R. Peon, Larry Kai, A. Shraer, Arif Merchant, and Kfir Lev-Ari. 2016. Slicer: Auto-sharding for datacenter applications. In 11th USENIX Symposium on Operating Systems Design and Implementation (OSDI'16). 739–753.
[2]
Junwhan Ahn, Sungpack Hong, Sungjoo Yoo, Onur Mutlu, and Kiyoung Choi. 2016. A scalable processing-in-memory accelerator for parallel graph processing. ACM SIGARCH Computer Architecture News 43, 3 (2016), 105–117.
[3]
Krste Asanovic and David Patterson. 2014. Firebox: A hardware building block for 2020 warehouse-scale computers. In 12th USENIX Conference on File and Storage Technologies (FAST'14). Keynote presentation.
[4]
Lakshmi N. Bairavasundaram, Garth R. Goodson, Shankar Pasupathy, and Jiri Schindler. 2007. An analysis of latent sector errors in disk drives. In Proceedings of the 2007 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems (SIGMETRICS'07). ACM, 289–300.
[5]
Rajeev Balasubramonian, Jichuan Chang, Troy Manning, Jaime H. Moreno, Richard Murphy, Ravi Nair, and Steven Swanson. 2014. Near-data processing: Insights from a micro-46 workshop. IEEE Micro 34, 4 (2014), 36–42.
[6]
Antonio Barbalace, Anthony Iliopoulos, Holm Rauchfuss, and Goetz Brasche. 2017. It's time to think about an operating system for near data processing architectures. In Workshop on Hot Topics in Operating Systems. ACM, 56–61.
[7]
Jing Bi, Haitao Yuan, Wei Tan, MengChu Zhou, Yushun Fan, Jia Zhang, and Jianqiang Li. 2017. Application-aware dynamic fine-grained resource provisioning in a virtualized cloud data center. IEEE Transactions on Automation Science and Engineering 14, 2 (2017), 1172–1184.
[8]
Amirali Boroumand, Saugata Ghose, Minesh Patel, Hasan Hassan, Brandon Lucia, Kevin Hsieh, Krishna T. Malladi, Hongzhong Zheng, and Onur Mutlu. 2017. LazyPIM: An efficient cache coherence mechanism for processing-in-memory. IEEE Computer Architecture Letters 16, 1 (2017), 46–50.
[9]
C. L. Philip Chen and Chun-Yang Zhang. 2014. Data-intensive applications, challenges, techniques and technologies: A survey on big data. Information Sciences 275 (2014), 314–347.
[10]
Sangyeun Cho, Chanik Park, Hyunok Oh, Sungchan Kim, Youngmin Yi, and Gregory R. Ganger. 2013. Active disk meets flash: A case for intelligent SSDs. In 27th International ACM Conference on International Conference on Supercomputing (ICS'13). ACM, 91–102.
[11]
Seokhei Cho, Changhyun Park, Youjip Won, Sooyong Kang, Jaehyuk Cha, Sungroh Yoon, and Jongmoo Choi. 2015. Design tradeoffs of SSDs: From energy consumption's perspective. ACM Transactions on Storage (TOS) 11, 2 (2015), 8.
[12]
[13]
George S. Davidson, Jim R. Cowie, Stephen C. Helmreich, Ron A. Zacharski, and Kevin W. Boyack. 2006. Data-centric Computing with the Netezza Architecture. Technical Report. Sandia National Laboratories.
[14]
Arup De, Maya Gokhale, Rajesh Gupta, and Steven Swanson. 2013. Minerva: Accelerating data analysis in next-generation SSDs. In Field-Programmable Custom Computing Machines (FCCM'13). IEEE, 9–16.
[15]
Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. Imagenet: A large-scale hierarchical image database. In IEEE Conference on Computer Vision and Pattern Recognition, 2009 (CVPR'09). IEEE, 248–255.
[16]
Jaeyoung Do, Yang-Suk Kee, Jignesh M. Patel, Chanik Park, Kwanghyun Park, and David J. DeWitt. 2013. Query processing on smart SSDs: Opportunities and challenges. In Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data. ACM, 1221–1230.
[17]
Jaliya Ekanayake, Shrideep Pallickara, and Geoffrey Fox. [n. d.]. Mapreduce for data intensive scientific analyses. In eScience'08. IEEE, 277–284.
[18]
[19]
Mingyu Gao, Grant Ayers, and Christos Kozyrakis. 2015. Practical near-data processing for in-memory analytics frameworks. In 2015 International Conference on Parallel Architecture and Compilation (PACT'15). IEEE, 113–124.
[20]
Boncheol Gu, Andre S. Yoon, Duck-Ho Bae, Insoon Jo, Jinyoung Lee, Jonghyun Yoon, Jeong-Uk Kang, Kwon Chanho Yoon, Sangyeun Cho, Jaeheon Jeong, and Duckhyun Chang. 2016. Biscuit: A framework for near-data processing of big data workloads. In 43rd Annual International Symposium on Computer Architecture (ISCA'16). ACM/IEEE, 153–165.
[21]
Hongjiang He and Hui Guo. 2008. The realization of FFT algorithm based on FPGA co-processor. In Second International Symposium on Intelligent Information Technology Application, 2008 (IITA'08). Vol. 3. IEEE, 239–243.
[22]
DIODES Incorporated. 2018. PI3DBS16212, 2:1 Mux/De-Mux Switch. https://www.diodes.com/assets/Databriefs/PI3DBS16212-Product-Brief.pdf.
[24]
Zsolt István, David Sidler, and Gustavo Alonso. 2017. Caribou: Intelligent distributed storage. Proceedings of the VLDB Endowment 10, 11 (2017), 1202–1213.
[25]
Insoon Jo, Duck-Ho Bae, Andre S. Yoon, Jeong-Uk Kang, Sangyeun Cho, Daniel D. G. Lee, and Jaeheon Jeong. 2016. YourSQL: A high-performance database system leveraging in-storage computing. Proceedings of the VLDB Endowment 9, 12 (2016), 924–935.
[26]
Sang-Woo Jun, Ming Liu, Sungjin Lee, Jamey Hicks, John Ankcorn, Myron King, Shuotao Xu, and Arvind. 2015. BlueDBM: An appliance for big data analytics. In 42nd Annual International Symposium on Computer Architecture (ISCA'15). ACM/IEEE, 1–13.
[27]
Dzmitry Kliazovich, Pascal Bouvry, and Samee Ullah Khan. 2013. Simulation and performance analysis of data intensive and workload intensive cloud computing data centers. In Optical Interconnects for Future Data Center Networks. Springer, 47–63.
[28]
Gunjae Koo, Kiran Kumar Matam, H. V. Narra, Jing Li, Hung-Wei Tseng, Steven Swanson, Murali Annavaram, et al. 2017. Summarizer: Trading communication with computing near storage. In Proceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture. ACM, 219–231.
[29]
David Mayhew and Venkata Krishnan. 2003. PCI express and advanced switching: Evolutionary path to building next generation interconnects. In Proceedings of the 11th Symposium on High Performance Interconnects, 2003. IEEE, 21–29.
[30]
Eduardo Pinheiro, Wolf-Dietrich Weber, and Luiz André Barroso. 2007. Failure trends in a large disk drive population. In 5th USENIX Conference on File and Storage Technologies (FAST'07). USENIX Association.
[31]
David Putzolu, Sanjay Bakshi, Satyendra Yadav, and Raj Yavatkar. 2000. The phoenix framework: A practical architecture for programmable networks. IEEE Communications Magazine 38, 3 (2000), 160–165.
[33]
Samsung. 2016. SmartSSD® Computational Storage Drive. https://samsungsemiconductor-us.com/smartssd//.
[35]
Marco Serafini, Essam Mansour, Ashraf Aboulnaga, Kenneth Salem, Taha Rafiq, and Umar Farooq Minhas. 2014. Accordion: Elastic Scalability for Database Systems Supporting Distributed Transactions. Proceedings of the VLDB Endowment 7, 12 (2014), 1035–1046.
[36]
Xiaojia Song, Tao Xie, and Stephen Fischer. 2019. A near-data processing server architecture and its impact on data center applications. In International Conference on High Performance Computing. Springer, 81–98.
[37]
Xiaojia Song, Tao Xie, and Wen Pan. 2018. RISP: A reconfigurable in-storage processing framework with energy-awareness. In 2018 18th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID'18). IEEE, 193–202.
[38]
Rebecca Taft, E. Mansour, M. Serafini, J. Duggan, Aaron J. Elmore, A. Aboulnaga, A. Pavlo, and M. Stonebraker. 2014. E-store: Fine-grained elastic partitioning for distributed transaction processing. Proceedings of the VLDB Endowment 8 (2014), 245–256.
[39]
Justin Talbot, Richard M. Yoo, and Christos Kozyrakis. 2011. Phoenix++: Modular mapreduce for shared-memory systems. In Proceedings of the 2nd International Workshop on MapReduce and Its Applications. ACM, 9–16.
[40]
Devesh Tiwari, Simona Boboila, Sudharshan S. Vazhkudai, Youngjae Kim, Xiaosong Ma, Peter Desnoyers, and Yan Solihin. 2013. Active flash: Towards energy-efficient, in-situ data analytics on extreme-scale machines. In 11th USENIX Conference on File and Storage Technologies (FAST'13). 119–132.
[41]
Endong Wang, Qing Zhang, Bo Shen, Guangyong Zhang, Xiaowei Lu, Qing Wu, and Yajuan Wang. 2014. Intel math kernel library. In High-Performance Computing on the Intel® Xeon Phi\(^{™}\). Springer, 167–188.
[42]
Louis Woods, Zsolt István, and Gustavo Alonso. 2014. Ibex: An intelligent storage engine with support for advanced SQL offloading. Proceedings of the VLDB Endowment 7, 11 (2014), 963–974.
[43]
Caesar Wu, Rajkumar Buyya, and Kotagiri Ramamohanarao. 2016. Big data analytics = machine learning + cloud computing. arXiv preprint:1601.03115 (2016).
[44]
Xing Wu and Frank Mueller. 2011. ScalaExtrap: Trace-based communication extrapolation for spmd programs. In Proceedings of the 16th ACM Symposium on Principles and Practice of Parallel Programming (PPoPP'11). ACM, 113–122.
[45]
Wm A. Wulf and Sally A. McKee. 1995. Hitting the memory wall: Implications of the obvious. ACM SIGARCH Computer Architecture News 23, 1 (1995), 20–24.
[46]
Xilinx. 2017. Xilinx Xilinx Virtex UltraScale+ FPGA VCU1525. https://www.xilinx.com/products/boards-and-kits/vcu1525-a.html.
[47]
Masato Yoshimi, Yasin Oge, and Tsutomu Yoshinaga. 2017. Pipelined parallel join and its FPGA-based acceleration. ACM Transactions on Reconfigurable Technology and Systems (TRETS) 10, 4 (2017), 28.
[48]
Dongping Zhang, Nuwan Jayasena, Alexander Lyashevsky, Joseph L. Greathouse, Lifan Xu, and Michael Ignatowski. 2014. TOP-PIM: Throughput-oriented programmable processing in memory. In the 23rd International Symposium on High-performance Parallel and Distributed Computing (HPDC'14). ACM, 85–98.

Cited By

View all
  • (2025)ProckStore: An NDP-empowered key-value store with asynchronous and multi-threaded compaction scheme for optimized performanceJournal of Systems Architecture10.1016/j.sysarc.2025.103342160(103342)Online publication date: Mar-2025
  • (2024)UNIFICO: Thread Migration in Heterogeneous-ISA CPUs without State TransformationProceedings of the 33rd ACM SIGPLAN International Conference on Compiler Construction10.1145/3640537.3641565(86-99)Online publication date: 17-Feb-2024
  • (2024)Nearest data processing in GPUSustainable Computing: Informatics and Systems10.1016/j.suscom.2024.10104744(101047)Online publication date: Dec-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Storage
ACM Transactions on Storage  Volume 17, Issue 4
November 2021
201 pages
ISSN:1553-3077
EISSN:1553-3093
DOI:10.1145/3487989
  • Editor:
  • Sam H. Noh
Issue’s Table of Contents

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 15 October 2021
Accepted: 01 April 2021
Revised: 01 March 2021
Received: 01 July 2020
Published in TOS Volume 17, Issue 4

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Near data processing
  2. FPGA
  3. ARM
  4. NDP server
  5. reconfigurability
  6. data center applications
  7. data-intensive
  8. compute-intensive

Qualifiers

  • Research-article
  • Refereed

Funding Sources

  • National Science Foundation

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)54
  • Downloads (Last 6 weeks)4
Reflects downloads up to 10 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2025)ProckStore: An NDP-empowered key-value store with asynchronous and multi-threaded compaction scheme for optimized performanceJournal of Systems Architecture10.1016/j.sysarc.2025.103342160(103342)Online publication date: Mar-2025
  • (2024)UNIFICO: Thread Migration in Heterogeneous-ISA CPUs without State TransformationProceedings of the 33rd ACM SIGPLAN International Conference on Compiler Construction10.1145/3640537.3641565(86-99)Online publication date: 17-Feb-2024
  • (2024)Nearest data processing in GPUSustainable Computing: Informatics and Systems10.1016/j.suscom.2024.10104744(101047)Online publication date: Dec-2024
  • (2022)Horae: A Hybrid I/O Request Scheduling Technique for Near-Data Processing-Based SSDIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2022.319751841:11(3803-3813)Online publication date: 1-Nov-2022
  • (2022)Near Data Processing in Taurus Database2022 IEEE 38th International Conference on Data Engineering (ICDE)10.1109/ICDE53745.2022.00170(1662-1674)Online publication date: May-2022
  • (2022)Accelerating kNN search in high dimensional datasets on FPGA by reducing external memory accessFuture Generation Computer Systems10.1016/j.future.2022.07.009137:C(189-200)Online publication date: 1-Dec-2022

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Full Text

View this article in Full Text.

Full Text

HTML Format

View this article in HTML Format.

HTML Format

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media