research-article

A Full-System Perspective on UPMEM Performance

Authors:

Marcel Lütke Dreimann,

Olaf SpinczykAuthors Info & Claims

DIMES '23: Proceedings of the 1st Workshop on Disruptive Memory Systems

Pages 1 - 7

https://doi.org/10.1145/3609308.3625266

Published: 23 October 2023 Publication History

Abstract

Recently, UPMEM has introduced the first commercially available processing in memory (PIM) platform. Its key feature are DRAM memory chips with built-in RISC CPUs for in-memory data processing. Naturally, this has sparked interest in the research community, which previously was limited to PIM simulators and custom FPGA prototypes. One result of this is the PrIM benchmark suite that combines an in-depth analysis of PIM performance with benchmarks that measure the speedup of PIM over processing on conventional CPUs and GPUs [10]. However, the current generation of UPMEM PIM faces limitations such as memory interleaving, and as such does not provide true in-memory computing. Applications must store data in DRAM and transfer it to/from UPMEM modules for processing, which behave just like computational offloading engines from this perspective. This paper examines the ramifications of treating them as such in comparative performance benchmarks. By extending the PrIM suite to address the challenges that computational offloading benchmarks face, we show that such a full-system perspective can drastically alter offloading recommendations, with 9 of 11 previously UPMEM-friendly benchmarks now performing best on a conventional server CPU.

References

[1]

D. H. Bailey, E. Barszcz, J. T. Barton, D. S. Browning, R. L. Carter, L. Dagum, R. A. Fatoohi, P. O. Frederickson, T. A. Lasinski, R. S. Schreiber, H. D. Simon, V. Venkatakrishnan, and S. K. Weeratunga. 1991. The NAS Parallel Benchmarks---summary and Preliminary Results. In Proceedings of the 1991 ACM/IEEE Conference on Supercomputing (Albuquerque, New Mexico, USA) (Supercomputing '91). Association for Computing Machinery, New York, NY, USA, 158--165.

Digital Library

[2]

Alexander Baumstark, Muhammad Attahir Jibril, and Kai-Uwe Sattler. 2023. Accelerating Large Table Scan using Processing-In-Memory Technology. In BTW 2023. Gesellschaft für Informatik e.V., Bonn, 797--814.

[3]

Stefano Corda, Madhurya Kumaraswamy, Ahsan Javed Awan, Roel Jordans, Akash Kumar, and Henk Corporaal. 2021. NMPO: Near-Memory Computing Profiling and Offloading. In 2021 24th Euromicro Conference on Digital System Design (DSD). 259--267.

[4]

Stefano Corda, Gagandeep Singh, Ahsan Jawed Awan, Roel Jordans, and Henk Corporaal. 2019. Platform Independent Software Analysis for Near Memory Computing. In 2019 22nd Euromicro Conference on Digital System Design (DSD). 606--609.

[5]

Andrew Davison. 1995. Twelve Ways to Fool the Masses When Giving Performance Results on Parallel Computers. Supercomputing Review (August 1995), 54--55.

[6]

Fabrice Devaux. 2019. The true Processing In Memory accelerator. In 2019 IEEE Hot Chips 31 Symposium (HCS). 1--24.

[7]

François Duhem, Fabrice Muller, and Philippe Lorenzini. 2011. FaRM: Fast Reconfiguration Manager for Reducing Reconfiguration Time Overhead on FPGA. In Reconfigurable Computing: Architectures, Tools and Applications, Andreas Koch, Ram Krishnamurthy, John McAllister, Roger Woods, and Tarek El-Ghazawi (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 253--260.

[8]

Scott Grauer-Gray, Lifan Xu, Robert Searles, Sudhee Ayalasomayajula, and John Cavazos. 2012. Auto-tuning a high-level language targeted to GPU codes. In 2012 Innovative Parallel Computing (InPar). 1--10.

[9]

Khronos OpenCL Working Group. 2023. The OpenCL specification version 3.0.14. (2023). https://registry.khronos.org/OpenCL/specs/3.0-unified/pdf/OpenCL_API.pdf

[10]

Juan Gómez-Luna, Izzat El Hajj, Ivan Fernandez, Christina Giannoula, Geraldo F. Oliveira, and Onur Mutlu. 2022. Benchmarking a New Paradigm: Experimental Analysis and Characterization of a Real Processing-in-Memory System. IEEE Access 10 (2022), 52565--52608.

[11]

Torsten Hoefler and Roberto Belli. 2015. Scientific Benchmarking of Parallel Computing Systems: Twelve Ways to Tell the Masses When Reporting Performance Results. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (Austin, Texas) (SC '15). Association for Computing Machinery, New York, NY, USA, Article 73, 12 pages.

Digital Library

[12]

Cheol-Ho Hong, Ivor Spence, and Dimitrios S. Nikolopoulos. 2017. GPU Virtualization and Scheduling Methods: A Comprehensive Survey. ACM Comput. Surv. 50, 3, Article 35 (jun 2017), 37 pages.

Digital Library

[13]

Nina Ihde, Paula Marten, Ahmed Eleliemy, Gabrielle Poerwawinata, Pedro Silva, Ilin Tolovski, Florina M. Ciorba, and Tilmann Rabl. 2022. A Survey of Big Data, High Performance Computing, and Machine Learning Benchmarks. In Performance Evaluation and Benchmarking, Raghunath Nambiar and Meikel Poess (Eds.). Springer International Publishing, Cham, 98--118.

[14]

Donghun Lee, Andrew Chang, Minseon Ahn, Jongmin Gim, Jungmin Kim, Jaemin Jung, Kang-Woo Choi, Vincent Pham, Oliver Rebholz, Krishna T. Malladi, and Yang-Seok Ki. 2020. Optimizing Data Movement with Near-Memory Acceleration of In-memory DBMS. In Proceedings of the 23rd International Conference on Extending Database Technology, EDBT 2020, Copenhagen, Denmark, March 30 - April 02, 2020, Angela Bonifati, Yongluan Zhou, Marcos Antonio Vaz Salles, Alexander Böhm, Dan Olteanu, George H. L. Fletcher, Arijit Khan, and Bin Yang (Eds.). OpenProceedings.org, 371--374.

[15]

Victor W. Lee, Changkyu Kim, Jatin Chhugani, Michael Deisher, Daehyun Kim, Anthony D. Nguyen, Nadathur Satish, Mikhail Smelyanskiy, Srinivas Chennupaty, Per Hammarlund, Ronak Singhal, and Pradeep Dubey. 2010. Debunking the 100X GPU vs. CPU Myth: An Evaluation of Throughput Computing on CPU and GPU. SIGARCH Comput. Archit. News 38, 3 (jun 2010), 451--460.

Digital Library

[16]

Kyprianos Papadimitriou, Apostolos Dollas, and Scott Hauck. 2011. Performance of Partial Reconfiguration in FPGA Systems: A Survey and a Cost Model. ACM Trans. Reconfigurable Technol. Syst. 4, 4, Article 36 (dec 2011), 24 pages.

Digital Library

[17]

Albert Reuther, Peter Michaleas, Michael Jones, Vijay Gadepally, Siddharth Samsi, and Jeremy Kepner. 2019. Survey and Benchmarking of Machine Learning Accelerators. In 2019 IEEE High Performance Extreme Computing Conference (HPEC). 1--9.

[18]

Robert Schmid, Max Plauth, Lukas Wenzel, Felix Eberhardt, and Andreas Polze. 2020. Accessible Near-Storage Computing with FPGAs. In Proceedings of the Fifteenth European Conference on Computer Systems (Heraklion, Greece) (EuroSys '20). Association for Computing Machinery, New York, NY, USA, Article 28, 12 pages.

Digital Library

[19]

Janet Tseng, Ren Wang, James Tsai, Yipeng Wang, and Tsung-Yuan Charlie Tai. 2017. Accelerating Open VSwitch with Integrated GPU. In Proceedings of the Workshop on Kernel-Bypass Networks (Los Angeles, CA, USA) (KBNets '17). Association for Computing Machinery, New York, NY, USA, 7--12.

Digital Library

[20]

Yash Ukidave, Fanny Nina Paravecino, Leiming Yu, Charu Kalra, Amir Momeni, Zhongliang Chen, Nick Materise, Brett Daley, Perhaad Mistry, and David Kaeli. 2015. NUPAR: A Benchmark Suite for Modern GPU Architectures. In Proceedings of the 6th ACM/SPEC International Conference on Performance Engineering (Austin, Texas, USA) (ICPE '15). Association for Computing Machinery, New York, NY, USA, 253--264.

Digital Library

[21]

UPMEM. 2023. UPMEM SDK. https://sdk.upmem.com/ version 2023.1.0.

Cited By

Grzelka FKöhler SPolze A(2024)Novel Memory Technologies for Multi-Tenant Exploratory ProgrammingProceedings of the 2nd Workshop on Disruptive Memory Systems10.1145/3698783.3699379(60-63)Online publication date: 3-Nov-2024
https://dl.acm.org/doi/10.1145/3698783.3699379
Friesel BDreimann MSpinczyk O(2024)Performance Models for Task-based Scheduling with Disruptive Memory TechnologiesProceedings of the 2nd Workshop on Disruptive Memory Systems10.1145/3698783.3699376(1-8)Online publication date: 3-Nov-2024
https://dl.acm.org/doi/10.1145/3698783.3699376
Ramezanikebrya HRipeanu M(2024)(re)Assessing PiM Effectiveness for Sequence AlignmentEuro-Par 2024: Parallel Processing10.1007/978-3-031-69766-1_11(152-166)Online publication date: 26-Aug-2024
https://doi.org/10.1007/978-3-031-69766-1_11

Index Terms

A Full-System Perspective on UPMEM Performance
1. Computing methodologies
2. Hardware
  1. Emerging technologies
    1. Analysis and design of emerging devices and systems
      1. Emerging architectures
    2. Memory and dense storage

Recommendations

Large System Performance of SPEC OMP2001 Benchmarks
ISHPC '02: Proceedings of the 4th International Symposium on High Performance Computing

Performance characteristics of application programs on large-scale systems are often significantly different from those on smaller systems. SPEC OMP2001 is a benchmark suite intended for measuring performance of modern shared memory parallel systems. ...
Large System Performance of SPEC OMP2001 Benchmarks
ISHPC '02: Proceedings of the 4th International Symposium on High Performance Computing

Performance characteristics of application programs on large-scale systems are often significantly different from those on smaller systems. SPEC OMP2001 is a benchmark suite intended for measuring performance of modern shared memory parallel systems. ...
Memory Centric Characterization and Analysis of SPEC CPU2017 Suite
ICPE '19: Proceedings of the 2019 ACM/SPEC International Conference on Performance Engineering

In this paper, we provide a comprehensive, memory-centric characterization of the SPEC CPU2017 benchmark suite, using a number of mechanisms including dynamic binary instrumentation, measurements on native hardware using hardware performance counters and ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

DIMES '23: Proceedings of the 1st Workshop on Disruptive Memory Systems

October 2023

64 pages

ISBN:9798400703003

DOI:10.1145/3609308

Copyright © 2023 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGOPS: ACM Special Interest Group on Operating Systems

In-Cooperation

USENIX

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 23 October 2023

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

Deutsche Forschungsgemeinschaft

Conference

DIMES '23

Sponsor:

SIGOPS

DIMES '23: 1st Workshop on Disruptive Memory Systems

October 23, 2023

Koblenz, Germany

Acceptance Rates

DIMES '23 Paper Acceptance Rate 8 of 17 submissions, 47%;

Overall Acceptance Rate 8 of 17 submissions, 47%

Upcoming Conference

SOSP '25

Sponsor:
sigops

ACM SIGOPS 31st Symposium on Operating Systems Principles

October 13 - 16, 2025

Seoul , Republic of Korea

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

3
Total Citations
View Citations
382
Total Downloads

Downloads (Last 12 months)277
Downloads (Last 6 weeks)24

Reflects downloads up to 12 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Grzelka FKöhler SPolze A(2024)Novel Memory Technologies for Multi-Tenant Exploratory ProgrammingProceedings of the 2nd Workshop on Disruptive Memory Systems10.1145/3698783.3699379(60-63)Online publication date: 3-Nov-2024
https://dl.acm.org/doi/10.1145/3698783.3699379
Friesel BDreimann MSpinczyk O(2024)Performance Models for Task-based Scheduling with Disruptive Memory TechnologiesProceedings of the 2nd Workshop on Disruptive Memory Systems10.1145/3698783.3699376(1-8)Online publication date: 3-Nov-2024
https://dl.acm.org/doi/10.1145/3698783.3699376
Ramezanikebrya HRipeanu M(2024)(re)Assessing PiM Effectiveness for Sequence AlignmentEuro-Par 2024: Parallel Processing10.1007/978-3-031-69766-1_11(152-166)Online publication date: 26-Aug-2024
https://doi.org/10.1007/978-3-031-69766-1_11

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents