Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article
Open access

Performance Evaluation of Intel Optane Memory for Managed Workloads

Published: 22 April 2021 Publication History
  • Get Citation Alerts
  • Abstract

    Intel Optane memory offers non-volatility, byte addressability, and high capacity. It suits managed workloads that prefer large main memory heaps. We investigate Optane as the main memory for managed (Java) workloads, focusing on performance scalability. As the workload (core count) increases, we note Optane’s performance relative to DRAM. A few workloads incur a slight slowdown on Optane memory, which helps conserve limited DRAM capacity. Unfortunately, other workloads scale poorly beyond a few core counts.
    This article investigates scaling bottlenecks for Java workloads on Optane memory, analyzing the application, runtime, and microarchitectural interactions. Poorly scaling workloads allocate objects rapidly and access objects in Optane memory frequently. These characteristics slow down the mutator and substantially slow down garbage collection (GC). At the microarchitecture level, load, store, and instruction miss penalties rise. To regain performance, we partition heaps across DRAM and Optane memory, a hybrid that scales considerably better than Optane alone. We exploit state-of-the-art GC approaches to partition heaps. Unfortunately, existing GC approaches needlessly waste DRAM capacity because they ignore runtime behavior.
    This article also introduces performance impact-guided memory allocation (PIMA) for hybrid memories. PIMA maximizes Optane utilization, allocating in DRAM only if it improves performance. It estimates the performance impact of allocating heaps in either memory type by sampling. We target PIMA at graph analytics workloads, offering a novel performance estimation method and detailed evaluation. PIMA identifies workload phases that benefit from DRAM with high (94.33%) accuracy, incurring only a 2% sampling overhead. PIMA operates stand-alone or combines with prior approaches to offer new performance versus DRAM capacity trade-offs. This work opens up Optane memory to a real-life role as the main memory for Java workloads.

    References

    [1]
    Ameen Akel, Adrian M. Caulfield, Todor I. Mollov, Rajesh K. Gupta, and Steven Swanson. 2011. Onyx: A protoype phase change memory storage array. In Proceedings of the USENIX Conference on Hot Topics in Storage and File Systems (HotStorage).
    [2]
    Shoaib Akram, Kathryn S. McKinley, Jennifer B. Sartor, and Lieven Eeckhout. 2018. Managing hybrid memories by predicting object write intensity. In Proceedings of the Conference Companion of the International Conference on Art, Science, and Engineering of Programming (Programming’18 Companion).
    [3]
    S. Akram, J. B. Sartor, and L. Eeckhout. 2016. DVFS performance prediction for managed multithreaded applications. In Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS).
    [4]
    Shoaib Akram, Jennifer B. Sartor, and Lieven Eeckhout. 2017. DEP+BURST: Online DVFS performance prediction for energy-efficient managed language execution. IEEE Trans. Comput. 66, 4 (April 2017), 601--615.
    [5]
    Shoaib Akram, Jennifer B. Sartor, Kathryn S. McKinley, and Lieven Eeckhout. 2018. Write-rationing garbage collection for hybrid memories. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI).
    [6]
    Shoaib Akram, Jennifer B. Sartor, Kathryn S. McKinley, and Lieven Eeckhout. 2019. Crystal Gazer: Profile-driven write-rationing garbage collection for hybrid memories. SIGMETRICS Perform. Eval. Rev. 47, 1 (Dec. 2019), 21--22.
    [7]
    Shoaib Akram, Jennifer B. Sartor, Kenzo Van Craeynest, Wim Heirman, and Lieven Eeckhout. 2015. Boosting the priority of garbage: Scheduling collection on heterogeneous multicore processors. In Proceedings of the International Conference on Parallel Architectures and Compilation Techniques (PACT).
    [8]
    Luiz Andre Barroso and Urs Hoelzle. 2009. The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines. Morgan & Claypool.
    [9]
    Brian Beeler. 2019. Intel Optane DC Persistent Memory Module (PMM). Retrieved March 11, 2021 from https://www.storagereview.com/news/intel-optane-dc-persistent-memory-module-pmm.
    [10]
    Stephen M. Blackburn, Perry Cheng, and Kathryn S. McKinley. 2004. Myths and realities: The performance impact of garbage collection. In Proceedings of the Joint International Conference on Measurement and Modeling of Computer Systems (SIGMETRICS).
    [11]
    Stephen M. Blackburn, Perry Cheng, and Kathryn S. McKinley. 2004. Oil and water? High performance garbage collection in Java with MMTk. In Proceedings of the International Conference on Software Engineering (ICSE).
    [12]
    Stephen M. Blackburn, Robin Garner, Chris Hoffmann, Asjad M. Khang, Kathryn S. McKinley, Rotem Bentzur, Amer Diwan, et al. 2006. The DaCapo benchmarks: Java benchmarking development and analysis. In Proceedings of the Annual ACM SIGPLAN Conference on Object-Oriented Programming Systems, Languages, and Applications (OOPSLA).
    [13]
    Stephen M. Blackburn, Martin Hirzel, Robin Garner, and Darko Stefanović. 2010. pjbb2005: The pseudojbb Benchmark. Retrieved March 11, 2021 from http://users.cecs.anu.edu.au/ steveb/research/research-infrastructure/pjbb2005.
    [14]
    Stephen M. Blackburn and Kathryn S. McKinley. 2008. Immix: A mark-region garbage collector with space efficiency, fast collection, and mutator performance. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI).
    [15]
    S. Bock, B. R. Childers, R. Melhem, and D. Mossé. 2016. Concurrent migration of multiple pages in software-managed hybrid main memory. In Proceedings of the IEEE 34th International Conference on Computer Design (ICCD).
    [16]
    Chia-Chen Chou, Aamer Jaleel, and Moinuddin K. Qureshi. 2014. CAMEO: A two-level memory organization with capacity of main memory and flexibility of hardware-managed cache. In Proceedings of the 47th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
    [17]
    Chia-Chen Chou, Aamer Jaleel, and Moinuddin K. Qureshi. 2015. BEAR: Techniques for mitigating bandwidth bloat in gigascale DRAM caches. In Proceedings of the Annual International Symposium on Computer Architecture (ISCA).
    [18]
    Joel Coburn, Adrian M. Caulfield, Ameen Akel, Laura M. Grupp, Rajesh K. Gupta, Ranjit Jhala, and Steven Swanson. 2011. NV-Heaps: Making persistent objects fast and safe with next-generation, non-volatile memories. In Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS).
    [19]
    Jeremy Condit, Edmund B. Nightingale, Christopher Frost, Engin Ipek, Benjamin Lee, Doug Burger, and Derrick Coetzee. 2009. Better I/O through byte-addressable, persistent memory. In Proceedings of the ACM SIGOPS 22nd Symposium on Operating Systems Principles (SOSP).
    [20]
    Kristof Du Bois, Jennifer B. Sartor, Stijn Eyerman, and Lieven Eeckhout. 2013. Bottle graphs: Visualizing scalability bottlenecks in multi-threaded applications. In Proceedings of the ACM SIGPLAN International Conference on Object-Oriented Programming Systems Languages and Applications (OOPSLA).
    [21]
    Subramanya R. Dulloor, Sanjay Kumar, Anil Keshavamurthy, Philip Lantz, Dheeraj Reddy, Rajesh Sankaran, and Jeff Jackson. 2014. System software for persistent memory. In Proceedings of the 9th European Conference on Computer Systems (EuroSys).
    [22]
    Subramanya R. Dulloor, Amitabha Roy, Zheguang Zhao, Narayanan Sundaram, Nadathur Satish, Rajesh Sankaran, Jeff Jackson, and Karsten Schwan. 2016. Data tiering in heterogeneous memory systems. In Proceedings of the 11th European Conference on Computer Systems (EuroSys).
    [23]
    S. Eyerman and L. Eeckhout. 2008. System-level performance metrics for multiprogram workloads. IEEE Micro 28, 3 (2008), 42--53.
    [24]
    Eclipse Foundation. 2020. Desktop IDEs. Retrieved March 11, 2021 from https://www.eclipse.org/ide/.
    [25]
    Lokesh Gidra, Gaël Thomas, Julien Sopena, and Marc Shapiro. 2013. A study of the scalability of stop-the-world garbage collectors on multicores. In Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS).
    [26]
    James Gosling, Bill Joy, Guy L. Steele, Gilad Bracha, and Alex Buckley. 2014. The Java Language Specification, Java SE 8 Edition. Addison-Wesley Professional.
    [27]
    The HSQL Development Group. 2020. HyperSQL. Retrieved March 11, 2021 from http://hsqldb.org.
    [28]
    Jungwoo Ha, Magnus Gustafsson, Stephen M. Blackburn, and Kathryn S. McKinley. 2008. Microarchitectural characterization of production JVMs and Java workloads. In Proceedings of the IBM CAS Workshop.
    [29]
    Jim Handy. 2017. Examining 3D XPoint’s 1,000 Times Endurance Benefit. Retrieved March 11, 2021 from https://thememoryguy.com/examining-3d-xpoints-1000-times-endurance-benefit/.
    [30]
    Jim Handy. 2018. Emerging Memories Today: The Technologies: MRAM, ReRAM, PCM/XPoint, FRAM, Etc. Retrieved March 11, 2021 from https://thememoryguy.com/emerging-memories-today-the-technologies-mram-reram-pcm-xpoint-fram-etc/.
    [31]
    Jim Handy. 2018. Emerging Memories Today: Why Emerging Memories Are Necessary. Retrieved March 11, 2021 from https://thememoryguy.com/.
    [32]
    Swapnil Haria, Mark D. Hill, and Michael M. Swift. 2020. MOD: Minimally ordered durable datastructures for persistent memory. In Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS).
    [33]
    Joel Hruska. 2018. Why RAM Prices Are Through the Roof. Retrieved March 11, 2021 from https://www.extremetech.com/computing/263031-ram-prices-roof-stuck-way.
    [34]
    Xianglong Huang, Stephen M. Blackburn, Kathryn S. McKinley, J. Eliot B. Moss, Zhenlin Wang, and Perry Cheng. 2004. The garbage collection advantage: Improving mutator locality. In Proceedings of the ACM SIGPLAN International Conference on Object-Oriented Programming Systems Languages and Applications (OOPSLA).
    [35]
    Djordje Jevdjic, Stavros Volos, and Babak Falsafi. 2013. Die-stacked DRAM caches for servers: Hit ratio, latency, or bandwidth? Have it all with footprint cache. In Proceedings of the Annual International Symposium on Computer Architecture (ISCA).
    [36]
    Patrick Kennedy. 2018. Why Server ASPs Are Rising the 2017-2018 DDR4 DRAM Shortage. Retrieved March 11, 2021 from https://www.servethehome.com/why-server-asps-are-rising-the-2017-2018-ddr4-dram-shortage/.
    [37]
    Dmitry Knyaginin, Vassilis Papaefstathiou, and Per Stenström. 2018. ProFess: A probabilistic hybrid main memory management framework for high performance and fairness. In Proceedings of the IEEE International Symposium on High Performance Computer Architecture (HPCA).
    [38]
    Iacovos G. Kolokasis, Anastasios Papagiannis, Polyvios Pratikakis, Angelos Bilas, and Foivos Zakkak. 2020. Say goodbye to off-heap caches! On-heap caches using memory-mapped I/O. In Proceedings of the 12th USENIX Workshop on Hot Topics in Storage and File Systems (HotStorage).
    [39]
    Aapo Kyrola, Guy Blelloch, and Carlos Guestrin. 2012. GraphChi: Large-scale graph computation on just a PC. In Proceedings of the USENIX Conference on Operating Systems Design and Implementation (OSDI).
    [40]
    Yucheng Low, Danny Bickson, Joseph Gonzalez, Carlos Guestrin, Aapo Kyrola, and Joseph M. Hellerstein. 2012. Distributed GraphLab: A framework for machine learning and data mining in the cloud. Proc. VLDB Endow. 5, 8 (April 2012), 716--727.
    [41]
    T. Mason, T. D. Doudali, M. Seltzer, and A. Gavrilovska. 2020. Unexpected performance of Intel® Optane™ DC persistent memory. IEEE Comput. Archit. Lett. 19, 1 (2020), 55--58.
    [42]
    S. Mittal and J. S. Vetter. 2016. A survey of techniques for architecting DRAM caches. IEEE Trans. Parallel Distrib. Syst. 27, 6 (2016), 1852--1863.
    [43]
    Onur Mutlu and Lavanya Subramanian. 2014. Research problems and opportunities in memory systems. Supercomput. Front. Innov. 1, 3 (Oct. 2014), 19--55.
    [44]
    Priya Nagpurkar, Harold W. Cain, Mauricio Serrano, Jong-Deok Choi, and Chandra Krintz. 2007. Call-chain software instruction prefetching in J2EE server applications. In Proceedings of the 16th International Conference on Parallel Architecture and Compilation Techniques (PACT). IEEE, Los Alamitos, CA, 140--149.
    [45]
    Numonym. 2008. Phase Change Memory. Retrieved March 11, 2021 from http://www.pdl.cmu.edu/SDI/2009/slides/Numonyx.pdf.
    [46]
    Onkar Patil, Latchesar Ionkov, Jason Lee, Frank Mueller, and Michael Lang. 2019. Performance characterization of a DRAM-NVM hybrid memory architecture for HPC applications using Intel Optane DC persistent memory modules. In Proceedings of the International Symposium on Memory Systems (MemSys).
    [47]
    Luiz E. Ramos, Eugene Gorbatov, and Ricardo Bianchini. 2011. Page placement in hybrid memory systems. In Proceedings of the International Conference on Supercomputing.
    [48]
    H. Servat, A. J. Peña, G. Llort, E. Mercadal, H. Hoppe, and J. Labarta. 2017. Automating the application data placement in hybrid memory systems. In Proceedings of the IEEE International Conference on Cluster Computing (CLUSTER).
    [49]
    Rifat Shahriyar, Stephen M. Blackburn, Xi Yang, and Kathryn S. McKinley. 2013. Taking off the gloves with reference counting Immix. In Proceedings of the ACM International Conference on Object-Oriented Programming Systems Languages, and Applications (OOPSLA).
    [50]
    Thomas Shull, Jian Huang, and Josep Torrellas. 2019. AutoPersist: An easy-to-use Java NVM framework based on reachability. In Proceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI). ACM, New York, NY, 316--332.
    [51]
    David Ungar. 1984. Generation scavenging: A non-disruptive high performance storage reclamation algorithm. In Proceedings of the 1st ACM SIGSOFT/SIGPLAN Software Engineering Symposium on Practical Software Development Environments (SDE).
    [52]
    Kenzo Van Craeynest, Shoaib Akram, Wim Heirman, Aamer Jaleel, and Lieven Eeckhout. 2013. Fairness-aware scheduling on single-ISA heterogeneous multi-cores. In Proceedings of the International Conference on Parallel Architectures and Compilation Techniques (PACT).
    [53]
    Kenzo Van Craeynest, Aamer Jaleel, Lieven Eeckhout, Paolo Narvaez, and Joel Emer. 2012. Scheduling heterogeneous multi-cores through performance impact estimation (PIE). In Proceedings of the International Symposium on Computer Architecture (ISCA).
    [54]
    Rik van Riel. 2014. Automatic NUMMA Balancing. Retrieved March 11, 2021 from https://www.redhat.com/files/summit/2014/summit2014_riel_chegu_w_0340_automatic_numa_balancing.pdf.
    [55]
    Evangelos Vasilakis, Vassilis Papaefstathiou, Pedro Trancoso, and Ioannis Sourdis. 2019. Decoupled fused cache: Fusing a decoupled LLC with a DRAM cache. ACM Trans. Archit. Code Optim. 15, 4 (2019), Article 65, 23 pages.
    [56]
    Evangelos Vasilakis, Vassilis Papaefstathiou, Pedro Trancoso, and Ioannis Sourdis. 2020. Hybrid2: Combining caching and migration in hybrid memory systems. In Proceedings of the IEEE International Symposium on High Performance Computer Architecture (HPCA).
    [57]
    Haris Volos, Andres Jaan Tack, and Michael M. Swift. 2011. Mnemosyne: Lightweight persistent memory. In Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS).
    [58]
    Chenxi Wang, Huimin Cui, Ting Cao, John Zigman, Haris Volos, Onur Mutlu, Fang Lv, Xiaobing Feng, and Guoqing Harry Xu. 2019. Panthera: Holistic memory management for big data processing over hybrid memories. In Proceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI).
    [59]
    Matthew Wilcox. 2014. Add Support for NV-DIMMs to Ext4. Retrieved March 11, 2021 from https://lwn.net/Articles/613384/.
    [60]
    Mingyu Wu, Haibo Chen, Hao Zhu, Binyu Zang, and Haibing Guan. 2020. GCPersist: An efficient GC-assisted lazy persistency framework for resilient Java applications on NVM. In Proceedings of the 16th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments (VEE). 1--14.
    [61]
    Mingyu Wu, Ziming Zhao, Haoyu Li, Heting Li, Haibo Chen, Binyu Zang, and Haibing Guan. 2018. Espresso: Brewing Java for more non-volatility with non-volatile memory. In Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS).
    [62]
    Jian Xu and Steven Swanson. 2016. NOVA: A log-structured file system for hybrid volatile/non-volatile main memories. In Proceedings of the 14th USENIX Conference on File and Storage Technologies (FAST).
    [63]
    Jian Xu, Lu Zhang, Amirsaman Memaripour, Akshatha Gangadharaiah, Amit Borase, Tamires Brito Da Silva, Steven Swanson, and Andy Rudoff. 2017. NOVA-Fortis: A fault-tolerant non-volatile main memory file system. In Proceedings of the 26th Symposium on Operating Systems Principles (SOSP).
    [64]
    Jian Yang, Juno Kim, Morteza Hoseinzadeh, Joseph Izraelevitz, and Steven Swanson. 2020. An empirical guide to the behavior and use of scalable persistent memory. In Proceedings of the 14th USENIX Conference on File and Storage Technologies (FAST).
    [65]
    Xi Yang, Stephen M. Blackburn, Daniel Frampton, and Antony L. Hosking. 2012. Barriers reconsidered, friendlier still! In Proceedings of the ACM SIGPLAN International Symposium on Memory Management (ISMM).
    [66]
    Xi Yang, Stephen M. Blackburn, Daniel Frampton, Jennifer B. Sartor, and Kathryn S. McKinley. 2011. Why nothing matters: The impact of zeroing. In Proceedings of the ACM International Conference on Object-Oriented Programming Systems Languages, and Applications (OOPSLA).
    [67]
    Jung Yoon, Ranjana Godse, and Andrew Walls. 2018. 3D NAND technology scaling helps accelerate AI growth. In Proceedings of the Flash Memory Summit (FSM).
    [68]
    Vinson Young, Chia-Chen Chou, Aamer Jaleel, and Moinuddin K. Qureshi. 2018. ACCORD: Enabling associativity for gigascale DRAM caches by coordinating way-install and way-prediction. In Proceedings of the 45th ACM/IEEE Annual International Symposium on Computer Architecture (ISCA).
    [69]
    S. Yu and P. Chen. 2016. Emerging memory technologies: Recent trends and prospects. IEEE Solid-State Circuits Magazine 8, 2 (2016), 43--56.
    [70]
    Matei Zaharia, Mosharaf Chowdhury, Michael J. Franklin, Scott Shenker, and Ion Stoica. 2010. Spark: Cluster computing with working sets. In Proceedings of the 2nd USENIX Conference on Hot Topics in Cloud Computing (HotCloud).
    [71]
    Yiying Zhang, Jian Yang, Amirsaman Memaripour, and Steven Swanson. 2015. Mojim: A reliable and highly-available non-volatile memory system. In Proceedings of the 20th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS).

    Cited By

    View all
    • (2023)Analyzing and Improving the Scalability of In-Memory Indices for Managed Search EnginesProceedings of the 2023 ACM SIGPLAN International Symposium on Memory Management10.1145/3591195.3595272(15-29)Online publication date: 6-Jun-2023
    • (2023)On the Implications of Heterogeneous Memory Tiering on Spark In-Memory Analytics2023 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)10.1109/IPDPSW59300.2023.00157(945-952)Online publication date: May-2023
    • (2022)Lock-Free High-performance Hashing for Persistent Memory via PM-aware Holistic OptimizationACM Transactions on Architecture and Code Optimization10.1145/356165120:1(1-26)Online publication date: 17-Nov-2022
    • Show More Cited By

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Transactions on Architecture and Code Optimization
    ACM Transactions on Architecture and Code Optimization  Volume 18, Issue 3
    September 2021
    370 pages
    ISSN:1544-3566
    EISSN:1544-3973
    DOI:10.1145/3460978
    Issue’s Table of Contents
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 22 April 2021
    Accepted: 01 February 2021
    Revised: 01 February 2021
    Received: 01 December 2020
    Published in TACO Volume 18, Issue 3

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Intel Optane memory
    2. Java
    3. analytics
    4. estimation
    5. performance
    6. scalability

    Qualifiers

    • Research-article
    • Research
    • Refereed

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)403
    • Downloads (Last 6 weeks)37

    Other Metrics

    Citations

    Cited By

    View all
    • (2023)Analyzing and Improving the Scalability of In-Memory Indices for Managed Search EnginesProceedings of the 2023 ACM SIGPLAN International Symposium on Memory Management10.1145/3591195.3595272(15-29)Online publication date: 6-Jun-2023
    • (2023)On the Implications of Heterogeneous Memory Tiering on Spark In-Memory Analytics2023 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)10.1109/IPDPSW59300.2023.00157(945-952)Online publication date: May-2023
    • (2022)Lock-Free High-performance Hashing for Persistent Memory via PM-aware Holistic OptimizationACM Transactions on Architecture and Code Optimization10.1145/356165120:1(1-26)Online publication date: 17-Nov-2022
    • (2022)Online Application Guidance for Heterogeneous Memory SystemsACM Transactions on Architecture and Code Optimization10.1145/353385519:3(1-27)Online publication date: 6-Jul-2022
    • (2022)Challenges and future directions for energy, latency, and lifetime improvements in NVMsDistributed and Parallel Databases10.1007/s10619-022-07421-x41:3(163-189)Online publication date: 21-Sep-2022

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format.

    HTML Format

    Get Access

    Login options

    Full Access

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media