research-article

A Case for Granularity Aware Page Migration

Authors:

Arkaprava BasuAuthors Info & Claims

ICS '18: Proceedings of the 2018 International Conference on Supercomputing

Pages 352 - 362

https://doi.org/10.1145/3205289.3208064

Published: 12 June 2018 Publication History

Abstract

Memory is becoming increasingly heterogeneous with the emergence of disparate memory technologies ranging from non-volatile memories like PCM, STT-RAM, and memristors to 3D-stacked memories like HBM. In such systems, data is of ten migrated across memory regions backed by different technologies for better overall performance. An effective migration mechanism is a prerequisite in such systems.

Prior works on OS-directed page migration have focused on what data to migrate and/or on when to migrate. In this work, we demonstrate the need to investigate another dimension -- how much to migrate. Specifically, we show that the amount of data migrated in a single migration operation (called "migration granularity") is vital to the overall performance. Through analysis on real hardware, we further show that different applications benefit from different migration granularities, owing to their distinct memory access characteristics. Since this preferred migration granularity may not be known a priori, we propose a novel scheme to infer this for any given application at runtime. When implemented in the Linux OS, running on a current hardware, the performance improved by up to 36% over a baseline with a fixed migration granularity.

References

[1]

Hydrodynamics Challenge Problem, Lawrence Livermore National Laboratory. Technical Report LLNL-TR-490254. 1--17 pages.

[2]

Nadav Amit. 2017. Optimizing the TLB shootdown algorithm with page access tracking. In Proc. USENIX Ann. Conf. 27--39.

Digital Library

[3]

D. H. Bailey, E. Barszcz, J. T. Barton, D. S. Browning, R. L. Carter, L. Dagum, R. A. Fatoohi, P. O. Frederickson, T. A. Lasinski, R. S. Schreiber, H. D. Simon, V. Venkatakrishnan, and S. K. Weeratunga. 1991. The NAS Parallel Benchmarks---Summary and Preliminary Results. In Proceedings of the 1991 ACM/IEEE Conference on Supercomputing (Supercomputing '91). ACM, New York, NY, USA, 158--165.

Digital Library

[4]

Christian Bienia, Sanjeev Kumar, Jaswinder Pal Singh, and Kai Li. 2008. The PARSEC benchmark suite: characterization and architectural implications. In Proceedings of the 17th international conference on Parallel architectures and compilation techniques. ACM, 72--81.

Digital Library

[5]

Christopher Cantalupo, Vishwanath Venkatesan, Jeff R Hammond, K Czurylo, and S Hammond. 2015. User extensible heap manager for heterogeneous memory platforms and mixed memory policies. Architecture document (2015).

[6]

Chiachen Chou, Aamer Jaleel, and Moinuddin K Qureshi. 2014. Cameo: A two-level memory organization with capacity of main memory and flexibility of hardware-managed cache. In Proceedings of the 47th Annual IEEE/ACM International Symposium on Microarchitecture. IEEE Computer Society, 1--12.

Digital Library

[7]

Mohammad Dashti, Alexandra Fedorova, Justin Funston, Fabien Gaud, Renaud Lachaize, Vivien Quema, and Mark Roth. 2013. Traffic Management: A Holistic Approach to Memory Placement on NUMA Systems. In ASPLOS-International Conference on Architectural Support for Programming Languages and Operating Systems.

Digital Library

[8]

P. Drongowski, Lei Yu, F. Swehosky, S. Suthikulpanit, and R. Richter. 2010. Incorporating Instruction-Based Sampling into AMD CodeAnalyst. In 2010 IEEE International Symposium on Performance Analysis of Systems Software (ISPASS). 119--120.

[9]

Fabien Gaud, Baptiste Lepers, Jeremie Decouchant, Justin Funston, Alexandra Fedorova, and Vivien Quéma. 2014. Large pages may be harmful on NUMA systems. In 2014 USENIX Annual Technical Conference (USENIX ATC 14). 231--242.

Digital Library

[10]

Fabien Gaud, Baptiste Lepers, Justin Funston, Mohammad Dashti, Alexandra Fedorova, Vivien Quéma, Renaud Lachaize, and Mark Roth. 2015. Challenges of memory management on modern NUMA systems. Commun. ACM 58, 12 (2015), 59--66.

Digital Library

[11]

Brice Goglin and Nathalie Furmento. 2009. Memory migration on next-touch. In Linux Symposium.

[12]

Joseph Greathouse. 2017. AMD IBS toolkit. (2017). https://github.com/jlgreathouse/AMD_IBS_Toolkit

[13]

Nagendra Gulur, Mahesh Mehendale, R Manikantan, and R Govindarajan. 2014. Bi-Modal DRAM Cache: Improving Hit Rate, Hit Latency and Bandwidth. In 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture. IEEE, 38--50.

Digital Library

[14]

Vishal Gupta, Min Lee, and Karsten Schwan. 2015. HeteroVisor: Exploiting Resource Heterogeneity to Enhance the Elasticity of Cloud Platforms. In Proceedings of the 11th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments (VEE '15). ACM, New York, NY, USA, 79--92.

Digital Library

[15]

Y Huai, M Pakala, F Albert, T Valet, and P Nguyen. 2005. Observation of spin-transfer switching in deep submicron-sized and low-resistance magnetic tunnel junctions. Appl. Phys. Lett. 84, cond-mat/0504486 (2005), 3118--3120.

[16]

Hakbeom Jang, Yongjun Lee, Jongwon Kim, Youngsok Kim, Jangwoo Kim, Jinkyu Jeong, and Jae W Lee. 2016. Efficient footprint caching for Tagless DRAM Caches. In 2016 IEEE International Symposium on High Performance Computer Architecture (HPCA). IEEE, 237--248.

[17]

JEDEC. 2015. JEDEC 235A: High Bandwidth Memory (HBM) DRAM. (2015).

[18]

Djordje Jevdjic, Gabriel H Loh, Cansu Kaynak, and Babak Falsafi. 2014. Unison cache: A scalable and effective die-stacked DRAM cache. In 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture. IEEE, 25--37.

Digital Library

[19]

Djordje Jevdjic, Stavros Volos, and Babak Falsafi. 2013. Die-Stacked DRAM Caches for Servers: Hit Ratio, Latency, or Bandwidth? Have It All with Footprint Cache. In Proceedings of the 40th Annual International Symposium on Computer Architecture. ACM.

Digital Library

[20]

Xiaowei Jiang, Niti Madan, Li Zhao, Mike Upton, Ravishankar Iyer, Srihari Makineni, Donald Newell, Yan Solihin, and Rajeev Balasubramonian. 2010. CHOP: Adaptive filter-based DRAM caching for CMP server platforms. In HPCA-16 2010 The Sixteenth International Symposium on High-Performance Computer Architecture. IEEE, 1--12.

[21]

Lizy Kurian John. 1996. VaWiRAM: a variable width random access memory module. In VLSI Design, 1996. Proceedings., Ninth International Conference on. IEEE, 219--224.

Digital Library

[22]

Sudarsun Kannan, Ada Gavrilovska, Vishal Gupta, and Karsten Schwan. 2017. HeteroOS: OS Design for Heterogeneous Memory Management in Datacenter. In ISCA.

Digital Library

[23]

Sandia National Laboratories. 2017. Improving Performance via Mini-applications. (Aug. 2017). https://mantevo.org

[24]

J Laudon and D Lenoski. 1997. The SGI Origin: A ccnuma Highly Scalable Server. In Computer Architecture, 1997. Conference Proceedings. The 24th Annual International Symposium on. IEEE, 241--251.

Digital Library

[25]

Yongjun Lee, Jongwon Kim, Hakbeom Jang, Hyunggyun Yang, Jangwoo Kim, Jinkyu Jeong, and Jae W Lee. 2015. A fully associative, tagless DRAM cache. In 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA). IEEE, 211--222.

Digital Library

[26]

Baptiste Lepers, Vivien Quéma, and Alexandra Fedorova. 2015. Thread and memory placement on NUMA systems: asymmetry matters. In 2015 USENIX Annual Technical Conference (USENIX ATC 15). 277--289.

Digital Library

[27]

Zhongqi Li, Ruijin Zhou, and Tao Li. 2013. Exploring high-performance and energy proportional interface for phase change memory systems. In High Performance Computer Architecture (HPCA2013), 2013 IEEE 19th International Symposium on. IEEE, 210--221.

Digital Library

[28]

Felix Xiaozhu Lin and Xu Liu. 2016. Memif: Towards programming heterogeneous memory asynchronously. In Proceedings of the Twenty-First International Conference on Architectural Support for Programming Languages and Operating Systems. ACM, 369--383.

Digital Library

[29]

Linux Mailing List. 2015. Batch TLB flushing when unmapping pages for migration. (2015). https://lkml.org/lkml/2015/4/15/184

[30]

Linux Mailing List. 2015. TLB flush multiple pages per IPI. (2015). https://lkml.org/lkml/2015/7/6/438

[31]

Gabriel H Loh and Mark D Hill. 2012. Supporting very large dram caches with compound-access scheduling and missmap. IEEE Micro 32, 3 (2012), 70--78.

Digital Library

[32]

Gabriel H Loh, Nuwan Jayasena, K McGrath, M O'Connor, S Reinhardt, and J Chung. 2012. Challenges in heterogeneous die-stacked and off-chip memory systems. In In Proc. of 3rd Workshop on SoCs, Heterogeneity, and Workloads (SHAW).

[33]

Piotr R Luszczek, David H Bailey, Jack J Dongarra, Jeremy Kepner, Robert F Lucas, Rolf Rabenseifner, and Daisuke Takahashi. 2006. The HPC Challenge (HPCC) benchmark suite. In Proceedings of the 2006 ACM/IEEE conference on Supercomputing. Citeseer, 213.

Digital Library

[34]

Joe Macri. 2015. AMD's next generation GPU and high bandwidth memory architecture: FURY. In Hot Chips 27 Symposium (HCS), 2015 IEEE. IEEE, 1--26.

[35]

Mitesh R Meswani, Sergey Blagodurov, David Roberts, John Slice, Mike Ignatowski, and Gabriel H Loh. 2015. Heterogeneous memory architectures: A hw/sw approach for mixing die-stacked and off-package memories. In 2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA). IEEE, 126--136.

[36]

Mitesh R Meswani, Gabriel H Loh, Sergey Blagodurov, David Roberts, John Slice, and Mike Ignatowski. 2014. Toward efficient programmer-managed two-level memory hierarchies in exascale computers. In Hardware-Software Co-Design for High Performance Computing (Co-HPC), 2014. IEEE, 9--16.

Digital Library

[37]

Justin Meza, Jichuan Chang, HanBin Yoon, Onur Mutlu, and Parthasarathy Ranganathan. 2012. Enabling Efficient and Scalable Hybrid Memories Using Fine-Granularity DRAM Cache Management. IEEE Comput. Archit. Lett. 11, 2 (July 2012).

Digital Library

[38]

Richard C Murphy, Kyle B Wheeler, Brian W Barrett, and James A Ang. 2010. Introducing the graph 500. Cray UserâĂr&Zacute;s Group (CUG) (2010).

[39]

Mark Oskin and Gabriel H Loh. 2015. A Software-managed Approach to Die-stacked DRAM. In 2015 International Conference on Parallel Architecture and Compilation (PACT). IEEE, 188--200.

Digital Library

[40]

J Thomas Pawlowski. 2011. Hybrid memory cube: breakthrough DRAM performance with a fundamentally re-architected DRAM subsystem. In Proceedings of the 23rd Hot Chips Symposium.

[41]

Moinuddin K. Qureshi and Gabe H. Loh. 2012. Fundamental Latency Trade-off in Architecting DRAM Caches: Outperforming Impractical SRAM-Tags with a Simple and Practical Design. In Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture.

Digital Library

[42]

Moinuddin K Qureshi, Vijayalakshmi Srinivasan, and Jude ARivers. 2009. Scalable high performance main memory system using phase-change memory technology. In In International Symposium on Computer Architecture.

Digital Library

[43]

Luiz E. Ramos, Eugene Gorbatov, and Ricardo Bianchini. 2011. Page Placement in Hybrid Memory Systems. In Proceedings of the International Conference on Supercomputing (ICS '11). ACM, New York, NY, USA, 85--95.

Digital Library

[44]

Jee Ho Ryoo, Mitesh R Meswani, Andreas Prodromou, and Lizy K John. 2017. SILC-FM: Subblocked interleaved cache-like flat memory organization. In High Performance Computer Architecture (HPCA), 2017 IEEE International Symposium on. IEEE, 349--360.

[45]

A. Sembrant, D. Black-Schaffer, and E. Hagersten. 2012. Phase behavior in serial and parallel applications. In Workload Characterization (IISWC), 2012 IEEE International Symposium on. 47--58.

Digital Library

[46]

Jaewoong Sim, Alaa R Alameldeen, Zeshan Chishti, Chris Wilkerson, and Hyesoon Kim. 2014. Transparent hardware management of stacked dram as part of memory. In 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture. IEEE, 13--24.

Digital Library

[47]

Jaewoong Sim, Jaekyu Lee, Moinuddin K Qureshi, and Hyesoon Kim. 2012. FLEXclusion: balancing cache capacity and on-chip bandwidth via flexible exclusion. In Computer Architecture (ISCA), 2012 39th Annual International Symposium on. IEEE, 321--332.

Digital Library

[48]

Avinash Sodani. 2015. Knights Landing (KNL): 2nd Generation Intel® Xeon Phi processor. In Hot Chips 27 Symposium (HCS), 2015 IEEE. IEEE, 1--24.

[49]

Bo Su, Joseph L. Greathouse, Junli Gu, Michael Boyer, Li Shen, and Zhiying Wang. 2014. Implementing a Leading Loads Performance Predictor on Commodity Processors. In Proceedings of the 2014 USENIX Conference on USENIX Annual Technical Conference. USENIX Association.

Digital Library

[50]

Said Tehrani, JM Slaughter, E Chen, M Durlam, J Shi, and M DeHerren. 1999. Progress and outlook for MRAM technology. IEEE Transactions on Magnetics 35, 5 (1999), 2814--2819.

[51]

John R Tramm, Andrew R Siegel, Tanzima Islam, and Martin Schulz. XSBench - The Development and Verification of a Performance Abstraction for Monte Carlo Reactor Analysis. In PHYSOR 2014 - The Role of Reactor Physics toward a Sustainable Future.

[52]

Ahsen J. Uppal and Mitesh R. Meswani. 2015. Towards Workload-Aware Page Cache Replacement Policies for Hybrid Memories. In Proceedings of the 2015 International Symposium on Memory Systems (MEMSYS '15). ACM, New York, NY, USA, 206--219.

Digital Library

[53]

Carlos Villavieja, Vasileios Karakostas, Lluis Vilanova, Yoav Etsion, Alex Ramirez, Avi Mendelson, Nacho Navarro, Adrian Cristal, and Osman S Unsal. 2011. Didi: Mitigating the performance impact of tlb shootdowns using a shared tlb directory. In Parallel Architectures and Compilation Techniques (PACT), 2011 International Conference on. IEEE, 340--349.

Digital Library

[54]

H. Yoon, J. Meza, R. Ausavarungnirun, R. A. Harding, and O. Mutlu. 2012. Row buffer locality aware caching policies for hybrid memories. In 2012 IEEE 30th International Conference on Computer Design (ICCD). 337--344.

Digital Library

[55]

Ping Zhou, Bo Zhao, Jun Yang, and Youtao Zhang. 2009. A durable and energy efficient main memory using phase change memory technology. In In International Symposium on Computer Architecture.

Digital Library

Cited By

Lee TMonga SMin CEom YDruschel PKaufmann AMace JFlinn JSeltzer M(2023)MEMTIS: Efficient Memory Tiering with Dynamic Page Classification and Page Size DeterminationProceedings of the 29th Symposium on Operating Systems Principles10.1145/3600006.3613167(17-34)Online publication date: 23-Oct-2023
https://dl.acm.org/doi/10.1145/3600006.3613167
Rai STalawar B(2023)Nonvolatile Memory Technologies: Characteristics, Deployment, and Research ChallengesFrontiers of Quality Electronic Design (QED)10.1007/978-3-031-16344-9_4(137-173)Online publication date: 12-Jan-2023
https://doi.org/10.1007/978-3-031-16344-9_4
Michailidis TSwanson SZhao JSerafini MXu H(2022)PMShifterProceedings of the 13th ACM SIGOPS Asia-Pacific Workshop on Systems10.1145/3546591.3547523(1-8)Online publication date: 23-Aug-2022
https://dl.acm.org/doi/10.1145/3546591.3547523
Show More Cited By

Recommendations

Improving Total Migration Time in Live Virtual Machine Migration
ICCCT '15: Proceedings of the Sixth International Conference on Computer and Communication Technology 2015

Virtualization is the key underlying technology enabling cloud providers to host services for a large number of customers. Live migration is an essential feature of virtualization that allows transfer of virtual machines from one physical server to ...
Memory/Disk Operation Aware Lightweight VM Live Migration
Live virtual machine migration technique allows migrating an entire OS with running applications from one physical host to another, while keeping all services available without interruption. It provides a flexible and powerful way to balance system load, ...
Template-Aware Live Migration of Virtual Machines
SEC '23: Proceedings of the Eighth ACM/IEEE Symposium on Edge Computing

One of the key challenges of edge computing is working with a limited amount of resources available at the edge, especially memory and bandwidth. Virtual Machine (VM) Templating is a technique to start multiple VM instances quickly from a shared pre-...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

ICS '18: Proceedings of the 2018 International Conference on Supercomputing

June 2018

407 pages

ISBN:9781450357838

DOI:10.1145/3205289

Copyright © 2018 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGARCH: ACM Special Interest Group on Computer Architecture

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 12 June 2018

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Qualifiers

Research-article
Research
Refereed limited

Conference

ICS '18

Sponsor:

SIGARCH

ICS '18: 2018 International Conference on Supercomputing

June 12 - 15, 2018

Beijing, China

Acceptance Rates

Overall Acceptance Rate 629 of 2,180 submissions, 29%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

8
Total Citations
View Citations
239
Total Downloads

Downloads (Last 12 months)27
Downloads (Last 6 weeks)0

Reflects downloads up to 30 Aug 2024

Other Metrics

View Author Metrics

Citations

Cited By

Lee TMonga SMin CEom YDruschel PKaufmann AMace JFlinn JSeltzer M(2023)MEMTIS: Efficient Memory Tiering with Dynamic Page Classification and Page Size DeterminationProceedings of the 29th Symposium on Operating Systems Principles10.1145/3600006.3613167(17-34)Online publication date: 23-Oct-2023
https://dl.acm.org/doi/10.1145/3600006.3613167
Rai STalawar B(2023)Nonvolatile Memory Technologies: Characteristics, Deployment, and Research ChallengesFrontiers of Quality Electronic Design (QED)10.1007/978-3-031-16344-9_4(137-173)Online publication date: 12-Jan-2023
https://doi.org/10.1007/978-3-031-16344-9_4
Michailidis TSwanson SZhao JSerafini MXu H(2022)PMShifterProceedings of the 13th ACM SIGOPS Asia-Pacific Workshop on Systems10.1145/3546591.3547523(1-8)Online publication date: 23-Aug-2022
https://dl.acm.org/doi/10.1145/3546591.3547523
Rai STalawar B(2022)Challenges in Design, Data Placement, Migration and Power-Performance Trade-offs in DRAM-NVM-based Hybrid Memory SystemsIETE Technical Review10.1080/02564602.2022.212794540:4(498-520)Online publication date: 13-Oct-2022
https://doi.org/10.1080/02564602.2022.2127945
Adavally SIslam MKavi K(2021)Dynamically Adapting Page Migration Policies Based on Applications’ Memory Access BehaviorsACM Journal on Emerging Technologies in Computing Systems10.1145/344475017:2(1-24)Online publication date: 24-Mar-2021
https://dl.acm.org/doi/10.1145/3444750
Yan ZLustig DNellans DBhattacharjee ABahar IHerlihy MWitchel ELebeck A(2019)Nimble Page Management for Tiered Memory SystemsProceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems10.1145/3297858.3304024(331-345)Online publication date: 4-Apr-2019
https://dl.acm.org/doi/10.1145/3297858.3304024
Vasilakis EPapaefstathiou VTrancoso PSourdis I(2019)LLC-Guided Data Migration in Hybrid Memory Systems2019 IEEE International Parallel and Distributed Processing Symposium (IPDPS)10.1109/IPDPS.2019.00101(932-942)Online publication date: May-2019
https://doi.org/10.1109/IPDPS.2019.00101
Chi YYue JLiao XLIU HJin H(undefined)A Hybrid Memory Architecture Supporting Fine-Grained Data MigrationSSRN Electronic Journal10.2139/ssrn.4194313
https://doi.org/10.2139/ssrn.4194313

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents