Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3205289.3208064acmconferencesArticle/Chapter ViewAbstractPublication PagesicsConference Proceedingsconference-collections
research-article

A Case for Granularity Aware Page Migration

Published: 12 June 2018 Publication History

Abstract

Memory is becoming increasingly heterogeneous with the emergence of disparate memory technologies ranging from non-volatile memories like PCM, STT-RAM, and memristors to 3D-stacked memories like HBM. In such systems, data is of ten migrated across memory regions backed by different technologies for better overall performance. An effective migration mechanism is a prerequisite in such systems.
Prior works on OS-directed page migration have focused on what data to migrate and/or on when to migrate. In this work, we demonstrate the need to investigate another dimension -- how much to migrate. Specifically, we show that the amount of data migrated in a single migration operation (called "migration granularity") is vital to the overall performance. Through analysis on real hardware, we further show that different applications benefit from different migration granularities, owing to their distinct memory access characteristics. Since this preferred migration granularity may not be known a priori, we propose a novel scheme to infer this for any given application at runtime. When implemented in the Linux OS, running on a current hardware, the performance improved by up to 36% over a baseline with a fixed migration granularity.

References

[1]
Hydrodynamics Challenge Problem, Lawrence Livermore National Laboratory. Technical Report LLNL-TR-490254. 1--17 pages.
[2]
Nadav Amit. 2017. Optimizing the TLB shootdown algorithm with page access tracking. In Proc. USENIX Ann. Conf. 27--39.
[3]
D. H. Bailey, E. Barszcz, J. T. Barton, D. S. Browning, R. L. Carter, L. Dagum, R. A. Fatoohi, P. O. Frederickson, T. A. Lasinski, R. S. Schreiber, H. D. Simon, V. Venkatakrishnan, and S. K. Weeratunga. 1991. The NAS Parallel Benchmarks---Summary and Preliminary Results. In Proceedings of the 1991 ACM/IEEE Conference on Supercomputing (Supercomputing '91). ACM, New York, NY, USA, 158--165.
[4]
Christian Bienia, Sanjeev Kumar, Jaswinder Pal Singh, and Kai Li. 2008. The PARSEC benchmark suite: characterization and architectural implications. In Proceedings of the 17th international conference on Parallel architectures and compilation techniques. ACM, 72--81.
[5]
Christopher Cantalupo, Vishwanath Venkatesan, Jeff R Hammond, K Czurylo, and S Hammond. 2015. User extensible heap manager for heterogeneous memory platforms and mixed memory policies. Architecture document (2015).
[6]
Chiachen Chou, Aamer Jaleel, and Moinuddin K Qureshi. 2014. Cameo: A two-level memory organization with capacity of main memory and flexibility of hardware-managed cache. In Proceedings of the 47th Annual IEEE/ACM International Symposium on Microarchitecture. IEEE Computer Society, 1--12.
[7]
Mohammad Dashti, Alexandra Fedorova, Justin Funston, Fabien Gaud, Renaud Lachaize, Vivien Quema, and Mark Roth. 2013. Traffic Management: A Holistic Approach to Memory Placement on NUMA Systems. In ASPLOS-International Conference on Architectural Support for Programming Languages and Operating Systems.
[8]
P. Drongowski, Lei Yu, F. Swehosky, S. Suthikulpanit, and R. Richter. 2010. Incorporating Instruction-Based Sampling into AMD CodeAnalyst. In 2010 IEEE International Symposium on Performance Analysis of Systems Software (ISPASS). 119--120.
[9]
Fabien Gaud, Baptiste Lepers, Jeremie Decouchant, Justin Funston, Alexandra Fedorova, and Vivien Quéma. 2014. Large pages may be harmful on NUMA systems. In 2014 USENIX Annual Technical Conference (USENIX ATC 14). 231--242.
[10]
Fabien Gaud, Baptiste Lepers, Justin Funston, Mohammad Dashti, Alexandra Fedorova, Vivien Quéma, Renaud Lachaize, and Mark Roth. 2015. Challenges of memory management on modern NUMA systems. Commun. ACM 58, 12 (2015), 59--66.
[11]
Brice Goglin and Nathalie Furmento. 2009. Memory migration on next-touch. In Linux Symposium.
[12]
Joseph Greathouse. 2017. AMD IBS toolkit. (2017). https://github.com/jlgreathouse/AMD_IBS_Toolkit
[13]
Nagendra Gulur, Mahesh Mehendale, R Manikantan, and R Govindarajan. 2014. Bi-Modal DRAM Cache: Improving Hit Rate, Hit Latency and Bandwidth. In 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture. IEEE, 38--50.
[14]
Vishal Gupta, Min Lee, and Karsten Schwan. 2015. HeteroVisor: Exploiting Resource Heterogeneity to Enhance the Elasticity of Cloud Platforms. In Proceedings of the 11th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments (VEE '15). ACM, New York, NY, USA, 79--92.
[15]
Y Huai, M Pakala, F Albert, T Valet, and P Nguyen. 2005. Observation of spin-transfer switching in deep submicron-sized and low-resistance magnetic tunnel junctions. Appl. Phys. Lett. 84, cond-mat/0504486 (2005), 3118--3120.
[16]
Hakbeom Jang, Yongjun Lee, Jongwon Kim, Youngsok Kim, Jangwoo Kim, Jinkyu Jeong, and Jae W Lee. 2016. Efficient footprint caching for Tagless DRAM Caches. In 2016 IEEE International Symposium on High Performance Computer Architecture (HPCA). IEEE, 237--248.
[17]
JEDEC. 2015. JEDEC 235A: High Bandwidth Memory (HBM) DRAM. (2015).
[18]
Djordje Jevdjic, Gabriel H Loh, Cansu Kaynak, and Babak Falsafi. 2014. Unison cache: A scalable and effective die-stacked DRAM cache. In 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture. IEEE, 25--37.
[19]
Djordje Jevdjic, Stavros Volos, and Babak Falsafi. 2013. Die-Stacked DRAM Caches for Servers: Hit Ratio, Latency, or Bandwidth? Have It All with Footprint Cache. In Proceedings of the 40th Annual International Symposium on Computer Architecture. ACM.
[20]
Xiaowei Jiang, Niti Madan, Li Zhao, Mike Upton, Ravishankar Iyer, Srihari Makineni, Donald Newell, Yan Solihin, and Rajeev Balasubramonian. 2010. CHOP: Adaptive filter-based DRAM caching for CMP server platforms. In HPCA-16 2010 The Sixteenth International Symposium on High-Performance Computer Architecture. IEEE, 1--12.
[21]
Lizy Kurian John. 1996. VaWiRAM: a variable width random access memory module. In VLSI Design, 1996. Proceedings., Ninth International Conference on. IEEE, 219--224.
[22]
Sudarsun Kannan, Ada Gavrilovska, Vishal Gupta, and Karsten Schwan. 2017. HeteroOS: OS Design for Heterogeneous Memory Management in Datacenter. In ISCA.
[23]
Sandia National Laboratories. 2017. Improving Performance via Mini-applications. (Aug. 2017). https://mantevo.org
[24]
J Laudon and D Lenoski. 1997. The SGI Origin: A ccnuma Highly Scalable Server. In Computer Architecture, 1997. Conference Proceedings. The 24th Annual International Symposium on. IEEE, 241--251.
[25]
Yongjun Lee, Jongwon Kim, Hakbeom Jang, Hyunggyun Yang, Jangwoo Kim, Jinkyu Jeong, and Jae W Lee. 2015. A fully associative, tagless DRAM cache. In 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA). IEEE, 211--222.
[26]
Baptiste Lepers, Vivien Quéma, and Alexandra Fedorova. 2015. Thread and memory placement on NUMA systems: asymmetry matters. In 2015 USENIX Annual Technical Conference (USENIX ATC 15). 277--289.
[27]
Zhongqi Li, Ruijin Zhou, and Tao Li. 2013. Exploring high-performance and energy proportional interface for phase change memory systems. In High Performance Computer Architecture (HPCA2013), 2013 IEEE 19th International Symposium on. IEEE, 210--221.
[28]
Felix Xiaozhu Lin and Xu Liu. 2016. Memif: Towards programming heterogeneous memory asynchronously. In Proceedings of the Twenty-First International Conference on Architectural Support for Programming Languages and Operating Systems. ACM, 369--383.
[29]
Linux Mailing List. 2015. Batch TLB flushing when unmapping pages for migration. (2015). https://lkml.org/lkml/2015/4/15/184
[30]
Linux Mailing List. 2015. TLB flush multiple pages per IPI. (2015). https://lkml.org/lkml/2015/7/6/438
[31]
Gabriel H Loh and Mark D Hill. 2012. Supporting very large dram caches with compound-access scheduling and missmap. IEEE Micro 32, 3 (2012), 70--78.
[32]
Gabriel H Loh, Nuwan Jayasena, K McGrath, M O'Connor, S Reinhardt, and J Chung. 2012. Challenges in heterogeneous die-stacked and off-chip memory systems. In In Proc. of 3rd Workshop on SoCs, Heterogeneity, and Workloads (SHAW).
[33]
Piotr R Luszczek, David H Bailey, Jack J Dongarra, Jeremy Kepner, Robert F Lucas, Rolf Rabenseifner, and Daisuke Takahashi. 2006. The HPC Challenge (HPCC) benchmark suite. In Proceedings of the 2006 ACM/IEEE conference on Supercomputing. Citeseer, 213.
[34]
Joe Macri. 2015. AMD's next generation GPU and high bandwidth memory architecture: FURY. In Hot Chips 27 Symposium (HCS), 2015 IEEE. IEEE, 1--26.
[35]
Mitesh R Meswani, Sergey Blagodurov, David Roberts, John Slice, Mike Ignatowski, and Gabriel H Loh. 2015. Heterogeneous memory architectures: A hw/sw approach for mixing die-stacked and off-package memories. In 2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA). IEEE, 126--136.
[36]
Mitesh R Meswani, Gabriel H Loh, Sergey Blagodurov, David Roberts, John Slice, and Mike Ignatowski. 2014. Toward efficient programmer-managed two-level memory hierarchies in exascale computers. In Hardware-Software Co-Design for High Performance Computing (Co-HPC), 2014. IEEE, 9--16.
[37]
Justin Meza, Jichuan Chang, HanBin Yoon, Onur Mutlu, and Parthasarathy Ranganathan. 2012. Enabling Efficient and Scalable Hybrid Memories Using Fine-Granularity DRAM Cache Management. IEEE Comput. Archit. Lett. 11, 2 (July 2012).
[38]
Richard C Murphy, Kyle B Wheeler, Brian W Barrett, and James A Ang. 2010. Introducing the graph 500. Cray UserâĂrŹs Group (CUG) (2010).
[39]
Mark Oskin and Gabriel H Loh. 2015. A Software-managed Approach to Die-stacked DRAM. In 2015 International Conference on Parallel Architecture and Compilation (PACT). IEEE, 188--200.
[40]
J Thomas Pawlowski. 2011. Hybrid memory cube: breakthrough DRAM performance with a fundamentally re-architected DRAM subsystem. In Proceedings of the 23rd Hot Chips Symposium.
[41]
Moinuddin K. Qureshi and Gabe H. Loh. 2012. Fundamental Latency Trade-off in Architecting DRAM Caches: Outperforming Impractical SRAM-Tags with a Simple and Practical Design. In Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture.
[42]
Moinuddin K Qureshi, Vijayalakshmi Srinivasan, and Jude ARivers. 2009. Scalable high performance main memory system using phase-change memory technology. In In International Symposium on Computer Architecture.
[43]
Luiz E. Ramos, Eugene Gorbatov, and Ricardo Bianchini. 2011. Page Placement in Hybrid Memory Systems. In Proceedings of the International Conference on Supercomputing (ICS '11). ACM, New York, NY, USA, 85--95.
[44]
Jee Ho Ryoo, Mitesh R Meswani, Andreas Prodromou, and Lizy K John. 2017. SILC-FM: Subblocked interleaved cache-like flat memory organization. In High Performance Computer Architecture (HPCA), 2017 IEEE International Symposium on. IEEE, 349--360.
[45]
A. Sembrant, D. Black-Schaffer, and E. Hagersten. 2012. Phase behavior in serial and parallel applications. In Workload Characterization (IISWC), 2012 IEEE International Symposium on. 47--58.
[46]
Jaewoong Sim, Alaa R Alameldeen, Zeshan Chishti, Chris Wilkerson, and Hyesoon Kim. 2014. Transparent hardware management of stacked dram as part of memory. In 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture. IEEE, 13--24.
[47]
Jaewoong Sim, Jaekyu Lee, Moinuddin K Qureshi, and Hyesoon Kim. 2012. FLEXclusion: balancing cache capacity and on-chip bandwidth via flexible exclusion. In Computer Architecture (ISCA), 2012 39th Annual International Symposium on. IEEE, 321--332.
[48]
Avinash Sodani. 2015. Knights Landing (KNL): 2nd Generation Intel® Xeon Phi processor. In Hot Chips 27 Symposium (HCS), 2015 IEEE. IEEE, 1--24.
[49]
Bo Su, Joseph L. Greathouse, Junli Gu, Michael Boyer, Li Shen, and Zhiying Wang. 2014. Implementing a Leading Loads Performance Predictor on Commodity Processors. In Proceedings of the 2014 USENIX Conference on USENIX Annual Technical Conference. USENIX Association.
[50]
Said Tehrani, JM Slaughter, E Chen, M Durlam, J Shi, and M DeHerren. 1999. Progress and outlook for MRAM technology. IEEE Transactions on Magnetics 35, 5 (1999), 2814--2819.
[51]
John R Tramm, Andrew R Siegel, Tanzima Islam, and Martin Schulz. XSBench - The Development and Verification of a Performance Abstraction for Monte Carlo Reactor Analysis. In PHYSOR 2014 - The Role of Reactor Physics toward a Sustainable Future.
[52]
Ahsen J. Uppal and Mitesh R. Meswani. 2015. Towards Workload-Aware Page Cache Replacement Policies for Hybrid Memories. In Proceedings of the 2015 International Symposium on Memory Systems (MEMSYS '15). ACM, New York, NY, USA, 206--219.
[53]
Carlos Villavieja, Vasileios Karakostas, Lluis Vilanova, Yoav Etsion, Alex Ramirez, Avi Mendelson, Nacho Navarro, Adrian Cristal, and Osman S Unsal. 2011. Didi: Mitigating the performance impact of tlb shootdowns using a shared tlb directory. In Parallel Architectures and Compilation Techniques (PACT), 2011 International Conference on. IEEE, 340--349.
[54]
H. Yoon, J. Meza, R. Ausavarungnirun, R. A. Harding, and O. Mutlu. 2012. Row buffer locality aware caching policies for hybrid memories. In 2012 IEEE 30th International Conference on Computer Design (ICCD). 337--344.
[55]
Ping Zhou, Bo Zhao, Jun Yang, and Youtao Zhang. 2009. A durable and energy efficient main memory using phase change memory technology. In In International Symposium on Computer Architecture.

Cited By

View all
  • (2023)MEMTIS: Efficient Memory Tiering with Dynamic Page Classification and Page Size DeterminationProceedings of the 29th Symposium on Operating Systems Principles10.1145/3600006.3613167(17-34)Online publication date: 23-Oct-2023
  • (2023)Nonvolatile Memory Technologies: Characteristics, Deployment, and Research ChallengesFrontiers of Quality Electronic Design (QED)10.1007/978-3-031-16344-9_4(137-173)Online publication date: 12-Jan-2023
  • (2022)PMShifterProceedings of the 13th ACM SIGOPS Asia-Pacific Workshop on Systems10.1145/3546591.3547523(1-8)Online publication date: 23-Aug-2022
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
ICS '18: Proceedings of the 2018 International Conference on Supercomputing
June 2018
407 pages
ISBN:9781450357838
DOI:10.1145/3205289
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 12 June 2018

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

ICS '18
Sponsor:

Acceptance Rates

Overall Acceptance Rate 629 of 2,180 submissions, 29%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)27
  • Downloads (Last 6 weeks)0
Reflects downloads up to 30 Aug 2024

Other Metrics

Citations

Cited By

View all
  • (2023)MEMTIS: Efficient Memory Tiering with Dynamic Page Classification and Page Size DeterminationProceedings of the 29th Symposium on Operating Systems Principles10.1145/3600006.3613167(17-34)Online publication date: 23-Oct-2023
  • (2023)Nonvolatile Memory Technologies: Characteristics, Deployment, and Research ChallengesFrontiers of Quality Electronic Design (QED)10.1007/978-3-031-16344-9_4(137-173)Online publication date: 12-Jan-2023
  • (2022)PMShifterProceedings of the 13th ACM SIGOPS Asia-Pacific Workshop on Systems10.1145/3546591.3547523(1-8)Online publication date: 23-Aug-2022
  • (2022)Challenges in Design, Data Placement, Migration and Power-Performance Trade-offs in DRAM-NVM-based Hybrid Memory SystemsIETE Technical Review10.1080/02564602.2022.212794540:4(498-520)Online publication date: 13-Oct-2022
  • (2021)Dynamically Adapting Page Migration Policies Based on Applications’ Memory Access BehaviorsACM Journal on Emerging Technologies in Computing Systems10.1145/344475017:2(1-24)Online publication date: 24-Mar-2021
  • (2019)Nimble Page Management for Tiered Memory SystemsProceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems10.1145/3297858.3304024(331-345)Online publication date: 4-Apr-2019
  • (2019)LLC-Guided Data Migration in Hybrid Memory Systems2019 IEEE International Parallel and Distributed Processing Symposium (IPDPS)10.1109/IPDPS.2019.00101(932-942)Online publication date: May-2019
  • (undefined)A Hybrid Memory Architecture Supporting Fine-Grained Data MigrationSSRN Electronic Journal10.2139/ssrn.4194313

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media