research-article

Page overlays: an enhanced virtual memory framework to enable fine-grained memory management

Authors:

Vivek Seshadri,

Gennady Pekhimenko,

Olatunji Ruwase,

Phillip B. Gibbons,

Michael A. Kozuch,

Trishul ChilimbiAuthors Info & Claims

ACM SIGARCH Computer Architecture News, Volume 43, Issue 3S

Pages 79 - 91

https://doi.org/10.1145/2872887.2750379

Published: 13 June 2015 Publication History

Abstract

Many recent works propose mechanisms demonstrating the potential advantages of managing memory at a fine (e.g., cache line) granularity---e.g., fine-grained deduplication and fine-grained memory protection. Unfortunately, existing virtual memory systems track memory at a larger granularity (e.g., 4 KB pages), inhibiting efficient implementation of such techniques. Simply reducing the page size results in an unacceptable increase in page table overhead and TLB pressure.

We propose a new virtual memory framework that enables efficient implementation of a variety of fine-grained memory management techniques. In our framework, each virtual page can be mapped to a structure called a page overlay, in addition to a regular physical page. An overlay contains a subset of cache lines from the virtual page. Cache lines that are present in the overlay are accessed from there and all other cache lines are accessed from the regular physical page. Our page-overlay framework enables cache-line-granularity memory management without significantly altering the existing virtual memory framework or introducing high overheads.

We show that our framework can enable simple and efficient implementations of seven memory management techniques, each of which has a wide variety of applications. We quantitatively evaluate the potential benefits of two of these techniques: overlay-on-write and sparse-data-structure computation. Our evaluations show that overlay-on-write, when applied to fork, can improve performance by 15% and reduce memory capacity requirements by 53% on average compared to traditional copy-on-write. For sparse data computation, our framework can outperform a state-of-the-art software-based sparse representation on a number of real-world sparse matrices. Our framework is general, powerful, and effective in enabling fine-grained memory management at low cost.

References

[1]

fork(2) - Linux manual page. http://man7.org/linux/man-pages/man2/fork.2.html.

[2]

Memsim. http://safari.ece.cmu.edu/tools.html, 2012.

[3]

C. S. Ananian, K. Asanovic, B. C. Kuszmaul, C. E. Leiserson, and S. Lie. Unbounded Transactional Memory. In HPCA, 2005.

Digital Library

[4]

P. Barham, B. Dragovic, K. Fraser, S. Hand, T. Harris, A. Ho, R. Neugebauer, I. Pratt, and A. Warfield. Xen and the Art of Virtualization. In SOSP, 2003.

Digital Library

[5]

A. Basu, J. Gandhi, J. Chang, M. D. Hill, and M. M. Swift. Efficient Virtual Memory for Big Memory Servers. In ISCA, 2013.

Digital Library

[6]

J. Bent, G. Gibson, G. Grider, B. McClelland, P. Nowoczynski, J. Nunez, M. Polte, and M. Wingate. PLFS: A Checkpoint Filesystem for Parallel Applications. In SC, 2009.

Digital Library

[7]

D. L. Black, R. F. Rashid, D. B. Golub, and C. R. Hill. Translation lookaside buffer consistency: A software approach. In ASPLOS, 1989.

Digital Library

[8]

J. C. Brustoloni. Interoperation of copy avoidance in network and file I/O. In INFOCOM, volume 2, 1999.

[9]

J. Carter, W. Hsieh, L. Stoller, M. Swanson, L. Zhang, E. Brunvand, A. Davis, C.-C. Kuo, R. Kuramkote, M. Parker, L. Schaelicke, and T. Tateyama. Impulse: Building a Smarter Memory Controller. In HPCA, 1999.

Digital Library

[10]

M. Cekleov and M. Dubois. Virtual-Address Caches Part 1: Problems and Solutions in Uniprocessors. IEEE Micro, 17(5), 1997.

Digital Library

[11]

F. Chang and G. A. Gibson. Automatic I/O Hint Generation Through Speculative Execution. In OSDI, 1999.

Digital Library

[12]

D. Cheriton, A. Firoozshahian, A. Solomatnikov, J. P. Stevenson, and Omid A. HICAMP: Architectural Support for Efficient Concurrency-safe Shared Structured Data Access. In ASPLOS, 2012.

Digital Library

[13]

K. Constantinides, O. Mutlu, and T. Austin. Online design bug detection: Rtl analysis, flexible mechanisms, and evaluation. In MICRO, 2008.

Digital Library

[14]

K. Constantinides, O. Mutlu, T. Austin, and V. Bertacco. Software-Based Online Detection of Hardware Defects Mechanisms, Architectural Support, and Evaluation. In MICRO, 2007.

Digital Library

[15]

Intel Corporation. Intel Architecture Instruction Set Extensions Programming Reference, chapter 8. Intel Transactional Synchronization Extensions. Sep 2012.

[16]

Standard Performance Evaluation Corporation. SPEC CPU2006 Benchmark Suite. www.spec.org/cpu2006, 2006.

[17]

T. A. Davis and Y. Hu. The University of Florida Sparse Matrix Collection. TOMS, 38(1), 2011.

Digital Library

[18]

P. J. Denning. Virtual Memory. ACM Computer Survey, 2(3), 1970.

Digital Library

[19]

I. P. Egwutuoha, D. Levy, B. Selic, and S. Chen. A Survey of Fault Tolerance Mechanisms and Checkpoint/Restart Implementations for High Performance Computing Systems. Journal of Supercomputing, 2013.

Digital Library

[20]

S. C. Eisenstat, M. C. Gursky, M. H. Schultz, and A. H. Sherman. Yale Sparse Matrix Package I: The Symmetric Codes. IJNME, 18(8), 1982.

[21]

M. Ekman and P. Stenstrom. A Robust Main-Memory Compression Scheme. In ISCA, 2005.

Digital Library

[22]

J. Fotheringham. Dynamic Storage Allocation in the Atlas Computer, Including an Automatic Use of a Backing Store. Commun. ACM, 1961.

Digital Library

[23]

M. Gorman. Understanding the Linux Virtual Memory Manager, chapter 4, page 57. Prentice Hall, 2004.

[24]

D. Gupta, S. Lee, M. Vrable, S. Savage, A. C. Snoeren, G. Varghese, G. M. Voelker, and A. Vahdat. Difference Engine: Harnessing Memory Redundancy in Virtual Machines. In OSDI, 2008.

Digital Library

[25]

M. Herlihy and J. E. B. Moss. Transactional Memory: Architectural Support for Lock-free Data Structures. In ISCA, 1993.

Digital Library

[26]

Intel. Architecture Guide: Intel Active Management Technology. https://software.intel.com/en-us/articles/architecture-guide-intel-active-management-technology/.

[27]

Intel. Sparse Matrix Storage Formats, Intel Math Kernel Library. https://software.intel.com/en-us/node/471374.

[28]

A. Jaleel, K. B. Theobald, S. C. Steely, Jr., and J. Emer. High performance cache replacement using re-reference interval prediction (rrip). In ISCA, 2010.

Digital Library

[29]

JEDEC. DDR3 SDRAM, JESD79-3F, 2012.

[30]

L. Jiang, Y. Zhang, and J. Yang. Mitigating Write Disturbance in Super-Dense Phase Change Memories. In DSN, 2014.

Digital Library

[31]

T. Kilburn, D. B. G. Edwards, M. J. Lanigan, and F. H. Sumner. One-Level Storage System. IRE Transactions on Electronic Computers, 11(2), 1962.

[32]

S. Kumar, H. Zhao, A. Shriraman, E. Matthews, S. Dwarkadas, and L. Shannon. Amoeba-Cache: Adaptive Blocks for Eliminating Waste in the Memory Hierarchy. In MICRO, 2012.

Digital Library

[33]

H. A. Lagar-Cavilla, J. A. Whitney, A. M. Scannell, P. Patchin, S. M. Rumble, E. de Lara, M. Brudno, and M. Satyanarayanan. SnowFlock: Rapid Virtual Machine Cloning for Cloud Computing. In EuroSys, 2009.

Digital Library

[34]

H. Q. Le, W. J. Starke, J. S. Fields, F. P. O'Connell, D. Q. Nguyen, B. J. Ronchetti, W. M. Sauer, E. M. Schwarz, and M. T. Vaden. Ibm power6 microarchitecture. IBM JRD, 51(6), 2007.

Digital Library

[35]

C. J. Lee, V. Narasiman, E. Ebrahimi, O. Mutlu, and Y. N. Patt. DRAM-aware last-level cache writeback: Reducing write-caused interference in memory systems. Technical Report TR-HPS-2010-2, University of Texas at Austin, 2010.

[36]

V. Nagarajan and R. Gupta. Architectural Support for Shadow Memory in Multiprocessors. In VEE, 2009.

Digital Library

[37]

E. B. Nightingale, P. M. Chen, and J. Flinn. Speculative execution in a distributed file system. In SOSP, 2005.

Digital Library

[38]

G. Pekhimenko, V. Seshadri, Y. Kim, H. Xin, O. Mutlu, P. B. Gibbons, M. A. Kozuch, and T. C. Mowry. Linearly Compressed Pages: A Low-complexity, Low-latency Main Memory Compression Framework. In MICRO, 2013.

Digital Library

[39]

M. Prvulovic, Z. Zhang, and J. Torrellas. Revive: Cost-effective architectural support for rollback recovery in shared-memory multiprocessors. In ISCA, 2002.

Digital Library

[40]

E. D. Reilly. Memory-mapped I/O. In Encyclopedia of Computer Science, page 1152. John Wiley and Sons Ltd., Chichester, UK.

[41]

B. Romanescu, A. R. Lebeck, D. J. Sorin, and A. Bracy. UNified Instruction/Translation/Data (UNITD) Coherence: One Protocol to Rule Them All. In HPCA, 2010.

[42]

R. F. Sauers, C. P. Ruemmler, and P. S. Weygant. HP-UX 11i Tuning and Performance, chapter 8. Memory Bottlenecks. Prentice Hall, 2004.

Digital Library

[43]

S. Savage, M. Burrows, G. Nelson, P. Sobalvarro, and T. Anderson. Eraser: A Dynamic Data Race Detector for Multithreaded Programs. TOCS, 15(4), November 1997.

Digital Library

[44]

V. Seshadri, Y. Kim, C. Fallin, D. Lee, R. Ausavarungnirun, G. Pekhimenko, Y. Luo, O. Mutlu, P. B. Gibbons, M. A. Kozuch, and T. C. Mowry. RowClone: Fast and Energy-efficient in-DRAM Bulk Data Copy and Initialization. In MICRO, 2013.

Digital Library

[45]

V. Seshadri, O. Mutlu, M. A. Kozuch, and T. C. Mowry. The Evicted-Address Filter: A Unified Mechanism to Address Both Cache Pollution and Thrashing. In PACT, 2012.

Digital Library

[46]

T. Sherwood, E. Perelman, G. Hamerly, and B. Calder. Automatically Characterizing Large Scale Program Behavior. In ASPLOS, 2002.

Digital Library

[47]

W. Shi, H.-H. S. Lee, L. Falk, and M. Ghosh. An Integrated Framework for Dependable and Revivable Architectures Using Multicore Processors. In ISCA, 2006.

Digital Library

[48]

A. Silberschatz, P. B. Galvin, and G. Gagne. Operating System Concepts, chapter 11. File-System Implementation. Wiley, 2012.

Digital Library

[49]

G. S. Sohi, S. E. Breach, and T. N. Vijaykumar. Multiscalar processors. In ISCA, 1995.

Digital Library

[50]

D. J. Sorin, M. M. K. Martin, M. D. Hill, and D. A. Wood. Safetynet: Improving the availability of shared memory multiprocessors with global checkpoint/recovery. In ISCA, 2002.

Digital Library

[51]

S. Srinath, O. Mutlu, H. Kim, and Y. N. Patt. Feedback directed prefetching: Improving the performance and bandwidth-efficiency of hardware prefetchers. In HPCA, 2007.

Digital Library

[52]

S. M. Srinivasan, S. Kandula, C. R. Andrews, and Y. Zhou. Flashback: A Lightweight Extension for Rollback and Deterministic Replay for Software Debugging. In USENIX ATC, 2004.

Digital Library

[53]

M. E. Staknis. Sheaved Memory: Architectural Support for State Saving and Restoration in Pages Systems. In ASPLOS, 1989.

Digital Library

[54]

J. G. Steffan, C. B. Colohan, A. Zhai, and T. C. Mowry. A Scalable Approach to Thread-level Speculation. In ISCA, 2000.

Digital Library

[55]

P. J. Teller. Translation-Lookaside Buffer Consistency. IEEE Computer, 23(6), 1990.

Digital Library

[56]

G. Venkataramani, I. Doudalis, D. Solihin, and M. Prvulovic. FlexiTaint: A Programmable Accelerator for Dynamic Taint Propagation. In HPCA, 2008.

[57]

C. Villavieja, V. Karakostas, L. Vilanova, Y. Etsion, A. Ramirez, A. Mendelson, N. Navarro, A. Cristal, and O. S. Unsal. DiDi: Mitigating the Performance Impact of TLB Shootdowns Using a Shared TLB Directory. In PACT, 2011.

Digital Library

[58]

C. A. Waldspurger. Memory Resource Management in VMware ESX Server. OSDI, 2002.

Digital Library

[59]

Y-M. Wang, Y. Huang, K-P. Vo, P-Y. Chung, and C. Kintala. Checkpointing and its applications. In FTCS, 1995.

Digital Library

[60]

B. Wester, P. M. Chen, and J. Flinn. Operating system support for application-specific speculation. In EuroSys, 2011.

Digital Library

[61]

A. Wiggins, S. Winwood, H. Tuch, and G. Heiser. Legba: Fast Hardware Support for Fine-Grained Protection. In Amos Omondi and Stanislav Sedukhin, editors, Advances in Computer Systems Architecture, volume 2823 of Lecture Notes in Computer Science, 2003.

[62]

E. Witchel, J. Cates, and K. Asanović. Mondrian Memory Protection. In ASPLOS, 2002.

Digital Library

[63]

Q. Zhao, D. Bruening, and S. Amarasinghe. Efficient Memory Shadowing for 64-bit Architectures. In ISMM, 2010.

Digital Library

Cited By

Mishra DKanellopoulos KPanwar ASriraman ASeshadri VMutlu OMowry T(2024)Address Scaling: Architectural Support for Fine-Grained Thread-Safe Metadata ManagementIEEE Computer Architecture Letters10.1109/LCA.2024.337376023:1(69-72)Online publication date: Jan-2024
https://doi.org/10.1109/LCA.2024.3373760
Gu YLu YWu CLi JGuo M(2024)CKSM: An Efficient Memory Deduplication Method for Container-based Cloud Computing Systems2024 IEEE International Parallel and Distributed Processing Symposium (IPDPS)10.1109/IPDPS57955.2024.00016(76-88)Online publication date: 27-May-2024
https://doi.org/10.1109/IPDPS57955.2024.00016
Liu GLi KXiao ZWang RSalapura VZahran MChong FTang L(2022)PS-ORAMProceedings of the 49th Annual International Symposium on Computer Architecture10.1145/3470496.3527425(188-203)Online publication date: 18-Jun-2022
https://dl.acm.org/doi/10.1145/3470496.3527425
Show More Cited By

Index Terms

Page overlays: an enhanced virtual memory framework to enable fine-grained memory management
1. Hardware
  1. Hardware validation
  2. Integrated circuits
    1. Semiconductor memory
2. Software and its engineering
  1. Software organization and properties
    1. Contextual software domains
      1. Operating systems
        Memory management
        Virtual memory

Recommendations

Page overlays: an enhanced virtual memory framework to enable fine-grained memory management
ISCA '15: Proceedings of the 42nd Annual International Symposium on Computer Architecture

Many recent works propose mechanisms demonstrating the potential advantages of managing memory at a fine (e.g., cache line) granularity---e.g., fine-grained deduplication and fine-grained memory protection. Unfortunately, existing virtual memory systems ...
Migration based page caching algorithm for a hybrid main memory of DRAM and PRAM
SAC '11: Proceedings of the 2011 ACM Symposium on Applied Computing

As the DRAM based main memory significantly increases the power and cost budget of a computer system, new memory technologies such as Phase-change RAM (PRAM), Ferroelectric RAM (FRAM), and Magnetic RAM (MRAM) have been proposed to replace the DRAM. ...
Page placement in hybrid memory systems
ICS '11: Proceedings of the international conference on Supercomputing

Phase-Change Memory (PCM) technology has received substantial attention recently. Because PCM is byte-addressable and exhibits access times in the nanosecond range, it can be used in main memory designs. In fact, PCM has higher density and lower idle ...

Comments

Information & Contributors

Information

Published In

cover image ACM SIGARCH Computer Architecture News

ACM SIGARCH Computer Architecture News Volume 43, Issue 3S

ISCA'15

June 2015

745 pages

ISSN:0163-5964

DOI:10.1145/2872887

Editor:
Doug DeGroot
acm dot org

Issue’s Table of Contents

ISCA '15: Proceedings of the 42nd Annual International Symposium on Computer Architecture
June 2015
768 pages
ISBN:9781450334020
DOI:10.1145/2749469
General Chair:
Debbie Marr
Intel
,
Program Chair:
David Albonesi
Cornell

Copyright © 2015 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 June 2015

Published in SIGARCH Volume 43, Issue 3S

Check for updates

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

41
Total Citations
View Citations
903
Total Downloads

Downloads (Last 12 months)77
Downloads (Last 6 weeks)0

Reflects downloads up to 11 Aug 2024

Other Metrics

View Author Metrics

Citations

Cited By

Mishra DKanellopoulos KPanwar ASriraman ASeshadri VMutlu OMowry T(2024)Address Scaling: Architectural Support for Fine-Grained Thread-Safe Metadata ManagementIEEE Computer Architecture Letters10.1109/LCA.2024.337376023:1(69-72)Online publication date: Jan-2024
https://doi.org/10.1109/LCA.2024.3373760
Gu YLu YWu CLi JGuo M(2024)CKSM: An Efficient Memory Deduplication Method for Container-based Cloud Computing Systems2024 IEEE International Parallel and Distributed Processing Symposium (IPDPS)10.1109/IPDPS57955.2024.00016(76-88)Online publication date: 27-May-2024
https://doi.org/10.1109/IPDPS57955.2024.00016
Liu GLi KXiao ZWang RSalapura VZahran MChong FTang L(2022)PS-ORAMProceedings of the 49th Annual International Symposium on Computer Architecture10.1145/3470496.3527425(188-203)Online publication date: 18-Jun-2022
https://dl.acm.org/doi/10.1145/3470496.3527425
Wang ZChoo CKozuch MMowry TPekhimenko GSeshadri VSkarlatos DMartínez JDuato JJohn L(2021)NVOverlayProceedings of the 48th Annual International Symposium on Computer Architecture10.1109/ISCA52012.2021.00046(498-511)Online publication date: 14-Jun-2021
https://dl.acm.org/doi/10.1109/ISCA52012.2021.00046
Wang ZKozuch MMowry TSeshadri V(2019)Multiversioned Page Overlays: Enabling Faster Serializable Hardware Transactional Memory2019 28th International Conference on Parallel Architectures and Compilation Techniques (PACT)10.1109/PACT.2019.00038(395-408)Online publication date: Sep-2019
https://doi.org/10.1109/PACT.2019.00038
WANG LWANG QCHEN LHAO X(2016)Fine-Grained Data Management for DRAM/SSD Hybrid Main Memory ArchitectureIEICE Transactions on Information and Systems10.1587/transinf.2016EDL8105E99.D:12(3172-3176)Online publication date: 2016
https://doi.org/10.1587/transinf.2016EDL8105
Jang HLee YKim JKim YKim JJeong JLee J(2016)Efficient footprint caching for Tagless DRAM Caches2016 IEEE International Symposium on High Performance Computer Architecture (HPCA)10.1109/HPCA.2016.7446068(237-248)Online publication date: Mar-2016
https://doi.org/10.1109/HPCA.2016.7446068
Hildenbrand DSchulz MAmit NAamodt TJerger NSwift M(2023)Copy-on-Pin: The Missing Piece for Correct Copy-on-WriteProceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 210.1145/3575693.3575716(176-191)Online publication date: 27-Jan-2023
https://dl.acm.org/doi/10.1145/3575693.3575716
Vijaykumar NOlgun AKanellopoulos KBostanci FHassan HLotfi MGibbons PMutlu O(2022)MetaSys: A Practical Open-source Metadata Management System to Implement and Evaluate Cross-layer OptimizationsACM Transactions on Architecture and Code Optimization10.1145/350525019:2(1-29)Online publication date: 24-Mar-2022
https://dl.acm.org/doi/10.1145/3505250
Wang ZChoo CKozuch MMowry TPekhimenko GSeshadri VSkarlatos D(2021)NVOverlay: Enabling Efficient and Scalable High-Frequency Snapshotting to NVM2021 ACM/IEEE 48th Annual International Symposium on Computer Architecture (ISCA)10.1109/ISCA52012.2021.00046(498-511)Online publication date: Jun-2021
https://doi.org/10.1109/ISCA52012.2021.00046
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Issue’s Table of Contents