Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Page overlays: an enhanced virtual memory framework to enable fine-grained memory management

Published: 13 June 2015 Publication History
  • Get Citation Alerts
  • Abstract

    Many recent works propose mechanisms demonstrating the potential advantages of managing memory at a fine (e.g., cache line) granularity---e.g., fine-grained deduplication and fine-grained memory protection. Unfortunately, existing virtual memory systems track memory at a larger granularity (e.g., 4 KB pages), inhibiting efficient implementation of such techniques. Simply reducing the page size results in an unacceptable increase in page table overhead and TLB pressure.
    We propose a new virtual memory framework that enables efficient implementation of a variety of fine-grained memory management techniques. In our framework, each virtual page can be mapped to a structure called a page overlay, in addition to a regular physical page. An overlay contains a subset of cache lines from the virtual page. Cache lines that are present in the overlay are accessed from there and all other cache lines are accessed from the regular physical page. Our page-overlay framework enables cache-line-granularity memory management without significantly altering the existing virtual memory framework or introducing high overheads.
    We show that our framework can enable simple and efficient implementations of seven memory management techniques, each of which has a wide variety of applications. We quantitatively evaluate the potential benefits of two of these techniques: overlay-on-write and sparse-data-structure computation. Our evaluations show that overlay-on-write, when applied to fork, can improve performance by 15% and reduce memory capacity requirements by 53% on average compared to traditional copy-on-write. For sparse data computation, our framework can outperform a state-of-the-art software-based sparse representation on a number of real-world sparse matrices. Our framework is general, powerful, and effective in enabling fine-grained memory management at low cost.

    References

    [1]
    fork(2) - Linux manual page. http://man7.org/linux/man-pages/man2/fork.2.html.
    [2]
    Memsim. http://safari.ece.cmu.edu/tools.html, 2012.
    [3]
    C. S. Ananian, K. Asanovic, B. C. Kuszmaul, C. E. Leiserson, and S. Lie. Unbounded Transactional Memory. In HPCA, 2005.
    [4]
    P. Barham, B. Dragovic, K. Fraser, S. Hand, T. Harris, A. Ho, R. Neugebauer, I. Pratt, and A. Warfield. Xen and the Art of Virtualization. In SOSP, 2003.
    [5]
    A. Basu, J. Gandhi, J. Chang, M. D. Hill, and M. M. Swift. Efficient Virtual Memory for Big Memory Servers. In ISCA, 2013.
    [6]
    J. Bent, G. Gibson, G. Grider, B. McClelland, P. Nowoczynski, J. Nunez, M. Polte, and M. Wingate. PLFS: A Checkpoint Filesystem for Parallel Applications. In SC, 2009.
    [7]
    D. L. Black, R. F. Rashid, D. B. Golub, and C. R. Hill. Translation lookaside buffer consistency: A software approach. In ASPLOS, 1989.
    [8]
    J. C. Brustoloni. Interoperation of copy avoidance in network and file I/O. In INFOCOM, volume 2, 1999.
    [9]
    J. Carter, W. Hsieh, L. Stoller, M. Swanson, L. Zhang, E. Brunvand, A. Davis, C.-C. Kuo, R. Kuramkote, M. Parker, L. Schaelicke, and T. Tateyama. Impulse: Building a Smarter Memory Controller. In HPCA, 1999.
    [10]
    M. Cekleov and M. Dubois. Virtual-Address Caches Part 1: Problems and Solutions in Uniprocessors. IEEE Micro, 17(5), 1997.
    [11]
    F. Chang and G. A. Gibson. Automatic I/O Hint Generation Through Speculative Execution. In OSDI, 1999.
    [12]
    D. Cheriton, A. Firoozshahian, A. Solomatnikov, J. P. Stevenson, and Omid A. HICAMP: Architectural Support for Efficient Concurrency-safe Shared Structured Data Access. In ASPLOS, 2012.
    [13]
    K. Constantinides, O. Mutlu, and T. Austin. Online design bug detection: Rtl analysis, flexible mechanisms, and evaluation. In MICRO, 2008.
    [14]
    K. Constantinides, O. Mutlu, T. Austin, and V. Bertacco. Software-Based Online Detection of Hardware Defects Mechanisms, Architectural Support, and Evaluation. In MICRO, 2007.
    [15]
    Intel Corporation. Intel Architecture Instruction Set Extensions Programming Reference, chapter 8. Intel Transactional Synchronization Extensions. Sep 2012.
    [16]
    Standard Performance Evaluation Corporation. SPEC CPU2006 Benchmark Suite. www.spec.org/cpu2006, 2006.
    [17]
    T. A. Davis and Y. Hu. The University of Florida Sparse Matrix Collection. TOMS, 38(1), 2011.
    [18]
    P. J. Denning. Virtual Memory. ACM Computer Survey, 2(3), 1970.
    [19]
    I. P. Egwutuoha, D. Levy, B. Selic, and S. Chen. A Survey of Fault Tolerance Mechanisms and Checkpoint/Restart Implementations for High Performance Computing Systems. Journal of Supercomputing, 2013.
    [20]
    S. C. Eisenstat, M. C. Gursky, M. H. Schultz, and A. H. Sherman. Yale Sparse Matrix Package I: The Symmetric Codes. IJNME, 18(8), 1982.
    [21]
    M. Ekman and P. Stenstrom. A Robust Main-Memory Compression Scheme. In ISCA, 2005.
    [22]
    J. Fotheringham. Dynamic Storage Allocation in the Atlas Computer, Including an Automatic Use of a Backing Store. Commun. ACM, 1961.
    [23]
    M. Gorman. Understanding the Linux Virtual Memory Manager, chapter 4, page 57. Prentice Hall, 2004.
    [24]
    D. Gupta, S. Lee, M. Vrable, S. Savage, A. C. Snoeren, G. Varghese, G. M. Voelker, and A. Vahdat. Difference Engine: Harnessing Memory Redundancy in Virtual Machines. In OSDI, 2008.
    [25]
    M. Herlihy and J. E. B. Moss. Transactional Memory: Architectural Support for Lock-free Data Structures. In ISCA, 1993.
    [26]
    Intel. Architecture Guide: Intel Active Management Technology. https://software.intel.com/en-us/articles/architecture-guide-intel-active-management-technology/.
    [27]
    Intel. Sparse Matrix Storage Formats, Intel Math Kernel Library. https://software.intel.com/en-us/node/471374.
    [28]
    A. Jaleel, K. B. Theobald, S. C. Steely, Jr., and J. Emer. High performance cache replacement using re-reference interval prediction (rrip). In ISCA, 2010.
    [29]
    JEDEC. DDR3 SDRAM, JESD79-3F, 2012.
    [30]
    L. Jiang, Y. Zhang, and J. Yang. Mitigating Write Disturbance in Super-Dense Phase Change Memories. In DSN, 2014.
    [31]
    T. Kilburn, D. B. G. Edwards, M. J. Lanigan, and F. H. Sumner. One-Level Storage System. IRE Transactions on Electronic Computers, 11(2), 1962.
    [32]
    S. Kumar, H. Zhao, A. Shriraman, E. Matthews, S. Dwarkadas, and L. Shannon. Amoeba-Cache: Adaptive Blocks for Eliminating Waste in the Memory Hierarchy. In MICRO, 2012.
    [33]
    H. A. Lagar-Cavilla, J. A. Whitney, A. M. Scannell, P. Patchin, S. M. Rumble, E. de Lara, M. Brudno, and M. Satyanarayanan. SnowFlock: Rapid Virtual Machine Cloning for Cloud Computing. In EuroSys, 2009.
    [34]
    H. Q. Le, W. J. Starke, J. S. Fields, F. P. O'Connell, D. Q. Nguyen, B. J. Ronchetti, W. M. Sauer, E. M. Schwarz, and M. T. Vaden. Ibm power6 microarchitecture. IBM JRD, 51(6), 2007.
    [35]
    C. J. Lee, V. Narasiman, E. Ebrahimi, O. Mutlu, and Y. N. Patt. DRAM-aware last-level cache writeback: Reducing write-caused interference in memory systems. Technical Report TR-HPS-2010-2, University of Texas at Austin, 2010.
    [36]
    V. Nagarajan and R. Gupta. Architectural Support for Shadow Memory in Multiprocessors. In VEE, 2009.
    [37]
    E. B. Nightingale, P. M. Chen, and J. Flinn. Speculative execution in a distributed file system. In SOSP, 2005.
    [38]
    G. Pekhimenko, V. Seshadri, Y. Kim, H. Xin, O. Mutlu, P. B. Gibbons, M. A. Kozuch, and T. C. Mowry. Linearly Compressed Pages: A Low-complexity, Low-latency Main Memory Compression Framework. In MICRO, 2013.
    [39]
    M. Prvulovic, Z. Zhang, and J. Torrellas. Revive: Cost-effective architectural support for rollback recovery in shared-memory multiprocessors. In ISCA, 2002.
    [40]
    E. D. Reilly. Memory-mapped I/O. In Encyclopedia of Computer Science, page 1152. John Wiley and Sons Ltd., Chichester, UK.
    [41]
    B. Romanescu, A. R. Lebeck, D. J. Sorin, and A. Bracy. UNified Instruction/Translation/Data (UNITD) Coherence: One Protocol to Rule Them All. In HPCA, 2010.
    [42]
    R. F. Sauers, C. P. Ruemmler, and P. S. Weygant. HP-UX 11i Tuning and Performance, chapter 8. Memory Bottlenecks. Prentice Hall, 2004.
    [43]
    S. Savage, M. Burrows, G. Nelson, P. Sobalvarro, and T. Anderson. Eraser: A Dynamic Data Race Detector for Multithreaded Programs. TOCS, 15(4), November 1997.
    [44]
    V. Seshadri, Y. Kim, C. Fallin, D. Lee, R. Ausavarungnirun, G. Pekhimenko, Y. Luo, O. Mutlu, P. B. Gibbons, M. A. Kozuch, and T. C. Mowry. RowClone: Fast and Energy-efficient in-DRAM Bulk Data Copy and Initialization. In MICRO, 2013.
    [45]
    V. Seshadri, O. Mutlu, M. A. Kozuch, and T. C. Mowry. The Evicted-Address Filter: A Unified Mechanism to Address Both Cache Pollution and Thrashing. In PACT, 2012.
    [46]
    T. Sherwood, E. Perelman, G. Hamerly, and B. Calder. Automatically Characterizing Large Scale Program Behavior. In ASPLOS, 2002.
    [47]
    W. Shi, H.-H. S. Lee, L. Falk, and M. Ghosh. An Integrated Framework for Dependable and Revivable Architectures Using Multicore Processors. In ISCA, 2006.
    [48]
    A. Silberschatz, P. B. Galvin, and G. Gagne. Operating System Concepts, chapter 11. File-System Implementation. Wiley, 2012.
    [49]
    G. S. Sohi, S. E. Breach, and T. N. Vijaykumar. Multiscalar processors. In ISCA, 1995.
    [50]
    D. J. Sorin, M. M. K. Martin, M. D. Hill, and D. A. Wood. Safetynet: Improving the availability of shared memory multiprocessors with global checkpoint/recovery. In ISCA, 2002.
    [51]
    S. Srinath, O. Mutlu, H. Kim, and Y. N. Patt. Feedback directed prefetching: Improving the performance and bandwidth-efficiency of hardware prefetchers. In HPCA, 2007.
    [52]
    S. M. Srinivasan, S. Kandula, C. R. Andrews, and Y. Zhou. Flashback: A Lightweight Extension for Rollback and Deterministic Replay for Software Debugging. In USENIX ATC, 2004.
    [53]
    M. E. Staknis. Sheaved Memory: Architectural Support for State Saving and Restoration in Pages Systems. In ASPLOS, 1989.
    [54]
    J. G. Steffan, C. B. Colohan, A. Zhai, and T. C. Mowry. A Scalable Approach to Thread-level Speculation. In ISCA, 2000.
    [55]
    P. J. Teller. Translation-Lookaside Buffer Consistency. IEEE Computer, 23(6), 1990.
    [56]
    G. Venkataramani, I. Doudalis, D. Solihin, and M. Prvulovic. FlexiTaint: A Programmable Accelerator for Dynamic Taint Propagation. In HPCA, 2008.
    [57]
    C. Villavieja, V. Karakostas, L. Vilanova, Y. Etsion, A. Ramirez, A. Mendelson, N. Navarro, A. Cristal, and O. S. Unsal. DiDi: Mitigating the Performance Impact of TLB Shootdowns Using a Shared TLB Directory. In PACT, 2011.
    [58]
    C. A. Waldspurger. Memory Resource Management in VMware ESX Server. OSDI, 2002.
    [59]
    Y-M. Wang, Y. Huang, K-P. Vo, P-Y. Chung, and C. Kintala. Checkpointing and its applications. In FTCS, 1995.
    [60]
    B. Wester, P. M. Chen, and J. Flinn. Operating system support for application-specific speculation. In EuroSys, 2011.
    [61]
    A. Wiggins, S. Winwood, H. Tuch, and G. Heiser. Legba: Fast Hardware Support for Fine-Grained Protection. In Amos Omondi and Stanislav Sedukhin, editors, Advances in Computer Systems Architecture, volume 2823 of Lecture Notes in Computer Science, 2003.
    [62]
    E. Witchel, J. Cates, and K. Asanović. Mondrian Memory Protection. In ASPLOS, 2002.
    [63]
    Q. Zhao, D. Bruening, and S. Amarasinghe. Efficient Memory Shadowing for 64-bit Architectures. In ISMM, 2010.

    Cited By

    View all
    • (2024)Address Scaling: Architectural Support for Fine-Grained Thread-Safe Metadata ManagementIEEE Computer Architecture Letters10.1109/LCA.2024.337376023:1(69-72)Online publication date: Jan-2024
    • (2024)CKSM: An Efficient Memory Deduplication Method for Container-based Cloud Computing Systems2024 IEEE International Parallel and Distributed Processing Symposium (IPDPS)10.1109/IPDPS57955.2024.00016(76-88)Online publication date: 27-May-2024
    • (2022)PS-ORAMProceedings of the 49th Annual International Symposium on Computer Architecture10.1145/3470496.3527425(188-203)Online publication date: 18-Jun-2022
    • Show More Cited By

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM SIGARCH Computer Architecture News
    ACM SIGARCH Computer Architecture News  Volume 43, Issue 3S
    ISCA'15
    June 2015
    745 pages
    ISSN:0163-5964
    DOI:10.1145/2872887
    Issue’s Table of Contents
    • cover image ACM Conferences
      ISCA '15: Proceedings of the 42nd Annual International Symposium on Computer Architecture
      June 2015
      768 pages
      ISBN:9781450334020
      DOI:10.1145/2749469
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 13 June 2015
    Published in SIGARCH Volume 43, Issue 3S

    Check for updates

    Qualifiers

    • Research-article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)77
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 11 Aug 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Address Scaling: Architectural Support for Fine-Grained Thread-Safe Metadata ManagementIEEE Computer Architecture Letters10.1109/LCA.2024.337376023:1(69-72)Online publication date: Jan-2024
    • (2024)CKSM: An Efficient Memory Deduplication Method for Container-based Cloud Computing Systems2024 IEEE International Parallel and Distributed Processing Symposium (IPDPS)10.1109/IPDPS57955.2024.00016(76-88)Online publication date: 27-May-2024
    • (2022)PS-ORAMProceedings of the 49th Annual International Symposium on Computer Architecture10.1145/3470496.3527425(188-203)Online publication date: 18-Jun-2022
    • (2021)NVOverlayProceedings of the 48th Annual International Symposium on Computer Architecture10.1109/ISCA52012.2021.00046(498-511)Online publication date: 14-Jun-2021
    • (2019)Multiversioned Page Overlays: Enabling Faster Serializable Hardware Transactional Memory2019 28th International Conference on Parallel Architectures and Compilation Techniques (PACT)10.1109/PACT.2019.00038(395-408)Online publication date: Sep-2019
    • (2016)Fine-Grained Data Management for DRAM/SSD Hybrid Main Memory ArchitectureIEICE Transactions on Information and Systems10.1587/transinf.2016EDL8105E99.D:12(3172-3176)Online publication date: 2016
    • (2016)Efficient footprint caching for Tagless DRAM Caches2016 IEEE International Symposium on High Performance Computer Architecture (HPCA)10.1109/HPCA.2016.7446068(237-248)Online publication date: Mar-2016
    • (2023)Copy-on-Pin: The Missing Piece for Correct Copy-on-WriteProceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 210.1145/3575693.3575716(176-191)Online publication date: 27-Jan-2023
    • (2022)MetaSys: A Practical Open-source Metadata Management System to Implement and Evaluate Cross-layer OptimizationsACM Transactions on Architecture and Code Optimization10.1145/350525019:2(1-29)Online publication date: 24-Mar-2022
    • (2021)NVOverlay: Enabling Efficient and Scalable High-Frequency Snapshotting to NVM2021 ACM/IEEE 48th Annual International Symposium on Computer Architecture (ISCA)10.1109/ISCA52012.2021.00046(498-511)Online publication date: Jun-2021
    • Show More Cited By

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media