Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3037697.3037704acmconferencesArticle/Chapter ViewAbstractPublication PagesasplosConference Proceedingsconference-collections
research-article
Public Access

Efficient Address Translation for Architectures with Multiple Page Sizes

Published: 04 April 2017 Publication History

Abstract

Processors and operating systems (OSes) support multiple memory page sizes. Superpages increase Translation Lookaside Buffer (TLB) hits, while small pages provide fine-grained memory protection. Ideally, TLBs should perform well for any distribution of page sizes. In reality, set-associative TLBs -- used frequently for their energy efficiency compared to fully-associative TLBs -- cannot (easily) support multiple page sizes concurrently. Instead, commercial systems typically implement separate set-associative TLBs for different page sizes. This means that when superpages are allocated aggressively, TLB misses may, counter intuitively, increase even if entries for small pages remain unused (and vice-versa). We invent MIX TLBs, energy-frugal set-associative structures that concurrently support all page sizes by exploiting superpage allocation patterns. MIX TLBs boost the performance (often by 10-30%) of big-memory applications on native CPUs, virtualized CPUs, and GPUs. MIX TLBs are simple and require no OS or program changes.

References

[1]
J. Navarro, S. Iyer, P. Druschel, and A. Cox, "Practical, Transparent Operating System Support for Superpages," OSDI, 2002.
[2]
M. Talluri and M. Hill, "Surpassing the TLB Performance of Superpages with Less Operating System Support," ASPLOS, 1994.
[3]
M. Talluri, S. Kong, M. Hill, and D. Patterson, "Tradeoffs in Supporting Two Page Sizes," ISCA, 1992.
[4]
B. Pham, J. Vesely, G. Loh, and A. Bhattacharjee, "Large P ages and Lightweight Memory Management in Virtualized Systems: Can You Have it Both Ways?," MICRO, 2015.
[5]
D. Fan, Z. Tang, H. Huang, and G. Gao, "An Energy Efficient TLB Design Methodology," ISLPED, 2005.
[6]
V. Karakostas, J. Gandhi, A. Cristal, M. Hill, K. McKinle y, M. Nemirovsky, M. Swift, and O. Unsal, "Energy-Efficient Address Translation," HPCA, 2016.
[7]
T. Juan, T. Lang, and J. Navarro, "Reducing TLB Power Requirements," ISLPED, 1997.
[8]
I. Kadayif, A. Sivasubramaniam, M. Kandemir, G. Kandiraju, and G. Chen, "Generating Physical Addresses Directly for Saving Instruction TLB Energy," MICRO, 2002.
[9]
A. Sodani, "Race to Exascale: Opportunities and Challenges," MICRO Keynote, 2011.
[10]
M. Papadopoulou, X. Tong, A. Seznec, and A. Moshovos, "Prediction-Based Superpage-Friendly TLB Designs," HPCA, 2014.
[11]
Intel, "Haswell," www.7-cpu.com/cpu/Haswell.html, 2016.
[12]
Intel, "Skylake," www.7-cpu.com/cpu/Skylake.html, 2016.
[13]
J. Gandhi, A. Basu, M. Hill, and M. Swift, "Efficient Memory Virtualization," MICRO, 2014.
[14]
J. Buell, D. Hecht, J. Heo, K. Saladi, and R. Taheri, "Methodology for Performance Analysis of VMware vSphere under Tier-1 Applications," VMWare Technical Journal, 2013.
[15]
A. Seznec, "Concurrent Support of Multiple Page Sizes on a Skewed Associative TLB," IEEE Transactions on Computers, 2004.
[16]
B. Pham, V. Vaidyanathan, A. Jaleel, and A. Bhattacharj ee, "CoLT: Coalesced Large-Reach TLBs," MICRO, 2012.
[17]
B. Pham, A. Bhattacharjee, Y. Eckert, and G. Loh, "Increasing TLB Reach by Exploiting Clustering in Page Translations," HPCA, 2014.
[18]
A. Basu, J. Gandhi, J. Chang, M. Hill, and M. Swift, "Effic ient Virtual Memory for Big Memory Servers," ISCA, 2013.
[19]
A. Bhattacharjee, "Large-Reach Memory Management Unit Caches," MICRO, 2013.
[20]
R. Bhargava, B. Serebrin, F. Spadini, and S. Manne, "Accelerating Two-Dimensional Page Walks for Virtualized Systems," ASPLOS, 2008.
[21]
B. Pichai, L. Hsu, and A. Bhattacharjee, "Architectura l Support for Address Translation on GPUs," ASPLOS, 2014.
[22]
B. Pichai, L. Hsu, and A. Bhattacharjee, "Address Translation for Throughput Oriented Accelerators," IEEE Micro Top Picks, 2015.
[23]
J. Power, M. Hill, and D. Wood, "Supporting x86-64 Addre ss Translation for 100s of GPU Lanes," HPCA, 2014.
[24]
N. Agarwal, D. Nellans, M. O'Connor, S. Keckler, and T. Wenisch, "Unlocking Bandwidth for GPUs in CC-NUMA Systems," HPCA, 2015.
[25]
N. Agarwal, D. Nellans, M. Stephenson, M. O'Connor, and S. Keckler, "Page Placement Strategies for GPUs within Heterogeneous Memory Systems," ASPLOS, 2015.
[26]
G. Kyriazis, "Heterogeneous System Architecture: A Te chnical Review," Whitepaper, 2012.
[27]
J. Vesely, A. Basu, M. Oskin, G. Loh, and A. Bhattacharjee, "Observations and Opportunities in Architecting Shared Virtual Memory for Heterogeneous Systems," ISPASS, 2016.
[28]
T. Zheng, D. Nellans, A. Zulfiqar, M. Stephenson, and S. Keckler, "Towards a High Performance Paged Memory for GPUs," HPCA, 2016.
[29]
V. Karakostas, J. Gandhi, F. Ayar, A. Cristal, M. Hill, K. McKinley, M. Nemirovsky, M. Swift, and O. Unsal, "Redundant Memory Mappings for Fast Access to Large Memories," ISCA, 2015.
[30]
Intel, "Intel 64 and IA-32 Architectures Software Deve loper's Manual," 2016.
[31]
D. Lustig, G. Sethi, M. Martonosi, and A. Bhattacharjee, "COATCheck: Verifying Memory Ordering at the Hardware-OS Interface," ASPLOS, 2016.
[32]
B. Romanescu, A. Lebeck, and D. Sorin, "Specifying and Dynamically Verifying Address Translation-Aware Memory Consistency," ASPLOS, 2010.
[33]
N. Muralimanohar, R. Balasubramonian, and N. Jouppi, "CACTI 6.0: A Tool to Model Large Caches," MICRO, 2007.
[34]
A. Basu, M. Hill, and M. Swift, "Reducing Memory Reference Energy with Opportunistic Virtual Caching," ISCA, 2012.
[35]
A. Seznec, "A Case for Two-Way Skewed Associative Cache," ISCA, 1993.
[36]
F. Bodin and A. Seznec, "Skewed Associativity Enhances Performance Predictability," ISCA, 1995.
[37]
D. Sanchez and C. Kozyrakis, "The ZCache: Decoupling Ways and Associativity," MICRO, 2010.
[38]
R. Sampson and T. Wenisch, "Z-Cache Skewered," WDDD, 2011.
[39]
A. Bhattacharjee, D. Lustig, and M. Martonosi, "Shared Last-Level TLBs for Chip Multiprocessors," HPCA, 2011.
[40]
C.-K. Luk, R. Cohn, R. Muth, H. Patil, A. Klauser, G. Lown ey, S. Wallace, V. J. Reddi, and K. Hazelwood, "Pin: Building Customized Program Analysis Tools with Dynamic Instrumentation," PLDI, 2005.
[41]
C. Bienia, S. Kumar, J. P. Singh, and K. Li, "The PARSEC Benchmark Suite: Characterization and Architectural Simp lications," IISWC, 2008.
[42]
M. Ferdman, A. Adileh, O. Kocberber, S. Volos, M. Alisaf aee, D. Jevdjic, C. Kaynak, A. D. Popescu, A. Ailamaki, and B. Falsafi, "Clearing the Clouds: A Study of Emerging Scale-out Workloads on Modern Hardware," ASPLOS, 2012.
[43]
S. Che, J. Sheaffer, M. Boyer, L. Szafaryn, L. Wang, and K. Skadron, "A Characterization of the Rodinia Benchmark Suite with Comparison to Contemporary CMP Workloads," IISWC, 2010.
[44]
A. Arcangeli, "Transparent Hugepage Support," KVM Forum, 2010.
[45]
A. Clements, F. Kaashoek, and N. Zeldovich, "Scalable Address Spaces Using RCU Balanced Trees," ASPLOS, 2012.
[46]
A. Bhattacharjee, "Translation-Triggered Prefetching," ASP-LOS, 2017.
[47]
B. Pham, J. Vesely, G. Loh, and A. Bhattacharjee, "Using TLB Speculation to Overcome Page Splintering in Virtual Machines," Rutgers Technical Report DCS-TR-713, 2015.
[48]
F. Guo, S. Kim, Y. Baskakov, and I. Banerjee, "Proactively Breaking Large Pages to Improve Memory Overcommitment Performance in VMware ESXi," VEE, 2015.
[49]
F. Gaud, B. Lepers, J. Decouchant, J. Funston, and A. Fedorova, "Large Pages May be Harmful on NUMA Systems," USENIX ATC, 2014.
[50]
J. Gandhi, M. Hill, and M. Swift, "Agile Paging: Exceedi ng the Best of Nested and Shadow Paging," ISCA, 2016.

Cited By

View all
  • (2025)TLB Coalescing With Range Compressed Page Table for Embedded I/O DevicesIEEE Access10.1109/ACCESS.2025.352894513(12623-12633)Online publication date: 2025
  • (2024)Direct Memory Translation for Virtualized CloudsProceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 210.1145/3620665.3640358(287-304)Online publication date: 27-Apr-2024
  • (2024)A Case for Speculative Address Translation with Rapid Validation for GPUs2024 57th IEEE/ACM International Symposium on Microarchitecture (MICRO)10.1109/MICRO61859.2024.00029(278-292)Online publication date: 2-Nov-2024
  • Show More Cited By

Index Terms

  1. Efficient Address Translation for Architectures with Multiple Page Sizes

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      ASPLOS '17: Proceedings of the Twenty-Second International Conference on Architectural Support for Programming Languages and Operating Systems
      April 2017
      856 pages
      ISBN:9781450344654
      DOI:10.1145/3037697
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Sponsors

      In-Cooperation

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 04 April 2017

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. coalescing
      2. superpages
      3. tlb
      4. virtual memory

      Qualifiers

      • Research-article

      Funding Sources

      Conference

      ASPLOS '17

      Acceptance Rates

      ASPLOS '17 Paper Acceptance Rate 53 of 320 submissions, 17%;
      Overall Acceptance Rate 535 of 2,713 submissions, 20%

      Upcoming Conference

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)512
      • Downloads (Last 6 weeks)48
      Reflects downloads up to 25 Feb 2025

      Other Metrics

      Citations

      Cited By

      View all
      • (2025)TLB Coalescing With Range Compressed Page Table for Embedded I/O DevicesIEEE Access10.1109/ACCESS.2025.352894513(12623-12633)Online publication date: 2025
      • (2024)Direct Memory Translation for Virtualized CloudsProceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 210.1145/3620665.3640358(287-304)Online publication date: 27-Apr-2024
      • (2024)A Case for Speculative Address Translation with Rapid Validation for GPUs2024 57th IEEE/ACM International Symposium on Microarchitecture (MICRO)10.1109/MICRO61859.2024.00029(278-292)Online publication date: 2-Nov-2024
      • (2024)Elastic Translations: Fast Virtual Memory with Multiple Translation Sizes2024 57th IEEE/ACM International Symposium on Microarchitecture (MICRO)10.1109/MICRO61859.2024.00012(17-35)Online publication date: 2-Nov-2024
      • (2023)IDYLL: Enhancing Page Translation in Multi-GPUs via Light Weight PTE InvalidationsProceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture10.1145/3613424.3614269(1163-1177)Online publication date: 28-Oct-2023
      • (2023)CPU-free Computing: A Vision with a BlueprintProceedings of the 19th Workshop on Hot Topics in Operating Systems10.1145/3593856.3595906(1-14)Online publication date: 22-Jun-2023
      • (2023)Contiguitas: The Pursuit of Physical Memory Contiguity in DatacentersProceedings of the 50th Annual International Symposium on Computer Architecture10.1145/3579371.3589079(1-15)Online publication date: 17-Jun-2023
      • (2023)CVA6 RISC-V Virtualization: Architecture, Microarchitecture, and Design Space ExplorationIEEE Transactions on Very Large Scale Integration (VLSI) Systems10.1109/TVLSI.2023.330283731:11(1713-1726)Online publication date: Nov-2023
      • (2023)SnakeByte: A TLB Design with Adaptive and Recursive Page Merging in GPUs2023 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA56546.2023.10071063(1195-1207)Online publication date: Feb-2023
      • (2023)Memory-Efficient Hashed Page Tables2023 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA56546.2023.10071061(1221-1235)Online publication date: Feb-2023
      • Show More Cited By

      View Options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Login options

      Figures

      Tables

      Media

      Share

      Share

      Share this Publication link

      Share on social media