Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3079856.3080217acmconferencesArticle/Chapter ViewAbstractPublication PagesiscaConference Proceedingsconference-collections
research-article

Hybrid TLB Coalescing: Improving TLB Translation Coverage under Diverse Fragmented Memory Allocations

Published: 24 June 2017 Publication History

Abstract

To mitigate excessive TLB misses in large memory applications, techniques such as large pages, variable length segments, and HW coalescing, increase the coverage of limited hardware translation entries by exploiting the contiguous memory allocation. However, recent studies show that in non-uniform memory systems, using large pages often leads to performance degradation, or allocating large chunks of memory becomes more difficult due to memory fragmentation. Although each of the prior techniques favors its own best chunk size, diverse contiguity of memory allocation in real systems cannot always provide the optimal chunk of each technique. Under such fragmented and diverse memory allocations, this paper proposes a novel HW-SW hybrid translation architecture, which can adapt to different memory mappings efficiently. In the proposed hybrid coalescing technique, the operating system encodes memory contiguity information in a subset of page table entries, called anchor entries. During address translation through TLBs, an anchor entry provides translation for contiguous pages following the anchor entry. As a smaller number of anchor entries can cover a large portion of virtual address space, the efficiency of TLB can be significantly improved. The most important benefit of hybrid coalescing is its ability to change the coverage of the anchor entry dynamically, reflecting the current allocation contiguity status. By using the contiguity information directly set by the operating system, the technique can provide scalable translation coverage improvements with minor hardware changes, while allowing the flexibility of memory allocation. Our experimental results show that across diverse allocation scenarios with different distributions of contiguous memory chunks, the proposed scheme can effectively reap the potential translation coverage improvement from the existing contiguity.

References

[1]
Neha Agarwal and Thomas F. Wenisch. 2017. Thermostat: Application-transparent Page Management for Two-tiered Main Memory. In Proceedings of the Twenty-Second International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS '17). ACM, New York, NY, USA, 631--644.
[2]
Jeongseob Ahn, Seongwook Jin, and Jaehyuk Huh. 2012. Revisiting Hardware-assisted Page Walks for Virtualized Systems. In Proceedings of the 39th Annual International Symposium on Computer Architecture (ISCA '12). IEEE Computer Society, Washington, DC, USA, 476--487.
[3]
Thomas W. Barr, Alan L. Cox, and Scott Rixner. 2010. Translation Caching: Skip, Don'T Walk (the Page Table). In Proceedings of the 37th Annual International Symposium on Computer Architecture (ISCA '10). ACM, New York, NY, USA, 48--59.
[4]
Thomas W. Barr, Alan L. Cox, and Scott Rixner. 2011. SpecTLB: A Mechanism for Speculative Address Translation. In Proceedings of the 38th Annual International Symposium on Computer Architecture (ISCA '11). ACM, New York, NY, USA, 307--318.
[5]
Arkaprava Basu, Jayneel Gandhi, Jichuan Chang, Mark D. Hill, and Michael M. Swift. 2013. Efficient Virtual Memory for Big Memory Servers. In Proceedings of the 40th Annual International Symposium on Computer Architecture (ISCA '13). ACM, New York, NY, USA, 237--248.
[6]
Arkaprava Basu, Mark D. Hill, and Michael M. Swift. 2012. Reducing Memory Reference Energy with Opportunistic Virtual Caching. In Proceedings of the 39th Annual International Symposium on Computer Architecture (ISCA '12). IEEE Computer Society, Washington, DC, USA, 297--308.
[7]
Ravi Bhargava, Benjamin Serebrin, Francesco Spadini, and Srilatha Manne. 2008. Accelerating Two-dimensional Page Walks for Virtualized Systems. In Proceedings of the 13th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS XIII). ACM, New York, NY, USA, 26--35.
[8]
Abhishek Bhattacharjee. 2013. Large-reach Memory Management Unit Caches. In Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-46). ACM, New York, NY, USA, 383--394.
[9]
Abhishek Bhattacharjee, Daniel Lustig, and Margaret Martonosi. 2011. Shared Last-level TLBs for Chip Multiprocessors. In Proceedings of the 2011 IEEE 17th International Symposium on High Performance Computer Architecture (HPCA '11). IEEE Computer Society, Washington, DC, USA, 62--63.
[10]
Guilherme Cox and Abhishek Bhattacharjee. 2017. Efficient Address Translation for Architectures with Multiple Page Sizes. In Proceedings of the Twenty-Second International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS '17). ACM, New York, NY, USA, 435--448.
[11]
Linux Kernel Documentation. 2016. pagemap, from the userspace perspective. (28 May 2016). Retrieved April 26, 2017 from https://www.kernel.org/doc/Documentation/vm/pagemap.txt
[12]
Linux Kernel Documentation. 2017. Transparent Hugepage Support. (05 Mar 2017). Retrieved April 26, 2017 from https://www.kernel.org/doc/Documentation/vm/transhuge.txt
[13]
Yu Du, Miao Zhou, Bruce R. Childers, Daniel Mossé, and Rami Melhem. 2015. Supporting superpages in non-contiguous physical memory. In 2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA '15). IEEE, 223--234.
[14]
Subramanya R. Dulloor, Amitabha Roy, Zheguang Zhao, Narayanan Sundaram, Nadathur Satish, Rajesh Sankaran, Jeff Jackson, and Karsten Schwan. 2016. Data Tiering in Heterogeneous Memory Systems. In Proceedings of the Eleventh European Conference on Computer Systems (EuroSys '16). ACM, New York, NY, USA, Article 15, 16 pages.
[15]
Jayneel Gandhi, Arkaprava Basu, Mark D. Hill, and Michael M. Swift. 2014. Efficient Memory Virtualization: Reducing Dimensionality of Nested Page Walks. In Proceedings of the 47th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-47). IEEE Computer Society, Washington, DC, USA, 178--189.
[16]
Jayneel Gandhi, Mark D. Hill, and Michael M. Swift. 2016. Agile Paging: Exceeding the Best of Nested and Shadow Paging. In Proceedings of the 43rd International Symposium on Computer Architecture (ISCA '16). IEEE Press, Piscataway, NJ, USA, 707--718.
[17]
Fabien Gaud, Baptiste Lepers, Jeremie Decouchant, Justin Funston, Alexandra Fedorova, and Vivien Quéma. 2014. Large Pages May Be Harmful on NUMA Systems. In Proceedings of the 2014 USENIX Conference on USENIX Annual Technical Conference (USENIX ATC'14). USENIX Association, Berkeley, CA, USA, 231--242.
[18]
Intel Corporation. 2016. Intel® 64 and IA-32 Architectures Optimization Reference Manual.
[19]
Gokul B. Kandiraju and Anand Sivasubramaniam. 2002. Going the Distance for TLB Prefetching: An Application-driven Study. In Proceedings of the 29th Annual International Symposium on Computer Architecture (ISCA '02). IEEE Computer Society, Washington, DC, USA, 195--206.
[20]
Sudarsun Kannan, Ada Gavrilovska, and Karsten Schwan. 2016. pVM: Persistent Virtual Memory for Efficient Capacity Scaling and Object Storage. In Proceedings of the Eleventh European Conference on Computer Systems (EuroSys '16). ACM, New York, NY, USA, Article 13, 16 pages.
[21]
Vasileios Karakostas, Jayneel Gandhi, Furkan Ayar, Adrián Cristal, Mark D. Hill, Kathryn S. McKinley, Mario Nemirovsky, Michael M. Swift, and Osman Ünsal. 2015. Redundant Memory Mappings for Fast Access to Large Memories. In Proceedings of the 42Nd Annual International Symposium on Computer Architecture (ISCA '15). ACM, New York, NY, USA, 66--78.
[22]
Vasileios Karakostas, Jayneel Gandhi, Adrián Cristal, Mark D. Hill, Kathryn S. McKinley, Mario Nemirovsky, Michael M. Swift, and Osman S. Ünsal. 2016. Energy-efficient address translation. In 2016 IEEE International Symposium on High Performance Computer Architecture (HPCA '16). IEEE Press, 631--643.
[23]
Gwangsun Kim, John Kim, Jung H. Ahn, and Jaeha Kim. 2013. Memory-centric system interconnect design with Hybrid Memory Cubes. In Proceedings of the 22nd International Conference on Parallel Architectures and Compilation Techniques. 145--155.
[24]
Youngjin Kwon, Hangchen Yu, Simon Peter, Christopher J. Rossbach, and Emmett Witchel. 2016. Coordinated and Efficient Huge Page Management with Ingens. In Proceedings of the 12th USENIX Conference on Operating Systems Design and Implementation (OSDI'16). USENIX Association, Berkeley, CA, USA, 705--721.
[25]
Yang Li, Rami Melhem, and Alex K. Jones. 2012. Leveraging Sharing in Second Level Translation-Lookaside Buffers for Chip Multiprocessors. IEEE Computer Architecture Letters 11, 2 (2012), 49--52.
[26]
Gabriel H. Loh. 2008. 3D-Stacked Memory Architectures for Multi-core Processors. In 2008 International Symposium on Computer Architecture. 453--464.
[27]
Chi-Keung Luk, Robert Cohn, Robert Muth, Harish Patil, Artur Klauser, Geoff Lowney, Steven Wallace, Vijay Janapa Reddi, and Kim Hazelwood. 2005. Pin: Building Customized Program Analysis Tools with Dynamic Instrumentation. In Proceedings of the 2005 ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI '05). ACM, New York, NY, USA, 190--200.
[28]
Mitesh R. Meswani, Sergey Blagodurov, David Roberts, John Slice, Mike Ignatowski, and Gabriel H. Loh. 2015. Heterogeneous memory architectures: A HW/SW approach for mixing die-stacked and off-package memories. In 2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA '15). 126--136.
[29]
Juan Navarro, Sitararn Iyer, Peter Druschel, and Alan Cox. 2002. Practical, Transparent Operating System Support for Superpages. SIGOPS Oper. Syst. Rev. 36, SI, 89--104.
[30]
Misel M. Papadopoulou, Xin Tong, André Seznec, and Andreas Moshovos. 2015. Prediction-based superpage-friendly TLB designs. In 2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA '15). 210--222.
[31]
Chang Hyun Park, Taekyung Heo, and Jaehyuk Huh. 2016. Efficient Synonym Filtering and Scalable Delayed Translation for Hybrid Virtual Caching. In Proceedings of the 43rd International Symposium on Computer Architecture (ISCA '16). IEEE Press, Piscataway, NJ, USA, 217--229.
[32]
J. Thomas Pawlowski. 2011. Hybrid memory cube (HMC). In 2011 IEEE Hot Chips 23 Symposium (HCS). 1--24.
[33]
Binh Pham, Abhishek Bhattacharjee, Yasuko Eckert, and Gabriel H. Loh. 2014. Increasing TLB reach by exploiting clustering in page translations. In 2014 IEEE 20th International Symposium on High Performance Computer Architecture (HPCA '14). 558--567.
[34]
Binh Pham, Viswanathan Vaidyanathan, Aamer Jaleel, and Abhishek Bhattacharjee. 2012. CoLT: Coalesced Large-Reach TLBs. In Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-45). IEEE Computer Society, Washington, DC, USA, 258--269.
[35]
Moinuddin K. Qureshi, Vijayalakshmi Srinivasan, and Jude A. Rivers. 2009. Scalable High Performance Main Memory System Using Phase-change Memory Technology. In Proceedings of the 36th Annual International Symposium on Computer Architecture (ISCA '09). ACM, New York, NY, USA, 24--33.
[36]
Ashley Saulsbury, Fredrik Dahlgren, and Per Stenström. 2000. Recency-based TLB Preloading. In Proceedings of the 27th Annual International Symposium on Computer Architecture (ISCA '00). ACM, New York, NY, USA, 117--127.
[37]
Shekhar Srikantaiah and Mahmut Kandemir. 2010. Synergistic TLBs for High Performance Address Translation in Chip Multiprocessors. In 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO '10). 313--324.
[38]
Madhusudhan Talluri and Mark D. Hill. 1994. Surpassing the TLB Performance of Superpages with Less Operating System Support. In Proceedings of the Sixth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS VI). ACM, New York, NY, USA, 171--182.
[39]
Hongil Yoon and Gurindar S. Sohi. 2016. Revisiting virtual L1 caches: A practical design using dynamic synonym remapping. In 2016 IEEE International Symposium on High Performance Computer Architecture (HPCA '16). 212--224.
[40]
Lixin Zhang, Evan Speight, Ram Rajamony, and Jiang Lin. 2010. Enigma: Architectural and Operating System Support for Reducing the Impact of Address Translation. In Proceedings of the 24th ACM International Conference on Supercomputing (ICS '10). ACM, New York, NY, USA, 159--168.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
ISCA '17: Proceedings of the 44th Annual International Symposium on Computer Architecture
June 2017
736 pages
ISBN:9781450348928
DOI:10.1145/3079856
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 24 June 2017

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. TLB coalescing
  2. Virtual memory
  3. address translation

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

ISCA '17
Sponsor:

Acceptance Rates

ISCA '17 Paper Acceptance Rate 54 of 322 submissions, 17%;
Overall Acceptance Rate 543 of 3,203 submissions, 17%

Upcoming Conference

ISCA '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)69
  • Downloads (Last 6 weeks)9
Reflects downloads up to 25 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2025)TLB Coalescing With Range Compressed Page Table for Embedded I/O DevicesIEEE Access10.1109/ACCESS.2025.352894513(12623-12633)Online publication date: 2025
  • (2024)Every mapping counts in large amountsProceedings of the 2024 USENIX Conference on Usenix Annual Technical Conference10.5555/3691992.3692069(1273-1282)Online publication date: 10-Jul-2024
  • (2024)Contiguity Aware TLB Prefetching for Embedded I/O DevicesIEICE Electronics Express10.1587/elex.21.20240664Online publication date: 2024
  • (2024)Direct Memory Translation for Virtualized CloudsProceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 210.1145/3620665.3640358(287-304)Online publication date: 27-Apr-2024
  • (2024)STAR: Sub-Entry Sharing-Aware TLB for Multi-Instance GPU2024 57th IEEE/ACM International Symposium on Microarchitecture (MICRO)10.1109/MICRO61859.2024.00031(309-323)Online publication date: 2-Nov-2024
  • (2024)A Case for Speculative Address Translation with Rapid Validation for GPUs2024 57th IEEE/ACM International Symposium on Microarchitecture (MICRO)10.1109/MICRO61859.2024.00029(278-292)Online publication date: 2-Nov-2024
  • (2024)Elastic Translations: Fast Virtual Memory with Multiple Translation Sizes2024 57th IEEE/ACM International Symposium on Microarchitecture (MICRO)10.1109/MICRO61859.2024.00012(17-35)Online publication date: 2-Nov-2024
  • (2024)Barre Chord: Efficient Virtual Memory Translation for Multi-Chip-Module GPUs2024 ACM/IEEE 51st Annual International Symposium on Computer Architecture (ISCA)10.1109/ISCA59077.2024.00065(834-847)Online publication date: 29-Jun-2024
  • (2023)An Empirical Evaluation of PTE CoalescingProceedings of the International Symposium on Memory Systems10.1145/3631882.3631902(1-16)Online publication date: 2-Oct-2023
  • (2023)Architectural Support for Optimizing Huge Page Selection Within the OSProceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture10.1145/3613424.3614296(1213-1226)Online publication date: 28-Oct-2023
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media