research-article

Public Access

Paging and the Address-Translation Problem

Authors:

Michael A. Bender,

Abhishek Bhattacharjee,

Martín Farach-Colton,

Sudarsun Kannan,

William Kuszmaul,

Nirjhar Mukherjee,

Guido Tagliavini,

Janet Vorobyeva,

Evan WestAuthors Info & Claims

SPAA '21: Proceedings of the 33rd ACM Symposium on Parallelism in Algorithms and Architectures

Pages 105 - 117

https://doi.org/10.1145/3409964.3461814

Published: 06 July 2021 Publication History

Abstract

The classical paging problem, introduced by Sleator and Tarjan in 1985, formalizes the problem of caching pages in RAM in order to minimize IOs. Their online formulation ignores the cost of address translation: programs refer to data via virtual addresses, and these must be translated into physical locations in RAM. Although the cost of an individual address translation is much smaller than that of an IO, every memory access involves an address translation, whereas IOs can be infrequent. In practice, one can spend money to avoid paging by over-provisioning RAM; in contrast, address translation is effectively unavoidable. Thus address-translation costs can sometimes dominate paging costs, and systems must simultaneously optimize both.

To mitigate the cost of address translation, all modern CPUs have translation lookaside buffers (TLBs), which are hardware caches of common address translations. What makes TLBs interesting is that a single TLB entry can potentially encode the address translation for many addresses. This is typically achieved via the use of huge pages, which translate runs of contiguous virtual addresses to runs of contiguous physical addresses. Huge pages reduce TLB misses at the cost of increasing the IOs needed to maintain contiguity in RAM. This tradeoff between TLB misses and IOs suggests that the classical paging problem does not tell the full story.

This paper introduces the Address-Translation Problem, which formalizes the problem of maintaining a TLB, a page table, and RAM in order to minimize the total cost of both TLB misses and IOs. We present an algorithm that achieves the benefits of huge pages for TLB misses without the downsides of huge pages for IOs.

References

[1]

Couchbase: Disabling transparent huge pages (THP). https://docs.couchbase.com/server/current/install/thp-disable.html. Accessed: 2/11/2021.

[2]

MongoDB: Disable transparent huge pages (THP). https://docs.mongodb.com/manual/tutorial/transparent-huge-pages/. Accessed: 2/11/2021.

[3]

Oracle database: Disabling transparent hugepages. https://docs.oracle.com/en/database/oracle/oracle-database/12.2/ladbi/disabling-transparent-hugepages.html. Accessed: 2/11/2021.

[4]

Percona: Settling the myth of transparent hugepages for databases. https://www.percona.com/blog/2019/03/06/settling-the-myth-of-transparent-hugepages-for-databases/. Accessed: 2/11/2021.

[5]

Alok Aggarwal and S. Vitter, Jeffrey. The input/output complexity of sorting and related problems. Commun. ACM, 31(9):1116--1127, September 1988.

Digital Library

[6]

Kunal Agrawal, Michael A. Bender, and Jeremy T. Fineman. The worst page-replacement policy. In Proceedings of the 4th International Conference on Fun with Algorithms (FUN), page 135--145. Springer-Verlag, 2007.

[7]

Inc. AMD. Amd-v nested paging.

[8]

Arkaprava Basu, Jayneel Gandhi, Jichuan Chang, Mark D. Hill, and Michael M. Swift. Efficient virtual memory for big memory servers. In Proceedings of the 40th Annual International Symposium on Computer Architecture (ISCA). ACM, 2013.

Digital Library

[9]

Abhishek Bhattacharjee. Preserving virtual memory by mitigating the address translation wall. IEEE Micro, 37(5):6--10, 2017.

Digital Library

[10]

Abhishek Bhattacharjee. Translation-triggered prefetching. In Proceedings of the Twenty-Second International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), pages 63--76. ACM, 2017.

Digital Library

[11]

Allan Borodin and Ran El-Yaniv. Online Computation and Competitive Analysis. Cambridge University Press, USA, 1998.

Digital Library

[12]

Joan Boyar, Lene M. Favrholdt, and Kim S. Larsen. The relative worst-order ratio applied to paging. J. Comput. Syst. Sci., 73(5):818--843, August 2007.

Digital Library

[13]

Mark Brehob, Richard Enbody, Eric Torng, and Stephen Wagner. On-line restricted caching. In Proceedings of the Twelfth Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 374--383. Society for Industrial and Applied Mathematics, 2001.

Digital Library

[14]

Niv Buchbinder, Shahar Chen, and Joseph (Seffi) Naor. Competitive algorithms for restricted caching and matroid caching. In Proceedings of the 22nd European Symposium on Algorithms (ESA), pages 209--221. Springer Berlin Heidelberg, 2014.

[15]

Intel's Cascade Lake microarchitecture. https://en.wikichip.org/wiki/intel/microarchitectures/cascade_lake. Accessed: 02/02/2020.

[16]

Fernando J. Corbató. A paging experiment with the Multics system. In MIT Project MAC Report MAC-M-384, 1969.

[17]

Guilherme Cox and Abhishek Bhattacharjee. Efficient address translation for architectures with multiple page sizes. In Proceedings of the Twenty-Second International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), pages 435--448. ACM, 2017.

Digital Library

[18]

Peter J. Denning. The working set model for program behavior. Commun. ACM, 11(5):323--333, May 1968.

Digital Library

[19]

Reza Dorrigiv and Alejandro López-Ortiz. Closing the gap between theory and practice: New measures for on-line algorithm analysis. In Shin-ichi Nakano and Md. Saidur Rahman, editors, WALCOM: Algorithms and Computation, pages 13--24. Springer Berlin Heidelberg, 2008.

[20]

Reza Dorrigiv, Alejandro López-Ortiz, and J. Ian Munro. On the relative dominance of paging algorithms. Theor. Comput. Sci., 410(38--40):3694--3701, September 2009.

[21]

Y. Du, M. Zhou, B. R. Childers, D. Mossé, and R. Melhem. Supporting superpages in non-contiguous physical memory. In 2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA), pages 223--234, Feb 2015.

[22]

Amos Fiat, Richard M Karp, Michael Luby, Lyle A McGeoch, Daniel D Sleator, and Neal E Young. Competitive paging algorithms. Journal of Algorithms, 12(4):685 -- 699, 1991.

Digital Library

[23]

Amos Fiat, Manor Mendel, and Steven Seiden. Online companion caching. Theoretical Computer Science, 324:499--511, 09 2002.

[24]

Matteo Frigo, Charles E. Leiserson, Harald Prokop, and Sridhar Ramachandran. Cache-oblivious algorithms. In Proceedings of the 40th Annual Symposium on Foundations of Computer Science (FOCS), page 285, 1999.

[25]

Matteo Frigo, Charles E. Leiserson, Harald Prokop, and Sridhar Ramachandran. Cache-oblivious algorithms. ACM Transactions on Algorithms, 8(1):4, 2012.

Digital Library

[26]

Mel Gorman. Linux huge pages. https://lwn.net/Articles/375096/, 2010.

[27]

Mel Gorman. AMD Zen architecture. https://en.wikichip.org/wiki/amd/microarchitectures/zen, 2018.

[28]

Inc. Intel. Intel® 64 and ia-32 architectures software developer's manual volume 3a: System programming guide, part 1.

[29]

V. Karakostas, J. Gandhi, A. Cristal, M. D. Hill, K. S. McKinley, M. Nemirovsky, M. M. Swift, and O. S. Unsal. Energy-efficient address translation. In 2016 IEEE International Symposium on High Performance Computer Architecture (HPCA), pages 631--643, 2016.

[30]

Youngjin Kwon, Hangchen Yu, Simon Peter, Christopher J. Rossbach, and Emmett Witchel. Coordinated and efficient huge page management with ingens. In 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI), pages 705--721. USENIX Association, November 2016.

[31]

Richard C Murphy, Kyle B Wheeler, Brian W Barrett, and James A Ang. Introducing the graph 500. Cray Users Group (CUG), 19:45--74, 2010.

[32]

Juan Navarro, Sitaram Iyer, Peter Druschel, and Alan L. Cox. Practical, transparent operating system support for superpages. In 5th Symposium on Operating System Design and Implementation (OSDI), 2002.

[33]

Stanko Novakovic, Yizhou Shan, Aasheesh Kolli, Michael Cui, Yiying Zhang, Haggai Eran, Liran Liss, Michael Wei, Dan Tsafrir, and Marcos K. Aguilera. Storm: a fast transactional dataplane for remote data structures. CoRR, abs/1902.02411, 2019.

Digital Library

[34]

Omitted for Anonymity . Dynamic balls-and-bins and iceberg hashing. Under review, 2021. Manuscript available upon request.

[35]

Ashish Panwar, Sorav Bansal, and K. Gopinath. Hawkeye: Efficient fine-grained os support for huge pages. In Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), pages 347--360. Association for Computing Machinery, 2019.

Digital Library

[36]

Ashish Panwar, Aravinda Prasad, and K. Gopinath. Making huge pages actually useful. SIGPLAN Not., 53(2):679--692, March 2018.

Digital Library

[37]

Chang Hyun Park, Taekyung Heo, Jungi Jeong, and Jaehyuk Huh. Hybrid tlb coalescing: Improving tlb translation coverage under diverse fragmented memory allocations. SIGARCH Comput. Archit. News, 45(2):444--456, June 2017.

Digital Library

[38]

Chang Hyun Park, Taekyung Heo, Jungi Jeong, and Jaehyuk Huh. Hybrid TLB coalescing: Improving TLB translation coverage under diverse fragmented memory allocations. In Proceedings of the 44th Annual International Symposium on Computer Architecture (ISCA), pages 444--456. Association for Computing Machinery, 2017.

Digital Library

[39]

Enoch Peserico. Online paging with arbitrary associativity. In Proceedings of the Fourteenth Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 555--564. Society for Industrial and Applied Mathematics, 2003.

Digital Library

[40]

B. Pham, A. Bhattacharjee, Y. Eckert, and G. H. Loh. Increasing TLB reach by exploiting clustering in page translations. In 2014 IEEE 20th International Symposium on High Performance Computer Architecture (HPCA), pages 558--567, Feb 2014.

[41]

B. Pham, V. Vaidyanathan, A. Jaleel, and A. Bhattacharjee. CoLT: coalesced large-reach TLBs. In 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), pages 258--269, December 2012.

Digital Library

[42]

B. Pham, J. Veselý, G. H. Loh, and A. Bhattacharjee. Large pages and lightweight memory management in virtualized environments: Can you have it both ways? In Proceedings of the 2015 48th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), pages 1--12, Dec 2015.

Digital Library

[43]

H. Prokop. Cache oblivious algorithms. Master's thesis, Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, June 1999.

[44]

Martin Raab and Angelika Steger. “balls into bins” -- a simple and tight analysis. In Randomization and Approximation Techniques in Computer Science, pages 159--170. Springer Berlin Heidelberg, 1998.

[45]

SandyBridge. https://www.7-cpu.com/cpu/SandyBridge.html.

[46]

Sandeep Sen and Siddhartha Chatterjee. Towards a theory of cache-efficient algorithms. In Proceedings of the Eleventh Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 829--838, USA, 2000. Society for Industrial and Applied Mathematics.

Digital Library

[47]

Daniel D. Sleator and Robert E. Tarjan. Amortized efficiency of list update and paging rules. Commun. ACM, 28(2):202--208, February 1985.

Digital Library

[48]

Michael M. Swift. Towards O(1) memory. In Proceedings of the 16th Workshop on Hot Topics in Operating Systems (HotOS), pages 7--11, 2017.

Digital Library

[49]

Berthold Vöcking. How asymmetry helps load balancing. J. ACM, 50(4):568--589, July 2003.

Digital Library

[50]

Zi Yan, Daniel Lustig, David Nellans, and Abhishek Bhattacharjee. Translation ranger: Operating system support for contiguity-aware TLBs. In Proceedings of the 46th International Symposium on Computer Architecture (ISCA), pages 698--710, New York, NY, USA, 2019.

Digital Library

[51]

Jian Yang, Joseph Izraelevitz, and Steven Swanson. Filemr: Rethinking RDMA networking for scalable persistent memory. In 17th USENIX Symposium on Networked Systems Design and Implementation (NSDI 20), pages 111--125, Santa Clara, CA, February 2020. USENIX Association.

[52]

N. Young. The k-server dual and loose competitiveness for paging. Algorithmica, 11(6):525--541, Jun 1994.

Digital Library

[53]

Neal E. Young. On-line file caching. In Proceedings of the Ninth Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 82--86, USA, 1998. Society for Industrial and Applied Mathematics.

Digital Library

[54]

AMD's Zen microarchitecture. https://en.wikichip.org/wiki/amd/microarchitectures/zen. Accessed: 07/15/2020.

Cited By

Han JGosakan KKuszmaul WMubarek IMukherjee NSriram KTagliavini GWest EBender MBhattacharjee AConway AFarach-Colton MGandhi JJohnson RKannan SPorter D(2024)Mosaic Pages: Big TLB Reach With Small PagesIEEE Micro10.1109/MM.2024.340918144:4(52-59)Online publication date: 6-Jun-2024
https://dl.acm.org/doi/10.1109/MM.2024.3409181
Psomadakis SAlverti CKarakostas VKatsakioris CSiakavaras DNikas KGoumas GKoziris N(2024)Elastic Translations: Fast Virtual Memory with Multiple Translation Sizes2024 57th IEEE/ACM International Symposium on Microarchitecture (MICRO)10.1109/MICRO61859.2024.00012(17-35)Online publication date: 2-Nov-2024
https://doi.org/10.1109/MICRO61859.2024.00012
Bender MConway AFarach-Colton MKuszmaul WTagliavini G(2023)Iceberg Hashing: Optimizing Many Hash-Table Criteria at OnceJournal of the ACM10.1145/362581770:6(1-51)Online publication date: 30-Nov-2023
https://dl.acm.org/doi/10.1145/3625817
Show More Cited By

Index Terms

Paging and the Address-Translation Problem
1. Software and its engineering
  1. Software organization and properties
    1. Contextual software domains
      1. Operating systems
        Memory management
        Virtual memory
2. Theory of computation
  1. Design and analysis of algorithms
    1. Online algorithms
      1. Caching and paging algorithms
    2. Streaming, sublinear and near linear time algorithms
      1. Bloom filters and hashing

Recommendations

Mosaic Pages: Big TLB Reach with Small Pages
ASPLOS 2023: Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 3

The TLB is increasingly a bottleneck for big data applications. In most designs, the number of TLB entries are highly constrained by latency requirements, and growing much more slowly than the working sets of applications. Many solutions to this ...
Filtering Translation Bandwidth with Virtual Caching
ASPLOS '18: Proceedings of the Twenty-Third International Conference on Architectural Support for Programming Languages and Operating Systems

Heterogeneous computing with GPUs integrated on the same chip as CPUs is ubiquitous, and to increase programmability many of these systems support virtual address accesses from GPU hardware. However, this entails address translation on every memory ...
Filtering Translation Bandwidth with Virtual Caching
ASPLOS '18

Heterogeneous computing with GPUs integrated on the same chip as CPUs is ubiquitous, and to increase programmability many of these systems support virtual address accesses from GPU hardware. However, this entails address translation on every memory ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

SPAA '21: Proceedings of the 33rd ACM Symposium on Parallelism in Algorithms and Architectures

July 2021

463 pages

ISBN:9781450380706

DOI:10.1145/3409964

General Chair:
Kunal Agrawal
Washington University in St. Louis, USA
,
Program Chair:
Yossi Azar
Tel Aviv University, Israel

Copyright © 2021 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGACT: ACM Special Interest Group on Algorithms and Computation Theory
SIGARCH: ACM Special Interest Group on Computer Architecture
EATCS: European Association for Theoretical Computer Science

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 06 July 2021

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

Hertz Foundation
United States Air Force Research Laboratory
NSF (National Science Foundation)

Conference

SPAA '21

Sponsor:

SPAA '21: 33rd ACM Symposium on Parallelism in Algorithms and Architectures

July 6 - 8, 2021

Virtual Event, USA

Acceptance Rates

Overall Acceptance Rate 447 of 1,461 submissions, 31%

Upcoming Conference

SPAA '25

Sponsor:
sigact
sigact

37th ACM Symposium on Parallelism in Algorithms and Architectures

July 28 - August 1, 2025

Portland , OR , USA

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

4
Total Citations
View Citations
820
Total Downloads

Downloads (Last 12 months)262
Downloads (Last 6 weeks)27

Reflects downloads up to 27 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Han JGosakan KKuszmaul WMubarek IMukherjee NSriram KTagliavini GWest EBender MBhattacharjee AConway AFarach-Colton MGandhi JJohnson RKannan SPorter D(2024)Mosaic Pages: Big TLB Reach With Small PagesIEEE Micro10.1109/MM.2024.340918144:4(52-59)Online publication date: 6-Jun-2024
https://dl.acm.org/doi/10.1109/MM.2024.3409181
Psomadakis SAlverti CKarakostas VKatsakioris CSiakavaras DNikas KGoumas GKoziris N(2024)Elastic Translations: Fast Virtual Memory with Multiple Translation Sizes2024 57th IEEE/ACM International Symposium on Microarchitecture (MICRO)10.1109/MICRO61859.2024.00012(17-35)Online publication date: 2-Nov-2024
https://doi.org/10.1109/MICRO61859.2024.00012
Bender MConway AFarach-Colton MKuszmaul WTagliavini G(2023)Iceberg Hashing: Optimizing Many Hash-Table Criteria at OnceJournal of the ACM10.1145/362581770:6(1-51)Online publication date: 30-Nov-2023
https://dl.acm.org/doi/10.1145/3625817
Lee TMonga SMin CEom YDruschel PKaufmann AMace JFlinn JSeltzer M(2023)MEMTIS: Efficient Memory Tiering with Dynamic Page Classification and Page Size DeterminationProceedings of the 29th Symposium on Operating Systems Principles10.1145/3600006.3613167(17-34)Online publication date: 23-Oct-2023
https://dl.acm.org/doi/10.1145/3600006.3613167

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Figures

Tables

Media

View Table of Conten