research-article

Open access

DAOS: Data Access-aware Operating System

Authors:

Madhuparna Bhowmik,

Alexandru UtaAuthors Info & Claims

HPDC '22: Proceedings of the 31st International Symposium on High-Performance Parallel and Distributed Computing

Pages 4 - 15

https://doi.org/10.1145/3502181.3531466

Published: 27 June 2022 Publication History

Abstract

In data-intensive workloads, data placement and memory management are inherently difficult: the programmer and the operating system have to choose between (combinations of) DRAM and storage, replacement policies, as well as paging sizes. Efficient memory management is based on fine-grained data access patterns driving placement decisions. Current solutions in this space cannot be applied to general workloads and production systems due to either unrealistic assumptions or prohibitive monitoring overheads.

To overcome these issues, we introduce DAOS, an open-source system for general data access-aware memory management. DAOS provides a data access monitoring framework that utilizes practical best-effort trade-offs between overhead and accuracy. The memory management engine of DAOS allows users to implement their access-aware management with no code, just simple configuration schemes. For system administrators, DAOS provides a runtime system that auto-tunes the schemes for user-defined objectives in a finite time. We evaluated DAOS on commercial service production systems as well as state-of-the-art benchmarks. DAOS achieves up to 12% performance improvement and 91% memory saving. DAOS is upstreamed and available in the Linux kernel.

References

[1]

Amazon ec2 i3 instances - high i/o compute instances. https://aws.amazon.com/ec2/instance-types/i3/.

[2]

Amazon ec2 z1d instances - high frequency compute instances. https://aws.amazon.com/ec2/instance-types/z1d/.

[3]

Intel Memory Drive Technology. https://www.intel.com/content/www/us/en/software/intel-memory-drive-technology.html.

[4]

Intel Optane Persistent Memory. https://www.intel.com/content/www/us/en/architecture-and-technology/optane-dc-persistent-memory.html.

[5]

Qemu. https://www.qemu.org/.

[6]

Transparent hugepage support. https://www.kernel.org/doc/Documentation/vm/transhuge.txt.

[7]

zram: Compressed ram based block devices. https://www.kernel.org/doc/Documentation/blockdev/zram.txt.

[8]

zswap. https://www.kernel.org/doc/html/latest/vm/zswap.html.

[9]

Marcos K. Aguilera, Nadav Amit, Irina Calciu, Xavier Deguillard, Jayneel Gandhi, Stanko Novakovic263;, Arun Ramanathan, Pratap Subrahmanyam, Lalith Suresh, Kiran Tati, Rajesh Venkatasubramanian, and Michael Wei. Remote regions: a simple abstraction for remote memory. In 2018 {USENIX} Annual Technical Conference (ATC), pages 775--787, Boston, MA, July 2018. USENIX Association.

[10]

Vlastimil Babka. [rfc 06/13] mm, thp: remove __gfp_noretry from khugepaged and madvised allocations. https://lkml.org/lkml/2016/5/10/105, 2016.

[11]

Sorav Bansal and Dharmendra S Modha. Car: Clock with adaptive replacement. In 3rd {USENIX Conference on File and Storage Technologies (FAST), volume 4, pages 187--200, 2004.

[12]

Jeff Barr. Amazon ec2 m5 instances - general purpose compute instances. https://aws.amazon.com/ec2/instance-types/m5/, 2017.

[13]

Wen-ke Chen, Sanjay Bhansali, Trishul Chilimbi, Xiaofeng Gao, and Weihaw Chuang. Profile-guided proactive garbage collection for locality optimization. In ACM SIGPLAN Notices, volume 41, pages 332--340. ACM, 2006.

[14]

Brian F. Cooper, Adam Silberstein, Erwin Tam, Raghu Ramakrishnan, and Russell Sears. Benchmarking cloud serving systems with ycsb. In Proceedings of the 1st ACM Symposium on Cloud Computing, SoCC '10, page 143--154, New York, NY, USA, 2010. Association for Computing Machinery.

Digital Library

[15]

Jonathan Corbet. A kernel self-testing update. https://lwn.net/Articles/737893/, 2017.

[16]

Jonathan Corbet. Proactive compaction. https://lwn.net/Articles/717656/, 2017.

[17]

Jonathan Corbet. A kernel unit-testing framework. https://lwn.net/Articles/780985/, 2019.

[18]

Jonathan Corbet. Proactively reclaiming idle memory. https://lwn.net/Articles/787611/, 2019.

[19]

Asit Dan and Don Towsley. An approximate analysis of the lru and fifo buffer replacement schemes. SIGMETRICS Perform. Eval. Rev., 18(1):143--152, April 1990.

Digital Library

[20]

Subramanya R Dulloor, Amitabha Roy, Zheguang Zhao, Narayanan Sundaram, Nadathur Satish, Rajesh Sankaran, Jeff Jackson, and Karsten Schwan. Data tiering in heterogeneous memory systems. In Proceedings of the 11th European Conference on Computer Systems (EuroSys), page 15. ACM, 2016.

[21]

S. A. Dyer and Xin He. Least-squares fitting of data by polynomials. IEEE Instrumentation Measurement Magazine, 4(4):46--51, 2001.

[22]

Michael Ferdman, Almutaz Adileh, Onur Kocberber, Stavros Volos, Mohammad Alisafaee, Djordje Jevdjic, Cansu Kaynak, Adrian Daniel Popescu, Anastasia Ailamaki, and Babak Falsafi. Clearing the clouds. In Proceedings of the 17th International Conference on Architectural Support for Programming Languages and Operating Systems, volume 47 of ASPLOS, page 37, New York, New York, USA, 2012. ACM Press.

Digital Library

[23]

Mel Gorman. Page Frame Reclamation. Prentice Hall Upper Saddle River, 2004. https://www.kernel.org/doc/gorman/html/understand/understand013.html.

[24]

Juncheng Gu, Youngmoon Lee, Yiwen Zhang, Mosharaf Chowdhury, and Kang G. Shin. Efficient memory disaggregation with infiniswap. In 14th USENIX Symposium on Networked Systems Design and Implementation (NSDI 17), pages 649--667, Boston, MA, March 2017. USENIX Association.

[25]

Philip George Guest and Philip George Guest. Numerical methods of curve fitting. Cambridge University Press, 2012.

[26]

Nitin Gupta. Proactive compaction for the kernel. https://lwn.net/Articles/817905/, 2020.

[27]

Vishal Gupta, Min Lee, and Karsten Schwan. Heterovisor: Exploiting resource heterogeneity to enhance the elasticity of cloud platforms. In Proceedings of the 11th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments, VEE '15, page 79--92, New York, NY, USA, 2015. Association for Computing Machinery.

[28]

Keonsoo Ha and Jihong Kim. A program context-aware data separation technique for reducing garbage collection overhead in nand flash memory. Proc. 7th IEEE SNAPI, 2011.

[29]

Dave Hansen. Memory hotplug. https://lwn.net/Articles/124226/, 2005.

[30]

Dave Hansen. Allow persistent memory to be used like normal ram. https://lwn.net/Articles/776921/, 2019.

[31]

T. Heo, Y. Wang, W. Cui, J. Huh, and L. Zhang. Adaptive page migration policy with huge pages in tiered memory systems. IEEE Transactions on Computers, pages 1--1, 2020.

[32]

Cua lin Iorgulescu, Florin Dinu, Aunn Raza, Wajih Ul Hassan, and Willy Zwaenepoel. Don't cry over spilled records: Memory elasticity of data-parallel applications and its application to cluster scheduling. In 2017 {USENIX} Annual Technical Conference ({USENIX}{ATC} 17), pages 97--109, 2017.

[33]

Joseph Izraelevitz, Jian Yang, Lu Zhang, Juno Kim, Xiao Liu, Amirsaman Memaripour, Yun Joon Soh, Zixuan Wang, Yi Xu, Subramanya R. Dulloor, Jishen Zhao, and Steven Swanson. Basic performance measurements of the intel optane DC persistent memory module. CoRR, abs/1903.05714, 2019.

[34]

Aamer Jaleel. Memory characterization of workloads using instrumentation-driven simulation. http://www.jaleels.org/ajaleel/publications/SPECanalysis.pdf, 2007.

[35]

George Karakostas and D Serpanos. Practical lfu implementation for web caching. Technical Report TR-622-00, 2000.

[36]

H Jerome Keisler. Foundations of infinitesimal calculus, volume 20. Prindle, Weber & Schmidt Boston, 1976.

[37]

Changsu Kim, Juhyun Kim, Juwon Kang, Jae W Lee, and Hanjun Kim. Context-aware memory profiling for speculative parallelism. In 2017 IEEE 24th International Conference on High Performance Computing (HiPC), pages 328--337. IEEE, 2017.

[38]

Taejin Kim, Duwon Hong, Sangwook Shane Hahn, Myoungjun Chun, Sungjin Lee, Jooyoung Hwang, Jongyoul Lee, and Jihong Kim. Fully automatic stream management for multi-streamed ssds using program contexts. In 17th $USENIX$ Conference on File and Storage Technologies (FAST), pages 295--308, 2019.

[39]

Sohyang Ko, Seonsoo Jun, Yeonseung Ryu, Ohhoon Kwon, and Kern Koh. A new linux swap system for flash memory storage devices. In International Conference on Computational Sciences and Its Applications, pages 151--156. IEEE, 2008.

[40]

Youngjin Kwon, Hangchen Yu, Simon Peter, Christopher J Rossbach, and Emmett Witchel. Coordinated and efficient huge page management with ingens. In 12th {USENIX} Symposium on Operating Systems Design and Implementation (OSDI), pages 705--721, 2016.

[41]

Andres Lagar-Cavilla, Junwhan Ahn, Suleiman Souhlal, Neha Agarwal, Radoslaw Burny, Shakeel Butt, Jichuan Chang, Ashwin Chaugule, Nan Deng, Junaid Shahid, Greg Thelen, Kamil Adam Yurtsever, Yu Zhao, and Parthasarathy Ranganathan. Software-defined far memory in warehouse-scale computers. In Proceedings of the 24th International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS, pages 317--330, New York, NY, USA, 2019. ACM.

Digital Library

[42]

EC Levy. Complex-curve fitting. IRE transactions on automatic control, (1):37--43, 1959.

[43]

Zhan-sheng Li, Da-wei Liu, and Hui-juan Bi. Crfp: a novel adaptive replacement policy combined the lru and lfu policies. In 2008 IEEE 8th International Conference on Computer and Information Technology Workshops, pages 72--79. IEEE, 2008.

Digital Library

[44]

Youyou Lu, Jiwu Shu, Weimin Zheng, et al. Extending the lifetime of flash-based storage through reducing write amplification from file systems. In 11th {USENIX} Conference on File and Storage Technologies (FAST), volume 13, 2013.

Digital Library

[45]

Chi-Keung Luk, Robert Cohn, Robert Muth, Harish Patil, Artur Klauser, Geoff Lowney, Steven Wallace, Vijay Janapa Reddi, and Kim Hazelwood. Pin: building customized program analysis tools with dynamic instrumentation. In Acm sigplan notices, volume 40, pages 190--200. ACM, 2005.

Digital Library

[46]

Jaydeep Marathe and Frank Mueller. Hardware profile-guided automatic page placement for ccNUMA systems. In Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programming, pages 90--99. ACM, 2006.

Digital Library

[47]

Nimrod Megiddo and Dharmendra S Modha. Arc: A self-tuning, low overhead replacement cache. In 2nd {USENIX} Conference on File and Storage Technologies (FAST), volume 3, pages 115--130, 2003.

Digital Library

[48]

Gaku Nakagawa and Shuichi Oikawa. Data placement based on data semantics for nvdimm/dram hybrid memory architecture. CLOUD COMPUTING 2017, page 109, 2017.

[49]

Khang T Nguyen. Intel's cache monitoring technology: Use models and data. https://software.intel.com/content/www/us/en/develop/blogs/intels-cache-monitoring-technology-use-models-and-data.html, 2016.

[50]

Vlad Nitu, Boris Teabe, Alain Tchana, Canturk Isci, and Daniel Hagimont. Welcome to zombieland: practical and energy-efficient memory disaggregation in a datacenter. In Proceedings of the 13th European Conference on Computer Systems (EuroSys), page 16. ACM, 2018.

[51]

Elizabeth J. O'Neil, Patrick E. O'Neil, and Gerhard Weikum. The lru-k page replacement algorithm for database disk buffering. In Proceedings of the 1993 ACM SIGMOD International Conference on Management of Data, SIGMOD '93, page 297--306, New York, NY, USA, 1993. Association for Computing Machinery.

Digital Library

[52]

Young Paik. Developing extremely low-latency nvme ssds. Flash Memory Summit, 2017.

[53]

Ashish Panwar, Sorav Bansal, and K. Gopinath. Hawkeye: Efficient fine-grained os support for huge pages. In Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS, page 347--360, New York, NY, USA, 2019. Association for Computing Machinery.

Digital Library

[54]

SeongJae Park, Yunjae Lee, Moonsub Kim, and Heon Y. Yeom. Automating context-based access pattern hint injection for system performance and swap storage durability. In 11th USENIX Workshop on Hot Topics in Storage and File Systems (HotStorage 19), Renton, WA, July 2019. USENIX Association.

Digital Library

[55]

SeongJae Park, Yunjae Lee, and Heon Y. Yeom. Profiling dynamic data access patterns with controlled overhead and quality. In Proceedings of the 20th International Middleware Conference Industrial Track, Middleware '19, page 1--7, New York, NY, USA, 2019. Association for Computing Machinery.

Digital Library

[56]

Karl Pettis and Robert C Hansen. Profile guided code positioning. In ACM SIGPLAN Notices, volume 25, pages 16--27. ACM, 1990.

Digital Library

[57]

Mike Rapoport. Idle page tracking. https://www.kernel.org/doc/html/latest/admin-guide/mm/idle_page_tracking.html, 2018.

[58]

Janis Schoetterl-Glausch. Intel page modification logging for lightweight continuous checkpointing. Bachelor thesis, Operating Systems Group, Karlsruhe Institute of Technology (KIT), Germany, October31, 2016.

[59]

Harald Servat, Antonio J Pe na, Germán Llort, Estanislao Mercadal, Hans-Christian Hoppe, and Jesús Labarta. Automating the application data placement in hybrid memory systems. In 2017 IEEE International Conference on Cluster Computing (CLUSTER), pages 126--136. IEEE, 2017.

[60]

Jakob Unterwurzacher. earlyoom(1). https://manpages.debian.org/testing/earlyoom/earlyoom.1.en.html, 2020.

[61]

Carl Waldspurger, Trausti Saemundsson, Irfan Ahmad, and Nohhyun Park. Cache modeling and optimization using miniature simulations. In 2017 {USENIX} Annual Technical Conference (ATC), pages 487--498, Santa Clara, CA, 2017. USENIX Association.

[62]

Haojie Wang, Jidong Zhai, Xiongchao Tang, Bowen Yu, Xiaosong Ma, and Wenguang Chen. Spindle: Informed memory access monitoring. In 2018 {USENIX} Annual Technical Conference (ATC), pages 561--574, Boston, MA, 2018. USENIX Association.

[63]

Yijian Wang and David Kaeli. Profile-guided I/O partitioning. In Proceedings of the 17th annual international conference on Supercomputing, pages 252--260. ACM, 2003.

Digital Library

[64]

Johannes Weiner. mm: thrash detection-based file cache sizing. https://lwn.net/Articles/552327/, 2013.

[65]

Dan Williams. Filesystem-dax. Oakland, CA, February 2018. USENIX Association.

[66]

Juncheng Yang, Yao Yue, and K. V. Rashmi. A large scale analysis of hundreds of in-memory cache clusters at twitter. In 14th USENIX Symposium on Operating Systems Design and Implementation (OSDI 20), pages 191--208. USENIX Association, November 2020.

Digital Library

[67]

Xusheng Zhan, Yungang Bao, Christian Bienia, and Kai Li. Parsec3.0: A multicore benchmark suite with network stacks and splash-2x. ACM SIGARCH Computer Architecture News, 44(5):1--16, 2017.

Digital Library

[68]

Weixi Zhu, Alan L. Cox, and Scott Rixner. A comprehensive analysis of superpage management mechanisms and policies. In 2020 {USENIX} Annual Technical Conference (ATC), pages 829--842. USENIX Association, July 2020.

Cited By

Chang JDoh WMoon YLee EAhn JMencagli GDazzi PLowenthal DBadia R(2024)IDT: Intelligent Data Placement for Multi-tiered Main Memory with Reinforcement LearningProceedings of the 33rd International Symposium on High-Performance Parallel and Distributed Computing10.1145/3625549.3658659(69-82)Online publication date: 3-Jun-2024
https://dl.acm.org/doi/10.1145/3625549.3658659
Weng WUta ARellermeyer J(2024)Brug: An Adaptive Memory (Re-)Allocator2024 IEEE 24th International Symposium on Cluster, Cloud and Internet Computing (CCGrid)10.1109/CCGrid59990.2024.00017(67-76)Online publication date: 6-May-2024
https://doi.org/10.1109/CCGrid59990.2024.00017
Lee TMonga SMin CEom YDruschel PKaufmann AMace JFlinn JSeltzer M(2023)MEMTIS: Efficient Memory Tiering with Dynamic Page Classification and Page Size DeterminationProceedings of the 29th Symposium on Operating Systems Principles10.1145/3600006.3613167(17-34)Online publication date: 23-Oct-2023
https://dl.acm.org/doi/10.1145/3600006.3613167

Index Terms

DAOS: Data Access-aware Operating System
1. Information systems
  1. Information storage systems
    1. Storage management
      1. Information lifecycle management
2. Software and its engineering
  1. Software organization and properties
    1. Contextual software domains
      1. Operating systems
        Memory management
        Allocation / deallocation strategies
        Secondary storage

Recommendations

Towards Write-Activity-Aware Page Table Management for Non-volatile Main Memories

Non-volatile memories such as phase change memory (PCM) and memristor are being actively studied as an alternative to DRAM-based main memory in embedded systems because of their properties, which include low power consumption and high density. Though ...
File-Based Memory Management for Non-volatile Main Memory
COMPSAC '13: Proceedings of the 2013 IEEE 37th Annual Computer Software and Applications Conference

Active research and development efforts on byte addressable non-volatile (NV) memory technologies, such as STT-RAM, PCM, and ReRAM, have been conducted in recent years. Because they are byte addressable, they can be used as main memory by directly ...
Multi-level queue NVM/DRAM hybrid memory management with language runtime support
RACS '15: Proceedings of the 2015 Conference on research in adaptive and convergent systems

Non-volatile memory devices (NVM) devices, such as PCM, STT-MRAM, and ReRAM, enable the integration of secondary storage into main memory. This integration reduces I/O access to slow block devices; however, it is currently unrealistic to construct a ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

HPDC '22: Proceedings of the 31st International Symposium on High-Performance Parallel and Distributed Computing

June 2022

314 pages

ISBN:9781450391993

DOI:10.1145/3502181

General Chairs:
Jon Weissman
University of Minnesota, MN, USA
,
Abhishek Chandra
University of Minnesota, MN, USA
,
Program Chairs:
Ada Gavrilovska
Georgia Institute of Technology, GA, USA
,
Devesh Tiwari
Northeastern University, MA, USA

Copyright © 2022 Owner/Author.

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives International 4.0 License.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 27 June 2022

Check for updates

Author Tags

Qualifiers

Research-article

Conference

HPDC '22

Sponsor:

HPDC '22: The 31st International Symposium on High-Performance Parallel and Distributed Computing

June 27 - July 1, 2022

MN, Minneapolis, USA

Acceptance Rates

Overall Acceptance Rate 166 of 966 submissions, 17%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

2
Total Citations
View Citations
678
Total Downloads

Downloads (Last 12 months)272
Downloads (Last 6 weeks)17

Reflects downloads up to 10 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Chang JDoh WMoon YLee EAhn JMencagli GDazzi PLowenthal DBadia R(2024)IDT: Intelligent Data Placement for Multi-tiered Main Memory with Reinforcement LearningProceedings of the 33rd International Symposium on High-Performance Parallel and Distributed Computing10.1145/3625549.3658659(69-82)Online publication date: 3-Jun-2024
https://dl.acm.org/doi/10.1145/3625549.3658659
Weng WUta ARellermeyer J(2024)Brug: An Adaptive Memory (Re-)Allocator2024 IEEE 24th International Symposium on Cluster, Cloud and Internet Computing (CCGrid)10.1109/CCGrid59990.2024.00017(67-76)Online publication date: 6-May-2024
https://doi.org/10.1109/CCGrid59990.2024.00017
Lee TMonga SMin CEom YDruschel PKaufmann AMace JFlinn JSeltzer M(2023)MEMTIS: Efficient Memory Tiering with Dynamic Page Classification and Page Size DeterminationProceedings of the 29th Symposium on Operating Systems Principles10.1145/3600006.3613167(17-34)Online publication date: 23-Oct-2023
https://dl.acm.org/doi/10.1145/3600006.3613167

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents