Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3502181.3531466acmconferencesArticle/Chapter ViewAbstractPublication PageshpdcConference Proceedingsconference-collections
research-article
Open access

DAOS: Data Access-aware Operating System

Published: 27 June 2022 Publication History

Abstract

In data-intensive workloads, data placement and memory management are inherently difficult: the programmer and the operating system have to choose between (combinations of) DRAM and storage, replacement policies, as well as paging sizes. Efficient memory management is based on fine-grained data access patterns driving placement decisions. Current solutions in this space cannot be applied to general workloads and production systems due to either unrealistic assumptions or prohibitive monitoring overheads.
To overcome these issues, we introduce DAOS, an open-source system for general data access-aware memory management. DAOS provides a data access monitoring framework that utilizes practical best-effort trade-offs between overhead and accuracy. The memory management engine of DAOS allows users to implement their access-aware management with no code, just simple configuration schemes. For system administrators, DAOS provides a runtime system that auto-tunes the schemes for user-defined objectives in a finite time. We evaluated DAOS on commercial service production systems as well as state-of-the-art benchmarks. DAOS achieves up to 12% performance improvement and 91% memory saving. DAOS is upstreamed and available in the Linux kernel.

References

[1]
Amazon ec2 i3 instances - high i/o compute instances. https://aws.amazon.com/ec2/instance-types/i3/.
[2]
Amazon ec2 z1d instances - high frequency compute instances. https://aws.amazon.com/ec2/instance-types/z1d/.
[3]
Intel Memory Drive Technology. https://www.intel.com/content/www/us/en/software/intel-memory-drive-technology.html.
[4]
Intel Optane Persistent Memory. https://www.intel.com/content/www/us/en/architecture-and-technology/optane-dc-persistent-memory.html.
[5]
Qemu. https://www.qemu.org/.
[6]
Transparent hugepage support. https://www.kernel.org/doc/Documentation/vm/transhuge.txt.
[7]
zram: Compressed ram based block devices. https://www.kernel.org/doc/Documentation/blockdev/zram.txt.
[8]
zswap. https://www.kernel.org/doc/html/latest/vm/zswap.html.
[9]
Marcos K. Aguilera, Nadav Amit, Irina Calciu, Xavier Deguillard, Jayneel Gandhi, Stanko Novakovic263;, Arun Ramanathan, Pratap Subrahmanyam, Lalith Suresh, Kiran Tati, Rajesh Venkatasubramanian, and Michael Wei. Remote regions: a simple abstraction for remote memory. In 2018 {USENIX} Annual Technical Conference (ATC), pages 775--787, Boston, MA, July 2018. USENIX Association.
[10]
Vlastimil Babka. [rfc 06/13] mm, thp: remove __gfp_noretry from khugepaged and madvised allocations. https://lkml.org/lkml/2016/5/10/105, 2016.
[11]
Sorav Bansal and Dharmendra S Modha. Car: Clock with adaptive replacement. In 3rd {USENIX Conference on File and Storage Technologies (FAST), volume 4, pages 187--200, 2004.
[12]
Jeff Barr. Amazon ec2 m5 instances - general purpose compute instances. https://aws.amazon.com/ec2/instance-types/m5/, 2017.
[13]
Wen-ke Chen, Sanjay Bhansali, Trishul Chilimbi, Xiaofeng Gao, and Weihaw Chuang. Profile-guided proactive garbage collection for locality optimization. In ACM SIGPLAN Notices, volume 41, pages 332--340. ACM, 2006.
[14]
Brian F. Cooper, Adam Silberstein, Erwin Tam, Raghu Ramakrishnan, and Russell Sears. Benchmarking cloud serving systems with ycsb. In Proceedings of the 1st ACM Symposium on Cloud Computing, SoCC '10, page 143--154, New York, NY, USA, 2010. Association for Computing Machinery.
[15]
Jonathan Corbet. A kernel self-testing update. https://lwn.net/Articles/737893/, 2017.
[16]
Jonathan Corbet. Proactive compaction. https://lwn.net/Articles/717656/, 2017.
[17]
Jonathan Corbet. A kernel unit-testing framework. https://lwn.net/Articles/780985/, 2019.
[18]
Jonathan Corbet. Proactively reclaiming idle memory. https://lwn.net/Articles/787611/, 2019.
[19]
Asit Dan and Don Towsley. An approximate analysis of the lru and fifo buffer replacement schemes. SIGMETRICS Perform. Eval. Rev., 18(1):143--152, April 1990.
[20]
Subramanya R Dulloor, Amitabha Roy, Zheguang Zhao, Narayanan Sundaram, Nadathur Satish, Rajesh Sankaran, Jeff Jackson, and Karsten Schwan. Data tiering in heterogeneous memory systems. In Proceedings of the 11th European Conference on Computer Systems (EuroSys), page 15. ACM, 2016.
[21]
S. A. Dyer and Xin He. Least-squares fitting of data by polynomials. IEEE Instrumentation Measurement Magazine, 4(4):46--51, 2001.
[22]
Michael Ferdman, Almutaz Adileh, Onur Kocberber, Stavros Volos, Mohammad Alisafaee, Djordje Jevdjic, Cansu Kaynak, Adrian Daniel Popescu, Anastasia Ailamaki, and Babak Falsafi. Clearing the clouds. In Proceedings of the 17th International Conference on Architectural Support for Programming Languages and Operating Systems, volume 47 of ASPLOS, page 37, New York, New York, USA, 2012. ACM Press.
[23]
Mel Gorman. Page Frame Reclamation. Prentice Hall Upper Saddle River, 2004. https://www.kernel.org/doc/gorman/html/understand/understand013.html.
[24]
Juncheng Gu, Youngmoon Lee, Yiwen Zhang, Mosharaf Chowdhury, and Kang G. Shin. Efficient memory disaggregation with infiniswap. In 14th USENIX Symposium on Networked Systems Design and Implementation (NSDI 17), pages 649--667, Boston, MA, March 2017. USENIX Association.
[25]
Philip George Guest and Philip George Guest. Numerical methods of curve fitting. Cambridge University Press, 2012.
[26]
Nitin Gupta. Proactive compaction for the kernel. https://lwn.net/Articles/817905/, 2020.
[27]
Vishal Gupta, Min Lee, and Karsten Schwan. Heterovisor: Exploiting resource heterogeneity to enhance the elasticity of cloud platforms. In Proceedings of the 11th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments, VEE '15, page 79--92, New York, NY, USA, 2015. Association for Computing Machinery.
[28]
Keonsoo Ha and Jihong Kim. A program context-aware data separation technique for reducing garbage collection overhead in nand flash memory. Proc. 7th IEEE SNAPI, 2011.
[29]
Dave Hansen. Memory hotplug. https://lwn.net/Articles/124226/, 2005.
[30]
Dave Hansen. Allow persistent memory to be used like normal ram. https://lwn.net/Articles/776921/, 2019.
[31]
T. Heo, Y. Wang, W. Cui, J. Huh, and L. Zhang. Adaptive page migration policy with huge pages in tiered memory systems. IEEE Transactions on Computers, pages 1--1, 2020.
[32]
Cua lin Iorgulescu, Florin Dinu, Aunn Raza, Wajih Ul Hassan, and Willy Zwaenepoel. Don't cry over spilled records: Memory elasticity of data-parallel applications and its application to cluster scheduling. In 2017 {USENIX} Annual Technical Conference ({USENIX}{ATC} 17), pages 97--109, 2017.
[33]
Joseph Izraelevitz, Jian Yang, Lu Zhang, Juno Kim, Xiao Liu, Amirsaman Memaripour, Yun Joon Soh, Zixuan Wang, Yi Xu, Subramanya R. Dulloor, Jishen Zhao, and Steven Swanson. Basic performance measurements of the intel optane DC persistent memory module. CoRR, abs/1903.05714, 2019.
[34]
Aamer Jaleel. Memory characterization of workloads using instrumentation-driven simulation. http://www.jaleels.org/ajaleel/publications/SPECanalysis.pdf, 2007.
[35]
George Karakostas and D Serpanos. Practical lfu implementation for web caching. Technical Report TR-622-00, 2000.
[36]
H Jerome Keisler. Foundations of infinitesimal calculus, volume 20. Prindle, Weber & Schmidt Boston, 1976.
[37]
Changsu Kim, Juhyun Kim, Juwon Kang, Jae W Lee, and Hanjun Kim. Context-aware memory profiling for speculative parallelism. In 2017 IEEE 24th International Conference on High Performance Computing (HiPC), pages 328--337. IEEE, 2017.
[38]
Taejin Kim, Duwon Hong, Sangwook Shane Hahn, Myoungjun Chun, Sungjin Lee, Jooyoung Hwang, Jongyoul Lee, and Jihong Kim. Fully automatic stream management for multi-streamed ssds using program contexts. In 17th $USENIX$ Conference on File and Storage Technologies (FAST), pages 295--308, 2019.
[39]
Sohyang Ko, Seonsoo Jun, Yeonseung Ryu, Ohhoon Kwon, and Kern Koh. A new linux swap system for flash memory storage devices. In International Conference on Computational Sciences and Its Applications, pages 151--156. IEEE, 2008.
[40]
Youngjin Kwon, Hangchen Yu, Simon Peter, Christopher J Rossbach, and Emmett Witchel. Coordinated and efficient huge page management with ingens. In 12th {USENIX} Symposium on Operating Systems Design and Implementation (OSDI), pages 705--721, 2016.
[41]
Andres Lagar-Cavilla, Junwhan Ahn, Suleiman Souhlal, Neha Agarwal, Radoslaw Burny, Shakeel Butt, Jichuan Chang, Ashwin Chaugule, Nan Deng, Junaid Shahid, Greg Thelen, Kamil Adam Yurtsever, Yu Zhao, and Parthasarathy Ranganathan. Software-defined far memory in warehouse-scale computers. In Proceedings of the 24th International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS, pages 317--330, New York, NY, USA, 2019. ACM.
[42]
EC Levy. Complex-curve fitting. IRE transactions on automatic control, (1):37--43, 1959.
[43]
Zhan-sheng Li, Da-wei Liu, and Hui-juan Bi. Crfp: a novel adaptive replacement policy combined the lru and lfu policies. In 2008 IEEE 8th International Conference on Computer and Information Technology Workshops, pages 72--79. IEEE, 2008.
[44]
Youyou Lu, Jiwu Shu, Weimin Zheng, et al. Extending the lifetime of flash-based storage through reducing write amplification from file systems. In 11th {USENIX} Conference on File and Storage Technologies (FAST), volume 13, 2013.
[45]
Chi-Keung Luk, Robert Cohn, Robert Muth, Harish Patil, Artur Klauser, Geoff Lowney, Steven Wallace, Vijay Janapa Reddi, and Kim Hazelwood. Pin: building customized program analysis tools with dynamic instrumentation. In Acm sigplan notices, volume 40, pages 190--200. ACM, 2005.
[46]
Jaydeep Marathe and Frank Mueller. Hardware profile-guided automatic page placement for ccNUMA systems. In Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programming, pages 90--99. ACM, 2006.
[47]
Nimrod Megiddo and Dharmendra S Modha. Arc: A self-tuning, low overhead replacement cache. In 2nd {USENIX} Conference on File and Storage Technologies (FAST), volume 3, pages 115--130, 2003.
[48]
Gaku Nakagawa and Shuichi Oikawa. Data placement based on data semantics for nvdimm/dram hybrid memory architecture. CLOUD COMPUTING 2017, page 109, 2017.
[49]
Khang T Nguyen. Intel's cache monitoring technology: Use models and data. https://software.intel.com/content/www/us/en/develop/blogs/intels-cache-monitoring-technology-use-models-and-data.html, 2016.
[50]
Vlad Nitu, Boris Teabe, Alain Tchana, Canturk Isci, and Daniel Hagimont. Welcome to zombieland: practical and energy-efficient memory disaggregation in a datacenter. In Proceedings of the 13th European Conference on Computer Systems (EuroSys), page 16. ACM, 2018.
[51]
Elizabeth J. O'Neil, Patrick E. O'Neil, and Gerhard Weikum. The lru-k page replacement algorithm for database disk buffering. In Proceedings of the 1993 ACM SIGMOD International Conference on Management of Data, SIGMOD '93, page 297--306, New York, NY, USA, 1993. Association for Computing Machinery.
[52]
Young Paik. Developing extremely low-latency nvme ssds. Flash Memory Summit, 2017.
[53]
Ashish Panwar, Sorav Bansal, and K. Gopinath. Hawkeye: Efficient fine-grained os support for huge pages. In Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS, page 347--360, New York, NY, USA, 2019. Association for Computing Machinery.
[54]
SeongJae Park, Yunjae Lee, Moonsub Kim, and Heon Y. Yeom. Automating context-based access pattern hint injection for system performance and swap storage durability. In 11th USENIX Workshop on Hot Topics in Storage and File Systems (HotStorage 19), Renton, WA, July 2019. USENIX Association.
[55]
SeongJae Park, Yunjae Lee, and Heon Y. Yeom. Profiling dynamic data access patterns with controlled overhead and quality. In Proceedings of the 20th International Middleware Conference Industrial Track, Middleware '19, page 1--7, New York, NY, USA, 2019. Association for Computing Machinery.
[56]
Karl Pettis and Robert C Hansen. Profile guided code positioning. In ACM SIGPLAN Notices, volume 25, pages 16--27. ACM, 1990.
[57]
Mike Rapoport. Idle page tracking. https://www.kernel.org/doc/html/latest/admin-guide/mm/idle_page_tracking.html, 2018.
[58]
Janis Schoetterl-Glausch. Intel page modification logging for lightweight continuous checkpointing. Bachelor thesis, Operating Systems Group, Karlsruhe Institute of Technology (KIT), Germany, October31, 2016.
[59]
Harald Servat, Antonio J Pe na, Germán Llort, Estanislao Mercadal, Hans-Christian Hoppe, and Jesús Labarta. Automating the application data placement in hybrid memory systems. In 2017 IEEE International Conference on Cluster Computing (CLUSTER), pages 126--136. IEEE, 2017.
[60]
Jakob Unterwurzacher. earlyoom(1). https://manpages.debian.org/testing/earlyoom/earlyoom.1.en.html, 2020.
[61]
Carl Waldspurger, Trausti Saemundsson, Irfan Ahmad, and Nohhyun Park. Cache modeling and optimization using miniature simulations. In 2017 {USENIX} Annual Technical Conference (ATC), pages 487--498, Santa Clara, CA, 2017. USENIX Association.
[62]
Haojie Wang, Jidong Zhai, Xiongchao Tang, Bowen Yu, Xiaosong Ma, and Wenguang Chen. Spindle: Informed memory access monitoring. In 2018 {USENIX} Annual Technical Conference (ATC), pages 561--574, Boston, MA, 2018. USENIX Association.
[63]
Yijian Wang and David Kaeli. Profile-guided I/O partitioning. In Proceedings of the 17th annual international conference on Supercomputing, pages 252--260. ACM, 2003.
[64]
Johannes Weiner. mm: thrash detection-based file cache sizing. https://lwn.net/Articles/552327/, 2013.
[65]
Dan Williams. Filesystem-dax. Oakland, CA, February 2018. USENIX Association.
[66]
Juncheng Yang, Yao Yue, and K. V. Rashmi. A large scale analysis of hundreds of in-memory cache clusters at twitter. In 14th USENIX Symposium on Operating Systems Design and Implementation (OSDI 20), pages 191--208. USENIX Association, November 2020.
[67]
Xusheng Zhan, Yungang Bao, Christian Bienia, and Kai Li. Parsec3.0: A multicore benchmark suite with network stacks and splash-2x. ACM SIGARCH Computer Architecture News, 44(5):1--16, 2017.
[68]
Weixi Zhu, Alan L. Cox, and Scott Rixner. A comprehensive analysis of superpage management mechanisms and policies. In 2020 {USENIX} Annual Technical Conference (ATC), pages 829--842. USENIX Association, July 2020.

Cited By

View all
  • (2024)IDT: Intelligent Data Placement for Multi-tiered Main Memory with Reinforcement LearningProceedings of the 33rd International Symposium on High-Performance Parallel and Distributed Computing10.1145/3625549.3658659(69-82)Online publication date: 3-Jun-2024
  • (2024)Brug: An Adaptive Memory (Re-)Allocator2024 IEEE 24th International Symposium on Cluster, Cloud and Internet Computing (CCGrid)10.1109/CCGrid59990.2024.00017(67-76)Online publication date: 6-May-2024
  • (2023)MEMTIS: Efficient Memory Tiering with Dynamic Page Classification and Page Size DeterminationProceedings of the 29th Symposium on Operating Systems Principles10.1145/3600006.3613167(17-34)Online publication date: 23-Oct-2023

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
HPDC '22: Proceedings of the 31st International Symposium on High-Performance Parallel and Distributed Computing
June 2022
314 pages
ISBN:9781450391993
DOI:10.1145/3502181
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives International 4.0 License.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 27 June 2022

Check for updates

Author Tags

  1. memory management
  2. operating systems

Qualifiers

  • Research-article

Conference

HPDC '22

Acceptance Rates

Overall Acceptance Rate 166 of 966 submissions, 17%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)272
  • Downloads (Last 6 weeks)17
Reflects downloads up to 10 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)IDT: Intelligent Data Placement for Multi-tiered Main Memory with Reinforcement LearningProceedings of the 33rd International Symposium on High-Performance Parallel and Distributed Computing10.1145/3625549.3658659(69-82)Online publication date: 3-Jun-2024
  • (2024)Brug: An Adaptive Memory (Re-)Allocator2024 IEEE 24th International Symposium on Cluster, Cloud and Internet Computing (CCGrid)10.1109/CCGrid59990.2024.00017(67-76)Online publication date: 6-May-2024
  • (2023)MEMTIS: Efficient Memory Tiering with Dynamic Page Classification and Page Size DeterminationProceedings of the 29th Symposium on Operating Systems Principles10.1145/3600006.3613167(17-34)Online publication date: 23-Oct-2023

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media