Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3447786.3456242acmconferencesArticle/Chapter ViewAbstractPublication PageseurosysConference Proceedingsconference-collections
research-article

Memory-mapped I/O on steroids

Published: 21 April 2021 Publication History

Editorial Notes

The editors have requested minor, non-substantive changes to the VoR and, in accordance with ACM policies, a Corrected VoR was published on April 28, 2021. For reference purposes the VoR may still be accessed via the Supplemental Material section on this page.

Abstract

With current technology trends for fast storage devices, the host-level I/O path is emerging as a main bottleneck for modern, data-intensive servers and applications. The need to improve I/O performance requires customizing various aspects of the I/O path, including the page cache and the method to access the storage devices.
In this paper, we present Aquila, a library OS that allows applications to reduce I/O overhead by customizing the memory-mapped I/O (mmio) path for files or storage devices. Compared to Linux mmap, Aquila (a) offers full mmio compatibility and protection to minimize application modifications, (b) allows applications to customize the DRAM I/O cache, its policies, and access to storage devices, and (c) significantly reduces I/O overhead. Aquila achieves its mmio compatibility, flexibility, and performance by placing the application in a privileged domain, non-root ring 0.
We show the benefits of Aquila in two cases: (a) Using mmio in key-value stores to reduce I/O overhead and (b) utilizing mmio in graph processing applications to extend the memory heap over fast storage devices. Aquila requires 2.58× fewer CPU cycles for cache management in RocksDB, compared to user-space caching and read/write system calls and results in 40% improvement in request throughput. Finally, we use Ligra, a graph processing framework, to show the efficiency of Aquila in extending the memory heap over fast storage devices. In this case, Aquila results in up to 4.14× lower execution time compared to Linux mmap.

Supplementary Material

3456242-vor (3456242-vor.pdf)
Version of Record for "Memory-mapped I/O on steroids" by Papagiannis et al., Proceedings of the Sixteenth European Conference on Computer Systems (EuroSys '21).

References

[1]
Alexandra Fedorova. [n. d.]. Getting storage engines ready for fast storage devices. https://engineering.mongodb.com/post/getting-storage-engines-ready-for-fast-storage-devices. Accessed: March 18, 2021.
[2]
Nadav Amit. 2017. Optimizing the TLB Shootdown Algorithm with Page Access Tracking. In 2017 USENIX Annual Technical Conference (USENIX ATC 17). USENIX Association, Santa Clara, CA, 27--39. https://www.usenix.org/conference/atc17/technical-sessions/presentation/amit
[3]
Nadav Amit, Amy Tai, and Michael Wei. 2020. Don't Shoot down TLB Shootdowns!. In Proceedings of the Fifteenth European Conference on Computer Systems (EuroSys '20). Association for Computing Machinery, New York, NY, USA, Article 35, 14 pages.
[4]
Adam Belay, Andrea Bittau, Ali Mashtizadeh, David Terei, David Mazières, and Christos Kozyrakis. 2012. Dune: Safe User-level Access to Privileged CPU Features. In Presented as part of the 10th USENIX Symposium on Operating Systems Design and Implementation (OSDI 12). USENIX, Hollywood, CA, 335--348. https://www.usenix.org/conference/osdi12/technical-sessions/presentation/belay
[5]
Adam Belay, George Prekas, Ana Klimovic, Samuel Grossman, Christos Kozyrakis, and Edouard Bugnion. 2014. IX: A Protected Dataplane Operating System for High Throughput and Low Latency. In 11th USENIX Symposium on Operating Systems Design and Implementation (OSDI 14). USENIX Association, Broomfield, CO, 49--65. https://www.usenix.org/conference/osdi14/technical-sessions/presentation/belay
[6]
Muli Ben-Yehuda, Michael D. Day, Zvi Dubitzky, Michael Factor, Nadav Har'El, Abel Gordon, Anthony Liguori, Orit Wasserman, and Ben-Ami Yassour. 2010. The Turtles Project: Design and Implementation of Nested Virtualization. In Proceedings of the 9th USENIX Conference on Operating Systems Design and Implementation (OSDI'10). USENIX Association, USA, 423--436.
[7]
Peter A. Boncz, Martin L. Kersten, and Stefan Manegold. 2008. Breaking the Memory Wall in MonetDB. Commun. ACM 51, 12 (Dec. 2008), 77--85.
[8]
Silas Boyd-Wickizer, Austin T. Clements, Yandong Mao, Aleksey Pesterev, M. Frans Kaashoek, Robert Morris, and Nickolai Zeldovich. 2010. An Analysis of Linux Scalability to Many Cores. In Proceedings of the 9th USENIX Conference on Operating Systems Design and Implementation (OSDI'10). USENIX Association, USA, 1--16.
[9]
Zhichao Cao, Siying Dong, Sagar Vemuri, and David H.C. Du. 2020. Characterizing, Modeling, and Benchmarking RocksDB Key-Value Workloads at Facebook. In 18th USENIX Conference on File and Storage Technologies (FAST 20). USENIX Association, Santa Clara, CA, 209--223. https://www.usenix.org/conference/fast20/presentation/cao-zhichao
[10]
D. Chakrabarti, Y. Zhan, and C. Faloutsos. 2004. R-MAT: A Recursive Model for Graph Mining. In SIAM International Conference on Data Mining. http://www.cs.cmu.edu/~christos/PUBLICATIONS/siam04.pdf
[11]
Jungsik Choi, Jiwon Kim, and Hwansoo Han. 2017. Efficient Memory Mapped File I/O for In-Memory File Systems. In 9th USENIX Workshop on Hot Topics in Storage and File Systems (HotStorage 17). USENIX Association, Santa Clara, CA. https://www.usenix.org/conference/hotstorage17/program/presentation/choi
[12]
Austin T. Clements, M. Frans Kaashoek, and Nickolai Zeldovich. 2012. Scalable Address Spaces Using RCU Balanced Trees. In Proceedings of the Seventeenth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS XVII). Association for Computing Machinery, New York, NY, USA, 199--210.
[13]
Austin T. Clements, M. Frans Kaashoek, and Nickolai Zeldovich. 2013. RadixVM: Scalable Address Spaces for Multithreaded Applications. In Proceedings of the 8th ACM European Conference on Computer Systems (EuroSys '13). Association for Computing Machinery, New York, NY, USA, 211--224.
[14]
Joel Coburn, Adrian M. Caulfield, Ameen Akel, Laura M. Grupp, Rajesh K. Gupta, Ranjit Jhala, and Steven Swanson. 2011. NV-Heaps: Making Persistent Objects Fast and Safe with next-Generation, Non-Volatile Memories. In Proceedings of the Sixteenth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS XVI). Association for Computing Machinery, New York, NY, USA, 105--118.
[15]
Brian F. Cooper, Adam Silberstein, Erwin Tam, Raghu Ramakrishnan, and Russell Sears. 2010. Benchmarking Cloud Serving Systems with YCSB. In Proceedings of the 1st ACM Symposium on Cloud Computing (SoCC '10). ACM, New York, NY, USA, 143--154.
[16]
Tudor David, Rachid Guerraoui, and Vasileios Trigonakis. 2015. Asynchronized Concurrency: The Secret to Scaling Concurrent Search Data Structures. In Proceedings of the Twentieth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS '15). Association for Computing Machinery, New York, NY, USA, 631--644.
[17]
D. R. Engler, M. F. Kaashoek, and J. O'Toole. 1995. Exokernel: An Operating System Architecture for Application-Level Resource Management. In Proceedings of the Fifteenth ACM Symposium on Operating Systems Principles (SOSP '95). Association for Computing Machinery, New York, NY, USA, 251--266.
[18]
Brian Essen, Henry Hsieh, Sasha Ames, Roger Pearce, and Maya Gokhale. 2015. DI-MMAP-a Scalable Memory-map Runtime for Out-of-core Data-intensive Applications. Cluster Computing 18, 1 (March 2015), 15--28.
[19]
B. V. Essen, H. Hsieh, S. Ames, and M. Gokhale. 2012. DI-MMAP: A High Performance Memory-Map Runtime for Data-Intensive Applications. In 2012 SC Companion: High Performance Computing, Networking Storage and Analysis. 731--735.
[20]
Facebook. [n. d.]. RocksDB. https://rocksdb.org/. Accessed: March 18, 2021.
[21]
Facebook. [n. d.]. RocksDB IO. https://github.com/facebook/rocksdb/wiki/IO. Accessed: March 18, 2021.
[22]
Facebook. [n. d.]. RocksDB Tuning Guide. https://github.com/facebook/rocksdb/wiki/RocksDB-Tuning-Guide. Accessed: March 18, 2021.
[23]
E. R. Giles, K. Doshi, and P. Varman. 2015. SoftWrAP: A lightweight framework for transactional support of storage class memory. In 2015 31st Symposium on Mass Storage Systems and Technologies (MSST). 1--14.
[24]
Stavros Harizopoulos, Daniel J. Abadi, Samuel Madden, and Michael Stonebraker. 2008. OLTP through the Looking Glass, and What We Found There. In Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data (SIGMOD '08). Association for Computing Machinery, New York, NY, USA, 981--992.
[25]
Qingda Hu, Jinglei Ren, Anirudh Badam, Jiwu Shu, and Thomas Moscibroda. 2017. Log-Structured Non-Volatile Main Memory. In 2017 USENIX Annual Technical Conference (USENIX ATC 17). USENIX Association, Santa Clara, CA, 703--717. https://www.usenix.org/conference/atc17/technical-sessions/presentation/hu
[26]
IBM. [n. d.]. Power ISA. Version 2.06 Revision B.
[27]
S. Imamura and E. Yoshida. 2019. POSTER: AR-MMAP: Write Performance Improvement of Memory-Mapped File. In 2019 28th International Conference on Parallel Architectures and Compilation Techniques (PACT). 493--494.
[28]
Intel. [n. d.]. OPTANE SSD DC P4800X SERIES. https://www.intel.com/content/www/us/en/products/memory-storage/solid-state-drives/data-center-ssds/optane-dc-p4800x-series.html. Accessed: March 18, 2021.
[29]
Intel. [n. d.]. PCI-SIG SR-IOV Primer. https://www.intel.com/content/www/us/en/pci-express/pci-sig-sr-iov-primer-sr-iov-technology-paper.html. Accessed: March 18, 2021.
[30]
Intel. [n. d.]. Virtualization Technology Specification for the Intel Itanium Architecture (VT-i). Accessed: March 18, 2021.
[31]
Joseph Izraelevitz, Jian Yang, Lu Zhang, Juno Kim, Xiao Liu, Amirsaman Memaripour, Yun Joon Soh, Zixuan Wang, Yi Xu, Subramanya R. Dulloor, Jishen Zhao, and Steven Swanson. 2019. Basic Performance Measurements of the Intel Optane DC Persistent Memory Module. CoRR abs/1903.05714 (2019). arXiv:1903.05714 http://arxiv.org/abs/1903.05714
[32]
Jens Axboe. [n. d.]. Efficient IO with io_uring. https://kernel.dk/iouring.pdf. Accessed: March 18, 2021.
[33]
Kostis Kaffes, Timothy Chong, Jack Tigar Humphries, Adam Belay, David Mazières, and Christos Kozyrakis. 2019. Shinjuku: Preemptive Scheduling for μsecond-scale Tail Latency. In 16th USENIX Symposium on Networked Systems Design and Implementation (NSDI 19). USENIX Association, Boston, MA, 345--360. https://www.usenix.org/conference/nsdi19/presentation/kaffes
[34]
Rishi Kapoor, George Porter, Malveeka Tewari, Geoffrey M. Voelker, and Amin Vahdat. 2012. Chronos: Predictable Low Latency for Data Center Applications. In Proceedings of the Third ACM Symposium on Cloud Computing (SoCC '12). Association for Computing Machinery, New York, NY, USA, Article 9, 14 pages.
[35]
Linux kernel. [n. d.]. cgroups. https://www.kernel.org/doc/Documentation/cgroup-v1/cgroups.txt. Accessed: March 18, 2021.
[36]
Linux kernel. [n. d.]. Userfaultfd. https://www.kernel.org/doc/Documentation/vm/userfaultfd.txt. Accessed: March 18, 2021.
[37]
Hyeong-Jun Kim, Young-Sik Lee, and Jin-Soo Kim. 2016. NVMeDirect: A User-space I/O Framework for Application-specific Optimization on NVMe SSDs. In 8th USENIX Workshop on Hot Topics in Storage and File Systems (HotStorage 16). USENIX Association, Denver, CO. https://www.usenix.org/conference/hotstorage16/workshop-program/presentation/kim
[38]
Ana Klimovic, Heiner Litz, and Christos Kozyrakis. 2017. ReFlex: Remote Flash ≈ Local Flash. In Proceedings of the Twenty-Second International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS '17). Association for Computing Machinery, New York, NY, USA, 345--359.
[39]
Aasheesh Kolli, Steven Pelley, Ali Saidi, Peter M. Chen, and Thomas F. Wenisch. 2016. High-Performance Transactions for Persistent Memories. In Proceedings of the Twenty-First International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS '16). Association for Computing Machinery, New York, NY, USA, 399--411.
[40]
Hyeontaek Lim, Dongsu Han, David G. Andersen, and Michael Kaminsky. 2014. MICA: A Holistic Approach to Fast in-Memory Key-Value Storage. In Proceedings of the 11th USENIX Conference on Networked Systems Design and Implementation (NSDI'14). USENIX Association, USA, 429--444.
[41]
Linux. [n. d.]. KVM - Nested VMX. https://www.kernel.org/doc/Documentation/virtual/kvm/nested-vmx.txt. Accessed: March 18, 2021.
[42]
W. Mauerer. 2008. Professional Linux Kernel Architecture. Wiley. https://books.google.gr/books?id=e8BbHxVhzFAC
[43]
Micron. [n. d.]. UNVMe - A User Space NVMe Driver. https://github.com/MicronSSD/unvme. Accessed: March 18, 2021.
[44]
Microsoft. [n. d.]. Run Hyper-V in a Virtual Machine with Nested Virtualization. https://docs.microsoft.com/en-us/virtualization/hyper-v-on-windows/user-guide/nested-virtualization. Accessed: March 18, 2021.
[45]
MongoDB. [n. d.]. MMAPv1 Storage Engine. https://docs.mongodb.com/v4.0/core/mmapv1/. Accessed: March 18, 2021.
[46]
Patrick O'Neil, Edward Cheng, Dieter Gawlick, and Elizabeth O'Neil. 1996. The Log-Structured Merge-Tree (LSM-Tree). Acta Inf. 33, 4 (June 1996), 351--385.
[47]
Anastasios Papagiannis, Giorgos Saloustros, Pilar González-Férez, and Angelos Bilas. 2016. Tucana: Design and Implementation of a Fast and Efficient Scale-up Key-value Store. In 2016 USENIX Annual Technical Conference (USENIX ATC 16). USENIX Association, Denver, CO, 537--550. https://www.usenix.org/conference/atc16/technical-sessions/presentation/papagiannis
[48]
Anastasios Papagiannis, Giorgos Saloustros, Pilar González-Férez, and Angelos Bilas. 2018. An Efficient Memory-Mapped Key-Value Store for Flash Storage. In Proceedings of the ACM Symposium on Cloud Computing (SoCC '18). ACM, New York, NY, USA, 490--502.
[49]
Anastasios Papagiannis, Giorgos Saloustros, Giorgos Xanthakis, Giorgos Kalaentzis, Pilar Gonzalez-Ferez, and Angelos Bilas. 2021. Kreon: An Efficient Memory-Mapped Key-Value Store for Flash Storage. ACM Trans. Storage 17, 1, Article 7 (Jan. 2021), 32 pages.
[50]
Anastasios Papagiannis, Giorgos Xanthakis, Giorgos Saloustros, Manolis Marazakis, and Angelos Bilas. 2020. Optimizing Memory-mapped I/O for Fast Storage Devices. In 2020 USENIX Annual Technical Conference (USENIX ATC 20). USENIX Association, 813--827. https://www.usenix.org/conference/atc20/presentation/papagiannis
[51]
I. Peng, M. McFadden, E. Green, K. Iwabuchi, K. Wu, D. Li, R. Pearce, and M. Gokhale. 2019. UMap: Enabling Application-driven Optimizations for Page Management. In 2019 IEEE/ACM Workshop on Memory Centric High Performance Computing (MCHPC). 71--78.
[52]
Persistent Memory Development Kit (PMDK). [n. d.]. https://pmem.io/pmdk/. Accessed: March 18, 2021.
[53]
Simon Peter, Jialin Li, Irene Zhang, Dan R. K. Ports, Doug Woos, Arvind Krishnamurthy, Thomas Anderson, and Timothy Roscoe. 2014. Arrakis: The Operating System is the Control Plane. In Proceedings of the 11th USENIX Conference on Operating Systems Design and Implementation (OSDI'14). USENIX Association, USA, 1--16.
[54]
pmem.io: Persistent Memory Programming. [n. d.]. http://pmem.io/. Accessed: March 18, 2021.
[55]
George Prekas, Marios Kogias, and Edouard Bugnion. 2017. ZygOS: Achieving Low Tail Latency for Microsecond-Scale Networked Tasks. In Proceedings of the 26th Symposium on Operating Systems Principles (SOSP '17). Association for Computing Machinery, New York, NY, USA, 325--341.
[56]
Jinglei Ren. 2016. YCSB-C. https://github.com/basicthinker/YCSB-C.
[57]
Julian Shun and Guy E. Blelloch. 2013. Ligra: A Lightweight Graph Processing Framework for Shared Memory. In Proceedings of the 18th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP '13). ACM, New York, NY, USA, 135--146.
[58]
Nae Young Song, Yongseok Son, Hyuck Han, and Heon Young Yeom. 2016. Efficient Memory-Mapped I/O on Fast Storage Device. ACM Trans. Storage 12, 4, Article 19 (May 2016), 27 pages.
[59]
SPDK - BlobFS. [n. d.]. https://spdk.io/doc/blobfs.html. Accessed: March 18, 2021.
[60]
SPDK - Blobstore. [n. d.]. https://spdk.io/doc/blob.html. Accessed: March 18, 2021.
[61]
Storage Performance Development Kit (SPDK). [n. d.]. https://spdk.io/. Accessed: March 18, 2021.
[62]
R. Uhlig, G. Neiger, D. Rodgers, A.L. Santoni, F.C.M. Martins, A.V. Anderson, S.M. Bennett, A. Kagi, F.H. Leung, and L. Smith. 2005. Intel virtualization technology. Computer 38, 5 (May 2005), 48--56.
[63]
Prashant Varanasi and Gernot Heiser. 2011. Hardware-Supported Virtualization on ARM. In Proceedings of the Second Asia-Pacific Workshop on Systems (APSys '11). Association for Computing Machinery, New York, NY, USA, Article 11, 5 pages.
[64]
VMware. [n. d.]. Running Nested VMs. https://communities.vmware.com/docs/DOC-8970. Accessed: March 18, 2021.
[65]
Haris Volos, Andres Jaan Tack, and Michael M. Swift. 2011. Mnemosyne: Lightweight Persistent Memory. In Proceedings of the Sixteenth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS XVI). Association for Computing Machinery, New York, NY, USA, 91--104.
[66]
Xen. [n. d.]. Nested Virtualization in Xen. https://wiki.xenproject.org/wiki/NestedVirtualizationinXen. Accessed: March 18, 2021.
[67]
Lu Zhang and Steven Swanson. 2019. Pangolin: A Fault-Tolerant Persistent Memory Programming Library. In 2019 USENIX Annual Technical Conference (USENIX ATC 19). USENIX Association, Renton, WA, 897--912. https://www.usenix.org/conference/atc19/presentation/zhanglu

Cited By

View all
  • (2024)Fastmove: A Comprehensive Study of On-Chip DMA and its Demonstration for Accelerating Data Movement in NVM-based Storage SystemsACM Transactions on Storage10.1145/365647720:3(1-30)Online publication date: 6-Jun-2024
  • (2024)DMA-Assisted I/O for Persistent MemoryIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2024.337300335:5(829-843)Online publication date: May-2024
  • (2023)TriCache: A User-Transparent Block Cache Enabling High-Performance Out-of-Core Processing with In-Memory ProgramsACM Transactions on Storage10.1145/358313919:2(1-30)Online publication date: 22-Mar-2023
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
EuroSys '21: Proceedings of the Sixteenth European Conference on Computer Systems
April 2021
631 pages
ISBN:9781450383349
DOI:10.1145/3447786
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 21 April 2021

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. I/O caching
  2. Linux mmap
  3. fast storage devices
  4. key-value stores
  5. memory-mapped I/O

Qualifiers

  • Research-article

Funding Sources

Conference

EuroSys '21
Sponsor:
EuroSys '21: Sixteenth European Conference on Computer Systems
April 26 - 28, 2021
Online Event, United Kingdom

Acceptance Rates

EuroSys '21 Paper Acceptance Rate 38 of 181 submissions, 21%;
Overall Acceptance Rate 241 of 1,308 submissions, 18%

Upcoming Conference

EuroSys '25
Twentieth European Conference on Computer Systems
March 30 - April 3, 2025
Rotterdam , Netherlands

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)144
  • Downloads (Last 6 weeks)20
Reflects downloads up to 15 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Fastmove: A Comprehensive Study of On-Chip DMA and its Demonstration for Accelerating Data Movement in NVM-based Storage SystemsACM Transactions on Storage10.1145/365647720:3(1-30)Online publication date: 6-Jun-2024
  • (2024)DMA-Assisted I/O for Persistent MemoryIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2024.337300335:5(829-843)Online publication date: May-2024
  • (2023)TriCache: A User-Transparent Block Cache Enabling High-Performance Out-of-Core Processing with In-Memory ProgramsACM Transactions on Storage10.1145/358313919:2(1-30)Online publication date: 22-Mar-2023
  • (2023)Evaluation and Refinement of an Explicit Virtual-Memory PrimitiveIEEE Access10.1109/ACCESS.2023.333814911(136855-136868)Online publication date: 2023

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media