Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/2815400.2815421acmconferencesArticle/Chapter ViewAbstractPublication PagessospConference Proceedingsconference-collections
research-article

Split-level I/O scheduling

Published: 04 October 2015 Publication History

Abstract

We introduce split-level I/O scheduling, a new framework that splits I/O scheduling logic across handlers at three layers of the storage stack: block, system call, and page cache. We demonstrate that traditional block-level I/O schedulers are unable to meet throughput, latency, and isolation goals. By utilizing the split-level framework, we build a variety of novel schedulers to readily achieve these goals: our Actually Fair Queuing scheduler reduces priority-misallocation by 28x; our Split-Deadline scheduler reduces tail latencies by 4x; our Split-Token scheduler reduces sensitivity to interference by 6x. We show that the framework is general and operates correctly with disparate file systems (ext4 and XFS). Finally, we demonstrate that split-level scheduling serves as a useful foundation for databases (SQLite and PostgreSQL), hypervisors (QEMU), and distributed file systems (HDFS), delivering improved isolation and performance in these important application scenarios.

Supplementary Material

MP4 File (p474.mp4)

References

[1]
CFQ (Complete Fairness Queueing). https://www.kernel.org/doc/Documentation/block/cfq-iosched.txt.
[2]
Database/kernel community topic at collaboration summit 2014. http://www.postgresql.org/message-id/[email protected].
[3]
Deadline IO scheduler tunables. https://www.kernel.org/doc/Documentation/block/deadline-iosched.txt.
[4]
Documentation for pgbench. http://http://www.postgresql.org/docs/9.4/static/pgbench.html.
[5]
Documentation for /proc/sys/vm/*. https://www.kernel.org/doc/Documentation/sysctl/vm.txt.
[6]
Inside the Windows Vista Kernel: Part 1. http://technet.microsoft.com/en-us/magazine/2007.02.vistakernel.aspx.
[7]
ionice(1) - Linux man page. http://linux.die.net/man/1/ionice.
[8]
Notes on the Generic Block Layer Rewrite in Linux 2.5. https://www.kernel.org/doc/Documentation/block/biodoc.txt.
[9]
pgsql-hackers maillist communication. http://www.postgresql.org/message-id/CA+Tgmobv6gm6SzHx8e2w-0180+jHbCNYbAot9KyzG_3DxRYxaw@mail.gmail.com.
[10]
Postgresql 9.2.9 documentation. http://www.postgresql.org/docs/9.2/static/wal-configuration.html.
[11]
Anand, A., Sen, S., Krioukov, A., Popovici, F., Akella, A., Arpaci-Dusseau, A. C., Arpaci-Dusseau, R. H., and Banerjee, S. Avoiding File System Micromanagement with Range Writes. In OSDI '08 (San Diego, CA, December 2008).
[12]
Arpaci-Dusseau, A. C., and Arpaci-Dusseau, R. H. Information and Control in Gray-Box Systems. In ACM SIGOPS Operating Systems Review (2001), vol. 35, ACM, pp. 43--56.
[13]
Arpaci-Dusseau, R. H., and Arpaci-Dusseau, A. C. Operating Systems: Three Easy Pieces. Arpaci-Dusseau Books, 2014.
[14]
Banga, G., Druschel, P., and Mogul, J. C. Resource containers: A new facility for resource management in server systems. In OSDI (1999), vol. 99, pp. 45--58.
[15]
Best, S. JFS Overview. http://jfs.sourceforge.net/project/pub/jfs.pdf, 2000.
[16]
Bonwick, J., and Moore, B. ZFS: The Last Word in File Systems. http://opensolaris.org/os/community/zfs/docs/zfs_last.pdf, 2007.
[17]
Bosch, P., and Mullender, S. Real-time disk scheduling in a mixed-media file system. In Real-Time Technology and Applications Symposium, 2000. RTAS 2000. Proceedings. Sixth IEEE (2000), pp. 23--32.
[18]
Craciunas, S. S., Kirsch, C. M., and Röck, H. The TAP Project: Traffic Shaping System Calls. http://tap.cs.uni-salzburg.at/downloads.html.
[19]
Craciunas, S. S., Kirsch, C. M., and Röck, H. I/O Resource Management Through System Call Scheduling. SIGOPS Oper. Syst. Rev. 42, 5 (July 2008), 44--54.
[20]
Frost, C., Mammarella, M., Kohler, E., de los Reyes, A., Hovsepian, S., Matsuoka, A., and Zhang, L. Generalized File System Dependencies. In SOSP '07 (Stevenson, WA, October 2007), pp. 307--320.
[21]
Ganger, G. R., McKusick, M. K., Soules, C. A., and Patt, Y. N. Soft Updates: A Solution to the Metadata Update Problem in File Systems. ACM Transactions on Computer Systems (TOCS) 18, 2 (2000), 127--153.
[22]
Gu, W., Kalbarczyk, Z., Iyer, R. K., and Yang, Z. Characterization of Linux Kernel Behavior Under Errors. In DSN '03 (San Francisco, CA, June 2003), pp. 459--468.
[23]
Gulati, A., Merchant, A., and Varman, P. J. pClock: An Arrival Curve Based Approach for QoS Guarantees in Shared Storage Systems. In Proceedings of the 2007 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems (New York, NY, USA, 2007), SIGMETRICS '07, ACM, pp. 13--24.
[24]
Gupta, D., Cherkasova, L., Gardner, R., and Vahdat, A. Enforcing performance isolation across virtual machines in xen. In Middleware 2006. Springer, 2006, pp. 342--362.
[25]
Hagmann, R. Reimplementing the Cedar File System Using Logging and Group Commit. In SOSP '87 (Austin, TX, November 1987).
[26]
Hipp, D. R., and Kennedy, D. SQlite, 2007.
[27]
Hofri, M. Disk scheduling: FCFS vs.SSTF revisited. Communications of the ACM 23, 11 (1980), 645--653.
[28]
Huang, L., and Chiueh, T. Implementation of a Rotation-Latency-Sensitive Disk Scheduler. Tech. Rep. ECSL-TR81, SUNY, Stony Brook, March 2000.
[29]
Jacobson, D. M., and Wilkes, J. Disk Scheduling Algorithms Based on Rotational Position. Tech. Rep. HPL-CSP-91-7, Hewlett Packard Laboratories, 1991.
[30]
Kim, J., Oh, Y., Kim, E., Choi, J., Lee, D., and Noh, S. H. Disk Schedulers for Solid State Drivers. In Proceedings of the Seventh ACM International Conference on Embedded Software (New York, NY, USA, 2009), EMSOFT '09, ACM, pp. 295--304.
[31]
Lumb, C., Schindler, J., Ganger, G., Nagle, D., and Riedel, E. Towards Higher Disk Head Utilization: Extracting "Free" Bandwidth From Busy Disk Drives. In OSDI '00 (San Diego, CA, October 2000), pp. 87--102.
[32]
Lumb, C. R., Merchant, A., and Alvarez, G. A. Facade: Virtual storage devices with performance guarantees. In FAST '03 (San Francisco, CA, April 2003).
[33]
Mason, C. The Btrfs Filesystem. oss.oracle.com/projects/btrfs/dist/documentation/btrfs-ukuug.pdf, September 2007.
[34]
Mathur, A., Cao, M., Bhattacharya, S., Dilge, A., Tomas, A., and Vivier, L. The New Ext4 Filesystem: Current Status and Future Plans. In Ottawa Linux Symposium (OLS '07) (Ottawa, Canada, July 2007).
[35]
Mesnier, M., Chen, F., Luo, T., and Akers, J. B. Differentiated Storage Services. In Proceedings of the 23rd ACM Symposium on Operating Systems Principles (SOSP '11) (Cascais, Portugal, October 2011).
[36]
Park, S., and Shen, K. FIOS: A fair, Efficient Flash I/O Scheduler. In FAST (2012), p. 13.
[37]
Pillai, T. S., Chidambaram, V., Alagappan, R., Alkiswany, S., Arpaci-Dusseau, A. C., and Arpaci-Dusseau, R. H. All File Systems Are Not Created Equal: On the Complexity of Crafting Crash-Consistent Applications. In 11th USENIX Symposium on Operating Systems Design and Implementation (OSDI 14) (Broomfield, CO, October 2014).
[38]
Popovici, F. I., Arpaci-Dusseau, A. C., and Arpaci-Dusseau, R. H. Robust, Portable I/O Scheduling with the Disk Mimic. In USENIX Annual Technical Conference, General Track (2003), pp. 297--310.
[39]
Povzner, A., Kaldewey, T., Brandt, S., Golding, R., Wong, T. M., and Maltzahn, C. Efficient Guaranteed Disk Request Scheduling with Fahrrad. In EuroSys '08 (Glasgow, Scotland UK, March 2008).
[40]
Riska, A., Larkby-Lahet, J., and Riedel, E. Evaluating Block-level Optimization Through the IO Path. In USENIX '07 (Santa Clara, CA, June 2007).
[41]
Rizzo, L., and Checconi, F. GEOM_SCHED: A Framework for Disk Scheduling within GEOM. http://info.iet.unipi.it/~luigi/papers/20090508-geom_sched-slides.pdf.
[42]
Rosenblum, M., and Ousterhout, J. The Design and Implementation of a Log-Structured File System. ACM Transactions on Computer Systems 10, 1 (February 1992), 26--52.
[43]
Ruemmler, C., and Wilkes, J. An Introduction to Disk Drive Modeling. IEEE Computer 27, 3 (March 1994), 17--28.
[44]
Seltzer, M., Chen, P., and Ousterhout, J. Disk Scheduling Revisited. In USENIX Winter '90 (Washington, DC, January 1990), pp. 313--324.
[45]
Shenoy, P., and Vin, H. Cello: A Disk Scheduling Framework for Next-generation Operating Systems. In SIGMETRICS '98 (Madison, WI, June 1998), pp. 44--55.
[46]
Shue, D., and Freedman, M. J. From application requests to Virtual IOPs: Provisioned key-value storage with Libra. In Proceedings of the Ninth European Conference on Computer Systems (2014), ACM, p. 17.
[47]
Sweeney, A., Doucette, D., Hu, W., Anderson, C., Nishimoto, M., and Peck, G. Scalability in the XFS File System. In USENIX 1996 (San Diego, CA, January 1996).
[48]
Thereska, E., Ballani, H., O'Shea, G., Karagiannis, T., Rowstron, A., Talpey, T., Black, R., and Zhu, T. IOFlow: A Software-Defined Storage Architecture. In Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles (2013), ACM, pp. 182--196.
[49]
Van Meter, R., and Gao, M. Latency Management in Storage Systems. In OSDI '00 (San Diego, CA, October 2000), pp. 103--117.
[50]
Wachs, M., Abd-El-Malek, M., Thereska, E., and Ganger, G. R. Argon: Performance insulation for shared storage servers. In FAST '07 (San Jose, CA, February 2007).
[51]
Waldspurger, C. A., and Weihl, W. E. Stride Scheduling: Deterministic Proportional Share Resource Management. Massachusetts Institute of Technology. Laboratory for Computer Science, 1995.
[52]
Wang, H., and Varman, P. J. Balancing fairness and efficiency in tiered storage systems with bottleneck-aware allocation. In FAST '13 (San Jose, CA, February 2014).
[53]
Wilkes, J., Golding, R., Staelin, C., and Sullivan, T. The HP AutoRAID Hierarchical Storage System. ACM Transactions on Computer Systems 14, 1 (February 1996), 108--136.
[54]
Worthington, B. L., Ganger, G. R., and Patt, Y. N. Scheduling Algorithms for Modern Disk Drives. In SIGMETRICS '94 (Nashville, TN, May 1994), pp. 241--251.
[55]
Yang, J., Sar, C., and Engler, D. EXPLODE: A Lightweight, General System for Finding Serious Storage System Errors. In OSDI '06 (Seattle, WA, November 2006).
[56]
Yang, T., Liu, T., Berger, E. D., Kaplan, S. F., and Moss, J. E. B. Redline: First Class Support for Interactivity in Commodity Operating Systems. In OSDI '08 (San Diego, CA, December 2008).
[57]
Yu, X., Gum, B., Chen, Y., Wang, R. Y., Li, K., Krishnamurthy, A., and Anderson, T. E. Trading Capacity for Performance in a Disk Array. In OSDI '00 (San Diego, CA, October 2000).

Cited By

View all
  • (2024)Timing-accurate scheduling and allocation for parallel I/O operations in real-time systemsJournal of Systems Architecture10.1016/j.sysarc.2024.103158(103158)Online publication date: May-2024
  • (2023)Enabling Multi-tenancy on SSDs with Accurate IO Interference ModelingProceedings of the 2023 ACM Symposium on Cloud Computing10.1145/3620678.3624657(216-232)Online publication date: 30-Oct-2023
  • (2023)Enabling High-Performance and Secure Userspace NVM File Systems with the Trio ArchitectureProceedings of the 29th Symposium on Operating Systems Principles10.1145/3600006.3613171(150-165)Online publication date: 23-Oct-2023
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SOSP '15: Proceedings of the 25th Symposium on Operating Systems Principles
October 2015
499 pages
ISBN:9781450338349
DOI:10.1145/2815400
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 04 October 2015

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Research-article

Funding Sources

  • NFS

Conference

SOSP '15
Sponsor:

Acceptance Rates

SOSP '15 Paper Acceptance Rate 30 of 181 submissions, 17%;
Overall Acceptance Rate 131 of 716 submissions, 18%

Upcoming Conference

SOSP '24

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)87
  • Downloads (Last 6 weeks)9
Reflects downloads up to 04 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Timing-accurate scheduling and allocation for parallel I/O operations in real-time systemsJournal of Systems Architecture10.1016/j.sysarc.2024.103158(103158)Online publication date: May-2024
  • (2023)Enabling Multi-tenancy on SSDs with Accurate IO Interference ModelingProceedings of the 2023 ACM Symposium on Cloud Computing10.1145/3620678.3624657(216-232)Online publication date: 30-Oct-2023
  • (2023)Enabling High-Performance and Secure Userspace NVM File Systems with the Trio ArchitectureProceedings of the 29th Symposium on Operating Systems Principles10.1145/3600006.3613171(150-165)Online publication date: 23-Oct-2023
  • (2023)Do we still need IO schedulers for low-latency disks?Proceedings of the 15th ACM Workshop on Hot Topics in Storage and File Systems10.1145/3599691.3603400(44-50)Online publication date: 9-Jul-2023
  • (2023)Holistic and Opportunistic Scheduling of Background I/Os in Flash-Based SSDsIEEE Transactions on Computers10.1109/TC.2023.328874872:11(3127-3139)Online publication date: Nov-2023
  • (2023)i-NVMe: Isolated NVMe over TCP for a Containerized EnvironmentIEEE INFOCOM 2023 - IEEE Conference on Computer Communications10.1109/INFOCOM53939.2023.10228889(1-10)Online publication date: 17-May-2023
  • (2022)Towards low-latency I/O services for mixed workloads using ultra-low latency SSDsProceedings of the 36th ACM International Conference on Supercomputing10.1145/3524059.3532378(1-12)Online publication date: 28-Jun-2022
  • (2022)IOCost: block IO control for containers in datacentersProceedings of the 27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems10.1145/3503222.3507727(595-608)Online publication date: 28-Feb-2022
  • (2022)Memory/Disk Operation Aware Lightweight VM Live MigrationIEEE/ACM Transactions on Networking10.1109/TNET.2022.315593530:4(1895-1910)Online publication date: Aug-2022
  • (2022)Be United in Actions: Taking Live Snapshots of Heterogeneous Edge–Cloud Collaborative Cluster With Low OverheadIEEE Internet of Things Journal10.1109/JIOT.2021.31110239:10(7311-7324)Online publication date: 15-May-2022
  • Show More Cited By

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media