Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/2901318.2901326acmotherconferencesArticle/Chapter ViewAbstractPublication PageseurosysConference Proceedingsconference-collections
research-article

The Linux scheduler: a decade of wasted cores

Published: 18 April 2016 Publication History

Abstract

As a central part of resource management, the OS thread scheduler must maintain the following, simple, invariant: make sure that ready threads are scheduled on available cores. As simple as it may seem, we found that this invariant is often broken in Linux. Cores may stay idle for seconds while ready threads are waiting in runqueues. In our experiments, these performance bugs caused many-fold performance degradation for synchronization-heavy scientific applications, 13% higher latency for kernel make, and a 14-23% decrease in TPC-H throughput for a widely used commercial database. The main contribution of this work is the discovery and analysis of these bugs and providing the fixes. Conventional testing techniques and debugging tools are ineffective at confirming or understanding this kind of bugs, because their symptoms are often evasive. To drive our investigation, we built new tools that check for violation of the invariant online and visualize scheduling activity. They are simple, easily portable across kernel versions, and run with a negligible overhead. We believe that making these tools part of the kernel developers' tool belt can help keep this type of bug at bay.

References

[1]
DTrace. http://dtrace.org/.
[2]
Event Tracing for Windows. https://msdn.microsoft.com/en-us/library/windows/desktop/bb968803(v=vs.85).aspx.
[3]
Ftrace. https://www.kernel.org/doc/Documentation/trace/ftrace.txt.
[4]
KernelShark. http://people.redhat.com/srostedt/kernelshark/HTML/.
[5]
Linux 4.3 scheduler change "potentially affects every SMP workload in existence". http://www.phoronix.com/scan.php?page=news_item&px=Linux-4.3-Scheduler-SMP.
[6]
SystemTap. https://sourceware.org/systemtap/.
[7]
J. Antony, P. P. Janes, and A. P. Rendell. Exploring thread and memory placement on NUMA architectures: Solaris and Linux, UltraSPARC/FirePlane and Opteron/HyperTransport. In Proceedings of the 13th International Conference on High Performance Computing, HiPC'06, 2006.
[8]
S. Blagodurov, S. Zhuravlev, and A. Fedorova. Contention-aware scheduling on multicore systems. ACM Trans. Comput. Syst., 28:8:1--8:45, 2010.
[9]
S. Blagodurov, S. Zhuravlev, M. Dashti, and A. Fedorova. A case for NUMA-aware contention management on multicore systems. In Proceedings of the 2011 USENIX Annual Technical Conference, USENIX ATC'11, 2011.
[10]
S. Boyd-Wickizer, A. T. Clements, Y. Mao, A. Pesterev, M. F. Kaashoek, R. Morris, and N. Zeldovich. An analysis of Linux scalability to many cores. In Proceedings of the 9th USENIX Conference on Operating Systems Design and Implementation, OSDI'10, 2010.
[11]
H. Chen, D. Ziegler, T. Chajed, A. Chlipala, M. F. Kaashoek, and N. Zeldovich. Using Crash Hoare logic for certifying the FSCQ file system. In Proceedings of the 25th Symposium on Operating Systems Principles, SOSP '15, 2015.
[12]
H. Chen, D. Ziegler, A. Chlipala, M. F. Kaashoek, E. Kohler, and N. Zeldovich. Specifying crash safety for storage systems. In 15th Workshop on Hot Topics in Operating Systems (HotOS XV), Kartause Ittingen, Switzerland, May 2015.
[13]
T. Chen, L. I. Ananiev, and A. V. Tikhonov. Keeping kernel performance from regressions. In Linux Symposium, volume 1, pages 93--102, 2007.
[14]
M. Dashti, A. Fedorova, J. Funston, F. Gaud, R. Lachaize, B. Lepers, V. Quema, and M. Roth. Traffic management: a holistic approach to memory placement on NUMA systems. In Proceedings of the Eighteenth International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS '13, 2013.
[15]
D. Engler and K. Ashcraft. RacerX: Effective, static detection of race conditions and deadlocks. In Proceedings of the Nineteenth ACM Symposium on Operating Systems Principles, SOSP '03, 2003.
[16]
J. Erickson, M. Musuvathi, S. Burckhardt, and K. Olynyk. Effective data-race detection for the kernel. In Proceedings of the 9th USENIX Conference on Operating Systems Design and Implementation, OSDI'10, 2010.
[17]
J. R. Funston, K. El Maghraoui, J. Jann, P. Pattnaik, and A. Fedorova. An SMT-selection metric to improve multithreaded applications' performance. In Proceedings of the 2012 IEEE 26th International Parallel and Distributed Processing Symposium, IPDPS '12, 2012.
[18]
F. Gaud, B. Lepers, J. Decouchant, J. Funston, A. Fedorova, and V. Quéma. Large pages may be harmful on NUMA systems. In Proceedings of the 2014 USENIX Annual Technical Conference, USENIX ATC'14, 2014.
[19]
M. Gomaa, M. D. Powell, and T. N. Vijaykumar. Heat-and-run: leveraging SMT and CMP to manage power density through the operating system. In Proceedings of the 11th International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS XI, 2004.
[20]
A. S. Harji, P. A. Buhr, and T. Brecht. Our troubles with Linux and why you should care. In Proceedings of the Second Asia-Pacific Workshop on Systems, APSys '11, 2011.
[21]
G. Klein, K. Elphinstone, G. Heiser, J. Andronick, D. Cock, P. Derrin, D. Elkaduwe, K. Engelhardt, R. Kolanski, M. Norrish, T. Sewell, H. Tuch, and S. Winwood. seL4: formal verification of an OS kernel. In Proceedings of the ACM SIGOPS 22nd Symposium on Operating Systems Principles, SOSP '09, 2009.
[22]
D. Koufaty, D. Reddy, and S. Hahn. Bias scheduling in heterogeneous multi-core architectures. In Proceedings of the 5th European Conference on Computer Systems, EuroSys '10, 2010.
[23]
B. Lepers, V. Quéma, and A. Fedorova. Thread and memory placement on NUMA systems: asymmetry matters. In Proceedings of the 2015 USENIX Conference on Usenix Annual Technical Conference, USENIX ATC '15, 2015.
[24]
T. Li, D. Baumberger, D. A. Koufaty, and S. Hahn. Efficient operating system scheduling for performance-asymmetric multi-core architectures. In Proceedings of the 2007 ACM/IEEE Conference on Supercomputing, SC '07, 2007.
[25]
X. Liu, Z. Guo, X. Wang, F. Chen, X. Lian, J. Tang, M. Wu, M. F. Kaashoek, and Z. Zhang. D3S: debugging deployed distributed systems. In Proceedings of the 5th USENIX Symposium on Networked Systems Design and Implementation, NSDI'08, 2008.
[26]
D. Luu. The Nyquist theorem and limitations of sampling profilers today, with glimpses of tracing tools from the future. http://danluu.com/perf-tracing.
[27]
J. Mace, R. Roelke, and R. Fonseca. Pivot tracing: dynamic causal monitoring for distributed systems. In Proceedings of the 25th Symposium on Operating Systems Principles, SOSP '15, 2015.
[28]
H. Mai, E. Pek, H. Xue, S. T. King, and P. Madhusudan. Verifying security invariants in ExpressOS. In Proceedings of the Eighteenth International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS '13, 2013.
[29]
A. Merkel, J. Stoess, and F. Bellosa. Resource-conscious scheduling for energy efficiency on multicore processors. In Proceedings of the 5th European Conference on Computer Systems, EuroSys '10, 2010.
[30]
M. S. Mollison, B. Brandenburg, and J. H. Anderson. Towards unit testing real-time schedulers in LITMUSRT. In Proceedings of the 5th Workshop on Operating Systems Platforms for Embedded Real-Time Applications, OSPERT '09, 2009.
[31]
M. Musuvathi, D. Y. W. Park, A. Chou, D. R. Engler, and D. L. Dill. CMC: a pragmatic approach to model checking real code. SIGOPS Oper. Syst. Rev., 36(SI), Dec. 2002.
[32]
NAS Parallel Benchmarks. http://www.nas.nasa.gov/publications/npb.html.
[33]
S. E. Perl and W. E. Weihl. Performance assertion checking. In Proceedings of the Fourteenth ACM Symposium on Operating Systems Principles, SOSP '93, 1993.
[34]
K. K. Pusukuri, D. Vengerov, A. Fedorova, and V. Kalogeraki. FACT: a framework for adaptive contention-aware thread migrations. In Proceedings of the 8th ACM International Conference on Computing Frontiers, CF '11, 2011.
[35]
J. C. Saez, M. Prieto, A. Fedorova, and S. Blagodurov. A comprehensive scheduler for asymmetric multicore systems. In Proceedings of the 5th European Conference on Computer Systems, EuroSys '10, 2010.
[36]
S. K. Sahoo, J. Criswell, C. Geigle, and V. Adve. Using likely invariants for automated software fault localization. In Proceedings of the Eighteenth International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS '13, 2013.
[37]
S. Savage, M. Burrows, G. Nelson, P. Sobalvarro, and T. Anderson. Eraser: a dynamic data race detector for multithreaded programs. ACM Trans. Comput. Syst., 15(4), Nov. 1997.
[38]
L. T. Schermerhorn. A matter of hygiene: automatic page migration for Linux. 2007. URL https://linux.org.au/conf/2007/talk/197.html.
[39]
K. Shen, M. Zhong, and C. Li. I/O system performance debugging using model-driven anomaly characterization. In Proceedings of the 4th USENIX Conference on File and Storage Technologies - Volume 4, FAST'05, pages 23--23, 2005.
[40]
D. Sites. Data center computers: modern challenges in CPU design. https://www.youtube.com/watch?v=QBu2Ae8-8LM.
[41]
D. Tam, R. Azimi, and M. Stumm. Thread clustering: sharing-aware scheduling on SMP-CMP-SMT multiprocessors. In Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems 2007, EuroSys '07, 2007.
[42]
D. K. Tam, R. Azimi, L. B. Soares, and M. Stumm. RapidMRC: approximating L2 miss rate curves on commodity systems for online optimizations. In Proceedings of the 14th International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS XIV, 2009.
[43]
L. Torvalds. The Linux Kernel Mailing List. http://tech-insider.org/linux/research/2001/1215.html, Feb. 2001.
[44]
L. Torvalds. Tech Talk: Linus Torvalds on git, Google. http://www.youtube.com/watch?v=4XpnKHJAok8, Mar. 2007.
[45]
J. Yang, P. Twohey, D. Engler, and M. Musuvathi. Using model checking to find serious file system errors. ACM Trans. Comput. Syst., 24(4), Nov. 2006.
[46]
S. Zhuravlev, S. Blagodurov, and A. Fedorova. Addressing shared resource contention in multicore processors via scheduling. In Proceedings of the Fifteenth Edition of ASPLOS on Architectural Support for Programming Languages and Operating Systems, ASPLOS XV, 2010.

Cited By

View all
  • (2025)Syntalos: a software for precise synchronization of simultaneous multi-modal data acquisition and closed-loop interventionsNature Communications10.1038/s41467-025-56081-916:1Online publication date: 15-Jan-2025
  • (2024)Identifying on-/off-CPU bottlenecks together with blocked samplesProceedings of the 18th USENIX Conference on Operating Systems Design and Implementation10.5555/3691938.3691986(893-910)Online publication date: 10-Jul-2024
  • (2024)In Serverless, OS Scheduler Choice Costs Money: A Hybrid Scheduling Approach for Cheaper FaaSProceedings of the 25th International Middleware Conference10.1145/3652892.3700757(172-184)Online publication date: 2-Dec-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
EuroSys '16: Proceedings of the Eleventh European Conference on Computer Systems
April 2016
605 pages
ISBN:9781450342407
DOI:10.1145/2901318
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 18 April 2016

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Research-article

Conference

EuroSys '16
EuroSys '16: Eleventh EuroSys Conference 2016
April 18 - 21, 2016
London, United Kingdom

Acceptance Rates

EuroSys '16 Paper Acceptance Rate 38 of 180 submissions, 21%;
Overall Acceptance Rate 241 of 1,308 submissions, 18%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)815
  • Downloads (Last 6 weeks)53
Reflects downloads up to 13 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2025)Syntalos: a software for precise synchronization of simultaneous multi-modal data acquisition and closed-loop interventionsNature Communications10.1038/s41467-025-56081-916:1Online publication date: 15-Jan-2025
  • (2024)Identifying on-/off-CPU bottlenecks together with blocked samplesProceedings of the 18th USENIX Conference on Operating Systems Design and Implementation10.5555/3691938.3691986(893-910)Online publication date: 10-Jul-2024
  • (2024)In Serverless, OS Scheduler Choice Costs Money: A Hybrid Scheduling Approach for Cheaper FaaSProceedings of the 25th International Middleware Conference10.1145/3652892.3700757(172-184)Online publication date: 2-Dec-2024
  • (2024)Enoki: High Velocity Linux Kernel Scheduler DevelopmentProceedings of the Nineteenth European Conference on Computer Systems10.1145/3627703.3629569(962-980)Online publication date: 22-Apr-2024
  • (2024)Expanding Datacenter Capacity with DVFS Boosting: A safe and scalable deployment experienceProceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 110.1145/3617232.3624853(150-165)Online publication date: 27-Apr-2024
  • (2024)AppSteer: Framework for Improving Multicore Scalability of Network Functions via Application-aware Packet Steering2024 IEEE 24th International Symposium on Cluster, Cloud and Internet Computing (CCGrid)10.1109/CCGrid59990.2024.00012(18-27)Online publication date: 6-May-2024
  • (2023)Machine learning in run-time control of multicore processor systemsit - Information Technology10.1515/itit-2023-005665:4-5(164-176)Online publication date: 2-Aug-2023
  • (2023)Towards Latency-Aware Linux Scheduling for Serverless WorkloadsProceedings of the 1st Workshop on SErverless Systems, Applications and MEthodologies10.1145/3592533.3592807(19-26)Online publication date: 8-May-2023
  • (2023)Efficient Scheduler Live Update for Linux Kernel with ModularizationProceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 310.1145/3582016.3582054(194-207)Online publication date: 25-Mar-2023
  • (2023)Capturing and Predicting User Frustration to Support a Smart Operating SystemCompanion Proceedings of the 28th International Conference on Intelligent User Interfaces10.1145/3581754.3584124(29-32)Online publication date: 27-Mar-2023
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media