Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
SlideShare a Scribd company logo
Understanding the impact of Linux scheduler on your application
Copyright © 2017, Oracle and/or its affiliates. All rights reserved.
Understanding the impact of Scheduler
on your application
Atish Patra
Dhaval Giani
Linux Kernel Developer, Oracle
Copyright © 2017, Oracle and/or its affiliates. All rights reserved.
"You don't need to know how to operate an X-ray machine, but you do need
to know that if you swallow a penny, an X-ray is an option."
- Brendan Gregg's friend
Copyright © 2017, Oracle and/or its affiliates. All rights reserved.
Program Agenda
Introduction to Scheduler
Load Balancing
Application Analysis
Scheduler Tunables
Conclusion
1
2
3
4
5
4
Copyright © 2017, Oracle and/or its affiliates. All rights reserved.
Introduction
5
Copyright © 2017, Oracle and/or its affiliates. All rights reserved.
Linux Scheduler
• Who gets to run when (and where)
6
Copyright © 2017, Oracle and/or its affiliates. All rights reserved.
Completely Fair Scheduler
• Two tasks
• T1 runs
• Followed by T2
• Followed by T1
• …
7
Copyright © 2017, Oracle and/or its affiliates. All rights reserved.
How long?
• T1 has nice -1 and T2 has nice 0
• T1 gets to run twice as long as T2!
• nice
• weight
• vruntime
vruntime ⍺ 1/weight
Copyright © 2017, Oracle and/or its affiliates. All rights reserved.
Group Scheduling
• Consider Apache spawns 1000 threads for every Oracle thread.
• Does that mean Apache gets to run 1000 times as often?
• Enter Group Scheduling
• Apache Group has all the Apache threads
• Oracle Group has all the Oracle threads
• Oracle and Apache groups have their own nice values
9
Copyright © 2017, Oracle and/or its affiliates. All rights reserved.
Symmetric Multiprocessing
10
CPU 0
CPU 3CPU 2
CPU 1
Last Level Cache (LLC)
Last Level Cache (LLC)
Copyright © 2017, Oracle and/or its affiliates. All rights reserved.
Load Balancing
• Consider
• 4 CPUs (quad core laptops!)
• 4 tasks (let’s consider same nice value)
• Each CPU gets 1
• Now consider
• 4 CPUs
• 5 tasks (1 has nice value 1)
• Now what?
11
Copyright © 2017, Oracle and/or its affiliates. All rights reserved.
Load Balancing with group scheduling
• Consider Apache with 1000 threads
• Consider Oracle with 4 threads
• Apache and Oracle have same weight
• 4 CPUs. Apache gets 2, Oracle gets 2
• Now what if we had a third group with 3 threads?
12
Copyright © 2017, Oracle and/or its affiliates. All rights reserved.
Non Uniform Memory Access
13
CPU 0 CPU 1 CPU 4 CPU 5
CPU 2 CPU 3 CPU 6 CPU 7
Last Level Cache (LLC)
Last Level Cache (LLC)
Last Level Cache (LLC)
Last Level Cache (LLC)
Copyright © 2017, Oracle and/or its affiliates. All rights reserved.
Load Balancing
• Problems:
– Who runs where ?
– Who moves where ?
– How much movement ?
• Too much: tasks play ping pong!
• Too less: cores go idle!
14
Copyright © 2017, Oracle and/or its affiliates. All rights reserved.
Scheduling domains/groups
15
Copyright © 2017, Oracle and/or its affiliates. All rights reserved.
Why scheduling domains?
• Issues
– System wide task movement and search cost
– Hardware properties
– Solution
– A hierarchal search to reduce the cost
16
Copyright © 2017, Oracle and/or its affiliates. All rights reserved.
Scheduling Domains - Example
0 1 2 3
4 5 6 7
8 9 10 11
12 13 14 15
Last Level Cache
Last Level Cache
Last Level Cache
Last Level Cache
Copyright © 2017, Oracle and/or its affiliates. All rights reserved.
Scheduling Domains - SMT Domain
0 1 2 3
4 5 6 7
8 9 10 11
12 13 14 15
Last Level Cache
Last Level Cache
Last Level Cache
Last Level Cache
Copyright © 2017, Oracle and/or its affiliates. All rights reserved.
Scheduling Domains - MC Domain
0 1 2 3
4 5 6 7
8 9 10 11
12 13 14 15
Last Level Cache
Last Level Cache
Last Level Cache
Last Level Cache
Copyright © 2017, Oracle and/or its affiliates. All rights reserved.
Scheduling Domains - Die/CPU Domain
0 1 2 3
4 5 6 7
8 9 10 11
12 13 14 15
Last Level Cache
Last Level Cache
Last Level Cache
Last Level Cache
Copyright © 2017, Oracle and/or its affiliates. All rights reserved.
Scheduling Domains - NUMA Domain
0 1 2 3
4 5 6 7
8 9 10 11
12 13 14 15
Last Level Cache
Last Level Cache
Last Level Cache
Last Level Cache
Copyright © 2017, Oracle and/or its affiliates. All rights reserved.
Load Balancing (Pull)
• Periodic balancing
– Happens at every scheduling tick
– Trace domain to find imbalance
– One CPU performs the search
– Frequency reduces as we traverse up the hierarchy
22
Copyright © 2017, Oracle and/or its affiliates. All rights reserved.
0 1 2 3
4 5 6 7
8 9 10 11
12 13 14 15
Last Level Cache (LLC)
Last Level Cache (LLC)
Last Level Cache (LLC)
Last Level Cache (LLC)
Copyright © 2017, Oracle and/or its affiliates. All rights reserved.
0 1 2 3
4 5 6 7
8 9 10 11
12 13 14 15
Last Level Cache (LLC)
Last Level Cache (LLC)
Last Level Cache (LLC)
Last Level Cache (LLC)
Copyright © 2017, Oracle and/or its affiliates. All rights reserved.
0 1 2 3
4 5 6 7
8 9 10 11
12 13 14 15
Last Level Cache (LLC)
Last Level Cache (LLC)
Last Level Cache (LLC)
Last Level Cache (LLC)
Copyright © 2017, Oracle and/or its affiliates. All rights reserved.
0 1 2 3
4 5 6 7
8 9 10 11
12 13 14 15
Last Level Cache (LLC)
Last Level Cache (LLC)
Last Level Cache (LLC)
Last Level Cache (LLC)
Copyright © 2017, Oracle and/or its affiliates. All rights reserved.
T1
T2
T3
T4
• New Idle balancing
– average idle time > migration_cost
– average idle time > domain search
cost
– Needs SD_BALANCE_NEWIDLE
enabled
Last Level Cache (LLC)
Last Level Cache (LLC)
Copyright © 2017, Oracle and/or its affiliates. All rights reserved.
• New Idle balancing
– average idle time > migration_cost
– average idle time > domain search
cost
– Needs SD_BALANCE_NEWIDLE
enabled
T1T2
T3T4
Last Level Cache (LLC)
Last Level Cache (LLC)
Copyright © 2017, Oracle and/or its affiliates. All rights reserved.
• Wakeup load balancing
– cache affinity
– waker-wakee relationship
– search in the same LLC domain
T1
T2
T3T4
Last Level Cache (LLC)
Last Level Cache (LLC)
Copyright © 2017, Oracle and/or its affiliates. All rights reserved.
• Wakeup load balancing
– cache affinity
– waker-wakee relationship
– search in the same LLC domain
T2
T3T4
Last Level Cache (LLC)
Last Level Cache (LLC)
Copyright © 2017, Oracle and/or its affiliates. All rights reserved.
• Wakeup load balancing
– cache affinity
– waker-wakee relationship
– search in the same LLC domain
T2
T3T4
Last Level Cache (LLC)
Last Level Cache (LLC)
T1
Copyright © 2017, Oracle and/or its affiliates. All rights reserved.
• Wakeup load balancing
– cache affinity
– waker-wakee relationship
– search in the same LLC domain
T2
T3T4
Last Level Cache (LLC)
Last Level Cache (LLC)
T1
T5
Copyright © 2017, Oracle and/or its affiliates. All rights reserved.
• Fork/Exec load balancing
– Cache affinity might not matter
– Find the least loaded Group
– Find the least loaded CPU
– Needs SD_BALANCE_FORK/EXEC
enabled
T2
T5T4
Last Level Cache (LLC)
Last Level Cache (LLC)
T1
T3 T6
T7
Copyright © 2017, Oracle and/or its affiliates. All rights reserved.
• Fork/Exec load balancing
– Cache affinity might not matter
– Find the least loaded Group
– Find the least loaded CPU
– Needs SD_BALANCE_FORK/EXEC
enabled
T2
T5T4
Last Level Cache (LLC)
Last Level Cache (LLC)
T1
T3 T6
T7
T8
Copyright © 2017, Oracle and/or its affiliates. All rights reserved.
Application Analysis
Copyright © 2017, Oracle and/or its affiliates. All rights reserved.
Application Analysis - Tools (Our experience)
• Manual instrumentation
– printk/pr_* : NO!
– trace_printk(): Maybe!
• Issues
– Kernel modification
– Parsing difficulty
36
Copyright © 2017, Oracle and/or its affiliates. All rights reserved.
Application Analysis - perf
• Perf
– call graphs
# sudo perf record -a -g -e cycles perf bench sched 
messaging -g 1000 -l 10000
# sudo perf report --no-children -g 
-s symbol > perf_callg.out
• Difficult to process in this format
37
Copyright © 2017, Oracle and/or its affiliates. All rights reserved.
Application Analysis - perf
• Flamegraph is immensely useful
$ sudo perf script | 
./stackcollapse-perf.pl > out.perf-folded
$./flamegraph.pl out.perf-folded > perf-kernel.svg
38
Copyright © 2017, Oracle and/or its affiliates. All rights reserved.
Application Analysis - perf
• Use perf stat for migrations and cache-misses
$ sudo perf stat -a -e migrations -e cache-misses 
-e cache-references perf bench sched messaging 
-g 100 -l 1000
Performance counter stats for 'system wide':
7,847,249 migrations
20,950,769,373 cache-misses # 32.308 % of all cache refs (100.00%)
64,847,047,418 cache-references
Copyright © 2017, Oracle and/or its affiliates. All rights reserved.
Application Analysis - Ftrace
•trace-cmd or manual
•Able to trace each task wakeup/migration events
#trace-cmd record -e sched:sched_wakeup_new -e sched:sched_migrate_task perf bench sched messaging -g 100 -l 100
•trace-cmd spawns a process on each processor
•Might be tricky if task migrations/wakeup analysis
40
Copyright © 2017, Oracle and/or its affiliates. All rights reserved.
Application Analysis - Impact of Scheduling domains/
groups
41
• /proc/schedstat
– scheduling domains/groups follow hardware ?
– load balancing statistics
– Too many new idle balances especially across NUMA ?
• Try to reduce relax_domain_level
Copyright © 2017, Oracle and/or its affiliates. All rights reserved.
Application Analysis - mpstat
• mpstat -P ALL -u 5
42
Copyright © 2017, Oracle and/or its affiliates. All rights reserved.
Application Analysis - eBPF
• New kid on the block
•Find cpu idleness due to scheduler
•https://github.com/iovisor/bcc/blob/master/tools/
cpuunclaimed.py
43
Copyright © 2017, Oracle and/or its affiliates. All rights reserved.
Application Analysis - eBPF
•Find run-queue length distribution
•https://github.com/iovisor/bcc/
blob/master/tools/runqlen.py
44
Copyright © 2017, Oracle and/or its affiliates. All rights reserved.
Scheduler Tunables
Copyright © 2017, Oracle and/or its affiliates. All rights reserved.
Scheduler Tunables
46
•sysctl -A | grep "sched" | grep -v “domain”
kernel.sched_cfs_bandwidth_slice_us = 5000
kernel.sched_child_runs_first = 0
kernel.sched_latency_ns = 24000000 (24ms)
kernel.sched_migration_cost_ns = 500000 (0.5ms)
kernel.sched_min_granularity_ns = 10000000 (10ms)
kernel.sched_nr_migrate = 32
kernel.sched_schedstats = 0
kernel.sched_time_avg_ms = 1000
kernel.sched_wakeup_granularity_ns = 15000000 (15ms)
Copyright © 2017, Oracle and/or its affiliates. All rights reserved.
Scheduler Tunables
47
• sched_migration_cost_ns: Is task cache-hot ?
Task bouncing Cores go idle
Less wakeup preemption
Reduces scheduling
delay
• sched_wakeup_granularity_ns: To preempt or not to ?
Copyright © 2017, Oracle and/or its affiliates. All rights reserved.
Scheduler Tunables
48
• sched_latency_ns: Period where each task guaranteed is to run
• sched_min_granularity_ns : Minimum guaranteed runtime for a task
• sched period = max(sched_latency, nr_tasks*sched_min_granularity)
Reduces latency Improves throughput
sched_min_granularity_ns
Copyright © 2017, Oracle and/or its affiliates. All rights reserved.
Conclusions
• Understand your application
• How does it interact with the system?
• Use the tools at your disposal
• Is it a scheduler issue?
• Experiment with different tunables
• Tune one at a time
• Might be a kernel bug!
Copyright © 2017, Oracle and/or its affiliates. All rights reserved.
Thank You!
atish.patra@oracle.com
dhaval.giani@oracle.com

More Related Content

Understanding the impact of Linux scheduler on your application

  • 2. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. Understanding the impact of Scheduler on your application Atish Patra Dhaval Giani Linux Kernel Developer, Oracle
  • 3. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. "You don't need to know how to operate an X-ray machine, but you do need to know that if you swallow a penny, an X-ray is an option." - Brendan Gregg's friend
  • 4. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. Program Agenda Introduction to Scheduler Load Balancing Application Analysis Scheduler Tunables Conclusion 1 2 3 4 5 4
  • 5. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. Introduction 5
  • 6. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. Linux Scheduler • Who gets to run when (and where) 6
  • 7. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. Completely Fair Scheduler • Two tasks • T1 runs • Followed by T2 • Followed by T1 • … 7
  • 8. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. How long? • T1 has nice -1 and T2 has nice 0 • T1 gets to run twice as long as T2! • nice • weight • vruntime vruntime ⍺ 1/weight
  • 9. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. Group Scheduling • Consider Apache spawns 1000 threads for every Oracle thread. • Does that mean Apache gets to run 1000 times as often? • Enter Group Scheduling • Apache Group has all the Apache threads • Oracle Group has all the Oracle threads • Oracle and Apache groups have their own nice values 9
  • 10. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. Symmetric Multiprocessing 10 CPU 0 CPU 3CPU 2 CPU 1 Last Level Cache (LLC) Last Level Cache (LLC)
  • 11. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. Load Balancing • Consider • 4 CPUs (quad core laptops!) • 4 tasks (let’s consider same nice value) • Each CPU gets 1 • Now consider • 4 CPUs • 5 tasks (1 has nice value 1) • Now what? 11
  • 12. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. Load Balancing with group scheduling • Consider Apache with 1000 threads • Consider Oracle with 4 threads • Apache and Oracle have same weight • 4 CPUs. Apache gets 2, Oracle gets 2 • Now what if we had a third group with 3 threads? 12
  • 13. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. Non Uniform Memory Access 13 CPU 0 CPU 1 CPU 4 CPU 5 CPU 2 CPU 3 CPU 6 CPU 7 Last Level Cache (LLC) Last Level Cache (LLC) Last Level Cache (LLC) Last Level Cache (LLC)
  • 14. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. Load Balancing • Problems: – Who runs where ? – Who moves where ? – How much movement ? • Too much: tasks play ping pong! • Too less: cores go idle! 14
  • 15. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. Scheduling domains/groups 15
  • 16. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. Why scheduling domains? • Issues – System wide task movement and search cost – Hardware properties – Solution – A hierarchal search to reduce the cost 16
  • 17. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. Scheduling Domains - Example 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Last Level Cache Last Level Cache Last Level Cache Last Level Cache
  • 18. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. Scheduling Domains - SMT Domain 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Last Level Cache Last Level Cache Last Level Cache Last Level Cache
  • 19. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. Scheduling Domains - MC Domain 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Last Level Cache Last Level Cache Last Level Cache Last Level Cache
  • 20. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. Scheduling Domains - Die/CPU Domain 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Last Level Cache Last Level Cache Last Level Cache Last Level Cache
  • 21. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. Scheduling Domains - NUMA Domain 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Last Level Cache Last Level Cache Last Level Cache Last Level Cache
  • 22. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. Load Balancing (Pull) • Periodic balancing – Happens at every scheduling tick – Trace domain to find imbalance – One CPU performs the search – Frequency reduces as we traverse up the hierarchy 22
  • 23. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Last Level Cache (LLC) Last Level Cache (LLC) Last Level Cache (LLC) Last Level Cache (LLC)
  • 24. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Last Level Cache (LLC) Last Level Cache (LLC) Last Level Cache (LLC) Last Level Cache (LLC)
  • 25. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Last Level Cache (LLC) Last Level Cache (LLC) Last Level Cache (LLC) Last Level Cache (LLC)
  • 26. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Last Level Cache (LLC) Last Level Cache (LLC) Last Level Cache (LLC) Last Level Cache (LLC)
  • 27. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. T1 T2 T3 T4 • New Idle balancing – average idle time > migration_cost – average idle time > domain search cost – Needs SD_BALANCE_NEWIDLE enabled Last Level Cache (LLC) Last Level Cache (LLC)
  • 28. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. • New Idle balancing – average idle time > migration_cost – average idle time > domain search cost – Needs SD_BALANCE_NEWIDLE enabled T1T2 T3T4 Last Level Cache (LLC) Last Level Cache (LLC)
  • 29. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. • Wakeup load balancing – cache affinity – waker-wakee relationship – search in the same LLC domain T1 T2 T3T4 Last Level Cache (LLC) Last Level Cache (LLC)
  • 30. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. • Wakeup load balancing – cache affinity – waker-wakee relationship – search in the same LLC domain T2 T3T4 Last Level Cache (LLC) Last Level Cache (LLC)
  • 31. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. • Wakeup load balancing – cache affinity – waker-wakee relationship – search in the same LLC domain T2 T3T4 Last Level Cache (LLC) Last Level Cache (LLC) T1
  • 32. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. • Wakeup load balancing – cache affinity – waker-wakee relationship – search in the same LLC domain T2 T3T4 Last Level Cache (LLC) Last Level Cache (LLC) T1 T5
  • 33. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. • Fork/Exec load balancing – Cache affinity might not matter – Find the least loaded Group – Find the least loaded CPU – Needs SD_BALANCE_FORK/EXEC enabled T2 T5T4 Last Level Cache (LLC) Last Level Cache (LLC) T1 T3 T6 T7
  • 34. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. • Fork/Exec load balancing – Cache affinity might not matter – Find the least loaded Group – Find the least loaded CPU – Needs SD_BALANCE_FORK/EXEC enabled T2 T5T4 Last Level Cache (LLC) Last Level Cache (LLC) T1 T3 T6 T7 T8
  • 35. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. Application Analysis
  • 36. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. Application Analysis - Tools (Our experience) • Manual instrumentation – printk/pr_* : NO! – trace_printk(): Maybe! • Issues – Kernel modification – Parsing difficulty 36
  • 37. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. Application Analysis - perf • Perf – call graphs # sudo perf record -a -g -e cycles perf bench sched messaging -g 1000 -l 10000 # sudo perf report --no-children -g -s symbol > perf_callg.out • Difficult to process in this format 37
  • 38. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. Application Analysis - perf • Flamegraph is immensely useful $ sudo perf script | ./stackcollapse-perf.pl > out.perf-folded $./flamegraph.pl out.perf-folded > perf-kernel.svg 38
  • 39. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. Application Analysis - perf • Use perf stat for migrations and cache-misses $ sudo perf stat -a -e migrations -e cache-misses -e cache-references perf bench sched messaging -g 100 -l 1000 Performance counter stats for 'system wide': 7,847,249 migrations 20,950,769,373 cache-misses # 32.308 % of all cache refs (100.00%) 64,847,047,418 cache-references
  • 40. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. Application Analysis - Ftrace •trace-cmd or manual •Able to trace each task wakeup/migration events #trace-cmd record -e sched:sched_wakeup_new -e sched:sched_migrate_task perf bench sched messaging -g 100 -l 100 •trace-cmd spawns a process on each processor •Might be tricky if task migrations/wakeup analysis 40
  • 41. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. Application Analysis - Impact of Scheduling domains/ groups 41 • /proc/schedstat – scheduling domains/groups follow hardware ? – load balancing statistics – Too many new idle balances especially across NUMA ? • Try to reduce relax_domain_level
  • 42. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. Application Analysis - mpstat • mpstat -P ALL -u 5 42
  • 43. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. Application Analysis - eBPF • New kid on the block •Find cpu idleness due to scheduler •https://github.com/iovisor/bcc/blob/master/tools/ cpuunclaimed.py 43
  • 44. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. Application Analysis - eBPF •Find run-queue length distribution •https://github.com/iovisor/bcc/ blob/master/tools/runqlen.py 44
  • 45. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. Scheduler Tunables
  • 46. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. Scheduler Tunables 46 •sysctl -A | grep "sched" | grep -v “domain” kernel.sched_cfs_bandwidth_slice_us = 5000 kernel.sched_child_runs_first = 0 kernel.sched_latency_ns = 24000000 (24ms) kernel.sched_migration_cost_ns = 500000 (0.5ms) kernel.sched_min_granularity_ns = 10000000 (10ms) kernel.sched_nr_migrate = 32 kernel.sched_schedstats = 0 kernel.sched_time_avg_ms = 1000 kernel.sched_wakeup_granularity_ns = 15000000 (15ms)
  • 47. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. Scheduler Tunables 47 • sched_migration_cost_ns: Is task cache-hot ? Task bouncing Cores go idle Less wakeup preemption
Reduces scheduling delay • sched_wakeup_granularity_ns: To preempt or not to ?
  • 48. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. Scheduler Tunables 48 • sched_latency_ns: Period where each task guaranteed is to run • sched_min_granularity_ns : Minimum guaranteed runtime for a task • sched period = max(sched_latency, nr_tasks*sched_min_granularity) Reduces latency Improves throughput sched_min_granularity_ns
  • 49. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. Conclusions • Understand your application • How does it interact with the system? • Use the tools at your disposal • Is it a scheduler issue? • Experiment with different tunables • Tune one at a time • Might be a kernel bug!
  • 50. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. Thank You! atish.patra@oracle.com dhaval.giani@oracle.com