Scalable thread scheduling and global power management for heterogeneous many-core architectures

JA Winter, DH Albonesi, CA Shoemaker - Proceedings of the 19th …, 2010 - dl.acm.org
JA Winter, DH Albonesi, CA Shoemaker
Proceedings of the 19th international conference on Parallel architectures …, 2010dl.acm.org
Future many-core microprocessors are likely to be heterogeneous, by design or due to
variability and defects. The latter type of heterogeneity is especially challenging due to its
unpredictability. To minimize the performance and power impact of these hardware
imperfections, the runtime thread scheduler and global power manager must be nimble
enough to handle such random heterogeneity. With hundreds of cores expected on a single
die in the future, these algorithms must provide high power-performance efficiency, yet …
Future many-core microprocessors are likely to be heterogeneous, by design or due to variability and defects. The latter type of heterogeneity is especially challenging due to its unpredictability. To minimize the performance and power impact of these hardware imperfections, the runtime thread scheduler and global power manager must be nimble enough to handle such random heterogeneity. With hundreds of cores expected on a single die in the future, these algorithms must provide high power-performance efficiency, yet remain scalable with low runtime overhead.
This paper presents a range of scheduling and power management algorithms and performs a detailed evaluation of their effectiveness and scalability on heterogeneous many-core architectures with up to 256 cores. We also conduct a limit study on the potential benefits of coordinating scheduling and power management and demonstrate that coordination yields little benefit. We highlight the scalability limitations of previously proposed thread scheduling algorithms that were designed for small-scale chip multiprocessors and propose a Hierarchical Hungarian Scheduling Algorithm that dramatically reduces the scheduling overhead without loss of accuracy. Finally, we show that the high computational requirements of prior global power management algorithms based on linear programming make them infeasible for many-core chips, and that an algorithm that we call Steepest Drop achieves orders of magnitude lower execution time without sacrificing power-performance efficiency.
ACM Digital Library