research-article

COLAB: a collaborative multi-factor scheduler for asymmetric multicore processors

Authors:

Pavlos Petoumenos,

Vladimir Janjic,

John ThomsonAuthors Info & Claims

CGO '20: Proceedings of the 18th ACM/IEEE International Symposium on Code Generation and Optimization

Pages 268 - 279

https://doi.org/10.1145/3368826.3377915

Published: 22 February 2020 Publication History

Abstract

Increasingly prevalent asymmetric multicore processors (AMP) are necessary for delivering performance in the era of limited power budget and dark silicon. However, the software fails to use them efficiently. OS schedulers, in particular, handle asymmetry only under restricted scenarios. We have efficient symmetric schedulers, efficient asymmetric schedulers for single-threaded workloads, and efficient asymmetric schedulers for single program workloads. What we do not have is a scheduler that can handle all runtime factors affecting AMP for multi-threaded multi-programmed workloads.

This paper introduces the first general purpose asymmetry-aware scheduler for multi-threaded multi-programmed workloads. It estimates the performance of each thread on each type of core and identifies communication patterns and bottleneck threads. The scheduler then makes coordinated core assignment and thread selection decisions that still provide each application its fair share of the processor's time.

We evaluate our approach using the GEM5 simulator on four distinct big.LITTLE configurations and 26 mixed workloads composed of PARSEC and SPLASH2 benchmarks. Compared to the state-of-the art Linux CFS and AMP-aware schedulers, we demonstrate performance gains of up to 25% and 5% to 15% on average depending on the hardware setup.

References

[1]

ARM. 2016. http://infocenter.arm.com/help/index.jsp?topic=/com .arm.doc. ddi0388e/ BEHEDIHI.html. In ARM Cortex-A57 Technical Reference Manual.

[2]

Michela Becchi and Patrick Crowley. 2006. Dynamic thread assignment on heterogeneous multiprocessor architectures. In Proceedings of the 3rd conference on Computing frontiers (CF). ACM.

Digital Library

[3]

Christian Bienia. 2011. Benchmarking Modern Multiprocessors. Ph.D. Dissertation. Princeton University.

Digital Library

[4]

Christian Bienia, Sanjeev Kumar, Jaswinder Pal Singh, and Kai Li. 2008. The PARSEC Benchmark Suite: Characterization and Architectural Implications. In Proceedings of the 17th International Conference on Parallel Architectures and Compilation Techniques (PACT).

Digital Library

[5]

Nathan Binkert, Bradford Beckmann, Gabriel Black, Steven K Reinhardt, Ali Saidi, Arkaprava Basu, Joel Hestness, Derek R Hower, Tushar Krishna, Somayeh Sardashti, et al. 2011. The gem5 simulator. ACM SIGARCH Computer Architecture News 39, 2 (2011), 1–7.

Digital Library

[6]

Ting Cao, Stephen M Blackburn, Tiejun Gao, and Kathryn S McKinley. 2012. The yin and yang of power and performance for asymmetric hardware and managed software. In Proceedings of the 39th Annual International Symposium on Computer Architecture (ISCA).

Digital Library

[7]

Kallia Chronaki, Alejandro Rico, Marc Casas, Miquel Moretó, Rosa M Badia, Eduard Ayguadé, Jesus Labarta, and Mateo Valero. 2017. Task scheduling techniques for asymmetric multi-core systems. IEEE Transactions on Parallel and Distributed Systems (TPDS) 28, 7 (2017), 2074– 2087.

Digital Library

[8]

Kristof Du Bois, Stijn Eyerman, Jennifer B Sartor, and Lieven Eeckhout. 2013. Criticality stacks: Identifying critical threads in parallel programs using synchronization behavior. In Proceedings of the 40th Annual International Symposium on Computer Architecture (ISCA).

Digital Library

[9]

Stijn Eyerman and Lieven Eeckhout. 2008. System-level performance metrics for multiprogram workloads. IEEE micro 28, 3 (2008).

Digital Library

[10]

Jian-Jun Han, Xin Tao, Dakai Zhu, Hakan Aydin, Zili Shao, and Laurence T Yang. 2018. Multicore Mixed-Criticality Systems: Partitioned Scheduling and Utilization Bound. IEEE Transactions on ComputerAided Design of Integrated Circuits and Systems (TCAD) 37, 1 (2018), 21–34.

[11]

Brian Jeff. 2013. big.LITTLE technology moves towards fully heterogeneous global task scheduling. In ARM White Paper.

[12]

Ivan Jibaja, Ting Cao, Stephen M Blackburn, and Kathryn S McKinley. 2016. Portable performance on asymmetric multicore processors. In Proceedings of the 2016 International Symposium on Code Generation and Optimization (CGO).

Digital Library

[13]

José A Joao, M Aater Suleman, Onur Mutlu, and Yale N Patt. 2012. Bottleneck identification and scheduling in multithreaded applications. In Proceedings of the 17th international Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS).

Digital Library

[14]

José A Joao, M Aater Suleman, Onur Mutlu, and Yale N Patt. 2013. Utility-based acceleration of multithreaded applications on asymmetric CMPs. In Proceedings of the 40th Annual International Symposium on Computer Architecture (ISCA).

Digital Library

[15]

Changdae Kim and Jaehyuk Huh. 2016. Fairness-oriented OS scheduling support for multicore systems. In Proceedings of the 2016 ACM International Conference on Supercomputing (ICS).

Digital Library

[16]

Changdae Kim and Jaehyuk Huh. 2018. Exploring the Design Space of Fair Scheduling Supports for Asymmetric Multicore Systems. IEEE Transactions on Computers (TC) (2018).

Digital Library

[17]

Rakesh Kumar, Keith I Farkas, Norman P Jouppi, Parthasarathy Ranganathan, and Dean M Tullsen. 2003. Single-ISA heterogeneous multicore architectures: The potential for processor power reduction. In Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[18]

Rakesh Kumar, Dean M. Tullsen, Norman P. Jouppi, and Parthasarathy Ranganathan. 2005. Heterogeneous Chip Multiprocessors. Computer 38, 11 (Nov. 2005), 32–38.

Digital Library

[19]

Rakesh Kumar, Dean M Tullsen, Parthasarathy Ranganathan, Norman P Jouppi, and Keith I Farkas. 2004. Single-ISA heterogeneous multi-core architectures for multithreaded workload performance. In Proceedings of the 31th Annual International Symposium on Computer Architecture (ISCA).

[20]

Tong Li, Dan Baumberger, and Scott Hahn. 2009. Efficient and scalable multiprocessor fair scheduling using distributed weighted round-robin. In Proceedings of the 14th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP).

Digital Library

[21]

Tong Li, Dan Baumberger, David A Koufaty, and Scott Hahn. 2007. Efficient operating system scheduling for performance-asymmetric multi-core architectures. In Supercomputing, 2007. (SC). Proceedings of the 2007 ACM/IEEE Conference on. IEEE.

Digital Library

[22]

Sparsh Mittal. 2016. A survey of techniques for architecting and managing asymmetric multicore processors. ACM Computing Surveys (CSUR) 48, 3 (2016), 45.

[23]

Ingo Molnar. 2007. CFS scheduler. In Linux, Vol. 2. 36.

[24]

Juan Carlos Saez, Alexandra Fedorova, David Koufaty, and Manuel Prieto. 2012. Leveraging core specialization via OS scheduling to improve performance on asymmetric multicore systems. ACM Transactions on Computer Systems (TOCS) 30, 2 (2012), 6.

Digital Library

[25]

Volker Seeker, Pavlos Petoumenos, Hugh Leather, and Björn Franke. 2014. Measuring qoe of interactive workloads and characterising frequency governors on mobile devices. In 2014 IEEE International Symposium on Workload Characterization (IISWC). IEEE, 61–70.

[26]

Gabriel Southern and Jose Renau. 2016. Analysis of PARSEC workload scalability. In Performance Analysis of Systems and Software (ISPASS), 2016 IEEE International Symposium on. IEEE.

[27]

M Aater Suleman, Onur Mutlu, Moinuddin K Qureshi, and Yale N Patt. 2009. Accelerating critical section execution with asymmetric multicore architectures. In Proceedings of the 14th international Conference on Architectural Support for Programming Languages and Operating systems (ASPLOS).

Digital Library

[28]

Kenzo Van Craeynest, Shoaib Akram, Wim Heirman, Aamer Jaleel, and Lieven Eeckhout. 2013. Fairness-aware scheduling on single-ISA heterogeneous multi-cores. In Proceedings of the 22nd international conference on Parallel Architectures and Compilation Techniques (PACT).

[29]

Kenzo Van Craeynest, Aamer Jaleel, Lieven Eeckhout, Paolo Narvaez, and Joel Emer. 2012. Scheduling heterogeneous multi-cores through performance impact estimation (PIE). In Proceedings of the 39th Annual International Symposium on Computer Architecture (ISCA).

[30]

Xiaodong Wang and José F Martínez. 2016. ReBudget: Trading off efficiency vs. fairness in market-based multicore resource allocation via runtime budget reassignment. In Proceedings of the 21th international Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS).

Digital Library

[31]

Ian H Witten, Eibe Frank, Mark A Hall, and Christopher J Pal. 2016. Data Mining: Practical machine learning tools and techniques. Morgan Kaufmann.

Digital Library

[32]

Steven Cameron Woo, Moriyoshi Ohara, Evan Torrie, Jaswinder Pal Singh, and Anoop Gupta. 1995. The SPLASH-2 programs: Characterization and methodological considerations. In Proceedings of the 22th Annual International Symposium on Computer Architecture (ISCA).

Digital Library

[33]

Seyed Majid Zahedi, Qiuyun Llull, and Benjamin C Lee. 2018. Amdahlś Law in the Datacenter Era: A Market for Fair Processor Allocation. In 2018 IEEE International Symposium on High Performance Computer Architecture (HPCA). IEEE.

Cited By

Fang JXu YKong HCai M(2023)A prefetch control strategy based on improved hill-climbing method in asymmetric multi-core architectureThe Journal of Supercomputing10.1007/s11227-023-05078-679:10(10570-10588)Online publication date: 11-Feb-2023
https://dl.acm.org/doi/10.1007/s11227-023-05078-6
Zhai JJin YChen WZheng WZhai JJin YChen WZheng W(2023)Graph Analysis for Scalability AnalysisPerformance Analysis of Parallel Applications for HPC10.1007/978-981-99-4366-1_5(101-128)Online publication date: 19-Jun-2023
https://doi.org/10.1007/978-981-99-4366-1_5
Mahmood BAhmad NKhan MAkhunzada A(2021)Dynamic Priority Real-Time Scheduling on Power Asymmetric Multicore ProcessorsSymmetry10.3390/sym1308148813:8(1488)Online publication date: 13-Aug-2021
https://doi.org/10.3390/sym13081488
Show More Cited By

Index Terms

COLAB: a collaborative multi-factor scheduler for asymmetric multicore processors
1. Computer systems organization
  1. Architectures
    1. Parallel architectures
      1. Multicore architectures
  2. Real-time systems
    1. Real-time operating systems
2. Software and its engineering
  1. Software notations and tools
    1. Compilers
      1. Runtime environments

Recommendations

A Survey of Techniques for Architecting and Managing Asymmetric Multicore Processors

To meet the needs of a diverse range of workloads, asymmetric multicore processors (AMPs) have been proposed, which feature cores of different microarchitecture or ISAs. However, given the diversity inherent in their design and application scenarios, ...
Portable performance on asymmetric multicore processors
CGO '16: Proceedings of the 2016 International Symposium on Code Generation and Optimization

Static and dynamic power constraints are steering chip manufacturers to build single-ISA Asymmetric Multicore Processors (AMPs) with big and small cores. To deliver on their energy efficiency potential, schedulers must consider core sensitivity, load ...
A comprehensive scheduler for asymmetric multicore systems
EuroSys '10: Proceedings of the 5th European conference on Computer systems

Symmetric-ISA (instruction set architecture) asymmetric-performance multicore processors were shown to deliver higher performance per watt and area for applications with diverse architectural requirements, and so it is likely that future multicore ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

CGO '20: Proceedings of the 18th ACM/IEEE International Symposium on Code Generation and Optimization

February 2020

329 pages

ISBN:9781450370479

DOI:10.1145/3368826

General Chairs:
Jason Mars
University of Michigan, USA
,
Lingjia Tang
University of Michigan, USA
,
Program Chairs:
Jingling Xue
UNSW, Australia
,
Peng Wu
Futurewei Technologies, USA

Copyright © 2020 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGPLAN: ACM Special Interest Group on Programming Languages
SIGMICRO: ACM Special Interest Group on Microarchitectural Research and Processing
IEEE-CS: Computer Society

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 22 February 2020

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

Engineering and Physical Sciences Research Council

Conference

CGO '20

Sponsor:

CGO '20: 18th ACM/IEEE International Symposium on Code Generation and Optimization

February 22 - 26, 2020

CA, San Diego, USA

Acceptance Rates

Overall Acceptance Rate 312 of 1,061 submissions, 29%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

7
Total Citations
View Citations
324
Total Downloads

Downloads (Last 12 months)31
Downloads (Last 6 weeks)1

Reflects downloads up to 22 Sep 2024

Other Metrics

View Author Metrics

Citations

Cited By

Fang JXu YKong HCai M(2023)A prefetch control strategy based on improved hill-climbing method in asymmetric multi-core architectureThe Journal of Supercomputing10.1007/s11227-023-05078-679:10(10570-10588)Online publication date: 11-Feb-2023
https://dl.acm.org/doi/10.1007/s11227-023-05078-6
Zhai JJin YChen WZheng WZhai JJin YChen WZheng W(2023)Graph Analysis for Scalability AnalysisPerformance Analysis of Parallel Applications for HPC10.1007/978-981-99-4366-1_5(101-128)Online publication date: 19-Jun-2023
https://doi.org/10.1007/978-981-99-4366-1_5
Mahmood BAhmad NKhan MAkhunzada A(2021)Dynamic Priority Real-Time Scheduling on Power Asymmetric Multicore ProcessorsSymmetry10.3390/sym1308148813:8(1488)Online publication date: 13-Aug-2021
https://doi.org/10.3390/sym13081488
Yu TZhong RJanjic VPetoumenos PZhai JLeather HThomson J(2021)Collaborative Heterogeneity-Aware OS Scheduler for Asymmetric Multicore ProcessorsIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2020.304527932:5(1224-1237)Online publication date: 1-May-2021
https://doi.org/10.1109/TPDS.2020.3045279
Chen PHe SZhang XChen SHong PYin YSun XChen G(2021)CSWAP: A Self-Tuning Compression Framework for Accelerating Tensor Swapping in GPUs2021 IEEE International Conference on Cluster Computing (CLUSTER)10.1109/Cluster48925.2021.00019(271-282)Online publication date: Sep-2021
https://doi.org/10.1109/Cluster48925.2021.00019
Moolchandani DKumar AMartinez JSarangi S(2020)VisSched: An Auction-Based Scheduler for Vision Workloads on Heterogeneous ProcessorsIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2020.301307639:11(4252-4265)Online publication date: Nov-2020
https://doi.org/10.1109/TCAD.2020.3013076
Jin YWang HYu TTang XHoefler TLiu XZhai J(2020)SCALANA: Automating Scaling Loss Detection with Graph AnalysisSC20: International Conference for High Performance Computing, Networking, Storage and Analysis10.1109/SC41405.2020.00032(1-14)Online publication date: Nov-2020
https://doi.org/10.1109/SC41405.2020.00032

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents