Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3368826.3377915acmconferencesArticle/Chapter ViewAbstractPublication PagescgoConference Proceedingsconference-collections
research-article

COLAB: a collaborative multi-factor scheduler for asymmetric multicore processors

Published: 22 February 2020 Publication History

Abstract

Increasingly prevalent asymmetric multicore processors (AMP) are necessary for delivering performance in the era of limited power budget and dark silicon. However, the software fails to use them efficiently. OS schedulers, in particular, handle asymmetry only under restricted scenarios. We have efficient symmetric schedulers, efficient asymmetric schedulers for single-threaded workloads, and efficient asymmetric schedulers for single program workloads. What we do not have is a scheduler that can handle all runtime factors affecting AMP for multi-threaded multi-programmed workloads.
This paper introduces the first general purpose asymmetry-aware scheduler for multi-threaded multi-programmed workloads. It estimates the performance of each thread on each type of core and identifies communication patterns and bottleneck threads. The scheduler then makes coordinated core assignment and thread selection decisions that still provide each application its fair share of the processor's time.
We evaluate our approach using the GEM5 simulator on four distinct big.LITTLE configurations and 26 mixed workloads composed of PARSEC and SPLASH2 benchmarks. Compared to the state-of-the art Linux CFS and AMP-aware schedulers, we demonstrate performance gains of up to 25% and 5% to 15% on average depending on the hardware setup.

References

[1]
ARM. 2016. http://infocenter.arm.com/help/index.jsp?topic=/com .arm.doc. ddi0388e/ BEHEDIHI.html. In ARM Cortex-A57 Technical Reference Manual.
[2]
Michela Becchi and Patrick Crowley. 2006. Dynamic thread assignment on heterogeneous multiprocessor architectures. In Proceedings of the 3rd conference on Computing frontiers (CF). ACM.
[3]
Christian Bienia. 2011. Benchmarking Modern Multiprocessors. Ph.D. Dissertation. Princeton University.
[4]
Christian Bienia, Sanjeev Kumar, Jaswinder Pal Singh, and Kai Li. 2008. The PARSEC Benchmark Suite: Characterization and Architectural Implications. In Proceedings of the 17th International Conference on Parallel Architectures and Compilation Techniques (PACT).
[5]
Nathan Binkert, Bradford Beckmann, Gabriel Black, Steven K Reinhardt, Ali Saidi, Arkaprava Basu, Joel Hestness, Derek R Hower, Tushar Krishna, Somayeh Sardashti, et al. 2011. The gem5 simulator. ACM SIGARCH Computer Architecture News 39, 2 (2011), 1–7.
[6]
Ting Cao, Stephen M Blackburn, Tiejun Gao, and Kathryn S McKinley. 2012. The yin and yang of power and performance for asymmetric hardware and managed software. In Proceedings of the 39th Annual International Symposium on Computer Architecture (ISCA).
[7]
Kallia Chronaki, Alejandro Rico, Marc Casas, Miquel Moretó, Rosa M Badia, Eduard Ayguadé, Jesus Labarta, and Mateo Valero. 2017. Task scheduling techniques for asymmetric multi-core systems. IEEE Transactions on Parallel and Distributed Systems (TPDS) 28, 7 (2017), 2074– 2087.
[8]
Kristof Du Bois, Stijn Eyerman, Jennifer B Sartor, and Lieven Eeckhout. 2013. Criticality stacks: Identifying critical threads in parallel programs using synchronization behavior. In Proceedings of the 40th Annual International Symposium on Computer Architecture (ISCA).
[9]
Stijn Eyerman and Lieven Eeckhout. 2008. System-level performance metrics for multiprogram workloads. IEEE micro 28, 3 (2008).
[10]
Jian-Jun Han, Xin Tao, Dakai Zhu, Hakan Aydin, Zili Shao, and Laurence T Yang. 2018. Multicore Mixed-Criticality Systems: Partitioned Scheduling and Utilization Bound. IEEE Transactions on ComputerAided Design of Integrated Circuits and Systems (TCAD) 37, 1 (2018), 21–34.
[11]
Brian Jeff. 2013. big.LITTLE technology moves towards fully heterogeneous global task scheduling. In ARM White Paper.
[12]
Ivan Jibaja, Ting Cao, Stephen M Blackburn, and Kathryn S McKinley. 2016. Portable performance on asymmetric multicore processors. In Proceedings of the 2016 International Symposium on Code Generation and Optimization (CGO).
[13]
José A Joao, M Aater Suleman, Onur Mutlu, and Yale N Patt. 2012. Bottleneck identification and scheduling in multithreaded applications. In Proceedings of the 17th international Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS).
[14]
José A Joao, M Aater Suleman, Onur Mutlu, and Yale N Patt. 2013. Utility-based acceleration of multithreaded applications on asymmetric CMPs. In Proceedings of the 40th Annual International Symposium on Computer Architecture (ISCA).
[15]
Changdae Kim and Jaehyuk Huh. 2016. Fairness-oriented OS scheduling support for multicore systems. In Proceedings of the 2016 ACM International Conference on Supercomputing (ICS).
[16]
Changdae Kim and Jaehyuk Huh. 2018. Exploring the Design Space of Fair Scheduling Supports for Asymmetric Multicore Systems. IEEE Transactions on Computers (TC) (2018).
[17]
Rakesh Kumar, Keith I Farkas, Norman P Jouppi, Parthasarathy Ranganathan, and Dean M Tullsen. 2003. Single-ISA heterogeneous multicore architectures: The potential for processor power reduction. In Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[18]
Rakesh Kumar, Dean M. Tullsen, Norman P. Jouppi, and Parthasarathy Ranganathan. 2005. Heterogeneous Chip Multiprocessors. Computer 38, 11 (Nov. 2005), 32–38.
[19]
Rakesh Kumar, Dean M Tullsen, Parthasarathy Ranganathan, Norman P Jouppi, and Keith I Farkas. 2004. Single-ISA heterogeneous multi-core architectures for multithreaded workload performance. In Proceedings of the 31th Annual International Symposium on Computer Architecture (ISCA).
[20]
Tong Li, Dan Baumberger, and Scott Hahn. 2009. Efficient and scalable multiprocessor fair scheduling using distributed weighted round-robin. In Proceedings of the 14th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP).
[21]
Tong Li, Dan Baumberger, David A Koufaty, and Scott Hahn. 2007. Efficient operating system scheduling for performance-asymmetric multi-core architectures. In Supercomputing, 2007. (SC). Proceedings of the 2007 ACM/IEEE Conference on. IEEE.
[22]
Sparsh Mittal. 2016. A survey of techniques for architecting and managing asymmetric multicore processors. ACM Computing Surveys (CSUR) 48, 3 (2016), 45.
[23]
Ingo Molnar. 2007. CFS scheduler. In Linux, Vol. 2. 36.
[24]
Juan Carlos Saez, Alexandra Fedorova, David Koufaty, and Manuel Prieto. 2012. Leveraging core specialization via OS scheduling to improve performance on asymmetric multicore systems. ACM Transactions on Computer Systems (TOCS) 30, 2 (2012), 6.
[25]
Volker Seeker, Pavlos Petoumenos, Hugh Leather, and Björn Franke. 2014. Measuring qoe of interactive workloads and characterising frequency governors on mobile devices. In 2014 IEEE International Symposium on Workload Characterization (IISWC). IEEE, 61–70.
[26]
Gabriel Southern and Jose Renau. 2016. Analysis of PARSEC workload scalability. In Performance Analysis of Systems and Software (ISPASS), 2016 IEEE International Symposium on. IEEE.
[27]
M Aater Suleman, Onur Mutlu, Moinuddin K Qureshi, and Yale N Patt. 2009. Accelerating critical section execution with asymmetric multicore architectures. In Proceedings of the 14th international Conference on Architectural Support for Programming Languages and Operating systems (ASPLOS).
[28]
Kenzo Van Craeynest, Shoaib Akram, Wim Heirman, Aamer Jaleel, and Lieven Eeckhout. 2013. Fairness-aware scheduling on single-ISA heterogeneous multi-cores. In Proceedings of the 22nd international conference on Parallel Architectures and Compilation Techniques (PACT).
[29]
Kenzo Van Craeynest, Aamer Jaleel, Lieven Eeckhout, Paolo Narvaez, and Joel Emer. 2012. Scheduling heterogeneous multi-cores through performance impact estimation (PIE). In Proceedings of the 39th Annual International Symposium on Computer Architecture (ISCA).
[30]
Xiaodong Wang and José F Martínez. 2016. ReBudget: Trading off efficiency vs. fairness in market-based multicore resource allocation via runtime budget reassignment. In Proceedings of the 21th international Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS).
[31]
Ian H Witten, Eibe Frank, Mark A Hall, and Christopher J Pal. 2016. Data Mining: Practical machine learning tools and techniques. Morgan Kaufmann.
[32]
Steven Cameron Woo, Moriyoshi Ohara, Evan Torrie, Jaswinder Pal Singh, and Anoop Gupta. 1995. The SPLASH-2 programs: Characterization and methodological considerations. In Proceedings of the 22th Annual International Symposium on Computer Architecture (ISCA).
[33]
Seyed Majid Zahedi, Qiuyun Llull, and Benjamin C Lee. 2018. Amdahlś Law in the Datacenter Era: A Market for Fair Processor Allocation. In 2018 IEEE International Symposium on High Performance Computer Architecture (HPCA). IEEE.

Cited By

View all
  • (2023)A prefetch control strategy based on improved hill-climbing method in asymmetric multi-core architectureThe Journal of Supercomputing10.1007/s11227-023-05078-679:10(10570-10588)Online publication date: 11-Feb-2023
  • (2023)Graph Analysis for Scalability AnalysisPerformance Analysis of Parallel Applications for HPC10.1007/978-981-99-4366-1_5(101-128)Online publication date: 19-Jun-2023
  • (2021)Dynamic Priority Real-Time Scheduling on Power Asymmetric Multicore ProcessorsSymmetry10.3390/sym1308148813:8(1488)Online publication date: 13-Aug-2021
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
CGO '20: Proceedings of the 18th ACM/IEEE International Symposium on Code Generation and Optimization
February 2020
329 pages
ISBN:9781450370479
DOI:10.1145/3368826
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 22 February 2020

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Asymmetric Multicore Processor
  2. Multi-threaded Multi-programmed Workloads
  3. OS Scheduler

Qualifiers

  • Research-article

Funding Sources

Conference

CGO '20
Sponsor:

Acceptance Rates

Overall Acceptance Rate 312 of 1,061 submissions, 29%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)31
  • Downloads (Last 6 weeks)1
Reflects downloads up to 22 Sep 2024

Other Metrics

Citations

Cited By

View all
  • (2023)A prefetch control strategy based on improved hill-climbing method in asymmetric multi-core architectureThe Journal of Supercomputing10.1007/s11227-023-05078-679:10(10570-10588)Online publication date: 11-Feb-2023
  • (2023)Graph Analysis for Scalability AnalysisPerformance Analysis of Parallel Applications for HPC10.1007/978-981-99-4366-1_5(101-128)Online publication date: 19-Jun-2023
  • (2021)Dynamic Priority Real-Time Scheduling on Power Asymmetric Multicore ProcessorsSymmetry10.3390/sym1308148813:8(1488)Online publication date: 13-Aug-2021
  • (2021)Collaborative Heterogeneity-Aware OS Scheduler for Asymmetric Multicore ProcessorsIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2020.304527932:5(1224-1237)Online publication date: 1-May-2021
  • (2021)CSWAP: A Self-Tuning Compression Framework for Accelerating Tensor Swapping in GPUs2021 IEEE International Conference on Cluster Computing (CLUSTER)10.1109/Cluster48925.2021.00019(271-282)Online publication date: Sep-2021
  • (2020)VisSched: An Auction-Based Scheduler for Vision Workloads on Heterogeneous ProcessorsIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2020.301307639:11(4252-4265)Online publication date: Nov-2020
  • (2020)SCALANA: Automating Scaling Loss Detection with Graph AnalysisSC20: International Conference for High Performance Computing, Networking, Storage and Analysis10.1109/SC41405.2020.00032(1-14)Online publication date: Nov-2020

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media