research-article

Scheduling Task-parallel Applications in Dynamically Asymmetric Environments

Authors:

Pirah Noor Soomro,

Mustafa Abduljabbar,

Madhavan Manivannan,

Miquel PericasAuthors Info & Claims

ICPP Workshops '20: Workshop Proceedings of the 49th International Conference on Parallel Processing

Article No.: 18, Pages 1 - 10

https://doi.org/10.1145/3409390.3409408

Published: 17 August 2020 Publication History

Abstract

Shared resource interference is observed by applications as dynamic performance asymmetry. Prior art has developed approaches to reduce the impact of performance asymmetry mainly at the operating system and architectural levels. In this work, we study how application-level scheduling techniques can leverage moldability (i.e. flexibility to work as either single-threaded or multithreaded task) and explicit knowledge on task criticality to handle scenarios in which system performance is not only unknown but also changing over time. Our proposed task scheduler dynamically learns the performance characteristics of the underlying platform and uses this knowledge to devise better schedules aware of dynamic performance asymmetry, hence reducing the impact of interference. Our evaluation shows that both criticality-aware scheduling and parallelism tuning are effective schemes to address interference in both shared and distributed memory applications.

References

[1]

[1] X. Aguilar, H. Jordan, T. Heller, A. Hirsch, T. Fahringer, and E. Laure. An on-line performance introspection framework for task-based runtime systems. In Computational Science – ICCS 2019, 2019.

[2]

[2] ARM. Arm big.little. https://www.arm.com/why-arm/technologies/big-little, 2020.

[3]

[3] E. Ates, Y. Zhang, B. Aksar, J. Brandt, V. J. Leung, M. Egele, and A. K. Coskun. Hpas: An hpc performance anomaly suite for reproducing performance variations. In Proceedings of the 48th International Conference on Parallel Processing, ICPP 2019, 2019.

[4]

[4] S. Balakrishnan, Ravi Rajwar, M. Upton, and K. Lai. The impact of performance asymmetry in emerging multicore architectures. In 32nd International Symposium on Computer Architecture (ISCA’05), 2005.

Digital Library

[5]

[5] R. D. Blumofe and C. E. Leiserson. Scheduling multithreaded computations by work stealing. Journal of the ACM, 46(5), 1999.

Digital Library

[6]

[6] O. A. R. Board. Openmp application program interface. version 4.5, 2015.

[7]

[7] F. Broquedis, J. Clet-Ortega, S. Moreaud, N. Furmento, B. Goglin, G. Mercier, S. Thibault, and R. Namyst. hwloc: A generic framework for managing hardware affinities in hpc applications. In 2010 18th Euromicro Conference on Parallel, Distributed and Network-based Processing, 2010.

Digital Library

[8]

[8] S. Che, M. Boyer, J. Meng, D. Tarjan, J. W. Sheaffer, S. Lee, and K. Skadron. Rodinia: A benchmark suite for heterogeneous computing. In 2009 IEEE International Symposium on Workload Characterization (IISWC), 2009.

Digital Library

[9]

[9] K. Chronaki, A. Rico, R. M. Badia, E. Ayguadé, J. Labarta, and M. Valero. Criticality-aware dynamic task scheduling for heterogeneous architectures. In Proceedings of the 29th ACM on International Conference on Supercomputing, ICS ’15, 2015.

Digital Library

[10]

[10] K. Chronaki, A. Rico, M. Casas, M. Moretó, R. M. Badia, E. Ayguadé, J. Labarta, and M. Valero. Task scheduling techniques for asymmetric multi-core systems. IEEE Transactions on Parallel and Distributed Systems, 28(7), 2017.

Digital Library

[11]

[11] A. Duran, E. Ayguade, R. M. Badia, J. Labarta, L. Martinell, X. Martorell, and J. Planas. Ompss: A proposal for programming heterogeneous multi-core architectures. Parallel Processing Letters, 21(02), 2011.

[12]

[12] A. Gainaru, G. Aupy, A. Benoit, F. Cappello, Y. Robert, and M. Snir. Scheduling the i/o of hpc applications under congestion. In 2015 IEEE International Parallel and Distributed Processing Symposium, 2015.

Digital Library

[13]

[13] L. F. Góes, P. Guerra, B. Coutinho, L. Rocha, W. Meira, R. Ferreira, D. Guedes, and W. Cirne. Anthillsched: A scheduling strategy for irregular and iterative i/o-intensive parallel jobs. In D. Feitelson, E. Frachtenberg, L. Rudolph, and U. Schwiegelshohn, editors, Job Scheduling Strategies for Parallel Processing, 2005.

Digital Library

[14]

[14] T. Hoefler, T. Schneider, and A. Lumsdaine. Characterizing the influence of system noise on large-scale applications by simulation. In Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis, SC ’10, 2010.

Digital Library

[15]

[15] T. Hoefler, T. Schneider, and A. Lumsdaine. Characterizing the influence of system noise on large-scale applications by simulation. In Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis, SC ’10, 2010.

Digital Library

[16]

[16] C.-H. Hsu, C.-W. Hsieh, and C.-T. Yang. A generalized critical task anticipation technique for dag scheduling. In H. Jin, O. F. Rana, Y. Pan, and V. K. Prasanna, editors, Algorithms and Architectures for Parallel Processing, 2007.

[17]

[17] Y. Inadomi, T. Patki, K. Inoue, M. Aoyagi, B. Rountree, M. Schulz, D. Lowenthal, Y. Wada, K. Fukazawa, M. Ueda, M. Kondo, and I. Miyoshi. Analyzing and mitigating the impact of manufacturing variability in power-constrained supercomputing. In SC ’15: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, 2015.

Digital Library

[18]

[18] N. Jain, A. Bhatele, X. Ni, T. Gamblin, and L. V. Kale. Partitioning low-diameter networks to eliminate inter-job interference. In IEEE Intl Parallel and Distributed Processing Symposium (IPDPS), 2017.

[19]

[19] R. Knauerhase, P. Brett, B. Hohlt, T. Li, and S. Hahn. Using os observations to improve performance in multicore systems. IEEE Micro, 28(3), 2008.

Digital Library

[20]

[20] E. Le Sueur and G. Heiser. Dynamic voltage and frequency scaling: The laws of diminishing returns. In Proceedings of the 2010 Intl Conference on Power Aware Computing and Systems, HotPower’10, 2010.

[21]

[21] X. Liang and D. Brooks. Mitigating the impact of process variations on processor register files and execution units. In 2006 39th Annual IEEE/ACM Intl Symposium on Microarchitecture (MICRO’06), 2006.

Digital Library

[22]

[22] B. Ludäscher, I. Altintas, C. Berkley, D. Higgins, E. Jaeger, M. Jones, E. A. Lee, J. Tao, and Y. Zhao. Scientific workflow management and the kepler system. Concurrency and Computation: Practice and Experience, 18(10), 2006.

[23]

[23] J. Moreira, M. Brutman, J. Castaños, T. Engelsiepen, M. Giampapa, T. Gooding, R. Haskin, T. Inglett, D. Lieber, P. McCarthy, M. Mundy, J. Parker, and B. Wallenfelt. Designing a highly-scalable operating system: The blue gene/l story. In Proceedings of the 2006 ACM/IEEE Conference on Supercomputing, SC ’06, 2006.

[24]

[24] T. Patki, J. J. Thiagarajan, A. Ayala, and T. Z. Islam. Performance optimality or reproducibility: That is the question. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, SC ’19, 2019.

Digital Library

[25]

[25] S. Pellegrini, T. Hoefler, and T. Fahringer. On the effects of cpu caches on mpi point-to-point communications. In 2012 IEEE International Conference on Cluster Computing, 2012.

Digital Library

[26]

[26] M. Pericàs. Elastic places: An adaptive resource manager for scalable and portable performance. ACM Trans. Archit. Code Optim., 15(2), 2018.

Digital Library

[27]

[27] R. Riesen, R. Brightwell, P. G. Bridges, T. Hudson, A. B. Maccabe, P. M. Widener, and K. Ferreira. Designing and implementing lightweight kernels for capability computing. Concurrency and Computation: Practice and Experience, 21(6), 2009.

[28]

[28] A. Rohlin, H. Fahlgren, and M. Pericas. High performance scheduling of mixed-mode dags on heterogeneous multicores. In Workshop on High Performance Energy Efficient Embedded Systems 7th Edition (HIP3ES), 2019. arXiv:1901.05907.

[29]

[29] D. Skinner and W. Kramer. Understanding the causes of performance variability in hpc workloads. In IEEE International. 2005 Proceedings of the IEEE Workload Characterization Symposium, 2005., 2005.

[30]

[30] H. Topcuoglu, S. Hariri, and Min-You Wu. Performance-effective and low-complexity task scheduling for heterogeneous computing. IEEE Transactions on Parallel and Distributed Systems, 13(3), 2002.

Digital Library

[31]

[31] S. Zhuravlev, S. Blagodurov, and A. Fedorova. In Proceedings of the Fifteenth Edition of ASPLOS on Architectural Support for Programming Languages and Operating Systems, ASPLOS XV, 2010.

Cited By

Chen JManivannan MAbduljabbar MPericàs M(2022)ERASE: Energy Efficient Task Mapping and Resource Management for Work Stealing RuntimesACM Transactions on Architecture and Code Optimization10.1145/351042219:2(1-29)Online publication date: 7-Mar-2022
https://dl.acm.org/doi/10.1145/3510422

Recommendations

Improving task scheduling with parallelism awareness in heterogeneous computational environments
Abstract
Task scheduling is a key function for executing tasks in heterogeneous computational environments, efficiently. While the available computing resources are not fully used when applying existing scheduling methods as they consider that ...
Highlights
- The model of task scheduling problem with parallelism awareness.
- A set of task ...
Efficient task scheduling for hard real-time tasks in asymmetric multicore processors
ICA3PP'12: Proceedings of the 12th international conference on Algorithms and Architectures for Parallel Processing - Volume Part II

In the future it is very likely that asymmetric multi-core processors (AMP) will be used because of their proposed power efficiency and higher performance. In order to use the device intelligently and efficiently, it is essential to exploit the ...
Multi-job Associated Task Scheduling Based on Task Duplication and Insertion for Cloud Computing
Wireless Algorithms, Systems, and Applications
Abstract
The jobs processed in cloud computing systems may consist of multiple associated tasks which need to be executed under ordering constraints. The tasks of each job are run on different nodes, and communication is required to transfer data between ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

ICPP Workshops '20: Workshop Proceedings of the 49th International Conference on Parallel Processing

August 2020

186 pages

ISBN:9781450388689

DOI:10.1145/3409390

Copyright © 2020 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 17 August 2020

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Funding Sources

Horizon 2020

Conference

ICPP Workshops '20

ICPP Workshops '20: Workshops

August 17 - 20, 2020

AB, Edmonton, Canada

Acceptance Rates

Overall Acceptance Rate 91 of 313 submissions, 29%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
91
Total Downloads

Downloads (Last 12 months)13
Downloads (Last 6 weeks)1

Reflects downloads up to 16 Oct 2024

Other Metrics

View Author Metrics

Citations

Cited By

Chen JManivannan MAbduljabbar MPericàs M(2022)ERASE: Energy Efficient Task Mapping and Resource Management for Work Stealing RuntimesACM Transactions on Architecture and Code Optimization10.1145/351042219:2(1-29)Online publication date: 7-Mar-2022
https://dl.acm.org/doi/10.1145/3510422

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Media

Figures

Other

Tables

View Table of Contents