Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3409390.3409408acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicppConference Proceedingsconference-collections
research-article

Scheduling Task-parallel Applications in Dynamically Asymmetric Environments

Published: 17 August 2020 Publication History

Abstract

Shared resource interference is observed by applications as dynamic performance asymmetry. Prior art has developed approaches to reduce the impact of performance asymmetry mainly at the operating system and architectural levels. In this work, we study how application-level scheduling techniques can leverage moldability (i.e. flexibility to work as either single-threaded or multithreaded task) and explicit knowledge on task criticality to handle scenarios in which system performance is not only unknown but also changing over time. Our proposed task scheduler dynamically learns the performance characteristics of the underlying platform and uses this knowledge to devise better schedules aware of dynamic performance asymmetry, hence reducing the impact of interference. Our evaluation shows that both criticality-aware scheduling and parallelism tuning are effective schemes to address interference in both shared and distributed memory applications.

References

[1]
[1] X. Aguilar, H. Jordan, T. Heller, A. Hirsch, T. Fahringer, and E. Laure. An on-line performance introspection framework for task-based runtime systems. In Computational Science – ICCS 2019, 2019.
[2]
[2] ARM. Arm big.little. https://www.arm.com/why-arm/technologies/big-little, 2020.
[3]
[3] E. Ates, Y. Zhang, B. Aksar, J. Brandt, V. J. Leung, M. Egele, and A. K. Coskun. Hpas: An hpc performance anomaly suite for reproducing performance variations. In Proceedings of the 48th International Conference on Parallel Processing, ICPP 2019, 2019.
[4]
[4] S. Balakrishnan, Ravi Rajwar, M. Upton, and K. Lai. The impact of performance asymmetry in emerging multicore architectures. In 32nd International Symposium on Computer Architecture (ISCA’05), 2005.
[5]
[5] R. D. Blumofe and C. E. Leiserson. Scheduling multithreaded computations by work stealing. Journal of the ACM, 46(5), 1999.
[6]
[6] O. A. R. Board. Openmp application program interface. version 4.5, 2015.
[7]
[7] F. Broquedis, J. Clet-Ortega, S. Moreaud, N. Furmento, B. Goglin, G. Mercier, S. Thibault, and R. Namyst. hwloc: A generic framework for managing hardware affinities in hpc applications. In 2010 18th Euromicro Conference on Parallel, Distributed and Network-based Processing, 2010.
[8]
[8] S. Che, M. Boyer, J. Meng, D. Tarjan, J. W. Sheaffer, S. Lee, and K. Skadron. Rodinia: A benchmark suite for heterogeneous computing. In 2009 IEEE International Symposium on Workload Characterization (IISWC), 2009.
[9]
[9] K. Chronaki, A. Rico, R. M. Badia, E. Ayguadé, J. Labarta, and M. Valero. Criticality-aware dynamic task scheduling for heterogeneous architectures. In Proceedings of the 29th ACM on International Conference on Supercomputing, ICS ’15, 2015.
[10]
[10] K. Chronaki, A. Rico, M. Casas, M. Moretó, R. M. Badia, E. Ayguadé, J. Labarta, and M. Valero. Task scheduling techniques for asymmetric multi-core systems. IEEE Transactions on Parallel and Distributed Systems, 28(7), 2017.
[11]
[11] A. Duran, E. Ayguade, R. M. Badia, J. Labarta, L. Martinell, X. Martorell, and J. Planas. Ompss: A proposal for programming heterogeneous multi-core architectures. Parallel Processing Letters, 21(02), 2011.
[12]
[12] A. Gainaru, G. Aupy, A. Benoit, F. Cappello, Y. Robert, and M. Snir. Scheduling the i/o of hpc applications under congestion. In 2015 IEEE International Parallel and Distributed Processing Symposium, 2015.
[13]
[13] L. F. Góes, P. Guerra, B. Coutinho, L. Rocha, W. Meira, R. Ferreira, D. Guedes, and W. Cirne. Anthillsched: A scheduling strategy for irregular and iterative i/o-intensive parallel jobs. In D. Feitelson, E. Frachtenberg, L. Rudolph, and U. Schwiegelshohn, editors, Job Scheduling Strategies for Parallel Processing, 2005.
[14]
[14] T. Hoefler, T. Schneider, and A. Lumsdaine. Characterizing the influence of system noise on large-scale applications by simulation. In Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis, SC ’10, 2010.
[15]
[15] T. Hoefler, T. Schneider, and A. Lumsdaine. Characterizing the influence of system noise on large-scale applications by simulation. In Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis, SC ’10, 2010.
[16]
[16] C.-H. Hsu, C.-W. Hsieh, and C.-T. Yang. A generalized critical task anticipation technique for dag scheduling. In H. Jin, O. F. Rana, Y. Pan, and V. K. Prasanna, editors, Algorithms and Architectures for Parallel Processing, 2007.
[17]
[17] Y. Inadomi, T. Patki, K. Inoue, M. Aoyagi, B. Rountree, M. Schulz, D. Lowenthal, Y. Wada, K. Fukazawa, M. Ueda, M. Kondo, and I. Miyoshi. Analyzing and mitigating the impact of manufacturing variability in power-constrained supercomputing. In SC ’15: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, 2015.
[18]
[18] N. Jain, A. Bhatele, X. Ni, T. Gamblin, and L. V. Kale. Partitioning low-diameter networks to eliminate inter-job interference. In IEEE Intl Parallel and Distributed Processing Symposium (IPDPS), 2017.
[19]
[19] R. Knauerhase, P. Brett, B. Hohlt, T. Li, and S. Hahn. Using os observations to improve performance in multicore systems. IEEE Micro, 28(3), 2008.
[20]
[20] E. Le Sueur and G. Heiser. Dynamic voltage and frequency scaling: The laws of diminishing returns. In Proceedings of the 2010 Intl Conference on Power Aware Computing and Systems, HotPower’10, 2010.
[21]
[21] X. Liang and D. Brooks. Mitigating the impact of process variations on processor register files and execution units. In 2006 39th Annual IEEE/ACM Intl Symposium on Microarchitecture (MICRO’06), 2006.
[22]
[22] B. Ludäscher, I. Altintas, C. Berkley, D. Higgins, E. Jaeger, M. Jones, E. A. Lee, J. Tao, and Y. Zhao. Scientific workflow management and the kepler system. Concurrency and Computation: Practice and Experience, 18(10), 2006.
[23]
[23] J. Moreira, M. Brutman, J. Castaños, T. Engelsiepen, M. Giampapa, T. Gooding, R. Haskin, T. Inglett, D. Lieber, P. McCarthy, M. Mundy, J. Parker, and B. Wallenfelt. Designing a highly-scalable operating system: The blue gene/l story. In Proceedings of the 2006 ACM/IEEE Conference on Supercomputing, SC ’06, 2006.
[24]
[24] T. Patki, J. J. Thiagarajan, A. Ayala, and T. Z. Islam. Performance optimality or reproducibility: That is the question. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, SC ’19, 2019.
[25]
[25] S. Pellegrini, T. Hoefler, and T. Fahringer. On the effects of cpu caches on mpi point-to-point communications. In 2012 IEEE International Conference on Cluster Computing, 2012.
[26]
[26] M. Pericàs. Elastic places: An adaptive resource manager for scalable and portable performance. ACM Trans. Archit. Code Optim., 15(2), 2018.
[27]
[27] R. Riesen, R. Brightwell, P. G. Bridges, T. Hudson, A. B. Maccabe, P. M. Widener, and K. Ferreira. Designing and implementing lightweight kernels for capability computing. Concurrency and Computation: Practice and Experience, 21(6), 2009.
[28]
[28] A. Rohlin, H. Fahlgren, and M. Pericas. High performance scheduling of mixed-mode dags on heterogeneous multicores. In Workshop on High Performance Energy Efficient Embedded Systems 7th Edition (HIP3ES), 2019. arXiv:1901.05907.
[29]
[29] D. Skinner and W. Kramer. Understanding the causes of performance variability in hpc workloads. In IEEE International. 2005 Proceedings of the IEEE Workload Characterization Symposium, 2005., 2005.
[30]
[30] H. Topcuoglu, S. Hariri, and Min-You Wu. Performance-effective and low-complexity task scheduling for heterogeneous computing. IEEE Transactions on Parallel and Distributed Systems, 13(3), 2002.
[31]
[31] S. Zhuravlev, S. Blagodurov, and A. Fedorova. In Proceedings of the Fifteenth Edition of ASPLOS on Architectural Support for Programming Languages and Operating Systems, ASPLOS XV, 2010.

Cited By

View all
  • (2022)ERASE: Energy Efficient Task Mapping and Resource Management for Work Stealing RuntimesACM Transactions on Architecture and Code Optimization10.1145/351042219:2(1-29)Online publication date: 7-Mar-2022

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
ICPP Workshops '20: Workshop Proceedings of the 49th International Conference on Parallel Processing
August 2020
186 pages
ISBN:9781450388689
DOI:10.1145/3409390
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 17 August 2020

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Asymmetry
  2. Interference awareness
  3. Task scheduling

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Funding Sources

Conference

ICPP Workshops '20
ICPP Workshops '20: Workshops
August 17 - 20, 2020
AB, Edmonton, Canada

Acceptance Rates

Overall Acceptance Rate 91 of 313 submissions, 29%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)13
  • Downloads (Last 6 weeks)1
Reflects downloads up to 16 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2022)ERASE: Energy Efficient Task Mapping and Resource Management for Work Stealing RuntimesACM Transactions on Architecture and Code Optimization10.1145/351042219:2(1-29)Online publication date: 7-Mar-2022

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media