research-article

Runtime power limiting of parallel applications on Intel Xeon Phi processors

Authors:

Vaibhav Sundriyal,

Masha Sosonkina,

Yuzhong ShenAuthors Info & Claims

E2SC '16: Proceedings of the 4th International Workshop on Energy Efficient Supercomputing

Pages 39 - 45

Published: 13 November 2016 Publication History

Abstract

Energy-efficient computing is crucial to achieving exascale performance. Power capping and dynamic voltage/frequency scaling may be used to achieve energy savings. The Intel Xeon Phi implements a power capping strategy, where power thresholds are employed to dynamically set voltage/frequency at the runtime. By default, these power limits are much higher than the majority of applications would reach. Hence, this work aims to set the power limits according to the workload characteristics and application performance. Certain models, originally developed for the CPU performance and power, have been adapted here to determine power-limit thresholds in the Xeon Phi. Next, a procedure to select these thresholds dynamically is proposed, and its limitations outlined. When this runtime procedure along with static power-threshold assignment were compared with the default execution, energy savings ranging from 5% to 49% were observed, mostly for memory-intensive applications.

References

[1]

perf: Linux profiling with performance counters, 2015. https://perf.wiki.kernel.org/index.php/Main_Page.

[2]

Power capping framework, 2016. https://www.kernel.org/doc/Documentation/power/powercap/powercap.txt.

[3]

D. Abdurachmanov, B. Bockelman, P. Elmer, G. Eulisse, R. Knight, and S. Muzaffar. Heterogeneous high throughput scientific computing with APM x-gene and intel xeon phi. CoRR, abs/1410.3441, 2014.

[4]

F. Broquedis, J.C. Ortega, S. Moreaud, N. Furmento, B. Goglin, G. Mercier, S. Thibault, and R. Namyst. hwloc: A generic framework for managing hardware affinities in hpc applications. In Parallel, Distributed and Network-Based Processing (PDP), 2010 18th Euromicro International Conference on, pages 180--186, Feb. 2010.

Digital Library

[5]

S. Browne, J. Dongarra, N. Garner, G. Ho, and P. Mucci. A portable programming interface for performance evaluation on modern processors. Int. J. High Perform. Comput. Appl., 14(3):189--204, August 2000.

Digital Library

[6]

S. Cho and R. Melhem. Corollaries to amdahl's law for energy. IEEE Comput. Archit. Lett., 7:25--28, Jan. 2008.

Digital Library

[7]

J. Choi, M. Mukhan, X. Liu, and R. Vudue. Algorithmic time, energy, and power on candidate HPC compute building blocks. In 2014 IEEE 28th International Symposium on Parallel Distributed Processing (IPDPS), Arizona, USA, May 2014.

Digital Library

[8]

H. David, C. Fallin, E. Gorbatov, U.R. Hanebutte, and O. Mutlu. Memory power management via dynamic voltage/frequency scaling. In Proceedings of the 8th ACM International Conference on Autonomic Computing, pages 31--40, 2011.

Digital Library

[9]

H. David, E. Gorbatov, U.R. Hanebutte, R. Khannal, and C. Le. Rapl: memory power estimation and capping. In Proceedings of the 16th ACM/IEEE international symposium on Low power electronics and design, ISLPED'10, pages 189--194, New York, NY, USA, 2010. ACM.

Digital Library

[10]

Q. Deng, D. Meisner, A. Bhattacharjee, T.F. Wenisch, and R. Bianchini. Coscale: Coordinating cpu and memory system dvfs in server systems. In Microarchitecture (MICRO), 2012 45th Annual IEEE/ACM International Symposium on, pages 143--154, Dec 2012.

Digital Library

[11]

DOE. Co-design, 2013. http://science.energy.gov/ascr/research/scidac/co-design/.

[12]

M. Etinski, J. Corbalan, J. Labarta, M. Valero, and A. Veidenbaum. Power-aware load balancing of large scale MPI applications. In Parallel Distributed Processing, 2009. IPDPS 2009. IEEE International Symposium on, pages 1--8, May 2009.

Digital Library

[13]

ExMatEx. CoMD proxy application, 2012. http://www.exmatex.org/comd.html.

[14]

D.G. Fedorov, R.M. Olson, K. Kitaura, M.S. Gordon, and S. Koseki. A new hierarchical parallelization scheme: Generalized distributed data interface (GDDI), and an application to the fragment molecular orbital method (FMO). Journal of Computational Chemistry, 25, Issue 6:872--880, 2004.

[15]

X. Feng, R. Ge, and K.W. Cameron. Power and energy profiling of scientific applications on distributed systems. In Parallel and Distributed Processing Symposium, 2005. Proceedings. 19th IEEE International, page 34, 2005.

Digital Library

[16]

V.W. Freeh and D.K. Lowenthal. Using multiple energy gears in MPI programs on a power-scalable cluster. In Proceedings of the tenth ACM SIGPLAN symposium on Principles and practice of parallel programming, pages 164--173, 2005.

Digital Library

[17]

R. Ge, X. Feng, W. Feng, and K.W. Cameron. CPU MISER: A performance-directed, run-time system for power-aware clusters. In Parallel Processing, 2007. ICPP 2007. International Conference on, page 18, Sep. 2007.

Digital Library

[18]

M. S. Gordon, D. G. Fedorov, S. R. Pruitt, and L. V. Slipchenko. Fragmentation methods: A route to accurate calculations on large systems. Chemical Reviews, 112(1):632--672, 2012. 21866983.

[19]

M. S. Gordon and M. W. Schmidt. Advances in electronic structure theory: Gamess a decade later, 2005.

[20]

Gordon Research Group. The general atomic and molecular electronic structure system (GAMESS), 2016. http://www.msg.ameslab.gov/gamess/index.html.

[21]

C.H. Hsu and W. Feng. A power-aware run-time system for high-performance computing. In Supercomputing, 2005. Proceedings of the ACM/IEEE SC 2005 Conference, page 1, Nov. 2005.

Digital Library

[22]

ICL:UT. Performance application programming interface PAPI, 2015. http://icl.cs.utk.edu/papi/.

[23]

Intel. Intel xeon phi coprocessor: Datasheet, 2015. http://www.intel.com/content/www/us/en/processors/xeon/xeon-phi-coprocessor-datasheet.html.

[24]

J. Jeffers, J. Reinders, and A. Sodani. Intel Xeon Phi Processor High Performance Programming: Knights Landing Edition. MK Publishers, 2015. http://lotsofcores.com/.

Digital Library

[25]

K. Kandalla, E.P. Mancini, S. Sur, and D.K. Panda. Designing power-aware collective communication algorithms for InfiniBand clusters. In Parallel Processing (ICPP), 2010 39th International Conference on, pages 218--227, 2010.

Digital Library

[26]

D. Kusnezov, S. Binkley, B. Harrod, and B. Meisner. DOE exascale initiative, 2013. http://www.industry-academia.org/download/20130913-SEAB-DOE-Exascale-Initiative.pdf.

[27]

D. LaKomski, Z. Zong, T. Jin, and R. Ge. Optimal balance between energy and performance in hybrid computing applications. In Green Computing Conference and Sustainable Computing Conference (IGSC), 2015 Sixth International, pages 1--8, Dec 2015.

Digital Library

[28]

J. Laros. Sandia national laboratories high performance computing power application programming interface (api) specification, 2016. http://powerapi.sandia.gov/.

[29]

G. Lawson, M. Sosonkina, and Yuzhong S. Energy evaluation for applications with different thread affinities on the Intel Xeon Phi. In Computer Architecture and High Performance Computing Workshop (SBAC-PADW), 2014 International Symposium on, Oct 2014.

Digital Library

[30]

G. Lawson, M. Sosonkina, and Y. Shen. Changing CPU frequency in CoMD proxy application offloaded to Intel Xeon Phi co-processors. Procedia Computer Science, 51(0):100 -- 109, 2015. International Conference On Computational Science, ICCS 2015 Computational Science at the Gates of Nature.

Digital Library

[31]

Gary Lawson, Vaibhav Sundriyal, Masha Sosonkina, and Yuzhong Shen. Modeling performance and energy for applications offloaded to intel xeon phi. In Proceedings of the 2Nd International Workshop on Hardware-Software Co-Design for High Performance Computing, Co-HPC '15, pages 7:1--7:8, New York, NY, USA, 2015. ACM.

Digital Library

[32]

B. Li, H. C. Chang, S. Song, C. Y. Su, T. Meyer, J. Mooring, and K. W. Cameron. The power-performance tradeoffs of the intel xeon phi on hpc applications. In Parallel Distributed Processing Symposium Workshops (IPDPSW), 2014 IEEE International, pages 1448--1456, May 2014.

Digital Library

[33]

M.Y. Lim, V.W. Freeh, and D.K. Lowenthal. Adaptive, transparent frequency and voltage scaling of communication phases in MPI programs. In Proceedings of the 2006 ACM/IEEE conference on Supercomputing, 2006.

Digital Library

[34]

A. Marathe, P.E Bailey, D. K. Lowenthal, B. Rountree, M. Schulz, and B. R. de Supinski. A Run-Time System for Power-Constrained HPC Applications, pages 394--408. Springer International Publishing, Cham, 2015.

[35]

NASA. NAS parallel benchmarks, 2013. http://www.nas.nasa.gov/publications/npb.html.

[36]

Open MPI Project. Portable hardware locality (hwloc), 2016. https://www.open-mpi.org/projects/hwloc/.

[37]

B. Rountree, D.K. Lownenthal, B.R. de Supinski, M. Schulz, V.W. Freeh, and T. Bletsch. Adagio: Making DVS Practical for Complex HPC Applications. In Proceedings of the 23rd international conference on Supercomputing, ICS'09, pages 460--469, New York, NY, USA, 2009. ACM.

Digital Library

[38]

M. W. Schmidt, K.K. Baldridge, J.A. Boatz, S.T. Elbert, M.S. Gordon, J.H. Jensen, S. Koseki, N. Matsunaga, K.A. Nguyen, S. Su, T.L. Windus, M. Dupuis, and Jr. J.A. Montgomery. General atomic and molecular electronic structure system. J. Comput. Chem., 14:1347--1363, Nov. 1993.

Digital Library

[39]

A. Sodani, R. Gramunt, J. Corbal, H. S. Kim, K. Vinod, S. Chinthamani, S. Hutsell, R. Agarwal, and Y. C. Liu. Knights landing: Second-generation intel xeon phi product. IEEE Micro, 36(2):34--46, Mar 2016.

Digital Library

[40]

V. Sundriyal, A. Gaenko, M. Sosonkina, and Z. Zhang. Energy saving strategies for parallel applications with point-to-point communication phases. Journal of Parallel and Distributed Computing, 73(8):1157--1169, August 2013.

Digital Library

[41]

V. Sundriyal and M. Sosonkina. Joint frequency scaling of processor and dram. The Journal of Supercomputing, 72(4):1549--1569, 2016.

Digital Library

[42]

V. Weaver. Reading rapl energy measurements from linux, 2011. http://web.eece.maine.edu/~vweaver/projects/rapl/.

[43]

J. Wood, Z. Zong, Q. Gu, and R. Ge. Energy and power characterization of parallel programs running on intel xeon phi. In 2014 43rd International Conference on Parallel Processing Workshops, pages 265--272, Sept 2014.

Digital Library

[44]

Z. Zhang and J. M. Chang. A cool scheduler for multi-core systems exploiting program phases. IEEE Trans. Comput., 63(5):1061--1073, May 2014.

Digital Library

Cited By

Wang JHe XLawson GSosonkina MEzer TShen Y(2020)Applying EMD/HHT analysis to power traces of applications executed on systems with Intel Xeon PhiInternational Journal of High Performance Computing Applications10.1177/109434201773161234:2(187-198)Online publication date: 17-Jun-2020
https://dl.acm.org/doi/10.1177/1094342017731612
Lawson GSosonkina MEzer TShen Y(2017)Empirical Mode Decomposition for Modeling of Parallel Applications on Intel Xeon Phi ProcessorsProceedings of the 17th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing10.1109/CCGRID.2017.99(1000-1008)Online publication date: 14-May-2017
https://dl.acm.org/doi/10.1109/CCGRID.2017.99

Runtime power limiting of parallel applications on Intel Xeon Phi processors

Recommendations

Changing CPU Frequency in CoMD Proxy Application Offloaded to Intel Xeon Phi Co-processors

Obtaining exascale performance is a challenge. Although the technology of today features hardware with very high levels of concurrency, exascale performance is primarily limited by energy consumption. This limitation has lead to the use of GPUs and ...
Performance and energy evaluation of CoMD on Intel Xeon Phi co-processors
Co-HPC '14: Proceedings of the 1st International Workshop on Hardware-Software Co-Design for High Performance Computing

Molecular dynamics simulations are used extensively in science and engineering. Co-Design Molecular Dynamics (CoMD) is a proxy application that reflects the workload characteristics of production molecular dynamics software. In particular, CoMD is ...
Empirical Mode Decomposition for Modeling of Parallel Applications on Intel Xeon Phi Processors
CCGrid '17: Proceedings of the 17th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing

For modern parallel applications, modeling their general execution characteristics, such as power and time, is difficult due to a great many factors affecting software-hardware interactions, which is also exacerbated by the dearth of measuring and ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

E2SC '16: Proceedings of the 4th International Workshop on Energy Efficient Supercomputing

November 2016

91 pages

ISBN:9781509038565

General Chairs:
Kirk Cameron
Virginia Tech
,
Adolfy Hoisie
PNNL
,
Darren Kerbyson
PNNL
,
David Lowenthal
ASU
,
Dimitrios S. Nikolopoulos
Queen's University of Belfast, UK
,
Sudha Yalamanchili
Georgia Institute of Technology
,
Program Chairs:
Kevin Barker
PNNL
,
Rong Ge
Clemson University

Sponsors

SIGHPC: ACM Special Interest Group on High Performance Computing, Special Interest Group on High Performance Computing
IEEE-CS\DATC: IEEE Computer Society

In-Cooperation

SIGARCH: ACM Special Interest Group on Computer Architecture

Publisher

IEEE Press

Publication History

Published: 13 November 2016

Check for updates

Author Tags

Qualifiers

Research-article

Conference

SC16

Sponsor:

SIGHPC
IEEE-CS\DATC

SC16: The International Conference for High Performance Computing, Networking, Storage and Analysis

November 13 - 18, 2016

Utah, Salt Lake City

Acceptance Rates

Overall Acceptance Rate 17 of 33 submissions, 52%

Upcoming Conference

ICSE 2025

2025 IEEE/ACM 46th International Conference on Software Engineering

April 26 - May 3, 2025

Ottawa , ON , Canada

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

2
Total Citations
View Citations
116
Total Downloads

Downloads (Last 12 months)1
Downloads (Last 6 weeks)0

Reflects downloads up to 05 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Wang JHe XLawson GSosonkina MEzer TShen Y(2020)Applying EMD/HHT analysis to power traces of applications executed on systems with Intel Xeon PhiInternational Journal of High Performance Computing Applications10.1177/109434201773161234:2(187-198)Online publication date: 17-Jun-2020
https://dl.acm.org/doi/10.1177/1094342017731612
Lawson GSosonkina MEzer TShen Y(2017)Empirical Mode Decomposition for Modeling of Parallel Applications on Intel Xeon Phi ProcessorsProceedings of the 17th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing10.1109/CCGRID.2017.99(1000-1008)Online publication date: 14-May-2017
https://dl.acm.org/doi/10.1109/CCGRID.2017.99

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten