Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.5555/3018076.3018082acmconferencesArticle/Chapter ViewAbstractPublication PagesicseConference Proceedingsconference-collections
research-article

Runtime power limiting of parallel applications on Intel Xeon Phi processors

Published: 13 November 2016 Publication History

Abstract

Energy-efficient computing is crucial to achieving exascale performance. Power capping and dynamic voltage/frequency scaling may be used to achieve energy savings. The Intel Xeon Phi implements a power capping strategy, where power thresholds are employed to dynamically set voltage/frequency at the runtime. By default, these power limits are much higher than the majority of applications would reach. Hence, this work aims to set the power limits according to the workload characteristics and application performance. Certain models, originally developed for the CPU performance and power, have been adapted here to determine power-limit thresholds in the Xeon Phi. Next, a procedure to select these thresholds dynamically is proposed, and its limitations outlined. When this runtime procedure along with static power-threshold assignment were compared with the default execution, energy savings ranging from 5% to 49% were observed, mostly for memory-intensive applications.

References

[1]
perf: Linux profiling with performance counters, 2015. https://perf.wiki.kernel.org/index.php/Main_Page.
[2]
Power capping framework, 2016. https://www.kernel.org/doc/Documentation/power/powercap/powercap.txt.
[3]
D. Abdurachmanov, B. Bockelman, P. Elmer, G. Eulisse, R. Knight, and S. Muzaffar. Heterogeneous high throughput scientific computing with APM x-gene and intel xeon phi. CoRR, abs/1410.3441, 2014.
[4]
F. Broquedis, J.C. Ortega, S. Moreaud, N. Furmento, B. Goglin, G. Mercier, S. Thibault, and R. Namyst. hwloc: A generic framework for managing hardware affinities in hpc applications. In Parallel, Distributed and Network-Based Processing (PDP), 2010 18th Euromicro International Conference on, pages 180--186, Feb. 2010.
[5]
S. Browne, J. Dongarra, N. Garner, G. Ho, and P. Mucci. A portable programming interface for performance evaluation on modern processors. Int. J. High Perform. Comput. Appl., 14(3):189--204, August 2000.
[6]
S. Cho and R. Melhem. Corollaries to amdahl's law for energy. IEEE Comput. Archit. Lett., 7:25--28, Jan. 2008.
[7]
J. Choi, M. Mukhan, X. Liu, and R. Vudue. Algorithmic time, energy, and power on candidate HPC compute building blocks. In 2014 IEEE 28th International Symposium on Parallel Distributed Processing (IPDPS), Arizona, USA, May 2014.
[8]
H. David, C. Fallin, E. Gorbatov, U.R. Hanebutte, and O. Mutlu. Memory power management via dynamic voltage/frequency scaling. In Proceedings of the 8th ACM International Conference on Autonomic Computing, pages 31--40, 2011.
[9]
H. David, E. Gorbatov, U.R. Hanebutte, R. Khannal, and C. Le. Rapl: memory power estimation and capping. In Proceedings of the 16th ACM/IEEE international symposium on Low power electronics and design, ISLPED'10, pages 189--194, New York, NY, USA, 2010. ACM.
[10]
Q. Deng, D. Meisner, A. Bhattacharjee, T.F. Wenisch, and R. Bianchini. Coscale: Coordinating cpu and memory system dvfs in server systems. In Microarchitecture (MICRO), 2012 45th Annual IEEE/ACM International Symposium on, pages 143--154, Dec 2012.
[11]
DOE. Co-design, 2013. http://science.energy.gov/ascr/research/scidac/co-design/.
[12]
M. Etinski, J. Corbalan, J. Labarta, M. Valero, and A. Veidenbaum. Power-aware load balancing of large scale MPI applications. In Parallel Distributed Processing, 2009. IPDPS 2009. IEEE International Symposium on, pages 1--8, May 2009.
[13]
ExMatEx. CoMD proxy application, 2012. http://www.exmatex.org/comd.html.
[14]
D.G. Fedorov, R.M. Olson, K. Kitaura, M.S. Gordon, and S. Koseki. A new hierarchical parallelization scheme: Generalized distributed data interface (GDDI), and an application to the fragment molecular orbital method (FMO). Journal of Computational Chemistry, 25, Issue 6:872--880, 2004.
[15]
X. Feng, R. Ge, and K.W. Cameron. Power and energy profiling of scientific applications on distributed systems. In Parallel and Distributed Processing Symposium, 2005. Proceedings. 19th IEEE International, page 34, 2005.
[16]
V.W. Freeh and D.K. Lowenthal. Using multiple energy gears in MPI programs on a power-scalable cluster. In Proceedings of the tenth ACM SIGPLAN symposium on Principles and practice of parallel programming, pages 164--173, 2005.
[17]
R. Ge, X. Feng, W. Feng, and K.W. Cameron. CPU MISER: A performance-directed, run-time system for power-aware clusters. In Parallel Processing, 2007. ICPP 2007. International Conference on, page 18, Sep. 2007.
[18]
M. S. Gordon, D. G. Fedorov, S. R. Pruitt, and L. V. Slipchenko. Fragmentation methods: A route to accurate calculations on large systems. Chemical Reviews, 112(1):632--672, 2012. 21866983.
[19]
M. S. Gordon and M. W. Schmidt. Advances in electronic structure theory: Gamess a decade later, 2005.
[20]
Gordon Research Group. The general atomic and molecular electronic structure system (GAMESS), 2016. http://www.msg.ameslab.gov/gamess/index.html.
[21]
C.H. Hsu and W. Feng. A power-aware run-time system for high-performance computing. In Supercomputing, 2005. Proceedings of the ACM/IEEE SC 2005 Conference, page 1, Nov. 2005.
[22]
ICL:UT. Performance application programming interface PAPI, 2015. http://icl.cs.utk.edu/papi/.
[23]
Intel. Intel xeon phi coprocessor: Datasheet, 2015. http://www.intel.com/content/www/us/en/processors/xeon/xeon-phi-coprocessor-datasheet.html.
[24]
J. Jeffers, J. Reinders, and A. Sodani. Intel Xeon Phi Processor High Performance Programming: Knights Landing Edition. MK Publishers, 2015. http://lotsofcores.com/.
[25]
K. Kandalla, E.P. Mancini, S. Sur, and D.K. Panda. Designing power-aware collective communication algorithms for InfiniBand clusters. In Parallel Processing (ICPP), 2010 39th International Conference on, pages 218--227, 2010.
[26]
D. Kusnezov, S. Binkley, B. Harrod, and B. Meisner. DOE exascale initiative, 2013. http://www.industry-academia.org/download/20130913-SEAB-DOE-Exascale-Initiative.pdf.
[27]
D. LaKomski, Z. Zong, T. Jin, and R. Ge. Optimal balance between energy and performance in hybrid computing applications. In Green Computing Conference and Sustainable Computing Conference (IGSC), 2015 Sixth International, pages 1--8, Dec 2015.
[28]
J. Laros. Sandia national laboratories high performance computing power application programming interface (api) specification, 2016. http://powerapi.sandia.gov/.
[29]
G. Lawson, M. Sosonkina, and Yuzhong S. Energy evaluation for applications with different thread affinities on the Intel Xeon Phi. In Computer Architecture and High Performance Computing Workshop (SBAC-PADW), 2014 International Symposium on, Oct 2014.
[30]
G. Lawson, M. Sosonkina, and Y. Shen. Changing CPU frequency in CoMD proxy application offloaded to Intel Xeon Phi co-processors. Procedia Computer Science, 51(0):100 -- 109, 2015. International Conference On Computational Science, ICCS 2015 Computational Science at the Gates of Nature.
[31]
Gary Lawson, Vaibhav Sundriyal, Masha Sosonkina, and Yuzhong Shen. Modeling performance and energy for applications offloaded to intel xeon phi. In Proceedings of the 2Nd International Workshop on Hardware-Software Co-Design for High Performance Computing, Co-HPC '15, pages 7:1--7:8, New York, NY, USA, 2015. ACM.
[32]
B. Li, H. C. Chang, S. Song, C. Y. Su, T. Meyer, J. Mooring, and K. W. Cameron. The power-performance tradeoffs of the intel xeon phi on hpc applications. In Parallel Distributed Processing Symposium Workshops (IPDPSW), 2014 IEEE International, pages 1448--1456, May 2014.
[33]
M.Y. Lim, V.W. Freeh, and D.K. Lowenthal. Adaptive, transparent frequency and voltage scaling of communication phases in MPI programs. In Proceedings of the 2006 ACM/IEEE conference on Supercomputing, 2006.
[34]
A. Marathe, P.E Bailey, D. K. Lowenthal, B. Rountree, M. Schulz, and B. R. de Supinski. A Run-Time System for Power-Constrained HPC Applications, pages 394--408. Springer International Publishing, Cham, 2015.
[35]
NASA. NAS parallel benchmarks, 2013. http://www.nas.nasa.gov/publications/npb.html.
[36]
Open MPI Project. Portable hardware locality (hwloc), 2016. https://www.open-mpi.org/projects/hwloc/.
[37]
B. Rountree, D.K. Lownenthal, B.R. de Supinski, M. Schulz, V.W. Freeh, and T. Bletsch. Adagio: Making DVS Practical for Complex HPC Applications. In Proceedings of the 23rd international conference on Supercomputing, ICS'09, pages 460--469, New York, NY, USA, 2009. ACM.
[38]
M. W. Schmidt, K.K. Baldridge, J.A. Boatz, S.T. Elbert, M.S. Gordon, J.H. Jensen, S. Koseki, N. Matsunaga, K.A. Nguyen, S. Su, T.L. Windus, M. Dupuis, and Jr. J.A. Montgomery. General atomic and molecular electronic structure system. J. Comput. Chem., 14:1347--1363, Nov. 1993.
[39]
A. Sodani, R. Gramunt, J. Corbal, H. S. Kim, K. Vinod, S. Chinthamani, S. Hutsell, R. Agarwal, and Y. C. Liu. Knights landing: Second-generation intel xeon phi product. IEEE Micro, 36(2):34--46, Mar 2016.
[40]
V. Sundriyal, A. Gaenko, M. Sosonkina, and Z. Zhang. Energy saving strategies for parallel applications with point-to-point communication phases. Journal of Parallel and Distributed Computing, 73(8):1157--1169, August 2013.
[41]
V. Sundriyal and M. Sosonkina. Joint frequency scaling of processor and dram. The Journal of Supercomputing, 72(4):1549--1569, 2016.
[42]
V. Weaver. Reading rapl energy measurements from linux, 2011. http://web.eece.maine.edu/~vweaver/projects/rapl/.
[43]
J. Wood, Z. Zong, Q. Gu, and R. Ge. Energy and power characterization of parallel programs running on intel xeon phi. In 2014 43rd International Conference on Parallel Processing Workshops, pages 265--272, Sept 2014.
[44]
Z. Zhang and J. M. Chang. A cool scheduler for multi-core systems exploiting program phases. IEEE Trans. Comput., 63(5):1061--1073, May 2014.

Cited By

View all
  • (2020)Applying EMD/HHT analysis to power traces of applications executed on systems with Intel Xeon PhiInternational Journal of High Performance Computing Applications10.1177/109434201773161234:2(187-198)Online publication date: 17-Jun-2020
  • (2017)Empirical Mode Decomposition for Modeling of Parallel Applications on Intel Xeon Phi ProcessorsProceedings of the 17th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing10.1109/CCGRID.2017.99(1000-1008)Online publication date: 14-May-2017

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
E2SC '16: Proceedings of the 4th International Workshop on Energy Efficient Supercomputing
November 2016
91 pages
ISBN:9781509038565

Sponsors

In-Cooperation

Publisher

IEEE Press

Publication History

Published: 13 November 2016

Check for updates

Author Tags

  1. CoMD
  2. DVFS
  3. GAMESS
  4. Knights Landing
  5. NAS benchmarks
  6. energy savings
  7. intel xeon phi
  8. power limiting

Qualifiers

  • Research-article

Conference

SC16
Sponsor:

Acceptance Rates

Overall Acceptance Rate 17 of 33 submissions, 52%

Upcoming Conference

ICSE 2025

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)1
  • Downloads (Last 6 weeks)1
Reflects downloads up to 04 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2020)Applying EMD/HHT analysis to power traces of applications executed on systems with Intel Xeon PhiInternational Journal of High Performance Computing Applications10.1177/109434201773161234:2(187-198)Online publication date: 17-Jun-2020
  • (2017)Empirical Mode Decomposition for Modeling of Parallel Applications on Intel Xeon Phi ProcessorsProceedings of the 17th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing10.1109/CCGRID.2017.99(1000-1008)Online publication date: 14-May-2017

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media