Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/1362622.1362686acmconferencesArticle/Chapter ViewAbstractPublication PagesscConference Proceedingsconference-collections
research-article

A genetic algorithms approach to modeling the performance of memory-bound computations

Published: 10 November 2007 Publication History

Abstract

Benchmarks that measure memory bandwidth, such as STREAM, Apex-MAPS and MultiMAPS, are increasingly popular due to the "Von Neumann" bottleneck of modern processors which causes many calculations to be memory-bound. We present a scheme for predicting the performance of HPC applications based on the results of such benchmarks. A Genetic Algorithm approach is used to "learn" bandwidth as a function of cache hit rates per machine with MultiMAPS as the fitness test. The specific results are 56 individual performance predictions including 3 full-scale parallel applications run on 5 different modern HPC architectures, with various CPU counts and inputs, predicted within 10% average difference with respect to independently verified runtimes.

References

[1]
W. A. Wulf and S. A. McKee, Hitting the memory wall: implications of the obvious. SIGARCH Computer. Architecture News, 23 (1), pp 20--24. March 1995.
[2]
J. McCalpin, "Memory bandwidth and machine balance in current high performance computers". IEEE Technical Committee on Computer Architecture Newsletter.
[3]
E. Strohmaier and H. Shan, Architecture independent performance characterization and benchmarking for scientific applications. In International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems. 2004. Volendam, The Netherlands.
[4]
J. Weinberg, M. O. McCracken, A. Snavely, E. Strohmaier, Quantifying Locality In The Memory Access Patterns of HPC Applications. SC 05. November 2005, Seattle, WA.
[5]
A. Snavely, L. Carrington, N. Wolter, J. Labarta, R. Badia, A. Purkayastha, A Framework for Application Performance Modeling and Prediction. SC 02. November 2002., Baltimore MD.
[6]
L. Carrington, M. Laurenzano, A. Snavely, R. Campbell, L. Davis, How well can simple metrics represent the performance of HPC applications? SC 05. November 2005, Seattle, WA.
[7]
Department of Defense, High Performance Computing Modernization Program. Technology Insertion 07. http://www.hpcmo.hpc.mil/Htdocs/TI/.
[8]
HPC Challenge Benchmarks, http://icl.cs.utk.edu/hpcc/.
[9]
R. Bleck, An oceanic general circulation model framed in hybrid isopycnic-cartesian coordinates. Ocean Modelling, 4, 55--88. 2002.
[10]
C. C. Hoke, V. Burnley, C. G. Schwabacher, Aerodynamic Analysis of Complex Missile Configurations using AVUS (Air Vehicles Unstructured Solver). Applied Aerodynamics Conference and Exhibit. August 2004, Providence, RI.
[11]
P. G. Buning, D. C. Jespersen, T. H. Pulliam, G. H. Klopfer, W. M. Chan, J. P. Slotnick, S. E. Krist, and K. J. Renze, Overflow Users Manual, Langley Research Center, 2003. Hampton, VA.
[12]
J. H. Holland, Adaptation in Natural and Artificial Systems. University of Michigan Press, 1975
[13]
D. E. Goldberg, Genetic Algorithms in Search, Optimization and Machine Learning. Addison-Wesley, Boston, MA. 1989.
[14]
Reference Guide for The Genetic Algorithm Utility Library. http://gaul.sourceforge.net/gaul_reference_guide.html.2005.
[15]
High Performance Computing Modernization Program, http://www.hpcmo.hpc.mil.
[16]
D. Skinner, Performance monitoring of parallel scientific applications, Lawrence Berkeley National Laboratory, LBNL/PUB---5503. May 2005. Berkeley, CA.
[17]
D. Bailey, J. Barton, T. Lasinski, H. Simon, "The NAS parallel benchmarks", International Journal of Supercomputer Applications, 1991.
[18]
SPEC, http://www.spec.org/.
[19]
J. Gustafson and R. Todi, "Conventional benchmarks as a sample of the performance spectrum", Hawaii International Conference on System Sciences, 1998.
[20]
J. McCalpin, "Memory bandwidth and machine balance in current high performance computers", IEEE Technical Committee on Computer Architecture Newsletter.
[21]
G. Marin and J. Mellor-Crummey, "Cross-architecture performance predictions for scientific applications using parameterized models", SIGMETRICS Performance 04, 2004.
[22]
R. S., Ballansc, J. A. Cocke, and H. G. Kolsky, The Lookahead Unit, Planning a Computer System, (McGraw-Hill, New York, 1962).
[23]
G. S. Tjaden and MJ. Flynn, "Detection and Parallel Execution of Independent Instructions", IEEE Trans. Comptrs., vol. C-19 pp. 889--895, 1970.
[24]
J. Lo, S. Egger, J. Emer, H. Levy, R. Stamm, and D. Tullsen, "Converting Thread-Level Parallelism to Instruction-Level Parallelism via Simultaneous Multithreading", ACM Transactions on Computer Systems, August, 1997.
[25]
B. Falsafi and D. A. Wood, "Modeling Cost/Performance of a Parallel Computer Simulator", ACM Transactions on Modeling and Computer Simulation, vol. 7:1, pp. 104--130, 1997.
[26]
J. Gibson, R. Kunz, D. Ofelt, M. Horowitz, J. Hennessy, and M. Heinrich," FLASH vs. (Simulated) FLASH: Closing the Simulation Loop", The 9th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), November, pp. 49--58, 2000.
[27]
R. H. Saavedra and A. J. Smith, "Analysis of Benchmark Characteristics and Benchmark Performance Prediction", TOCS14, vol. 4, pp. 344--384, 1996.
[28]
C. L. Mendes and D. A. Reed, "Integrated Compilation and Scalability Analysis for Parallel Systems", IEEE PACT, 1998.
[29]
J. Simon and J. Wierun, "Accurate Performance Prediction for Massively Parallel Systems and its Applications", Euro-Par, vol. 2, pp. 675--688, 1996.
[30]
M. E. Crovella and T. J. LeBlanc, "Parallel Performance Prediction Using Lost Cycles Analysis", SuperComputing 1994, pp. 600--609, 1994.
[31]
Z. Xu, X. Zhang, L. Sun, "Semi-empirical Multiprocessor Performance Predictions", JPDC, vol. 39, pp. 14--28, 1996.
[32]
G. Abandah, E. S. Davidson, "Modeling the Communication Performance of the IBM SP2", Proceedings Int'l Parallel Processing Symposium, April, pp. 249--257, 1996.
[33]
E. L. Boyd, W. Azeem, H. H. Lee, T. P. Shih, S. H. Hung, and E. S. Davidson, "A Hierarchical Approach to Modeling and Improving the Performance of Scientific Applications on the KSR1", Proceedings of the 1994 International Conference on Parallel Processing, vol. 3, pp. 188--192, 1994.
[34]
A. Hosie, L. Olaf, H. Wasserman, "Scalability Analysis of Multidimensional Wavefront Algorithms on Large-Scale SMP Clusters", Proceedings of Frontiers of Massively Parallel Computing '99, Annapolis, MD, February, 1999.
[35]
A. Spooner and D. Kerbyson, "Identification of Performance Characteristics from Multiview Trace Analysis", Proc. Of Int. Conf. On Computational Science (ICCS), part 3 2659, pp. 936--945, 2003.
[36]
S. Moore, D. Cronk, F. Wolf, A. Purkayastha., P. Teller, R. Araiza, M. Aguilera, J. Nava, "Performance Profiling and Analysis of DoD Applications using PAPI and TAU", DoD HPCMP UGC 2005, IEEE, Nashville, TN, June, 2005.
[37]
High Productivity Computer Systems, www.highproductivity.org
[38]
M. Snir, and Jing Yu, "On the Theory of Spatial and Temporal Locality", Technical Report No. UIUCDCS-R-2005-2611, University of Illinois at Urbana-Champaign, Urbana, IL, July 2005.
[39]
X. Gao. PhD Thesis. 2006. University of California Computer Science Department.
[40]
Y. Chen and A. Snavely: Metrics for Ranking the Performance of Supercomputers, Cyberinfrastructure Technology Watch Journal: Special Issue on High Productivity Computer Systems, J. Dongarra Editor, Volume 2 Number 4, February 2007.
[41]
E. Ipek, S. McKee, R. Caruana, B. R. de Supinski, and Schulz, M. 2006. Efficiently exploring architectural design spaces via predictive modeling. SIGPLAN Not. 41, 11 (Nov. 2006), 195--206. DOI= http://doi.acm.org/10.1145/1168918.1168882
[42]
A. Phansalkar, L. K. John. Performance Prediction using Program Similarity, Proceedings of SPEC Benchmark Workshop 2006.

Cited By

View all
  • (2024)A Workload Prediction Model for 3D Textured Meshes in Webgl ContextProceedings of the 29th International ACM Conference on 3D Web Technology10.1145/3665318.3677156(1-11)Online publication date: 25-Sep-2024
  • (2022)Comparing the performance of multi-layer perceptron training on electrical and optical network-on-chipsThe Journal of Supercomputing10.1007/s11227-022-04945-y79:10(10725-10746)Online publication date: 23-Nov-2022
  • (2020)How to Evaluate Various Commonly Used Program Classification Methods?Advanced Computer Architecture10.1007/978-981-15-8135-9_17(233-248)Online publication date: 5-Sep-2020
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SC '07: Proceedings of the 2007 ACM/IEEE conference on Supercomputing
November 2007
723 pages
ISBN:9781595937643
DOI:10.1145/1362622
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 10 November 2007

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. cache bandwidth
  2. genetic algorithms
  3. machine learning
  4. memory bound applications
  5. performance modeling and prediction

Qualifiers

  • Research-article

Funding Sources

Conference

SC '07
Sponsor:

Acceptance Rates

SC '07 Paper Acceptance Rate 54 of 268 submissions, 20%;
Overall Acceptance Rate 1,516 of 6,373 submissions, 24%

Upcoming Conference

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)18
  • Downloads (Last 6 weeks)0
Reflects downloads up to 11 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)A Workload Prediction Model for 3D Textured Meshes in Webgl ContextProceedings of the 29th International ACM Conference on 3D Web Technology10.1145/3665318.3677156(1-11)Online publication date: 25-Sep-2024
  • (2022)Comparing the performance of multi-layer perceptron training on electrical and optical network-on-chipsThe Journal of Supercomputing10.1007/s11227-022-04945-y79:10(10725-10746)Online publication date: 23-Nov-2022
  • (2020)How to Evaluate Various Commonly Used Program Classification Methods?Advanced Computer Architecture10.1007/978-981-15-8135-9_17(233-248)Online publication date: 5-Sep-2020
  • (2019)Multi-spectral Reuse Distance: Divining Spatial Information from Temporal Data2019 IEEE High Performance Extreme Computing Conference (HPEC)10.1109/HPEC.2019.8916398(1-8)Online publication date: Sep-2019
  • (2019)HPC-Smart Infrastructures: A Review and Outlook on Performance Analysis Methods and ToolsSmart Infrastructure and Applications10.1007/978-3-030-13705-2_18(427-451)Online publication date: 21-Jun-2019
  • (2019)Use of model‐based architecture attributes to construct a component‐level trade spaceSystems Engineering10.1002/sys.2147822:2(172-187)Online publication date: 20-Feb-2019
  • (2018)Efficiency Analysis of the Parallel Implementation of the SIMPLE Algorithm on Multiprocessor ComputersJournal of Applied Mechanics and Technical Physics10.1134/S002189441707006958:7(1242-1259)Online publication date: 19-Mar-2018
  • (2018)Roofline Model Based Performance-Aware Energy Management for Scientific Computing2018 9th International Symposium on Parallel Architectures, Algorithms and Programming (PAAP)10.1109/PAAP.2018.00020(74-80)Online publication date: Dec-2018
  • (2017)Characterizing the Performance of Modern Architectures Through Opaque Benchmarks: Pitfalls Learned the Hard Way2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)10.1109/IPDPSW.2017.125(1588-1597)Online publication date: May-2017
  • (2016)Efficiency analysis of parallel implementation of SIMPLE algorithm on multi-processor computersComputational Continuum Mechanics10.7242/1999-6691/2016.9.3.259:3(298-315)Online publication date: 2016
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media