research-article

Open access

Performance optimality or reproducibility: that is the question

Authors:

Jayaraman J. Thiagarajan,

Tanzima Z. IslamAuthors Info & Claims

SC '19: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis

Article No.: 77, Pages 1 - 30

https://doi.org/10.1145/3295500.3356217

Published: 17 November 2019 Publication History

Abstract

The era of extremely heterogeneous supercomputing brings with itself the devil of increased performance variation and reduced reproducibility. There is a lack of understanding in the HPC community on how the simultaneous consideration of network traffic, power limits, concurrency tuning, and interference from other jobs impacts application performance.

In this paper, we design a methodology that allows both HPC users and system administrators to understand the trade-off space between optimal and reproducible performance. We present a first-of-its-kind dataset that simultaneously varies multiple system- and user-level parameters on a production cluster, and introduce a new metric, called the desirability score, which enables comparison across different system configurations. We develop a novel, model-agnostic machine learning methodology based on the graph signal theory for comparing the influence of parameters on application predictability, and using a new visualization technique, make practical suggestions for best practices for multi-objective HPC environments.

References

[1]

2016. OSU Benchmarks. http://mvapich.cse.ohio-state.edu/benchmarks/. (2016).

[2]

Ana Gainaru Ana, Guillaume Aupy, Anne Benoit, Franck Cappello, Yves Robert, and Marc Snir. 2015. Scheduling the I/O of HPC applications under congestion. In <u>IEEE International Parallel and Distributed Processing Symposium (IPDPS).</u>

[3]

David H. Bailey. 2006. NASA Advanced Supercomputing Division, NAS Parallel Benchmark Suite v3.3. (2006).

[4]

D. H. Bailey, E. Barszcz, J. T. Barton, D. S. Browning, R. L. Carter, L. Dagum, R. A. Fatoohi, P. O. Frederickson, T. A. Lasinski, R. S. Schreiber, H. D. Simon, V. Venkatakrishnan, and S. K. Weeratunga. [n. d.]. The NAS Parallel Benchmarks. In <u>Proceedings of the 1991 ACM/IEEE Conference on Supercomputing (Supercomputing '91).</u>

[5]

Bradley J. Barnes, Barry Rountree, David K. Lowenthal, Jaxk Reeves, Bronis de Supinski, and Martin Schulz. 2008. A Regression-based Approach to Scalability Prediction. In <u>Proceedings of the 22nd Annual International Conference on Supercomputing.</u> 368--377.

[6]

Abhinav Bhatele. 2010. Automating Topology Aware Mapping for Supercomputers. In <u>PhD Thesis, Dept. of Computer Science, University of Illinois.</u> http://hdl.handle.net/2142/16578.

[7]

Abhinav Bhatele, Todd Gamblin, Steven H. Langer, Peer-Timo Bremer, Erik W. Draeger, Bernd Hamann, Katherine E. Isaacs, Aaditya G. Landge, Joshua A. Levine, Valerio Pascucci, Martin Schulz, and Charles H. Still. 2012. Mapping Applications with Collectives over Sub-communicators on Torus Networks. In <u>Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis (SC '12).</u>

[8]

Abhinav Bhatele, Kathryn Mohror, Steven H. Langer, and Katherine E. Isaacs. 2013. There Goes the Neighborhood: Performance Degradation Due to Nearby Jobs. In <u>Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis (SC '13).</u>

[9]

A. Bhatele, A. R. Titus, J. J. Thiagarajan, N. Jain, T. Gamblin, P. T. Bremer, M. Schulz, and L. V. Kale. 2015. Identifying the Culprits Behind Network Congestion. In <u>Parallel and Distributed Processing Symposium (IPDPS), 2015 IEEE International.</u>

[10]

H. Bhatia, N. Jain, A. Bhatele, Y. Livnat, J. Domke, V. Pascucci, and P.-T. Bremer. 2018. Interactive Investigation of Traffic Congestion on Fat-Tree Networks Using TreeScope. <u>Computer Graphics Forum</u> 37, 3 (2018), 561--572.

[11]

S.H. Bokhari. 1981. On the Mapping Problem. <u>Computers, IEEE Transactions on</u> C-30, 3 (March 1981), 207--214.

[12]

Shekhar Borkar, Tanay Karnik, Siva Narendra, Jim Tschanz, Ali Keshavarzi, and Vivek De. 2003. Parameter Variations and Impact on Circuits and Microarchitecture. In <u>Proceedings of the 40th annual Design Automation Conference.</u> 338--342.

[13]

M. Broyles, C. Cain, T. Rosedahl, and G. Silva. 2015. IBM Energy Scale for POWER8 Processor-Based Systems. In <u>IBM Whitepaper.</u>

[14]

R. R. Chandrasekar, A. Venkatesh, K. Hamidouche, and D. K. Panda. 2015. Power-Check: An Energy-Efficient Checkpointing Framework for HPC Clusters. In <u>2015 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing.</u>

[15]

Siheng Chen, Rohan Varma, Aliaksei Sandryhaila, and Jelena Kovačević. 2015. Discrete signal processing on graphs: Sampling theory. <u>IEEE transactions on signal processing</u> 63, 24 (2015), 6510--6523.

[16]

Ryan Cochran, Can Hankendi, Ayse K Coskun, and Sherief Reda. 2011. Pack & Cap: adaptive DVFS and thread packing under power caps. In <u>Proceedings of the 44th annual IEEE/ACM international symposium on microarchitecture.</u> ACM, 175--185.

[17]

Diego Crupnicoff, Sujal Das, and Eitan Zahavi. 2005. <u>Deploying Quality of Service and Congestion Control in InfiniBand-based Data Center Networks.</u> Technical Report. Mellanox Technologies.

[18]

Howard David, Eugene Gorbatov, Ulf Hanebutte, Rahul Khanna, and Christian Le. 2010. RAPL: Memory Power Estimation and Capping. In <u>Proceedings of the 16th ACM/IEEE international symposium on Low power electronics and design (ISLPED '10).</u> 189--194.

Digital Library

[19]

S. Dighe, S.R. Vangal, P. Aseron, S. Kumar, T. Jacob, K.A. Bowman, J. Howard, J. Tschanz, V. Erraguntla, N. Borkar, V.K. De, and S. Borkar. 2011. Within-Die Variation-Aware Dynamic-Voltage-Frequency-Scaling With Optimal Core Allocation and Thread Hopping for the 80-Core TeraFLOPS Processor. <u>Solid-State Circuits, IEEE Journal of</u> 46, 1 (Jan 2011), 184--193.

[20]

Maja Etinski, Julita Corbalan, Jesus Labarta, and Mateo Valero. 2010. Optimizing Job Performance Under a Given Power Constraint in HPC Centers. In <u>Green Computing Conference.</u> 257--267.

[21]

Maja Etinski, Julita Corbalan, Jesus Labarta, and Mateo Valero. 2011. Linear Programming Based Parallel Job Scheduling for Power Constrained Systems. In <u>International Conference on High Performance Computing and Simulation.</u> 72--80.

[22]

Maja Etinski, Julita Corbalan, Jesus Labarta, and Mateo Valero. 2012. Parallel Job Scheduling for Power Constrained HPC Systems. Parallel Comput. 38, 12 (Dec. 2012), 615--630.

Digital Library

[23]

Y. Fan, P. Rich, W. E. Allcock, M. E. Papka, and Z. Lan. 2017. Trade-Off Between Prediction Accuracy and Underestimation Rate in Job Runtime Estimates. In <u>2017 IEEE International Conference on Cluster Computing (CLUSTER).</u> 530--540.

[24]

T. Fujiwara, P. Malakar, K. Reda, V. Vishwanath, M. E. Papka, and K. Ma. 2017. A Visual Analytics System for Optimizing Communications in Massively Parallel Applications. In <u>2017 IEEE Conference on Visual Analytics Science and Technology (VAST).</u> 59--70.

[25]

Yiannis Georgiou, Thomas Cadeau, David Glesser, Danny Auble, Morris Jette, and Matthieu Hautreux. 2014. Energy Accounting and Control with SLURM Resource and Job Management System. In <u>Distributed Computing and Networking.</u> Lecture Notes in Computer Science, Vol. 8314. Springer Berlin Heidelberg, 96--118.

[26]

Luís Fabrício Góes, Pedro Guerra, Bruno Coutinho, Leonardo Rocha, Wagner Meira, Renato Ferreira, Dorgival Guedes, and Walfredo Cirne. 2005. AnthillSched: A Scheduling Strategy for Irregular and Iterative I/O-Intensive Parallel Jobs. In <u>Job Scheduling Strategies for Parallel Processing: 11th International Workshop, JSSPP 2005.</u>

[27]

I. Goiri, Kien Le, M. E. Haque, R. Beauchea, T. D. Nguyen, J. Guitart, J. Torres, and R. Bianchini. 2011. GreenSlot: Scheduling Energy Consumption in Green Datacenters. In <u>High Performance Computing, Networking, Storage and Analysis (SC), 2011 International Conference for.</u> 1--11.

[28]

T. Hoefler and M. Snir. 2011. Generic Topology Mapping Strategies for Large-scale Parallel Architectures. In <u>Proceedings of the 2011 ACM International Conference on Supercomputing (ICS'11).</u> ACM, 75--85.

[29]

Yuichi Inadomi, Tapasya Patki, Koji Inoue, Mutsumi Aoyagi, Barry Rountree, Martin Schulz, David Lowenthal, Yasutaka Wada, Keiichiro Fukazawa, Masatsugu Ueda, Masaaki Kondo, and Ikuo Miyoshi. 2015. Analyzing and Mitigating the Impact of Manufacturing Variability in Power-constrained Supercomputing. In <u>Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC '15).</u>

[30]

Intel. 2011. Intel-64 and IA-32 Architectures Software Developer's Manual, Volumes 3A and 3B: System Programming Guide. (2011).

[31]

Katherine E. Isaacs, Alfredo Giménez, Ilir Jusufi, Todd Gamblin, Abhinav Bhatele, Martin Schulz, Bernd Hamann, and Timo Bremer. 2014. State of the Art of Performance Visualization. In <u>EuroVis.</u>

[32]

Nikhil Jain, Abhinav Bhatele, Louis H. Howell, David Böhme, Ian Karlin, Edgar A. León, Misbah Mubarak, Noah Wolfe, Todd Gamblin, and Matthew L. Leininger. 2017. Predicting the Performance Impact of Different Fat-tree Configurations. In <u>Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC '17).</u> ACM, New York, NY, USA, Article 50, 13 pages.

Digital Library

[33]

Nikhil Jain, Abhinav Bhatele, Xiang Ni, Todd Gamblin, and Laxmikant V. Kale. 2017. Partitioning Low-diameter Networks to Eliminate Inter-job Interference. In <u>Proceedings of the IEEE International Parallel & Distributed Processing Symposium (IPDPS '17 (to appear)).</u> IEEE Computer Society. LLNL-CONF-.

[34]

Sudhakar Jilla. 2013. Minimizing The Effects of Manufacturing Variation During Physcial Layout. <u>Chip Design Magazine</u> (2013). http://chipdesignmag.com/display.php?articleId=2437.

[35]

A. Jokanovic, J. C. Sancho, G. Rodriguez, A. Lucero, C. Minkenberg, and J. Labarta. 2015. Quiet Neighborhoods: Key to Protect Job Performance Predictability. In <u>2015 IEEE International Parallel and Distributed Processing Symposium.</u> 449--459.

Digital Library

[36]

Kyong Hoon Kim, R Buyya, and Jong Kim. 2007. Power Aware Scheduling of Bag-of-Tasks Applications with Deadline Constraints on DVS-enabled Clusters. In <u>Cluster Computing and the Grid, 2007. CCGRID 2007.</u> 541--548.

[37]

R. Kent Koeninger. 2003. The Ultra-Scalable HPTC Lustre Filesystem. <u>Cluster World</u> (2003).

[38]

A. J. Kunen, T. S. Bailey, and P. N. Brown. [n. d.]. KRIPKE - A Massively Parallel Transport Mini-App. In <u>American Nuclear Society M&C 2015.</u>

[39]

Aaditya G Landge, Joshua A Levine, Abhinav Bhatele, Katherine E Isaacs, Todd Gamblin, Martin Schulz, Steve H Langer, P-T Bremer, and Valerio Pascucci. 2012. Visualizing network traffic to understand the performance of massively parallel simulations. <u>Visualization and Computer Graphics, IEEE Transactions on</u> 18, 12 (2012), 2467--2476.

Digital Library

[40]

Barry Lawson and Evgenia Smirni. 2005. Power-aware Resource Allocation in High-end Systems via Online Simulation. In <u>International onference on Supercomputing.</u> 229--238.

[41]

Kangkang Li, Maciej Malawski, and Jarek Nabrzyski. 2017. Topology-aware Job Allocation in 3D Torus-based HPC Systems with Hard Job Priority Constraints. <u>Procedia Computer Science</u> 108 (2017), 515--524. International Conference on Computational Science, ICCS 2017, 12--14 June 2017, Zurich, Switzerland.

[42]

Xiaoyao Liang and David Brooks. 2006. Mitigating the Impact of Process Variations on Processor Register Files and Execution Units. In <u>International Symposium on Microarchitecture.</u> 504--514.

[43]

Aniruddha Marathe, Rushil Anirudh, Nikhil Jain, Abhinav Bhatele, Jayaraman Thiagarajan, Bhavya Kailkhura, Jae-Seung Yeom, Barry Rountree, and Todd Gamblin. 2017. Performance Modeling Under Resource Constraints Using Deep Transfer Learning. In <u>Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC '17).</u> ACM, New York, NY, USA, Article 31, 12 pages.

Digital Library

[44]

Aleksander Maricq, Dmitry Duplyakin, Ivo Jimenez, Carlos Maltzahn, Ryan Stutsman, and Robert Ricci. 2018. <u>Taming Performance Variability.</u> Berkeley, CA, USA. http://dl.acm.org/citation.cfm?id=3291168.3291198

[45]

C. M. McCarthy, K. E. Isaacs, A. Bhatele, P. Bremer, and B. Hamann. 2014. Visualizing the Five-dimensional Torus Network of the IBM Blue Gene/Q. In <u>2014 First Workshop on Visual Performance Analysis.</u> 24--27.

Digital Library

[46]

Jie Meng, Eduard Llamosí, Fulya Kaplan, Chulian Zhang, Jiayi Sheng, Martin Herbordt, Gunar Schirner, and Ayse K Coskun. 2016. Communication and cooling aware job allocation in data centers for communication-intensive workloads. J. Parallel and Distrib. Comput. 96 (2016), 181--193.

Digital Library

[47]

Jie Meng, Samuel McCauley, Fulya Kaplan, Vitus J. Leung, and Ayse K. Coskun. 2015. Simulation and optimization of {HPC} job allocation for jointly reducing communication and cooling costs. <u>Sustainable Computing: Informatics and Systems</u> 6 (2015), 48--57. Special Issue on Selected Papers from 2013 International Green Computing Conference (IGCC).

[48]

G. Michelogiannakis, K. Z. Ibrahim, J. Shalf, J. J. Wilke, S. Knight, and J. P. Kenny. 2017. APHiD: Hierarchical Task Placement to Enable a Tapered Fat Tree Topology for Lower Power and Cost in HPC Networks. In <u>2017 17th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID).</u> 228--237.

Digital Library

[49]

Adam Moody. 2009. Contention-Free Routing for Shift-based Communication in MPI Applications on Large-scale InfiniBand Clusters. <u>LLNL-TR-418522, Lawrence Livermore National Laboratory, Livermore, CA</u> (October 2009).

[50]

T. Patki, E. Ates, A. Coskun, and J. Thiagarajan. 2018. Understanding Simultaneous Impact of Network QoS and Power on HPC Application Performance. In <u>Computational Reproducibility at Exascale (CRE'18), Supercomputing Workshop 2018.</u>

[51]

Tapasya Patki, David K. Lowenthal, Barry Rountree, Martin Schulz, and Bronis R. de Supinski. 2013. Exploring Hardware Overprovisioning in Power-constrained, High Performance Computing. In <u>International Conference on Supercomputing.</u>

[52]

Tapasya Patki, Anjana Sasidharan, Matthias Maiterth, David Lowenthal, Barry Rountree, Martin Schulz, and Bronis de Supinski. 2015. Practical Resource Management in Power-Constrained, High Performance Computing. In <u>High Performance Parallel and Distributed Computing (HPDC).</u>

[53]

Olga Pearce, Hadia Ahmed, Rasmus W. Larsen, Peter Pirkelbauer, and David F. Richards. 2017. Exploring dynamic load imbalance solutions with the CoMD proxy application. <u>Future Generation Computer Systems</u> (2017). http://www.sciencedirect.com/science/article/pii/S0167739X17300560

[54]

Samuel D. Pollard, Nikhil Jain, Stephen Herbein, and Abhinav Bhatele. 2018. Evaluation of an Interference-free Node Allocation Policy on Fat-tree Clusters. In <u>Proceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis (SC '18).</u> IEEE Press, Piscataway, NJ, USA, Article 26, 13 pages. http://dl.acm.org/citation.cfm?id=3291656.3291691

[55]

R. Rajachandrasekar, J. Jaswani, H. Subramoni, and D. K. Panda. 2012. Minimizing Network Contention in InfiniBand Clusters with a QoS-Aware Data-Staging Framework. In <u>2012 IEEE International Conference on Cluster Computing.</u>

[56]

Barry Rountree, Dong H. Ahn, Bronis R. de Supinski, David K. Lowenthal, and Martin Schulz. 2012. Beyond DVFS: A First Look at Performance under a Hardware-Enforced Power Bound. In <u>IPDPS Workshops (HPPAC).</u> IEEE Computer Society, 947--953.

[57]

Barry Rountree and Stephanie Labasan. [n. d.]. Libmsr. https://github.com/LLNL/libmsr. ([n. d.]).

[58]

P. Sadayappan and F. Ercal. 1987. Nearest-Neighbor Mapping of Finite Element Graphs onto Processor Meshes. <u>Computers, IEEE Transactions on</u> C-36, 12 (Dec 1987), 1408--1424.

[59]

R. Sakamoto, T. Cao, M. Kondo, K. Inoue, M. Ueda, T. Patki, D. Ellsworth, B. Rountree, and M. Schulz. 2017. Production Hardware Overprovisioning: Real-World Performance Optimization Using an Extensible Power-Aware Resource Management Framework. In <u>2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS).</u> 957--966.

[60]

R. Sakamoto, T. Patki, T. Cao, M. Kondo, K. Inoue, M. Ueda, D. Ellsworth, B. Rountree, and M. Schulz. 2018. Analyzing Resource Trade-offs in Hardware Over-provisioned Supercomputers. In <u>2018 IEEE International Parallel and Distributed Processing Symposium (IPDPS).</u> 526--535.

[61]

Samie B. Samaan. 2004. The Impact of Device Parameter Variations on the Frequency and Performance of VLSI Chips. In <u>Computer Aided Design, 2004. ICCAD-2004. IEEE/ACM International Conference on.</u> 343--346.

[62]

Aliaksei Sandryhaila and José MF Moura. 2013. Discrete signal processing on graphs. <u>IEEE transactions on signal processing</u> 61, 7 (2013), 1644--1656.

[63]

Osman Sarood, Akhil Langer, Abhishek Gupta, and Laxmikant V. Kale. 2014. Maximizing Throughput of Overprovisioned HPC Data Centers Under a Strict Power Budget. In <u>Supercomputing.</u>

[64]

Lee Savoie, David K Lowenthal, Bronis R De Supinski, Tanzima Islam, Kathryn Mohror, Barry Rountree, and Martin Schulz. 2016. I/O Aware Power Shifting. In <u>Proceedings - 2016 IEEE 30th International Parallel and Distributed Processing Symposium, IPDPS 2016.</u> Institute of Electrical and Electronics Engineers Inc., United States, 740--749.

[65]

Kathleen Shoga, Barry Rountree, and Martin Schulz. 2014. Whitelisting MSRs with msr-safe. <u>Third Workshop on Extreme-Scale Programming Tools, held with SC 14</u> (November 2014).

[66]

Wei Tang, N. Desai, D. Buettner, and Zhiling Lan. 2010. Analyzing and Adjusting User Runtime Estimates to Improve Job Scheduling on the Blue Gene/P. In <u>Parallel Distributed Processing (IPDPS), 2010 IEEE International Symposium on.</u> 1--11.

[67]

R. Teodorescu and J. Torrellas. 2008. Variation-Aware Application Scheduling and Power Management for Chip Multiprocessors. In <u>Computer Architecture, 2008. ISCA '08. 35th International Symposium on.</u> 363--374.

[68]

Sagar Thapaliya, Purushotham Bangalore, Jay Lofstead, Kathryn Mohror, and Adam Moody. 2014. IO-Cop: Managing Concurrent Accesses to Shared Parallel File System. In <u>International Conference on Parallel Processing Workshops (ICCPW).</u>

[69]

L. Theisen, A. Shah, and F. Wolf. 2014. Down to Earth - How to Visualize Traffic on High-dimensional Torus Networks. In <u>2014 First Workshop on Visual Performance Analysis.</u> 17--23.

Digital Library

[70]

J. J. Thiagarajan, R. Anirudh, B. Kailkhura, N. Jain, T. Islam, A. Bhatele, J. Yeom, and T. Gamblin. 2018. PADDLE: Performance Analysis Using a Data-Driven Learning Environment. In <u>2018 IEEE International Parallel and Distributed Processing Symposium (IPDPS).</u> 784--793.

[71]

Ehsan Totoni, Akhil Langer, Josep Torrellas, and Laxmikant Kale. 2015. Scheduling for HPC Systems with Process Variation Heterogeneity. (January 2015).

[72]

James W. Tschanz, James T. Kao, Siva G. Narendra, Raj Nair, Dmitri A. Antoniadis, Anantha P. Chandrakasan, and Vivek De. 2002. Adaptive Body Bias for Reducing Impacts of Die-to-die and Within-die Parameter Variations on Microprocessor Frequency and Leakage. <u>Solid-State Circuits, IEEE Journal of</u> 37, 11 (Nov 2002), 1396--1402.

[73]

Ozan Tuncer, Emre Ates, Yijia Zhang, Ata Turk, Jim Brandt, Vitus Leung, Manuel Egele, and Ayse K. Coskun. 2017. Diagnosing Performance Variations in HPC Applications using Machine Learning. <u>International Supercomputing Conference in High Performance Computing (ISC-HPC)</u> (June 2017).

[74]

C. T. Vaughan and R. F. Barrett. 2015. Enabling Tractable Exploration of the Performance of Adaptive Mesh Refinement. In <u>2015 IEEE International Conference on Cluster Computing.</u> 746--752.

Digital Library

[75]

X. Yang, J. Jenkins, M. Mubarak, R. B. Ross, and Z. Lan. 2016. Watch Out for the Bully! Job Interference Study on Dragonfly Network. In <u>SC '16: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis.</u> 750--760.

[76]

Xu Yang, Zhou Zhou, Sean Wallace, Zhiling Lan, Wei Tang, Susan Coghlan, and Michael E. Papka. 2013. Integrating Dynamic Pricing of Electricity into Energy Aware Scheduling for HPC Systems. In <u>International Conference for High Performance Computing, Networking, Storage and Analysis.</u> 17--22.

[77]

Ziming Zhang, Michael Lang, Scott Pakin, and Song Fu. 2014. Trapped Capacity: Scheduling under a Power Cap to Maximize Machine-room Through-put. In <u>Proceedings of the 2nd International Workshop on Energy Efficient Supercomputing.</u> IEEE Press, 41--50.

[78]

Zhou Zhou, Zhiling Lan, Wei Tang, and Narayan Desai. 2014. Reducing Energy Costs for IBM Blue Gene/P via Power-Aware Job Scheduling. In <u>Job Scheduling Strategies for Parallel Processing.</u> Springer Berlin Heidelberg, 96--115.

[79]

Z. Zhou, X. Yang, Z. Lan, P. Rich, W. Tang, V. Morozov, and N. Desai. 2015. Improving Batch Scheduling on Blue Gene/Q by Relaxing 5D Torus Network Allocation Constraints. In <u>2015 IEEE International Parallel and Distributed Processing Symposium.</u> 439--448.

Digital Library

Cited By

Ceccato RCléto JLeite GRigo SDiaz JYviquel H(2024)Spinner: Enhancing HPC Experimentation with a Streamlined Parameter Sweep Tool2024 International Symposium on Computer Architecture and High Performance Computing Workshops (SBAC-PADW)10.1109/SBAC-PADW64858.2024.00013(1-11)Online publication date: 13-Nov-2024
https://doi.org/10.1109/SBAC-PADW64858.2024.00013
Banday BIslam TMarathe A(2024)PERFGEN: A Synthesis and Evaluation Framework for Performance Data using Generative AI2024 IEEE 48th Annual Computers, Software, and Applications Conference (COMPSAC)10.1109/COMPSAC61105.2024.00035(188-197)Online publication date: 2-Jul-2024
https://doi.org/10.1109/COMPSAC61105.2024.00035
Kilic OWang TTurilli MTitov MMerzky APouchard LJha S(2024)Workflow Mini-Apps: Portable, Scalable, Tunable & Faithful Representations of Scientific Workflows2024 IEEE 24th International Symposium on Cluster, Cloud and Internet Computing (CCGrid)10.1109/CCGrid59990.2024.00059(465-477)Online publication date: 6-May-2024
https://doi.org/10.1109/CCGrid59990.2024.00059
Show More Cited By

Index Terms

Performance optimality or reproducibility: that is the question
1. Computing methodologies
2. General and reference
  1. Cross-computing tools and techniques
    1. Performance

Recommendations

Do machine learning platforms provide out-of-the-box reproducibility?
Abstract
Science is experiencing an ongoing reproducibility crisis. In light of this crisis, our objective is to investigate whether machine learning platforms provide out-of-the-box reproducibility. Our method is twofold: First, we survey ...
Highlights
- A framework for comparing the support for reproducibility of machine learning platforms is proposed.
Reproducibility and Performance: Why Choose?
Research processes often rely on high-performance computing (HPC), but HPC is often seen as antithetical to “reproducibility”: one would have to choose between software that achieves high performance and software that can be deployed in a reproducible ...
Reproducibility in High Performance Computing
SC '15: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis

Ensuring reliability and reproducibility in computational research raises unique challenges in the supercomputing context. Specialized architectures, extensive and customized software, and complex workflows all raise barriers to transparency, while ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

SC '19: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis

November 2019

1921 pages

ISBN:9781450362290

DOI:10.1145/3295500

General Chair:
Michela Taufer,
Program Chairs:
Pavan Balaji,
Antonio J. Peña

Copyright © 2019 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGHPC: ACM Special Interest Group on High Performance Computing, Special Interest Group on High Performance Computing

In-Cooperation

IEEE CS

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 17 November 2019

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

SC '19

Sponsor:

SIGHPC

SC '19: The International Conference for High Performance Computing, Networking, Storage, and Analysis

November 17 - 19, 2019

Colorado, Denver

Acceptance Rates

Overall Acceptance Rate 1,516 of 6,373 submissions, 24%

Upcoming Conference

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

19
Total Citations
View Citations
1,168
Total Downloads

Downloads (Last 12 months)357
Downloads (Last 6 weeks)29

Reflects downloads up to 25 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Ceccato RCléto JLeite GRigo SDiaz JYviquel H(2024)Spinner: Enhancing HPC Experimentation with a Streamlined Parameter Sweep Tool2024 International Symposium on Computer Architecture and High Performance Computing Workshops (SBAC-PADW)10.1109/SBAC-PADW64858.2024.00013(1-11)Online publication date: 13-Nov-2024
https://doi.org/10.1109/SBAC-PADW64858.2024.00013
Banday BIslam TMarathe A(2024)PERFGEN: A Synthesis and Evaluation Framework for Performance Data using Generative AI2024 IEEE 48th Annual Computers, Software, and Applications Conference (COMPSAC)10.1109/COMPSAC61105.2024.00035(188-197)Online publication date: 2-Jul-2024
https://doi.org/10.1109/COMPSAC61105.2024.00035
Kilic OWang TTurilli MTitov MMerzky APouchard LJha S(2024)Workflow Mini-Apps: Portable, Scalable, Tunable & Faithful Representations of Scientific Workflows2024 IEEE 24th International Symposium on Cluster, Cloud and Internet Computing (CCGrid)10.1109/CCGrid59990.2024.00059(465-477)Online publication date: 6-May-2024
https://doi.org/10.1109/CCGrid59990.2024.00059
Nicolae BIslam TRoss RVan Dam HAssogba KShpilker PTitov MTurilli MWang TKilic OJha SPouchard L(2023)Building the I (Interoperability) of FAIR for Performance Reproducibility of Large-Scale Composable Workflows in RECUP2023 IEEE 19th International Conference on e-Science (e-Science)10.1109/e-Science58273.2023.10254808(1-7)Online publication date: 9-Oct-2023
https://doi.org/10.1109/e-Science58273.2023.10254808
Ramadan TLahiry AIslam T(2023)Novel Representation Learning Technique Using Graphs for Performance Analytics2023 International Conference on Machine Learning and Applications (ICMLA)10.1109/ICMLA58977.2023.00198(1311-1318)Online publication date: 15-Dec-2023
https://doi.org/10.1109/ICMLA58977.2023.00198
Hossain FYuneda T(2023)An exquisitely sensitive variant-conscious post-silicon Hardware Trojan detectionIntegration10.1016/j.vlsi.2023.10206493(102064)Online publication date: Nov-2023
https://doi.org/10.1016/j.vlsi.2023.102064
Pouchard LIslam TNicolae B(2022)Challenges for Implementing FAIR Digital Objects with High Performance WorkflowsResearch Ideas and Outcomes10.3897/rio.8.e948358Online publication date: 12-Oct-2022
https://doi.org/10.3897/rio.8.e94835
Abhinit IAdams EAlam KChase BDeelman EGorenstein LHudson SIslam TLarson JLentner GMandal ANavarro JNicolae BPouchard LRoss RRoy BRynge MSerebrenik AVahi KWild SXin Yda Silva RFilgueira R(2022)Novel Proposals for FAIR, Automated, Recommendable, and Robust Workflows2022 IEEE/ACM Workshop on Workflows in Support of Large-Scale Science (WORKS)10.1109/WORKS56498.2022.00016(84-92)Online publication date: Nov-2022
https://doi.org/10.1109/WORKS56498.2022.00016
Ritter MTarraf AGeib ADaoud NMohr BWolf F(2022)Conquering Noise With Hardware Counters on HPC Systems2022 IEEE/ACM Workshop on Programming and Performance Visualization Tools (ProTools)10.1109/ProTools56701.2022.00007(1-10)Online publication date: Nov-2022
https://doi.org/10.1109/ProTools56701.2022.00007
Hossain FSakib TAshar MFerdian R(2021)A dual mode self-test for a stand alone AES corePLOS ONE10.1371/journal.pone.026143116:12(e0261431)Online publication date: 23-Dec-2021
https://doi.org/10.1371/journal.pone.0261431
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Figures

Tables

Media

View Table of Conten