research-article

PANORAMA

Authors:

Christopher Carothers,

Anirban Mandal,

Jeffrey S Vetter,

Claris Castillo,

Jeremy Meredith,

Thomas Proffen,

Rafael Ferreira da SilvaAuthors Info & Claims

International Journal of High Performance Computing Applications, Volume 31, Issue 1

Pages 4 - 18

https://doi.org/10.1177/1094342015594515

Published: 01 January 2017 Publication History

Abstract

Computational science is well established as the third pillar of scientific discovery and is on par with experimentation and theory. However, as we move closer toward the ability to execute exascale calculations and process the ensuing extreme-scale amounts of data produced by both experiments and computations alike, the complexity of managing the compute and data analysis tasks has grown beyond the capabilities of domain scientists. Thus, workflow management systems are absolutely necessary to ensure current and future scientific discoveries. A key research question for these workflow management systems concerns the performance optimization of complex calculation and data analysis tasks. The central contribution of this article is a description of the PANORAMA approach for modeling and diagnosing the run-time performance of complex scientific workflows. This approach integrates extreme-scale systems testbed experimentation, structured analytical modeling, and parallel systems simulation into a comprehensive workflow framework called Pegasus for understanding and improving the overall performance of complex scientific workflows.

References

[1]

<ref id="bibr1-1094342015594515">Alexandrov A, Ionescu MF, Schauser KE . 1995 LogGP: Incorporating long messages into the LogP model. In: 7th annual ACM symposium on parallel algorithms and architectures, pp. pp.95-–105.

[2]

<ref id="bibr2-1094342015594515">Arnold O, Bilheux JC, Borreguero JM . 2014 Mantid: Data analysis and visualization package for neutron scattering and <inline-formula id="inline-formula2-1094342015594515"><mml:math display="inline" id="math2-1094342015594515"><mml:mrow><mml:mi>μ</mml:mi></mml:mrow></mml:math></inline-formula>SR experiments. Nuclear Instruments and Methods in Physics Research Section A Volume 764 : pp.156-–166.

[3]

<ref id="bibr3-1094342015594515">Baldin I, Xin Y, Evans D . 2012 Exogeni: A multi-domain infrastructure-as-a-service testbed. In: 8th international ICST conference on testbeds and research infrastructures for the development of networks and communities, pp. pp.97-–113.

[4]

<ref id="bibr4-1094342015594515">Barnes PD, Carothers CD, Jefferson D . 2013 Warp speed: executing time warp on 1,966,080 cores. In: ACM SIGSIM conference on principles of advanced discrete simulation, pp. pp.327-–336.

Digital Library

[5]

<ref id="bibr5-1094342015594515">Bauer DW, Carothers CD, Holder A 2009 Scalable time warp on blue gene supercomputers. In: ACM/IEEE/SCS 23rd workshop on principles of advanced and distributed simulation, pp. pp.35-–44.

Digital Library

[6]

<ref id="bibr6-1094342015594515">Bharathi S, Chervenak A, Deelman E . 2008 Characterization of scientific workflows. In: 3rd workshop on workflows in support of large-scale science, pp. pp.1-–10.

[7]

<ref id="bibr7-1094342015594515">Carothers CD, Perumalla KS 2010 On deciding between conservative and optimistic approaches on massively parallel-platforms. In: Winter simulation conference'10, pp. pp.678-–687.

Digital Library

[8]

<ref id="bibr8-1094342015594515">Carothers CD, Perumalla KS, Fujimoto RM 1999 Efficient optimistic parallel simulations using reverse computation. ACM Transactions on Modeling and Computer Simulation Volume 9 Issue 3: pp.224-–253.

Digital Library

[9]

<ref id="bibr9-1094342015594515">Case D, Berryman JT, Betz RM . 2015 AMBER 15. San Francisco, CA: University of California.

[10]

<ref id="bibr10-1094342015594515">Chen W, Ferreira da Silva R, Deelman E . 2014 Using imbalance metrics to optimize task clustering in scientific workflow executions. Future Generation Computer Systems .

[11]

<ref id="bibr11-1094342015594515">Culler D, Karp R, Patterson DA . 1993 LogP: Towards a realistic model of parallel computation. ACM SIGPLAN Notices Volume 28 Issue 7: pp.1-–12.

Digital Library

[12]

<ref id="bibr12-1094342015594515">Czechowski K, Battaglino C, McClanahan C . 2012 On the communication complexity of 3D FFTs and its implications for exascale. In: 26th ACM international conference on supercomputing, pp. pp.205-–214.

Digital Library

[13]

<ref id="bibr13-1094342015594515">da Cruz S, da Silva F, Gadelha LMR Jr . 2008 A lightweight middleware monitor for distributed scientific workflows. In: 8th IEEE international symposium on cluster computing and the grid, pp. pp.693-–698.

Digital Library

[14]

<ref id="bibr14-1094342015594515">Deelman E, Blythe J, Gil Y . 2002 Pegasus: Planning for execution in grids. GriPhyN technical report Volume 20 : pp.1-–6.

[15]

<ref id="bibr15-1094342015594515">Deelman E, Callaghan S . 2006 Managing large-scale workflow execution from resource provisioning to provenance tracking: The Cybershake example. In: 2nd IEEE international conference on e-Science and grid computing, pp. pp.4-–6.

Digital Library

[16]

<ref id="bibr16-1094342015594515">Deelman E, Chervenak A 2008 Data management challenges of data-intensive scientific workflows. IEEE international symposium on cluster computing and the grid, pp. pp.687-–692.

Digital Library

[17]

<ref id="bibr17-1094342015594515">Deelman E, Gannon D, Shields M . 2009 Workflows and e-science: An overview of workflow system features and capabilities. Future Generation Computer Systems Volume 25 Issue 5: pp.528-–540.

Digital Library

[18]

<ref id="bibr18-1094342015594515">Deelman E, Singh G, Su M . 2005 Pegasus: A framework for mapping complex scientific workflows onto distributed systems. Scientific Programming Volume 13 Issue 3: pp.219-–237.

Digital Library

[19]

<ref id="bibr19-1094342015594515">Deelman E, Vahi K, Juve G . 2015 Pegasus, a workflow management system for science automation. Future Generation Computer Systems Volume 46 : pp.17-–35.

Digital Library

[20]

<ref id="bibr20-1094342015594515">Ferreira da Silva R, Chen W . 2014a Community resources for enabling and evaluating research on scientific workflows. In: 10th IEEE international conference on e-Science, pp. pp.177-–184.

Digital Library

[21]

<ref id="bibr21-1094342015594515">Ferreira da Silva R, Glatard T 2013 A science-gateway workload archive to study pilot jobs, user activity, bag of tasks, task sub-steps, and workflow executions. In: Euro-Par 2012: Parallel Processing Workshops Lecture Notes in Computer Science, vol. 7640. New York: Springer, pp. pp.79-–88.

Digital Library

[22]

<ref id="bibr22-1094342015594515">Ferreira da Silva R, Glatard T . 2013a Self-healing of workflow activity incidents on distributed computing infrastructures. Future Generation Computer Systems Volume 29 Issue 8: pp.2284-–2294.

Digital Library

[23]

<ref id="bibr23-1094342015594515">Ferreira da Silva R, Glatard T . 2014 b Controlling fairness and task granularity in distributed, online, non-clairvoyant workflow executions. Concurrency and Computation: Practice and Experience Volume 26 Issue 14: pp.2347-–2366.

Digital Library

[24]

<ref id="bibr24-1094342015594515">Ferreira da Silva R, Juve G, Deelman E . 2013b Toward fine-grained online task characteristics estimation in scientific workflows. In: 8th workshop on workflows in support of large-scale science, pp. pp.58-–67.

Digital Library

[25]

<ref id="bibr25-1094342015594515">Ferreira da Silva R, Rynge M, Juve G . 2015 Characterizing a high throughput computing workload: The Compact Muon Solenoid CMS experiment at LHC. In: 2015 international conference on computational science ICCS 2015.

[26]

<ref id="bibr26-1094342015594515">Gahvari H, Gropp W 2010 An introductory exascale feasibility study for FFTs and multigrid. In: IEEE international symposium on parallel distributed processing, pp. pp.1-–9.

[27]

<ref id="bibr27-1094342015594515">Germain-Renaud C, Cady A, Gauron P . 2011 The grid observatory. In: IEEE international symposium on cluster computing and the grid, pp. pp.114-–123.

Digital Library

[28]

<ref id="bibr28-1094342015594515">Gunter D, Deelman E . 2011 Online workflow management and performance analysis with stampede. In: 7th international conference on network and service management, pp. pp.1-–10.

Digital Library

[29]

<ref id="bibr29-1094342015594515">Hart DL 2011 Measuring teragrid: workload characterization for a high-performance computing federation. International Journal of High Performance Computing Applications Volume 25 Issue 4: pp.451-–465.

[30]

<ref id="bibr30-1094342015594515">Iosup A, Epema D 2011 Grid computing workloads. IEEE Internet Computing Volume 15 Issue 2: pp.19-–26.

Digital Library

[31]

<ref id="bibr31-1094342015594515">Iosup A, Li H, Jan M . 2008 The grid workloads archive. Future Generation Computer Systems Volume 24 Issue 7: pp.672-–686.

Digital Library

[32]

<ref id="bibr32-1094342015594515">Janssen CL, Adalsteinsson H, Kenny JP 2011 Using simulation to design extremescale applications and architectures: Programming model exploration. SIGMETRICS Performance Evaluation Review Volume 38 Issue 4: pp.4-–8.

Digital Library

[33]

<ref id="bibr33-1094342015594515">Juve G, Chervenak A, Deelman E . 2013 Characterizing and profiling scientific workflows. Future Generation Computer Systems Volume 29 Issue 3: pp.682-–692.

Digital Library

[34]

<ref id="bibr34-1094342015594515">Juve G, Tovar B . 2014 Practical resource monitoring for robust high throughput computing. Technical report, University of Southern California.

[35]

<ref id="bibr35-1094342015594515">Kephart J, Chess D 2003 The vision of autonomic computing. Computer Volume 36 Issue 1: pp.41-–50.

Digital Library

[36]

<ref id="bibr36-1094342015594515">Kondo D, Javadi B, Iosup A . 2010 The failure trace archive: Enabling comparative analysis of failures in diverse distributed systems. In: 10th IEEE/ACM international conference on cluster, cloud and grid computing, pp. pp.398-–407.

Digital Library

[37]

<ref id="bibr37-1094342015594515">Lee S, Meredith JS, Vetter JS 2015 Zermatt: A framework for automated performance modeling and online performance prediction. In: ACM international conference on supercomputing, Newport Beach, CA.

Digital Library

[38]

<ref id="bibr38-1094342015594515">Lindner B, Smith J 2012 Sassena: X-ray and neutron scattering calculated from molecular dynamics trajectories using massively parallel computers. Computer Physics Communications Volume 183 Issue 7: pp.1491-–1501.

[39]

<ref id="bibr39-1094342015594515">Liu N, Cope J, Carns P . 2012 On the role of burst buffers in leadership-class storage systems. In: IEEE 28th symposium on mass storage systems and technologies, pp. pp.1-–11.

[40]

<ref id="bibr40-1094342015594515">Madougou S, Shahand S, Santcroos M . 2013 Characterizing workflow-based activity on a production e-infrastructure using provenance data. Future Generation Computer Systems Volume 29 Issue 8: pp.1931-–1942.

Digital Library

[41]

<ref id="bibr41-1094342015594515">Mahambre S, Kulkarni P, Bellue U . 2012 Workload characterization for capacity planning and performance management in IAAS cloud. In: IEEE international conference on cloud computing in emerging markets, pp. pp.1-–7.

[42]

<ref id="bibr42-1094342015594515">Mandal A, Baldin I, Xin Y . 2014 Enabling persistent queries for cross-aggregate performance monitoring. IEEE Communications Magazine Volume 52 Issue 5: pp.157-–164.

[43]

<ref id="bibr43-1094342015594515">Mandal A, Ruth P, Baldin I . 2013 Evaluating I/O aware network management for scientific workflows on networked clouds. In: 3rd international workshop on network-aware data management, pp. pp.2:1-–2:10.

Digital Library

[44]

<ref id="bibr44-1094342015594515">Mao M, Humphrey M 2011 Auto-scaling to minimize cost and meet application deadlines in cloud workflows. In: International conference for high performance computing, networking, storage and analysis, pp. pp.49:1-–49:12.

Digital Library

[45]

<ref id="bibr45-1094342015594515">Mason T, Abernathy D, Anderson I . 2006 The Spallation Neutron Source in Oak Ridge: A powerful tool for materials research. Physica B: Condensed Matter Volume 385 : pp.955-–960.

[46]

<ref id="bibr46-1094342015594515">Mathis M, Semke J, Mahdavi J . 1997 The macroscopic behavior of the TCP congestion avoidance algorithm. SIGCOMM Computer Communication Review Volume 27 Issue 3: pp.67-–82.

Digital Library

[47]

<ref id="bibr47-1094342015594515">Mubarak M, Carothers CD, Ross R . 2012 Modeling a million-node dragonfly network using massively parallel discrete event simulation. In: 3rd international workshop on performance modeling, benchmarking and simulation of high performance computer systems, pp. pp.366-–376.

Digital Library

[48]

<ref id="bibr48-1094342015594515">Ostermann S, Plankensteiner K, Prodan R . 2009 Workflow monitoring and analysis tool for askalon. In: Grid and Services Evolution . New York: Springer, pp. pp.1-–14.

[49]

<ref id="bibr49-1094342015594515">Ostermann S, Prodan R . 2008a On the characteristics of grid workflows. In: CoreGRID Symposium - Euro-Par 2008, pp. pp.1-–12.

[50]

<ref id="bibr50-1094342015594515">Ostermann S, Prodan R . 2008b A trace-based investigation of the characteristics of grid workflows. In: Priol T, Vanneschi M eds. From Grids to Service and Pervasive Computing . New York: Springer, pp. pp.191-–203.

[51]

<ref id="bibr51-1094342015594515">Ostermann S, Prodan R . 2010 Dynamic cloud provisioning for scientific grid workflows. In: 11th IEEE/ACM international conference on grid computing, pp. pp.97-–104.

[52]

<ref id="bibr52-1094342015594515">Phillips JC, Braun R, Wang W . 2005 Scalable molecular dynamics with NAMD. Journal of Computational Chemistry Volume 26 Issue 16: pp.1781-–1802.

[53]

<ref id="bibr53-1094342015594515">Ramakrishnan L, Gannon D 2008 A survey of distributed workflow characteristics and resource requirements. Technical Report, Indiana University . Available at: <ext-link ext-link-type="uri" xlink:href="http://www.cs.indiana.edu/pub/techreports/TR671.pdf">http://www.cs.indiana.edu/pub/techreports/TR671.pdf</ext-link>.

[54]

<ref id="bibr54-1094342015594515">Ren Z, Xu X, Wan J . 2012 Workload characterization on a production Hadoop cluster: A case study on Taobao. In: IEEE international symposium on workload characterization, pp. pp.3-–13.

Digital Library

[55]

<ref id="bibr55-1094342015594515">Rodrigues AF, Hemmert KS, Barrett BW . 2011 The structural simulation toolkit. SIGMETRICS Performance Evaluation Review Volume 38 Issue 4: pp.37-–42.

Digital Library

[56]

<ref id="bibr56-1094342015594515">Russell N, Aalst W . 2006 Workflow exception patterns. In: Dubois E, Pohl K eds. Advanced Information Systems Engineering Lecture Notes in Computer Science, vol. 4001. Berlin: Springer, pp. pp.288-–302.

Digital Library

[57]

<ref id="bibr57-1094342015594515">Samak T, Gunter D, Goode M, Deelman E, Juve G, Mehta G, Silva F, Vahi K 2011 a Online fault and anomaly detection for large-scale scientific workflows. In: 2011 IEEE 13th international conference on high performance computing and communications HPCC . IEEE, pp. pp.373-–381.

Digital Library

[58]

<ref id="bibr58-1094342015594515">Samak T, Gunter D, Goode M, Deelman E, Mehta G, Silva F, Vahi K 2011 b Failure prediction and localization in large scientific workflows. In: 6th workshop on workflows in support of large-scale science . ACM, pp. pp.107-–116.

Digital Library

[59]

<ref id="bibr59-1094342015594515">Singh G, Kesselman C . 2006 Application-level resource provisioning on the grid. In: 2nd IEEE international conference on e-science and grid computing.

Digital Library

[60]

<ref id="bibr60-1094342015594515">Spafford K, Vetter JS 2012 Aspen: A domain specific language for performance modeling. In: ACM/IEEE international conference for high performance computing, networking, storage, and analysis, pp. pp.84:1-–84:11.

Digital Library

[61]

<ref id="bibr61-1094342015594515">Spafford K, Vetter JS, Benson T . 2013 Modeling synthetic aperture radar computation with Aspen. International Journal of High Performance Computing Applications Volume 27 Issue 3: pp.255-–262.

[62]

<ref id="bibr62-1094342015594515">Srinivasan S, Juve G . 2014 A cleanup algorithm for implementing storage constraints in scientific workflow executions. In: 9th workshop on workflows in support of large-scale science, pp. pp.41-–49.

Digital Library

[63]

<ref id="bibr63-1094342015594515">Tierney B, Boote J, Boyd E . 2009 Instantiating a Global Network Measurement Framework. Technical Report LBNL-1452E, Lawrence Berkeley National Lab.

[64]

<ref id="bibr64-1094342015594515">Truong HL, Fahringer T 2004 Scalea-g: A unified monitoring and performance analysis system for the grid. In: Dikaiakos M ed. Grid Computing Lecture Notes in Computer Science, vol. 3165. Berlin: Springer, pp. pp.202-–211.

[65]

<ref id="bibr65-1094342015594515">Vahi K, Harvey I, Samak T, Gunter D, Evans K, Rogers D, Taylor I, Goode M, Silva F, Al-Shakarchi E . 2013a A case study into using common real-time workflow monitoring infrastructure for scientific workflows. Journal of Grid Computing Volume 11 Issue 3: pp.381-–406.

Digital Library

[66]

<ref id="bibr66-1094342015594515">Vahi K, Rynge M . 2013b Rethinking data management for big data scientific workflows. In: Workshop on BigData and science: infrastructure and services, pp. pp.1-–9.

[67]

<ref id="bibr67-1094342015594515">Valiant LG 1990 A bridging model for parallel computation. Communications of the ACM Volume 33 Issue 8: pp.103-–111.

Digital Library

[68]

<ref id="bibr68-1094342015594515">Vöeckler JS, Mehta G . 2006 Kickstarting remote applications. In: 2nd international workshop on grid computing environments, pp. pp.1-–8.

Cited By

Dongarra JTourancheau BJin HRaghavan KPapadimitriou GWang CMandal AKiran MDeelman EBalaprakash P(2023)Graph neural networks for detecting anomalies in scientific workflowsInternational Journal of High Performance Computing Applications10.1177/1094342023117214037:3-4(394-411)Online publication date: 1-Jul-2023
https://dl.acm.org/doi/10.1177/10943420231172140
Liu ZKettimuthu RChung JAnanthakrishnan RLink MFoster I(2021)Design and Evaluation of a Simple Data Interface for Efficient Data Transfer across Diverse StorageACM Transactions on Modeling and Performance Evaluation of Computing Systems10.1145/34520076:1(1-25)Online publication date: 29-May-2021
https://dl.acm.org/doi/10.1145/3452007
Roy RPatel TTiwari DMartínez JDuato JJohn L(2021)SatoriProceedings of the 48th Annual International Symposium on Computer Architecture10.1109/ISCA52012.2021.00031(292-305)Online publication date: 14-Jun-2021
https://dl.acm.org/doi/10.1109/ISCA52012.2021.00031
Show More Cited By

Index Terms

PANORAMA

Index terms have been assigned to the content through auto-classification.

Recommendations

A Survey of Data-Intensive Scientific Workflow Management

Nowadays, more and more computer-based scientific experiments need to handle massive amounts of data. Their data processing consists of multiple computational steps and dependencies within them. A data-intensive scientific workflow is useful for ...
Evaluating Distributed Platforms for Protein-Guided Scientific Workflow
XSEDE '14: Proceedings of the 2014 Annual Conference on Extreme Science and Engineering Discovery Environment

Complex and large-scale applications in different scientific disciplines are often represented as a set of independent tasks, known as workflows. Many scientific workflows have intensive resource requirements. Therefore, different distributed platforms, ...
Scientific Workflow Partitioning in Multisite Cloud
Revised Selected Papers, Part I, of the Euro-Par 2014 International Workshops on Parallel Processing - Volume 8805

Scientific workflows allow scientists to conduct experiments that manipulate data with multiple computational activities using Scientific Workflow Management Systems SWfMSs. As the scale of the data increases, SWfMSs need to support workflow execution ...

Comments

Information & Contributors

Information

Published In

cover image International Journal of High Performance Computing Applications

International Journal of High Performance Computing Applications Volume 31, Issue 1

1 2017

111 pages

ISSN:1094-3420

Issue’s Table of Contents

Copyright © © The Authors 2015.

Publisher

Sage Publications, Inc.

United States

Publication History

Published: 01 January 2017

Author Tags

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

5
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 12 Aug 2024

Other Metrics

View Author Metrics

Citations

Cited By

Dongarra JTourancheau BJin HRaghavan KPapadimitriou GWang CMandal AKiran MDeelman EBalaprakash P(2023)Graph neural networks for detecting anomalies in scientific workflowsInternational Journal of High Performance Computing Applications10.1177/1094342023117214037:3-4(394-411)Online publication date: 1-Jul-2023
https://dl.acm.org/doi/10.1177/10943420231172140
Liu ZKettimuthu RChung JAnanthakrishnan RLink MFoster I(2021)Design and Evaluation of a Simple Data Interface for Efficient Data Transfer across Diverse StorageACM Transactions on Modeling and Performance Evaluation of Computing Systems10.1145/34520076:1(1-25)Online publication date: 29-May-2021
https://dl.acm.org/doi/10.1145/3452007
Roy RPatel TTiwari DMartínez JDuato JJohn L(2021)SatoriProceedings of the 48th Annual International Symposium on Computer Architecture10.1109/ISCA52012.2021.00031(292-305)Online publication date: 14-Jun-2021
https://dl.acm.org/doi/10.1109/ISCA52012.2021.00031
Dongarra JTourancheau BDeelman EMandal AJiang MSakellariou R(2019)The role of machine learning in scientific workflowsInternational Journal of High Performance Computing Applications10.1177/109434201985212733:6(1128-1139)Online publication date: 1-Nov-2019
https://dl.acm.org/doi/10.1177/1094342019852127
Buyya RSrirama SCasale GCalheiros RSimmhan YVarghese BGelenbe EJavadi BVaquero LNetto MToosi ARodriguez MLlorente IVimercati SSamarati PMilojicic DVarela CBahsoon RAssuncao MRana OZhou WJin HGentzsch WZomaya AShen H(2018)A Manifesto for Future Generation Cloud ComputingACM Computing Surveys10.1145/324173751:5(1-38)Online publication date: 19-Nov-2018
https://dl.acm.org/doi/10.1145/3241737

View Options

View options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Issue’s Table of Contents