Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

PANORAMA

Published: 01 January 2017 Publication History
  • Get Citation Alerts
  • Abstract

    Computational science is well established as the third pillar of scientific discovery and is on par with experimentation and theory. However, as we move closer toward the ability to execute exascale calculations and process the ensuing extreme-scale amounts of data produced by both experiments and computations alike, the complexity of managing the compute and data analysis tasks has grown beyond the capabilities of domain scientists. Thus, workflow management systems are absolutely necessary to ensure current and future scientific discoveries. A key research question for these workflow management systems concerns the performance optimization of complex calculation and data analysis tasks. The central contribution of this article is a description of the PANORAMA approach for modeling and diagnosing the run-time performance of complex scientific workflows. This approach integrates extreme-scale systems testbed experimentation, structured analytical modeling, and parallel systems simulation into a comprehensive workflow framework called Pegasus for understanding and improving the overall performance of complex scientific workflows.

    References

    [1]
    <ref id="bibr1-1094342015594515">Alexandrov A, Ionescu MF, Schauser KE . 1995 LogGP: Incorporating long messages into the LogP model. In: 7th annual ACM symposium on parallel algorithms and architectures, pp. pp.95-–105.
    [2]
    <ref id="bibr2-1094342015594515">Arnold O, Bilheux JC, Borreguero JM . 2014 Mantid: Data analysis and visualization package for neutron scattering and <inline-formula id="inline-formula2-1094342015594515"><mml:math display="inline" id="math2-1094342015594515"><mml:mrow><mml:mi>μ</mml:mi></mml:mrow></mml:math></inline-formula>SR experiments. Nuclear Instruments and Methods in Physics Research Section A Volume 764 : pp.156-–166.
    [3]
    <ref id="bibr3-1094342015594515">Baldin I, Xin Y, Evans D . 2012 Exogeni: A multi-domain infrastructure-as-a-service testbed. In: 8th international ICST conference on testbeds and research infrastructures for the development of networks and communities, pp. pp.97-–113.
    [4]
    <ref id="bibr4-1094342015594515">Barnes PD, Carothers CD, Jefferson D . 2013 Warp speed: executing time warp on 1,966,080 cores. In: ACM SIGSIM conference on principles of advanced discrete simulation, pp. pp.327-–336.
    [5]
    <ref id="bibr5-1094342015594515">Bauer DW, Carothers CD, Holder A 2009 Scalable time warp on blue gene supercomputers. In: ACM/IEEE/SCS 23rd workshop on principles of advanced and distributed simulation, pp. pp.35-–44.
    [6]
    <ref id="bibr6-1094342015594515">Bharathi S, Chervenak A, Deelman E . 2008 Characterization of scientific workflows. In: 3rd workshop on workflows in support of large-scale science, pp. pp.1-–10.
    [7]
    <ref id="bibr7-1094342015594515">Carothers CD, Perumalla KS 2010 On deciding between conservative and optimistic approaches on massively parallel-platforms. In: Winter simulation conference'10, pp. pp.678-–687.
    [8]
    <ref id="bibr8-1094342015594515">Carothers CD, Perumalla KS, Fujimoto RM 1999 Efficient optimistic parallel simulations using reverse computation. ACM Transactions on Modeling and Computer Simulation Volume 9 Issue 3: pp.224-–253.
    [9]
    <ref id="bibr9-1094342015594515">Case D, Berryman JT, Betz RM . 2015 AMBER 15. San Francisco, CA: University of California.
    [10]
    <ref id="bibr10-1094342015594515">Chen W, Ferreira da Silva R, Deelman E . 2014 Using imbalance metrics to optimize task clustering in scientific workflow executions. Future Generation Computer Systems .
    [11]
    <ref id="bibr11-1094342015594515">Culler D, Karp R, Patterson DA . 1993 LogP: Towards a realistic model of parallel computation. ACM SIGPLAN Notices Volume 28 Issue 7: pp.1-–12.
    [12]
    <ref id="bibr12-1094342015594515">Czechowski K, Battaglino C, McClanahan C . 2012 On the communication complexity of 3D FFTs and its implications for exascale. In: 26th ACM international conference on supercomputing, pp. pp.205-–214.
    [13]
    <ref id="bibr13-1094342015594515">da Cruz S, da Silva F, Gadelha LMR Jr . 2008 A lightweight middleware monitor for distributed scientific workflows. In: 8th IEEE international symposium on cluster computing and the grid, pp. pp.693-–698.
    [14]
    <ref id="bibr14-1094342015594515">Deelman E, Blythe J, Gil Y . 2002 Pegasus: Planning for execution in grids. GriPhyN technical report Volume 20 : pp.1-–6.
    [15]
    <ref id="bibr15-1094342015594515">Deelman E, Callaghan S . 2006 Managing large-scale workflow execution from resource provisioning to provenance tracking: The Cybershake example. In: 2nd IEEE international conference on e-Science and grid computing, pp. pp.4-–6.
    [16]
    <ref id="bibr16-1094342015594515">Deelman E, Chervenak A 2008 Data management challenges of data-intensive scientific workflows. IEEE international symposium on cluster computing and the grid, pp. pp.687-–692.
    [17]
    <ref id="bibr17-1094342015594515">Deelman E, Gannon D, Shields M . 2009 Workflows and e-science: An overview of workflow system features and capabilities. Future Generation Computer Systems Volume 25 Issue 5: pp.528-–540.
    [18]
    <ref id="bibr18-1094342015594515">Deelman E, Singh G, Su M . 2005 Pegasus: A framework for mapping complex scientific workflows onto distributed systems. Scientific Programming Volume 13 Issue 3: pp.219-–237.
    [19]
    <ref id="bibr19-1094342015594515">Deelman E, Vahi K, Juve G . 2015 Pegasus, a workflow management system for science automation. Future Generation Computer Systems Volume 46 : pp.17-–35.
    [20]
    <ref id="bibr20-1094342015594515">Ferreira da Silva R, Chen W . 2014a Community resources for enabling and evaluating research on scientific workflows. In: 10th IEEE international conference on e-Science, pp. pp.177-–184.
    [21]
    <ref id="bibr21-1094342015594515">Ferreira da Silva R, Glatard T 2013 A science-gateway workload archive to study pilot jobs, user activity, bag of tasks, task sub-steps, and workflow executions. In: Euro-Par 2012: Parallel Processing Workshops Lecture Notes in Computer Science, vol. 7640. New York: Springer, pp. pp.79-–88.
    [22]
    <ref id="bibr22-1094342015594515">Ferreira da Silva R, Glatard T . 2013a Self-healing of workflow activity incidents on distributed computing infrastructures. Future Generation Computer Systems Volume 29 Issue 8: pp.2284-–2294.
    [23]
    <ref id="bibr23-1094342015594515">Ferreira da Silva R, Glatard T . 2014 b Controlling fairness and task granularity in distributed, online, non-clairvoyant workflow executions. Concurrency and Computation: Practice and Experience Volume 26 Issue 14: pp.2347-–2366.
    [24]
    <ref id="bibr24-1094342015594515">Ferreira da Silva R, Juve G, Deelman E . 2013b Toward fine-grained online task characteristics estimation in scientific workflows. In: 8th workshop on workflows in support of large-scale science, pp. pp.58-–67.
    [25]
    <ref id="bibr25-1094342015594515">Ferreira da Silva R, Rynge M, Juve G . 2015 Characterizing a high throughput computing workload: The Compact Muon Solenoid CMS experiment at LHC. In: 2015 international conference on computational science ICCS 2015.
    [26]
    <ref id="bibr26-1094342015594515">Gahvari H, Gropp W 2010 An introductory exascale feasibility study for FFTs and multigrid. In: IEEE international symposium on parallel distributed processing, pp. pp.1-–9.
    [27]
    <ref id="bibr27-1094342015594515">Germain-Renaud C, Cady A, Gauron P . 2011 The grid observatory. In: IEEE international symposium on cluster computing and the grid, pp. pp.114-–123.
    [28]
    <ref id="bibr28-1094342015594515">Gunter D, Deelman E . 2011 Online workflow management and performance analysis with stampede. In: 7th international conference on network and service management, pp. pp.1-–10.
    [29]
    <ref id="bibr29-1094342015594515">Hart DL 2011 Measuring teragrid: workload characterization for a high-performance computing federation. International Journal of High Performance Computing Applications Volume 25 Issue 4: pp.451-–465.
    [30]
    <ref id="bibr30-1094342015594515">Iosup A, Epema D 2011 Grid computing workloads. IEEE Internet Computing Volume 15 Issue 2: pp.19-–26.
    [31]
    <ref id="bibr31-1094342015594515">Iosup A, Li H, Jan M . 2008 The grid workloads archive. Future Generation Computer Systems Volume 24 Issue 7: pp.672-–686.
    [32]
    <ref id="bibr32-1094342015594515">Janssen CL, Adalsteinsson H, Kenny JP 2011 Using simulation to design extremescale applications and architectures: Programming model exploration. SIGMETRICS Performance Evaluation Review Volume 38 Issue 4: pp.4-–8.
    [33]
    <ref id="bibr33-1094342015594515">Juve G, Chervenak A, Deelman E . 2013 Characterizing and profiling scientific workflows. Future Generation Computer Systems Volume 29 Issue 3: pp.682-–692.
    [34]
    <ref id="bibr34-1094342015594515">Juve G, Tovar B . 2014 Practical resource monitoring for robust high throughput computing. Technical report, University of Southern California.
    [35]
    <ref id="bibr35-1094342015594515">Kephart J, Chess D 2003 The vision of autonomic computing. Computer Volume 36 Issue 1: pp.41-–50.
    [36]
    <ref id="bibr36-1094342015594515">Kondo D, Javadi B, Iosup A . 2010 The failure trace archive: Enabling comparative analysis of failures in diverse distributed systems. In: 10th IEEE/ACM international conference on cluster, cloud and grid computing, pp. pp.398-–407.
    [37]
    <ref id="bibr37-1094342015594515">Lee S, Meredith JS, Vetter JS 2015 Zermatt: A framework for automated performance modeling and online performance prediction. In: ACM international conference on supercomputing, Newport Beach, CA.
    [38]
    <ref id="bibr38-1094342015594515">Lindner B, Smith J 2012 Sassena: X-ray and neutron scattering calculated from molecular dynamics trajectories using massively parallel computers. Computer Physics Communications Volume 183 Issue 7: pp.1491-–1501.
    [39]
    <ref id="bibr39-1094342015594515">Liu N, Cope J, Carns P . 2012 On the role of burst buffers in leadership-class storage systems. In: IEEE 28th symposium on mass storage systems and technologies, pp. pp.1-–11.
    [40]
    <ref id="bibr40-1094342015594515">Madougou S, Shahand S, Santcroos M . 2013 Characterizing workflow-based activity on a production e-infrastructure using provenance data. Future Generation Computer Systems Volume 29 Issue 8: pp.1931-–1942.
    [41]
    <ref id="bibr41-1094342015594515">Mahambre S, Kulkarni P, Bellue U . 2012 Workload characterization for capacity planning and performance management in IAAS cloud. In: IEEE international conference on cloud computing in emerging markets, pp. pp.1-–7.
    [42]
    <ref id="bibr42-1094342015594515">Mandal A, Baldin I, Xin Y . 2014 Enabling persistent queries for cross-aggregate performance monitoring. IEEE Communications Magazine Volume 52 Issue 5: pp.157-–164.
    [43]
    <ref id="bibr43-1094342015594515">Mandal A, Ruth P, Baldin I . 2013 Evaluating I/O aware network management for scientific workflows on networked clouds. In: 3rd international workshop on network-aware data management, pp. pp.2:1-–2:10.
    [44]
    <ref id="bibr44-1094342015594515">Mao M, Humphrey M 2011 Auto-scaling to minimize cost and meet application deadlines in cloud workflows. In: International conference for high performance computing, networking, storage and analysis, pp. pp.49:1-–49:12.
    [45]
    <ref id="bibr45-1094342015594515">Mason T, Abernathy D, Anderson I . 2006 The Spallation Neutron Source in Oak Ridge: A powerful tool for materials research. Physica B: Condensed Matter Volume 385 : pp.955-–960.
    [46]
    <ref id="bibr46-1094342015594515">Mathis M, Semke J, Mahdavi J . 1997 The macroscopic behavior of the TCP congestion avoidance algorithm. SIGCOMM Computer Communication Review Volume 27 Issue 3: pp.67-–82.
    [47]
    <ref id="bibr47-1094342015594515">Mubarak M, Carothers CD, Ross R . 2012 Modeling a million-node dragonfly network using massively parallel discrete event simulation. In: 3rd international workshop on performance modeling, benchmarking and simulation of high performance computer systems, pp. pp.366-–376.
    [48]
    <ref id="bibr48-1094342015594515">Ostermann S, Plankensteiner K, Prodan R . 2009 Workflow monitoring and analysis tool for askalon. In: Grid and Services Evolution . New York: Springer, pp. pp.1-–14.
    [49]
    <ref id="bibr49-1094342015594515">Ostermann S, Prodan R . 2008a On the characteristics of grid workflows. In: CoreGRID Symposium - Euro-Par 2008, pp. pp.1-–12.
    [50]
    <ref id="bibr50-1094342015594515">Ostermann S, Prodan R . 2008b A trace-based investigation of the characteristics of grid workflows. In: Priol T, Vanneschi M eds. From Grids to Service and Pervasive Computing . New York: Springer, pp. pp.191-–203.
    [51]
    <ref id="bibr51-1094342015594515">Ostermann S, Prodan R . 2010 Dynamic cloud provisioning for scientific grid workflows. In: 11th IEEE/ACM international conference on grid computing, pp. pp.97-–104.
    [52]
    <ref id="bibr52-1094342015594515">Phillips JC, Braun R, Wang W . 2005 Scalable molecular dynamics with NAMD. Journal of Computational Chemistry Volume 26 Issue 16: pp.1781-–1802.
    [53]
    <ref id="bibr53-1094342015594515">Ramakrishnan L, Gannon D 2008 A survey of distributed workflow characteristics and resource requirements. Technical Report, Indiana University . Available at: <ext-link ext-link-type="uri" xlink:href="http://www.cs.indiana.edu/pub/techreports/TR671.pdf">http://www.cs.indiana.edu/pub/techreports/TR671.pdf</ext-link>.
    [54]
    <ref id="bibr54-1094342015594515">Ren Z, Xu X, Wan J . 2012 Workload characterization on a production Hadoop cluster: A case study on Taobao. In: IEEE international symposium on workload characterization, pp. pp.3-–13.
    [55]
    <ref id="bibr55-1094342015594515">Rodrigues AF, Hemmert KS, Barrett BW . 2011 The structural simulation toolkit. SIGMETRICS Performance Evaluation Review Volume 38 Issue 4: pp.37-–42.
    [56]
    <ref id="bibr56-1094342015594515">Russell N, Aalst W . 2006 Workflow exception patterns. In: Dubois E, Pohl K eds. Advanced Information Systems Engineering Lecture Notes in Computer Science, vol. 4001. Berlin: Springer, pp. pp.288-–302.
    [57]
    <ref id="bibr57-1094342015594515">Samak T, Gunter D, Goode M, Deelman E, Juve G, Mehta G, Silva F, Vahi K 2011 a Online fault and anomaly detection for large-scale scientific workflows. In: 2011 IEEE 13th international conference on high performance computing and communications HPCC . IEEE, pp. pp.373-–381.
    [58]
    <ref id="bibr58-1094342015594515">Samak T, Gunter D, Goode M, Deelman E, Mehta G, Silva F, Vahi K 2011 b Failure prediction and localization in large scientific workflows. In: 6th workshop on workflows in support of large-scale science . ACM, pp. pp.107-–116.
    [59]
    <ref id="bibr59-1094342015594515">Singh G, Kesselman C . 2006 Application-level resource provisioning on the grid. In: 2nd IEEE international conference on e-science and grid computing.
    [60]
    <ref id="bibr60-1094342015594515">Spafford K, Vetter JS 2012 Aspen: A domain specific language for performance modeling. In: ACM/IEEE international conference for high performance computing, networking, storage, and analysis, pp. pp.84:1-–84:11.
    [61]
    <ref id="bibr61-1094342015594515">Spafford K, Vetter JS, Benson T . 2013 Modeling synthetic aperture radar computation with Aspen. International Journal of High Performance Computing Applications Volume 27 Issue 3: pp.255-–262.
    [62]
    <ref id="bibr62-1094342015594515">Srinivasan S, Juve G . 2014 A cleanup algorithm for implementing storage constraints in scientific workflow executions. In: 9th workshop on workflows in support of large-scale science, pp. pp.41-–49.
    [63]
    <ref id="bibr63-1094342015594515">Tierney B, Boote J, Boyd E . 2009 Instantiating a Global Network Measurement Framework. Technical Report LBNL-1452E, Lawrence Berkeley National Lab.
    [64]
    <ref id="bibr64-1094342015594515">Truong HL, Fahringer T 2004 Scalea-g: A unified monitoring and performance analysis system for the grid. In: Dikaiakos M ed. Grid Computing Lecture Notes in Computer Science, vol. 3165. Berlin: Springer, pp. pp.202-–211.
    [65]
    <ref id="bibr65-1094342015594515">Vahi K, Harvey I, Samak T, Gunter D, Evans K, Rogers D, Taylor I, Goode M, Silva F, Al-Shakarchi E . 2013a A case study into using common real-time workflow monitoring infrastructure for scientific workflows. Journal of Grid Computing Volume 11 Issue 3: pp.381-–406.
    [66]
    <ref id="bibr66-1094342015594515">Vahi K, Rynge M . 2013b Rethinking data management for big data scientific workflows. In: Workshop on BigData and science: infrastructure and services, pp. pp.1-–9.
    [67]
    <ref id="bibr67-1094342015594515">Valiant LG 1990 A bridging model for parallel computation. Communications of the ACM Volume 33 Issue 8: pp.103-–111.
    [68]
    <ref id="bibr68-1094342015594515">Vöeckler JS, Mehta G . 2006 Kickstarting remote applications. In: 2nd international workshop on grid computing environments, pp. pp.1-–8.

    Cited By

    View all
    • (2023)Graph neural networks for detecting anomalies in scientific workflowsInternational Journal of High Performance Computing Applications10.1177/1094342023117214037:3-4(394-411)Online publication date: 1-Jul-2023
    • (2021)Design and Evaluation of a Simple Data Interface for Efficient Data Transfer across Diverse StorageACM Transactions on Modeling and Performance Evaluation of Computing Systems10.1145/34520076:1(1-25)Online publication date: 29-May-2021
    • (2021)SatoriProceedings of the 48th Annual International Symposium on Computer Architecture10.1109/ISCA52012.2021.00031(292-305)Online publication date: 14-Jun-2021
    • Show More Cited By

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image International Journal of High Performance Computing Applications
    International Journal of High Performance Computing Applications  Volume 31, Issue 1
    1 2017
    111 pages

    Publisher

    Sage Publications, Inc.

    United States

    Publication History

    Published: 01 January 2017

    Author Tags

    1. Performance modeling
    2. extreme scale
    3. scientific workflow

    Qualifiers

    • Research-article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)0
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 12 Aug 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2023)Graph neural networks for detecting anomalies in scientific workflowsInternational Journal of High Performance Computing Applications10.1177/1094342023117214037:3-4(394-411)Online publication date: 1-Jul-2023
    • (2021)Design and Evaluation of a Simple Data Interface for Efficient Data Transfer across Diverse StorageACM Transactions on Modeling and Performance Evaluation of Computing Systems10.1145/34520076:1(1-25)Online publication date: 29-May-2021
    • (2021)SatoriProceedings of the 48th Annual International Symposium on Computer Architecture10.1109/ISCA52012.2021.00031(292-305)Online publication date: 14-Jun-2021
    • (2019)The role of machine learning in scientific workflowsInternational Journal of High Performance Computing Applications10.1177/109434201985212733:6(1128-1139)Online publication date: 1-Nov-2019
    • (2018)A Manifesto for Future Generation Cloud ComputingACM Computing Surveys10.1145/324173751:5(1-38)Online publication date: 19-Nov-2018

    View Options

    View options

    Get Access

    Login options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media