Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/2567634.2567637acmconferencesArticle/Chapter ViewAbstractPublication PagesppoppConference Proceedingsconference-collections
research-article

Active workflow system for near real-time extreme-scale science

Published: 16 February 2014 Publication History

Abstract

In recent years, streaming-based data processing has been gaining substantial traction for dealing with overwhelming data generated by real-time applications, from both enterprise sources and scientific computing. In this work, however, we look at an emerging class of scientific data with Near Real-Time (NRT) requirement, in which data is typically generated in a bursty fashion with the near real-time constraints being applied primarily between bursts, rather than within a stream. A key challenge for this types of data sources is that the processing time per data element is not uniform, and not always feasible to predict. Given the observations on the increasing unpredictability of compute load and system dynamics, this work looks to adapt streaming-based approach to the context of this new class of large experiments and simulations that have complex run-time control and analysis issues.
In particular, we deploy a novel two-tier scheme for handling the increasing unpredictability of runtime behaviors: Instead of relying on determining what and where to run the scientific workflows beforehand or partial dynamically, the decision will also be adaptively enhanced online according to system runtime status. This is enabled by embedding workflow along with data streams. Specifically, we break data outputs generated from experiments or simulations into multiple self-describing "chunks", which we call active data objects. As such, if there is a transient hotspot observed, a data object with unfinished workflow pipeline can break its previous schedule and search for a least loaded location to continue the execution. Our preliminary experiment results based on synthetic workloads demonstrate the proposed active workflow system as a very promising solution by outperforming the state-of-the-art semi-dynamic workflow schedulers with an improved workflow completion time, as well as a good scalability.

References

[1]
Streambase systems. http://www.streambase.com.
[2]
A Threshold Selection Method from Gray-Level Histograms. IEEE Trans. Syst. Man Cybern, 9(1):62--66, 1979.
[3]
Harnessing Data in Motion. Technical report, IBM, 2010.
[4]
D. J. Abadi, D. Carney, U. Çetintemel, M. Cherniack, C. Convey, S. Lee, M. Stonebraker, N. Tatbul, and S. Zdonik. Aurora: A New Model and Architecture for Data Stream Management. The VLDB Journal, 12(2):120--139, Aug. 2003.
[5]
D. J. Abadi, Y. Ahmad, M. Balazinska, M. Cherniack, J. hyon Hwang, W. Lindner, A. S. Maskey, E. Rasin, E. Ryvkina, N. Tatbul, Y. Xing, and S. Zdonik. The design of the borealis stream processing engine. In CIDR, 2005.
[6]
L. Amini, H. Andrade, R. Bhagwan, F. Eskesen, R. King, Y. Park, and C. Venkatramani. SPC: A distributed, scalable platform for data mining. In DM-SSP, 2006.
[7]
A. Arasu, B. Babcock, S. Babu, J. Cieslewicz, K. Ito, R. Motwani, U. Srivastava, and J. Widom. STREAM: The stanford data stream management system. Springer, 2004.
[8]
B. Babcock, S. Babu, M. Datar, R. Motwani, and D. Thomas. Operator Scheduling in Data Stream Systems. The VLDB Journal, 2004.
[9]
S. Chandrasekaran, O. Cooper, A. Deshpande, M. J. Franklin, J. M. Hellerstein, W. Hong, S. Krishnamurthy, S. R. Madden, F. Reiss, and M. A. Shah. TelegraphCQ: Continuous Dataflow Processing. In SIGMOD, 2003.
[10]
J. Dayal, J. Cao, G. Eisenhauer, K. Schwan, M. Wolf, F. Zheng, H. Abbasi, S. Klasky, N. Podhorszki, and J. Lofstead. I/O Containers: Managing the Data Analytics and Visualization Pipelines of High End Codes. In IPDPSW, 2013.
[11]
G. Eisenhauer, M. Wolf, H. Abbasi, and K. Schwan. Eventbased systems: opportunities and challenges at exascale. In DEBS, 2009.
[12]
G. Eisenhauer, M.Wolf, H. Abbasi, S. Klasky, and K. Schwan. A Type System for High Performance Communication and Computation. In eScienceW, 2011.
[13]
R. Khandekar, K. Hildrum, S. Parekh, D. Rajan, J. Wolf, K.-L. Wu, H. Andrade, and B. Gedik. COLA: Optimizing Stream Processing Applications via Graph Partitioning. In Middleware, 2009.
[14]
G. Lee et al. Design and construction of the KSTAR tokamak. Nuclear Fusion, 41(10):1515, 2001.
[15]
Q. Liu, N. Podhorszki, J. Logan, and S. Klasky. Runtime I/O Re-Routing + Throttling on HPC Storage. HotStorage, 2013.
[16]
A. Marshall, P. Venkateswaran, J. Seitzman, and T. Lieuwen. Measurements of Leading Point Conditioned Statistics of High Hydrogen Content Fuels. In The 8th U.S. National Com-bustion Meeting, 2013.
[17]
S. Plimpton, R. Pollock, and M. Stevens. Particle-Mesh Ewald and rRESPA for Parallel Molecular Dynamics Simulations. In PPSC. SIAM, 1997. ISBN 0-89871-395-1.
[18]
T. Poinsot and D. Veynante. Theoretical and Numerical Combustion. R.T. Edwards.
[19]
S. Schneider, H. Andrade, B. Gedik, A. Biem, and K.-L. Wu. Elastic scaling of data parallel operators in stream processing. In IPDPS, 2009.
[20]
W. Thies, M. Karczmarek, and S. P. Amarasinghe. StreamIt: A Language for Streaming Applications. In CC, 2002.
[21]
J.Wolf, N. Bansal, K. Hildrum, S. Parekh, D. Rajan, R.Wagle, K.-L.Wu, and L. Fleischer. SODA: An Optimizing Scheduler for Large-scale Stream-based Distributed Computer Systems. In Middleware, 2008.
[22]
Y. Xing. Load Management Techniques for Distributed Stream Processing. PhD thesis, 2006.

Cited By

View all
  • (2018)A lightweight model for right-sizing master-worker applicationsProceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis10.5555/3291656.3291708(1-13)Online publication date: 11-Nov-2018
  • (2018)A lightweight model for right-sizing master-worker applicationsProceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis10.1109/SC.2018.00042(1-13)Online publication date: 11-Nov-2018
  • (2015)Co-sitesProceedings of the 10th Workshop on Workflows in Support of Large-Scale Science10.1145/2822332.2822337(1-11)Online publication date: 15-Nov-2015

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
PPAA '14: Proceedings of the first workshop on Parallel programming for analytics applications
February 2014
72 pages
ISBN:9781450326544
DOI:10.1145/2567634
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 16 February 2014

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. distributed workflow scheduler
  2. load balancing
  3. near real-time science
  4. scientific workflow system
  5. stream processing
  6. system dynamics

Qualifiers

  • Research-article

Conference

PPoPP '14
Sponsor:

Acceptance Rates

PPAA '14 Paper Acceptance Rate 6 of 7 submissions, 86%;
Overall Acceptance Rate 6 of 7 submissions, 86%

Upcoming Conference

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 03 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2018)A lightweight model for right-sizing master-worker applicationsProceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis10.5555/3291656.3291708(1-13)Online publication date: 11-Nov-2018
  • (2018)A lightweight model for right-sizing master-worker applicationsProceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis10.1109/SC.2018.00042(1-13)Online publication date: 11-Nov-2018
  • (2015)Co-sitesProceedings of the 10th Workshop on Workflows in Support of Large-Scale Science10.1145/2822332.2822337(1-11)Online publication date: 15-Nov-2015

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media