Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/2188286.2188347acmconferencesArticle/Chapter ViewAbstractPublication PagesicpeConference Proceedingsconference-collections
research-article

Hirundo: a mechanism for automated production of optimized data stream graphs

Published: 22 April 2012 Publication History

Abstract

Stream programs have to be crafted carefully to maximize the performance gain that can be obtained from stream processing environments. Manual fine tuning of a stream program is a very difficult process which requires considerable amount of programmer time and expertise. In this paper we present Hirundo, which is a mechanism for automatically generating optimized stream programs that are tailored for the environment they run. Hirundo analyzes, identifies the structure of a stream program, and transforms it to many different sample programs with same semantics using the notions of Tri-Operator Transformation, Transformer Blocks, and Operator Blocks Fusion. Then it uses empirical optimization information to identify a small subset of generated sample programs that could deliver high performance. It runs the selected sample programs in the run-time environment for a short period of time to obtain their performance information. Hirundo utilizes these information to output a ranked list of optimized stream programs that are tailored for a particular run-time environment. Hirundo has been developed using Python as a prototype application for optimizing SPADE programs, which run on System S stream processing run-time. Using three example real world stream processing applications we demonstrate effectiveness of our approach, and discuss how well it generalizes for automatic stream program performance optimization.

References

[1]
R. Ahmed and et al. Cost-based query transformation in oracle. VLDB '06, pages 1026--1036, 2006.
[2]
A. W. Appel. Modern compiler implementation in Java. Cambridge University Press, 2002.
[3]
A. Arasu, S. Babu, and J. Widom. The cql continuous query language: semantic foundations and query execution. The VLDB Journal, 15:121--142, June 2006.
[4]
S. Babu. Towards automatic optimization of mapreduce programs. SoCC '10, pages 137--142, 2010.
[5]
C. Ballard and et al. IBM Infosphere Streams: Harnessing Data in Motion. IBM, 2010.
[6]
P. Banerjee and et al. The paradigm compiler for distributed-memory multicomputers. Computer, 28:37--47, Oct 1995.
[7]
S. Bellamkonda and et al. Enhanced subquery optimizations in oracle. Proc. VLDB Endow., 2:1366--1377, August 2009.
[8]
B. Chapman, H. Herbeck, and H. Zima. Automatic support for data distribution. In DMCC, pages 51--58, May 1991.
[9]
D. Cook. Gold parsing system. URL: http://www.goldparser.org/, Dec. 2011.
[10]
M. Dayarathna, S. Takeno, and T. Suzumura. A performance study on operator-based stream processing systems. In IEEE IISWC, 2011.
[11]
B. Gedik, H. Andrade, and K.-L. Wu. A code generation approach to optimizing high-performance distributed data stream processing. In CIKM '09, pages 847--856, 2009.
[12]
B. Gedik and et al. Spade: the system s declarative stream processing engine. In SIGMOD '08, pages 1123--1134, 2008.
[13]
M. Hall and et al. Loop transformation recipes for code generation and auto-tuning. In Languages and Compilers for Parallel Computing, pages 50--64. 2010.
[14]
H. Herodotou and et al. Query optimization techniques for partitioned tables. SIGMOD '11, pages 49--60, 2011.
[15]
M. Hirzel and et al. Spl stream processing language specification. Nov 2009.
[16]
IBM. Ibm infosphere streams version 1.2: Programming model and language reference. Feb 2010.
[17]
N. Kabra and D. J. DeWitt. Efficient mid-query re-optimization of sub-optimal query execution plans. SIGMOD '98, pages 106--117, 1998.
[18]
R. Khandekar and et al. Cola: Optimizing stream processing applications via graph partitioning. In Middleware 2009, pages 308--327. 2009.
[19]
C. S. Liew and et al. Towards optimising distributed data streaming graphs using parallel streams. In HPDC '10, pages 725--736, 2010.
[20]
S. Marsland. Machine Learning : An Algorithmic Perspective. Chapman & Hall/CRC, 2009.
[21]
L. Neumeyer, B. Robbins, A. Nair, and A. Kesari. S4: Distributed stream computing platform. In KDCloud 2010, December 2010.
[22]
D. Palermo, E. Hodges, and P. Banerjee. Compiler optimization of dynamic data distributions for distributed-memory multicomputers. In Compiler Optimizations for Scalable Parallel Systems, volume 1808, pages 445--484. 2001.
[23]
J. Qin and et al. A novel graph based approach for automatic composition of high quality grid workflows. In HPDC '09, pages 167--176, 2009.
[24]
R. Rea and K. Mamidipaka. Ibm infosphere streams: Enabling complex analytics with ultra-low latencies on data in motion. May 2009.
[25]
Scipy. Scientific tools for python. URL: http://www.scipy.org/, 2011.
[26]
S. Sodhi, J. Subhlok, and Q. Xu. Performance prediction with skeletons. Cluster Computing, 11:151--165, 2008.
[27]
T. Suzumura, T. Yasue, and T. Onodera. Scalable performance of system s for extract-transform-load processing. In SYSTOR '10, pages 7:1--7:14, 2010.
[28]
Z. Wang and M. F. O'Boyle. Partitioning streaming parallelism for multi-cores: a machine learning based approach. In PACT '10, pages 307--318, 2010.
[29]
G. Yaikhom and et al. Federated enactment of workflow patterns. In Euro-Par 2010 - Parallel Processing, volume 6271, pages 317--328. 2010.
[30]
L. T. Yang, X. Ma, and F. Mueller. Cross-platform performance prediction of parallel applications using partial execution. In SC '05, Washington, DC, USA, 2005.
[31]
D. F. Yuan Yu, Michael Isard and M. Budiu. Dryadlinq: A system for general-purpose distributed data-parallel computing using a high-level language. OSDI '08, pages 1--14, 2008.
[32]
X. J. Zhang and et al. Workload characterization for operator-based distributed stream processing applications. In DEBS '10, pages 235--247, 2010.

Cited By

View all
  • (2021)Resource aware scheduler for distributed stream processing in cloud native environmentsConcurrency and Computation: Practice and Experience10.1002/cpe.637333:20Online publication date: 20-May-2021
  • (2020)Multiple stream job performance optimization with source operator graph transformationsConcurrency and Computation: Practice and Experience10.1002/cpe.565832:16Online publication date: 6-Jan-2020
  • (2019)Latency-Aware Secure Elastic Stream Processing with Homomorphic EncryptionData Science and Engineering10.1007/s41019-019-00100-5Online publication date: 12-Sep-2019
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
ICPE '12: Proceedings of the 3rd ACM/SPEC International Conference on Performance Engineering
April 2012
362 pages
ISBN:9781450312028
DOI:10.1145/2188286
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 22 April 2012

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. data-intensive computing
  2. fault tolerance
  3. performance optimization
  4. scalability
  5. stream processing

Qualifiers

  • Research-article

Conference

ICPE'12
Sponsor:

Acceptance Rates

Overall Acceptance Rate 252 of 851 submissions, 30%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 21 Sep 2024

Other Metrics

Citations

Cited By

View all
  • (2021)Resource aware scheduler for distributed stream processing in cloud native environmentsConcurrency and Computation: Practice and Experience10.1002/cpe.637333:20Online publication date: 20-May-2021
  • (2020)Multiple stream job performance optimization with source operator graph transformationsConcurrency and Computation: Practice and Experience10.1002/cpe.565832:16Online publication date: 6-Jan-2020
  • (2019)Latency-Aware Secure Elastic Stream Processing with Homomorphic EncryptionData Science and Engineering10.1007/s41019-019-00100-5Online publication date: 12-Sep-2019
  • (2018)Recent Advancements in Event ProcessingACM Computing Surveys10.1145/317043251:2(1-36)Online publication date: 13-Feb-2018
  • (2018)Automatic optimization of stream programs via source program operator graph transformationsDistributed and Parallel Databases10.1007/s10619-013-7130-x31:4(543-599)Online publication date: 27-Dec-2018
  • (2013)A Mechanism for Stream Program Performance Recovery in Resource Limited Compute ClustersDatabase Systems for Advanced Applications10.1007/978-3-642-37450-0_12(164-178)Online publication date: 2013

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media