Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3524860.3539639acmconferencesArticle/Chapter ViewAbstractPublication PagesdebsConference Proceedingsconference-collections
short-paper

Zero-shot cost models for distributed stream processing

Published: 15 July 2022 Publication History

Abstract

This paper proposes a learned cost estimation model for Distributed Stream Processing Systems (DSPS) with an aim to provide accurate cost predictions of executing queries. A major premise of this work is that the proposed learned model can generalize to the dynamics of streaming workloads out-of-the-box. This means a model once trained can accurately predict performance metrics such as latency and throughput even if the characteristics of the data and workload or the deployment of operators to hardware changes at runtime. That way the model can be used to solve tasks such as optimizing the placement of operators to minimize the end-to-end latency of a streaming query or maximize its throughput even under varying conditions. Our evaluation on a well-known DSPS, Apache Storm, shows that the model can predict accurately for unseen workloads and queries while generalizing across real-world benchmarks.

References

[1]
Z. Shao, "Real-time analytics at facebook," XLDB, 2011.
[2]
M. Nardelli, V. Cardellini, V. Grassi, and F. L. Presti, "Efficient operator placement for distributed data stream processing applications," IEEE TPDS, vol. 30, no. 8, pp. 1753--1767, 2019.
[3]
G. R. Russo, V. Cardellini, and F. L. Presti, "Reinforcement learning based policies for elastic stream processing on heterogeneous resources," in ACM DEBS, 2019, p. 31--42.
[4]
M. Hirzel, R. Soulé, S. Schneider, B. Gedik, and R. Grimm, "A catalog of stream processing optimizations," ACM Computing Surveys, vol. 46, no. 4, 2014.
[5]
L. Eskandari, J. Mair, Z. Huang, and D. Eyers, "I-scheduler: Iterative scheduling for distributed stream processing systems," Future Generation Computing Systems, vol. 117, pp. 219--233, 2021.
[6]
G. Mencagli, P. Dazzi, and N. Tonci, "Spinstreams: A static optimization tool for data stream processing applications," in ACM Middleware, 2018, p. 66--79.
[7]
M. Luthra, B. Koldehofe, N. Danger, P. Weisenberger, G. Salvaneschi, and I. Stavrakakis, "Tcep: Transitions in operator placement to adapt to dynamic network environments," Journal of Computer and System Sciences, vol. 122, pp. 94--125, 2021.
[8]
T. Li, Z. Xu, J. Tang, and Y. Wang, "Model-free control for distributed stream data processing using deep reinforcement learning," PVLDB, vol. 11, no. 6, p. 705--718, 2018.
[9]
C. Wang, X. Meng, Q. Guo, Z. Weng, and C. Yang, "Automating characterization deployment in distributed data stream management systems," IEEE TKDE, vol. 29, no. 12, pp. 2669--2681, 2017.
[10]
A. Alnafessah, G. Russo Russo, V. Cardellini, G. Casale, and F. Lo Presti, AI-Driven Performance Management in Data-Intensive Applications, 2021, pp. 199--222.
[11]
G. Li, X. Zhou, and L. Cao, "Machine learning for databases," ser. AIMLSystems, 2021.
[12]
B. Hilprecht and C. Binnig, "Zero-shot cost models for out-of-the-box learned cost prediction," 2022, arXiv. [Online]. Available: https://arxiv.org/abs/2201.00561
[13]
B. Hilprecht and C. Binnig, "One Model to Rule them All: Towards Zero-Shot Learning for Databases," in CIDR, 2022.
[14]
J. Karimov, T. Rabl, A. Katsifodimos, R. Samarev, H. Heiskanen, and V. Markl, "Benchmarking distributed stream data processing systems," in ICDE, 2018, pp. 1507--1518.
[15]
P. Carbone, A. Katsifodimos, S. Ewen, V. Markl, S. Haridi, and K. Tzoumas, "Apache flink: Stream and batch processing in a single engine," IEEE Data Eng. Bull., vol. 38, no. 4, pp. 28--38, 2015.
[16]
G. Cugola and A. Margara, "Processing flows of information: From data stream to complex event processing," ACM Computing Surveys, vol. 44, no. 3, 2012.
[17]
B. Hilprecht, A. Schmidt, M. Kulessa, A. Molina, K. Kersting, and C. Binnig, "Deepdb: Learn from data, not from queries!" PVLDB, vol. 13, no. 7, pp. 992--1005, 2020.
[18]
M. V. Bordin, D. Griebler, G. Mencagli, C. F. R. Geyer, and L. G. L. Fernandes, "Dsp-bench: A suite of benchmark applications for distributed data stream processing systems," IEEE Access, vol. 8, pp. 222 900--222 917, 2020.
[19]
A. Koliousis, M. Weidlich, R. Castro Fernandez, A. L. Wolf, P. Costa, and P. Pietzuch, "Saber: Window-based hybrid stream processing for heterogeneous architectures," in ACM SIGMOD, 2016, p. 555--569.
[20]
T. De Matteis and G. Mencagli, "Elastic scaling for distributed latency-sensitive data stream operators," in PDP, 2017, pp. 61--68.
[21]
L. Aniello, R. Baldoni, and L. Querzoni, "Adaptive online scheduling in storm," in ACM DEBS, 2013, p. 207--218.
[22]
B. Chandramouli, J. Goldstein, R. Barga, M. Riedewald, and I. Santos, "Accurate latency estimation in a distributed event processing system," in ICDE, 2011, p. 255--266.
[23]
A. Toshniwal, S. Taneja, A. Shukla, K. Ramasamy, J. M. Patel, S. Kulkarni, J. Jackson, K. Gade, M. Fu, J. Donham, N. Bhagat, S. Mittal, and D. Ryaboy, "Storm@twitter," in ACM SIGMOD, 2014, p. 147--156.
[24]
A. Alexandrov, R. Bergmann, S. Ewen, J.-C. Freytag, F. Hueske, A. Heise, O. Kao, M. Leich, U. Leser, V. Markl, F. Naumann, M. Peters, A. Rheinländer, M. J. Sax, S. Schelter, M. Höger, K. Tzoumas, and D. Warneke, "The stratosphere platform for big data analytics," VLDBJ, vol. 23, no. 6, p. 939--964, 2014.
[25]
S. Imai, S. Patterson, and C. A. Varela, "Maximum sustainable throughput prediction for data stream processing over public clouds," in IEEE/ACM CCGRID, 2017, pp. 504--513.
[26]
F. Lombardi, L. Aniello, S. Bonomi, and L. Querzoni, "Elastic symbiotic scaling of operators and resources in stream processing systems," IEEE TPDS, vol. 29, no. 3, pp. 572--585, 2018.

Cited By

View all
  • (2024)StreamBed: Capacity Planning for Stream ProcessingProceedings of the 18th ACM International Conference on Distributed and Event-based Systems10.1145/3629104.3666034(90-102)Online publication date: 24-Jun-2024
  • (2024)ZeroTune: Learned Zero-Shot Cost Models for Parallelism Tuning in Stream Processing.2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00163(2040-2053)Online publication date: 13-May-2024

Index Terms

  1. Zero-shot cost models for distributed stream processing

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    DEBS '22: Proceedings of the 16th ACM International Conference on Distributed and Event-Based Systems
    June 2022
    210 pages
    ISBN:9781450393089
    DOI:10.1145/3524860
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 15 July 2022

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. cost models
    2. stream processing
    3. zero-shot learning

    Qualifiers

    • Short-paper

    Funding Sources

    • German Research Foundation (DFG) Collaborative Research Center (CRC) 1053 MAKI
    • hessian.AI
    • NHR4CES

    Conference

    DEBS '22

    Acceptance Rates

    DEBS '22 Paper Acceptance Rate 10 of 19 submissions, 53%;
    Overall Acceptance Rate 145 of 583 submissions, 25%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)52
    • Downloads (Last 6 weeks)8
    Reflects downloads up to 13 Jan 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)StreamBed: Capacity Planning for Stream ProcessingProceedings of the 18th ACM International Conference on Distributed and Event-based Systems10.1145/3629104.3666034(90-102)Online publication date: 24-Jun-2024
    • (2024)ZeroTune: Learned Zero-Shot Cost Models for Parallelism Tuning in Stream Processing.2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00163(2040-2053)Online publication date: 13-May-2024

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media