Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3629104.3666034acmconferencesArticle/Chapter ViewAbstractPublication PagesdebsConference Proceedingsconference-collections
research-article
Open access

StreamBed: Capacity Planning for Stream Processing

Published: 22 July 2024 Publication History

Abstract

We present StreamBed, a capacity planning system for stream processing. StreamBed predicts, ahead of any production deployment, the resources that a query will require to process an incoming data rate sustainably, and the appropriate configuration of these resources. For this purpose, StreamBed builds a capacity planning model by piloting a series of runs of the target query in a small-scale, controlled testbed. We implement StreamBed for Apache Flink. Our evaluation with large-scale queries of the Nexmark benchmark demonstrates that StreamBed can accurately predict capacity requirements for jobs spanning more than 1,000 cores using a model built with a 48-core testbed.

References

[1]
Pratyush Agnihotri, Boris Koldehofe, Carsten Binnig, and Manisha Luthra. 2023. Zero-Shot Cost Models for Parallel Stream Processing. In 6th Intl. Workshop on Exploiting Artificial Intelligence Techniques for Data Management (aiDM).
[2]
Pratyush Agnihotri, Boris Koldehofe, Paul Stiegele, Roman Heinrich, Carsten Binnig, and Manisha Luthra. 2024. ZeroTune: Learned Zero-Shot Cost Models for Parallelism Tuning in Stream Processing. In ICDE 2024.
[3]
Candelieri Antonio. 2021. Sequential model based optimization of partially defined functions under unknown constraints. Journal of Global Optimization 79, 2 (2021), 281--303.
[4]
Apache Foundation. 2015. Zeppelin. https://zeppelin.apache.org
[5]
Esmail Asyabi, Yuanli Wang, John Liagouris, Vasiliki Kalavri, and Azer Bestavros. 2022. A new benchmark harness for systematic and robust evaluation of streaming state stores. In EuroSys 2022.
[6]
Raphaël Barazzutti, Thomas Heinze, André Martin, Emanuel Onica, Pascal Felber, Christof Fetzer, Zbigniew Jerzak, Marcelo Pasin, and Etienne Riviere. 2014. Elastic scaling of a high-throughput content-based publish/subscribe engine. In ICDCS 2014.
[7]
Raphaël Bolze, Franck Cappello, Eddy Caron, Michel Daydé, Frédéric Desprez, Emmanuel Jeannot, Yvon Jégou, Stephane Lanteri, Julien Leduc, Noredine Melab, et al. 2006. Grid'5000: A large scale and highly reconfigurable experimental grid testbed. The International Journal of High Performance Computing Applications 20, 4 (2006), 481--494.
[8]
Paris Carbone, Stephan Ewen, Gyula Fóra, Seif Haridi, Stefan Richter, and Kostas Tzoumas. 2017. State management in Apache Flink®: consistent stateful distributed stream processing. Proceedings of the VLDB Endowment 10, 12 (2017), 1718--1729.
[9]
Paris Carbone, Asterios Katsifodimos, Stephan Ewen, Volker Markl, Seif Haridi, and Kostas Tzoumas. 2015. Apache Flink: Stream and batch processing in a single engine. The Bulletin of the Technical Committee on Data Engineering 38, 4 (2015).
[10]
Valeria Cardellini, Francesco Lo Presti, Matteo Nardelli, and Gabriele Russo Russo. 2022. Runtime Adaptation of Data Stream Processing Systems: The State of the Art. ACM Comp. Sur. 54 (2022).
[11]
CNCF. 2014. Prometheus. https://prometheus.io/
[12]
COIN-OR Foundation. 2005. CBC. https://github.com/coin-or/Cbc
[13]
COIN-OR Foundation. 2005. PuLP. https://coin-or.github.io/pulp/
[14]
Siying Dong, Andrew Kryczka, Yanqin Jin, and Michael Stumm. 2021. RocksDB: Evolution of development priorities in a key-value store serving large-scale applications. ACM Trans. on Storage 17, 4 (2021).
[15]
Omar Farhat, Harsh Bindra, and Khuzaima Daudjee. 2020. Leaving stragglers at the window: Low-latency stream sampling with accuracy guarantees. In DEBS 2020.
[16]
Flink. 2022. Kafka Connector. https://github.com/apache/flink-connector-kafka
[17]
Apache Flink. 2023. Release 1.14.1. https://mvnrepository.com/artifact/org.apache.flink/flink-java/1.14.1.
[18]
Gyula Fóra and Mattias Andersson. 2023. Flink Kubernetes Autoscaler. https://github.com/apache/flink-kubernetes-operator/tree/main/flink-kubernetes-operator-autoscaler.
[19]
Vincenzo Gulisano, Yiannis Nikolakopoulos, Marina Papatriantafilou, and Philippas Tsigas. 2016. Scalejoin: A deterministic, disjoint-parallel and skew-resilient stream join. IEEE Trans. on Big Data 7, 2 (2016).
[20]
Tim Head et al. 2021. Scikit-optimize. https://zenodo.org/record/5565057
[21]
Roman Heinrich, Manisha Luthra, Harald Kornmayer, and Carsten Binnig. 2022. Zero-shot cost models for distributed stream processing. In DEBS 2022. 85--90.
[22]
Sören Henning and Wilhelm Hasselbring. 2022. A configurable method for benchmarking scalability of cloud-native applications. Empirical Software Engineering 27 (2022).
[23]
Antony S Higginson, Mihaela Dediu, Octavian Arsene, Norman W Paton, and Suzanne M Embury. 2020. Database workload capacity planning using time series analysis and ML. In SIGMOD 2020.
[24]
Vasiliki Kalavri, John Liagouris, Moritz Hoffmann, Desislava Dimitrova, Matthew Forshaw, and Timothy Roscoe. 2018. Three steps is all you need: fast, accurate, automatic scaling decisions for distributed streaming dataflows. In OSDI 2018.
[25]
Faria Kalim et al. 2019. Caladrius: A performance modelling service for distributed stream processing systems. In ICDE 2019.
[26]
Jeyhun Karimov, Tilmann Rabl, Asterios Katsifodimos, Roman Samarev, Henri Heiskanen, and Volker Markl. 2018. Benchmarking distributed stream data processing systems. In ICDE 2018.
[27]
Alireza Khoshkbarforoushha, Rajiv Ranjan, Raj Gaire, Ehsan Abbasnejad, Lizhe Wang, and Albert Y Zomaya. 2016. Distribution based workload modelling of continuous queries in clouds. IEEE Transactions on Emerging Topics in Computing 5, 1 (2016), 120--133.
[28]
Teng Li, Jian Tang, and Jielong Xu. 2016. Performance modeling and predictive scheduling for distributed stream data processing. IEEE Transactions on Big Data 2, 4 (2016), 353--364.
[29]
Ruirui Lu, Gang Wu, Bin Xie, and Jingtong Hu. 2014. StreamBench: Towards benchmarking modern distributed stream computing frameworks. In UCC 2014.
[30]
Maximilian Michels. 2023. FLIP-271: Autoscaling. https://cwiki.apache.org/confluence/display/FLINK/FLIP-271%3A+Autoscaling.
[31]
Andrew Y Ng. 1997. Preventing "overfitting" of cross-validation data. In ICML 1997.
[32]
Fabian Paul. 2020. Towards a Flink Autopilot in Ververica Platform. https://www.youtube.com/watch?v=5HvGo5vRxRA.
[33]
F. Pedregosa et al. 2011. Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research 12 (2011).
[34]
Benjamin JJ Pfister, Wolf S Lickefett, Jan Nitschke, Sumit Paul, Morgan K Geldenhuys, Dominik Scheinert, Kordian Gontarska, and Lauritz Thamsen. 2022. Rafiki: Task-Level Capacity Planning in Distributed Stream Processing Systems. In Euro-Par 2021 workshops.
[35]
Tilmann Rabl, Jonas Traub, Asterios Katsifodimos, and Volker Markl. 2016. Apache Flink in current research. it-Inf. Tech. 58, 4 (2016).
[36]
Henriette Röger and Ruben Mayer. 2019. A comprehensive survey on parallelization and elasticity in stream processing. ACM Computing Surveys (CSUR) 52, 2 (2019), 1--37.
[37]
Gabriele Russo Russo, Valeria Cardellini, Giuliano Casale, and Francesco Lo Presti. 2021. MEAD: Model-based vertical auto-scaling for data stream processing. In CCGrid 2021.
[38]
Subham Sahoo, Christoph Lampert, and Georg Martius. 2018. Learning equations for extrapolation and control. In ICML 2018.
[39]
Anupam Sanghi, Shadab Ahmed, and Jayant R Haritsa. 2022. Projection-compliant database generation. Proceedings of the VLDB Endowment 15, 5 (2022), 998--1010.
[40]
Anand Sanmukhani. 2019. Python Prometheus API client. https://github.com/4n4nd/prometheus-api-client-python
[41]
Rayman Preet Singh, Bharath Kumarasubramanian, Prateek Maheshwari, and Samarth Shetty. 2020. Auto-sizing for stream processing applications at LinkedIn. In HotCloud 2020.
[42]
Spotify. 2023. Kubernetes Operator for Apache Flink. https://github.com/spotify/flink-on-k8s-operator.
[43]
Strimzi. 2023. Strimzi provides a way to run an Apache Kafka cluster on Kubernetes in various deployment configurations. https://strimzi.io.
[44]
Tri Minh Truong, Aaron Harwood, Richard O Sinnott, and Shiping Chen. 2018. Performance analysis of large-scale distributed stream processing systems on the cloud. In IEEE Cloud 2018.
[45]
Pete Tucker, Kristin Tufte, Vassilis Papadimos, and David Maier. 2010. NEXMark-a benchmark for queries over data streams. Technical Report. OGI School of Science & Engineering at OHSU.
[46]
Juliane Verwiebe, Philipp M Grulich, Jonas Traub, and Volker Markl. 2023. Survey of window types for aggregation in stream processing systems. The VLDB Journal (2023), 1--27.
[47]
Wei Zhang, Ningxin Zheng, Quan Chen, Yong Yang, Zhuo Song, Tao Ma, Jingwen Leng, and Minyi Guo. 2020. URSA: Precise capacity planning and fair scheduling based on low-level statistics for public clouds. In ICPP 2020.
[48]
Beiji Zou, Tao Zhang, Chengzhang Zhu, Ling Xiao, Meng Zeng, and Zhi Chen. 2022. Alps: An Adaptive Load Partitioning Scaling Solution for Stream Processing System on Skewed Stream. In DEXA 2022.

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
DEBS '24: Proceedings of the 18th ACM International Conference on Distributed and Event-based Systems
June 2024
239 pages
ISBN:9798400704437
DOI:10.1145/3629104
This work is licensed under a Creative Commons Attribution International 4.0 License.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 22 July 2024

Check for updates

Author Tags

  1. Flink
  2. capacity planning
  3. stream processing

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Funding Sources

  • SWP Wallonia

Conference

DEBS '24

Acceptance Rates

DEBS '24 Paper Acceptance Rate 15 of 30 submissions, 50%;
Overall Acceptance Rate 145 of 583 submissions, 25%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 96
    Total Downloads
  • Downloads (Last 12 months)96
  • Downloads (Last 6 weeks)38
Reflects downloads up to 09 Nov 2024

Other Metrics

Citations

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media