Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3447545.3451190acmconferencesArticle/Chapter ViewAbstractPublication PagesicpeConference Proceedingsconference-collections
short-paper

How to Measure Scalability of Distributed Stream Processing Engines?

Published: 19 April 2021 Publication History

Abstract

Scalability is promoted as a key quality feature of modern big data stream processing engines. However, even though research made huge efforts to provide precise definitions and corresponding metrics for the term scalability, experimental scalability evaluations or benchmarks of stream processing engines apply different and inconsistent metrics. With this paper, we aim to establish general metrics for scalability of stream processing engines. Derived from common definitions of scalability in cloud computing, we propose two metrics: a load capacity function and a resource demand function. Both metrics relate provisioned resources and load intensities, while requiring specific service level objectives to be fulfilled. We show how these metrics can be employed for scalability benchmarking and discuss their advantages in comparison to other metrics, used for stream processing engines and other software systems.

References

[1]
A. B. Bondi. 2000. Characteristics of Scalability and Their Impact on Performance. In Proc. International Workshop on Software and Performance . https://doi.org/10.1145/350391.350432
[2]
G. Brataas, N. Herbst, S. Ivanek, and J. Polutnik. 2017. Scalability Analysis of Cloud Software Services. In Proc. International Conference on Autonomic Computing . https://doi.org/10.1109/ICAC.2017.34
[3]
P. Carbone, A. Katsifodimos, S. Ewen, V. Markl, S. Haridi, and K. Tzoumas. 2015. Apache Flink: Stream and batch processing in a single engine. Bulletin of the IEEE Computer Society Technical Committee on Data Engineering, Vol. 36, 4 (2015).
[4]
L. Duboc, D. Rosenblum, and T. Wicks. 2007. A Framework for Characterization and Analysis of Software System Scalability. In Proc. ESEC/FSE . https://doi.org/10.1145/1287624.1287679
[5]
M. Fragkoulis, P. Carbone, V. Kalavri, and A. Katsifodimos. 2020. A Survey on the Evolution of Stream Processing Systems. arxiv: 2008.00842 [cs.DC]
[6]
N. J. Gunther, P. Puglia, and K. Tomasette. 2015. Hadoop Superlinear Scalability. Commun. ACM, Vol. 58, 4 (2015). https://doi.org/10.1145/2719919
[7]
Sören Henning and Wilhelm Hasselbring. 2021. Theodolite: Scalability Benchmarking of Distributed Stream Processing Engines in Microservice Architectures. Big Data Research, Vol. 25 (2021), 100209. https://doi.org/10.1016/j.bdr.2021.100209
[8]
N. R. Herbst, S. Kounev, and R. Reussner. 2013. Elasticity in Cloud Computing: What It Is, and What It Is Not. In Proc. Int. Conference on Autonomic Computing .
[9]
P. Jogalekar and M. Woodside. 2000. Evaluating the scalability of distributed systems. IEEE Transactions on Parallel and Distributed Systems, Vol. 11, 6 (2000). https://doi.org/10.1109/71.862209
[10]
Z. Karakaya, A. Yazici, and M. Alayyoub. 2017. A Comparison of Stream Processing Frameworks. In Proc. International Conference on Computer and Applications . https://doi.org/10.1109/COMAPP.2017.8079733
[11]
J. Karimov, T. Rabl, A. Katsifodimos, R. Samarev, H. Heiskanen, and V. Markl. 2018. Benchmarking Distributed Stream Data Processing Systems. In Proc. International Conference on Data Engineering . https://doi.org/10.1109/ICDE.2018.00169
[12]
D. Kossmann, T. Kraska, and S. Loesing. 2010. An Evaluation of Alternative Architectures for Transaction Processing in the Cloud. In Proc. SIGMOD International Conference on Management of Data . https://doi.org/10.1145/1807167.1807231
[13]
J. Kuhlenkamp, M. Klems, and O. Röss. 2014. Benchmarking Scalability and Elasticity of Distributed Database Systems. Proc. VLDB Endow., Vol. 7, 12 (2014). https://doi.org/10.14778/2732977.2732995
[14]
S. Lehrig, H. Eikerling, and S. Becker. 2015. Scalability, elasticity, and efficiency in cloud computing: A systematic literature review of definitions and metrics. In Int. Conf. Quality of Software Architectures . https://doi.org/10.1145/2737182.2737185
[15]
H. Nasiri, S. Nasehi, and M. Goudarzi. 2019. Evaluation of distributed stream processing frameworks for IoT applications in Smart Cities. Journal of Big Data, Vol. 6, 52 (2019). https://doi.org/10.1186/s40537-019-0215--2
[16]
R. Sanders, G. Brataas, M. Cecowski, K. Haslum, S. Ivanek, J. Polutnik, and B. Viken. 2015. CloudStore -- Towards Scalability Benchmarking in Cloud Computing. Procedia Comput. Sci., Vol. 68 (2015). https://doi.org/10.1016/j.procs.2015.09.225
[17]
M. J. Sax, G. Wang, M. Weidlich, and J.-C. Freytag. 2018. Streams and Tables: Two Sides of the Same Coin. In Proc. International Workshop on Real-Time Business Intelligence and Analytics . https://doi.org/10.1145/3242153.3242155
[18]
A. Toshniwal, S. Taneja, A. Shukla, K. Ramasamy, J. M. Patel, S. Kulkarni, J. Jackson, K. Gade, M. Fu, J. Donham, N. Bhagat, S. Mittal, and D. Ryaboy. 2014. Storm@twitter. In Proc. SIGMOD International Conference on Management of Data . https://doi.org/10.1145/2588555.2595641
[19]
Vikash, L. Mishra, and S. Varma. 2020. Performance evaluation of real-time stream processing systems for Internet of Things applications. Future Generation Computer Systems, Vol. 113 (2020). https://doi.org/10.1016/j.future.2020.07.012
[20]
A. Weber, N. Herbst, H. Groenda, and S. Kounev. 2014. Towards a Resource Elasticity Benchmark for Cloud Environments. In Proc. International Workshop on Hot Topics in Cloud Service Scalability . https://doi.org/10.1145/2649563.2649571
[21]
M. Zaharia, T. Das, H. Li, T. Hunter, S. Shenker, and I. Stoica. 2013. Discretized Streams: Fault-Tolerant Streaming Computation at Scale. In Proc. Symposium on Operating Systems Principles . https://doi.org/10.1145/2517349.2522737

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
ICPE '21: Companion of the ACM/SPEC International Conference on Performance Engineering
April 2021
198 pages
ISBN:9781450383318
DOI:10.1145/3447545
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 19 April 2021

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. cloud computing
  2. metrics
  3. scalability
  4. stream processing

Qualifiers

  • Short-paper

Funding Sources

Conference

ICPE '21

Acceptance Rates

Overall Acceptance Rate 252 of 851 submissions, 30%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)27
  • Downloads (Last 6 weeks)1
Reflects downloads up to 28 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Benchmarking scalability of stream processing frameworks deployed as microservices in the cloudJournal of Systems and Software10.1016/j.jss.2023.111879208:COnline publication date: 4-Mar-2024
  • (2023)Towards a Benchmark for Fog Data Processing2023 IEEE International Conference on Cloud Engineering (IC2E)10.1109/IC2E59103.2023.00018(92-98)Online publication date: 25-Sep-2023
  • (2023)Comparing the Scalability of Communication Networks and SystemsIEEE Access10.1109/ACCESS.2023.331420111(101474-101497)Online publication date: 2023
  • (2023)Micro-batch and data frequency for stream processing on multi-coresThe Journal of Supercomputing10.1007/s11227-022-05024-y79:8(9206-9244)Online publication date: 9-Jan-2023
  • (2022)Evaluating Micro-batch and Data Frequency for Stream Processing Applications on Multi-cores2022 30th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP)10.1109/PDP55904.2022.00011(10-17)Online publication date: Mar-2022
  • (2022)Demo Paper: Benchmarking Scalability of Cloud-Native Applications with Theodolite2022 IEEE International Conference on Cloud Engineering (IC2E)10.1109/IC2E55432.2022.00037(275-276)Online publication date: Oct-2022
  • (2022)Scalable Collaborative Software Visualization as a Service: Short Industry and Experience Paper2022 IEEE International Conference on Cloud Engineering (IC2E)10.1109/IC2E55432.2022.00026(182-187)Online publication date: Oct-2022
  • (2022)Streaming vs. Functions: A Cost Perspective on Cloud Event Processing2022 IEEE International Conference on Cloud Engineering (IC2E)10.1109/IC2E55432.2022.00015(67-78)Online publication date: Oct-2022
  • (2022)Distributed Streaming Computing Mode Based on Fast Message Mechanism2022 2nd Asia-Pacific Conference on Communications Technology and Computer Science (ACCTCS)10.1109/ACCTCS53867.2022.00085(383-387)Online publication date: Mar-2022
  • (2022)Scalability and performance analysis of BDPS in cloudsComputing10.1007/s00607-022-01056-7104:6(1425-1460)Online publication date: 1-Jun-2022

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media