Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

StreamGen: Model-driven Development of Distributed Streaming Applications

Published: 20 January 2021 Publication History

Abstract

Distributed streaming applications, i.e., applications that process massive streams of data in a distributed fashion, are becoming increasingly popular to tame the velocity and the volume of Big Data. Nevertheless, the widespread adoption of data-intensive processing is still limited by the non-trivial design paradigms involved, which deal with the unboundedness and volume of involved data streams and by the many distributed streaming platforms, each with its own characteristics and APIs. In this article, we present StreamGen, a Model-Driven Engineering tool to simplify the design of such streaming applications and automatically generate the corresponding code. StreamGen is able to automatically generate fully working and processing-ready code for different target platforms (e.g., Apache Spark, Apache Flink). Evaluation shows that (i) StreamGen is general enough to model and generate the code, offering comparable performance against a preexisting similar and well-known application; (ii) the tool is fully compliant with streaming concepts defined as part of the Google Dataflow Model; and (iii) users with little computer science background and limited experience with big data have been able to work with StreamGen and create/refactor an application in a matter of minutes.

References

[1]
Lorenzo Affetti, Riccardo Tommasini, Alessandro Margara, Gianpaolo Cugola, and Emanuele Della Valle. 2017. Defining the execution semantics of stream processing engines. J. Big Data 4, 1 (2017), 12.
[2]
Tyler Akidau et al. 2015. The dataflow model: A practical approach to balancing correctness, latency, and cost in massive-scale, unbounded, out-of-order data processing. Proc. VLDB Endow. 8, 12 (2015), 1792--1803.
[3]
João Paulo Andrade Almeida. 2006. Model-driven Design of Distributed Applications. Ph.D. Dissertation. University of Twente, Netherlands.
[4]
C. A. Ardagna, V. Bellandi, P. Ceravolo, E. Damiani, M. Bezzi, and C. Hebert. 2017. A model-driven methodology for big data analytics-as-a-service. In Proceedings of the 2017 IEEE International Congress on Big Data (BigData Congress’17). 105--112.
[5]
Danilo Ardagna, Elisabetta Di Nitto, Giuliano Casale, Dana Petcu, Parastoo Mohagheghi, Sébastien Mosser, Peter Matthews, Anke Gericke, Cyril Ballagny, Francesco D’Andria, Cosmin-Septimiu Nechifor, and Craig Sheridan. 2012. MODAClouds: A model-driven approach for the design and execution of applications on multiple clouds. In Proceedings of the 4th International Workshop on Modeling in Software Engineering (MiSE’12). IEEE Press, Piscataway, NJ, 50--56. http://dl.acm.org/citation.cfm?id=2664431.2664439.
[6]
Krishnakumar Balasubramanian, Aniruddha Gokhale, Gabor Karsai, Janos Sztipanovits, and Sandeep Neema. 2006. Developing applications using model-driven design environments. Computer 39, 2 (February 2006), 33--40.
[7]
Marco Brambilla, Jordi Cabot, and Manuel Wimmer. 2012. Model-Driven Software Engineering in Practice (1st ed.). Morgan 8 Claypool.
[8]
G. Casale, D. Ardagna, M. Artac, F. Barbier, E. D. Nitto, A. Henry, G. Iuhasz, C. Joubert, J. Merseguer, V. I. Munteanu, J. F. Pérez, D. Petcu, M. Rossi, C. Sheridan, I. Spais, and D. Vladuic. 2015. DICE: Quality-driven development of data-intensive cloud applications. In Proceedings of the 2015 IEEE/ACM 7th International Workshop on Modeling in Software Engineering. 78--83.
[9]
Mathieu Colas, Ingo Finck, Jerome Buvat, Roopa Nambiar, and Rishi Raj Singh. 2015. Cracking the Data Conundrum: How Successful Companies Make Big Data Operational. Technical Report. Capgemini consulting. Retrieved from https://www.capgemini-consulting.com/cracking-the-data-conundrum.
[10]
Marcos Aurélio Almeida da Silva, Andrey Sadovykh, Alessandra Bagnato, Alexey Cheptsov, and Ludwig Adam. 2014. JUNIPER: Towards modeling approach enabling efficient platform for heterogeneous big data analysis. In Proceedings of the 10th Central and Eastern European Software Engineering Conference in Russia (CEE-SECR’14). ACM, New York, NY, Article 12, 7 pages.
[11]
Ernesto Damiani, Claudio Ardagna, Paolo Ceravolo, and Nello Scarabottolo. 2017. Toward model-based big data-as-a-service: The TOREADOR approach. In Advances in Databases and Information Systems, Mārīte Kirikova, Kjetil Nørvåg, and George A. Papadopoulos (Eds.). Springer International Publishing, Cham, 3--9.
[12]
Robert Feldt and Ana Magazinius. 2010. Validity threats in empirical software engineering research - an initial survey. In Proceedings of the Software Engineering and Knowledge Engineering Conference. 374--379.
[13]
Abel Gómez, José Merseguer, Elisabetta Di Nitto, and Damian A. Tamburri. 2016. Towards a UML profile for data intensive applications. In Proceedings of the 2nd International Workshop on Quality-Aware DevOps (QUDOS’16). ACM, New York, NY, 18--23.
[14]
Michele Guerriero, Alessandro Nesta, and Elisabetta Di Nitto. 2018. Streamgen: A UML-based tool for developing streaming applications. In Proceedings of the 10th International Workshop on Modelling in Software Engineering (MiSE@ICSE’18). 57--58.
[15]
Ryoya Kaneko, Kohei Miyaguchi, and Kenji Yamanishi. 2017. Detecting changes in streaming data with information-theoretic windowing. In Proceedings of the 2017 IEEE International Conference on Big Data (BigData’17), Jian-Yun Nie, Zoran Obradovic, Toyotaro Suzumura, Rumi Ghosh, Raghunath Nambiar, Chonggang Wang, Hui Zang, Ricardo Baeza-Yates, Xiaohua Hu, Jeremy Kepner, Alfredo Cuzzocrea, Jian Tang, and Masashi Toyoda (Eds.). IEEE Computer Society, 646--655.
[16]
Stuart Kent. 2002. Model driven engineering. In Integrated Formal Methods, Michael Butler, Luigia Petre, and Kaisa Sere (Eds.). Springer, 286--298.
[17]
Katsiaryna Labunets, Fabio Massacci, Federica Paci, Sabrina Marczak, and Flávio Moreira de Oliveira. 2018. Model comprehension for security risk assessment: An empirical comparison of tabular vs. graphical representations. In Proceedings of the 40th International Conference on Software Engineering (ICSE’18). ACM, New York, NY, 395--395.
[18]
François Lagarde, Huáscar Espinoza, François Terrier, and Sébastien Gérard. 2007. Improving UML profile design practices by leveraging conceptual domain models. In Proceedings of the 22nd IEEE/ACM International Conference on Automated Software Engineering (ASE’07). ACM, 445--448.
[19]
Xiufeng Liu, Nadeem Iftikhar, and Xike Xie. 2014. Survey of real-time processing systems for big data. In Proceedings of the 18th International Database Engineering and Applications Symposium (IDEAS’14). ACM, New York, NY, 356--361.
[20]
Object Management Group. 2017. OMG Unified Modeling Language—Version 2.5.1. Retrieved from https://www.omg.org/spec/UML/2.5.1.
[21]
Francesco Marconi, Marcello M. Bersani, and Matteo Rossi. 2017. A model-driven approach for the formal verification of storm-based streaming applications. SIGAPP Appl. Comput. Rev. 17, 3 (November 2017), 6--15.
[22]
Diego Perez-Palacin, José Merseguer, Jose Requeno, Michele Guerriero, Elisabetta Di Nitto, and Damian Tamburri. 2019. A UML profile for the design, quality assessment and deployment of data-intensive applications. Softw. Syst. Model. 18, 6 (2019), 3577--3614.
[23]
A. Rajbhoj, V. Kulkarni, and N. Bellarykar. 2014. Early experience with model-driven development of MapReduce based big data application. In Proceedings of the 2014 21st Asia-Pacific Software Engineering Conference (APSEC’14), Vol. 1. 94--97.
[24]
José Ignacio Requeno, Iñigo Gascón, and José Merseguer. 2018. Towards the performance analysis of apache tez applications. In Companion of the 2018 ACM/SPEC International Conference on Performance Engineering. 147--152.
[25]
J. Requeno, J. Merseguer, and S. Bernardi. 2017. Performance analysis of apache storm applications using stochastic petri nets. In Proceedings of the 2017 IEEE International Conference on Information Reuse and Integration (IRI’17). 411--418.
[26]
Per Runeson and Martin Höst. 2009. Guidelines for conducting and reporting case study research in software engineering. Emp. Softw. Eng. 14, 2 (2009), 131--164.
[27]
S. Santurkar, A. Arora, and K. Chandrasekaran. 2014. Stormgen - a domain specific language to create ad-hoc storm topologies. In Proceedings of the Federated Conference on Computer Science and Information Systems (FedCSIS’14). 1621--1628.
[28]
Bran Selic. 2007. A systematic approach to domain-specific language design using UML. In Proceedings of the 10th IEEE International Symposium on Object-Oriented Real-Time Distributed Computing (ISORC’07). IEEE Computer Society, 2--9.
[29]
Matei Zaharia, Tathagata Das, Haoyuan Li, Timothy Hunter, Scott Shenker, and Ion Stoica. 2013. Discretized streams: Fault-tolerant streaming computation at scale. In Proceedings of the 24th ACM Symposium on Operating Systems Principles (SOSP’13). ACM, New York, NY, 423--438.

Cited By

View all
  • (2024)IDE4ICDS: A Human-Centric and Model-Driven Proposal to Improve the Digitization of Clinical Practice GuidelineACM Transactions on Software Engineering and Methodology10.1145/367473233:7(1-38)Online publication date: 28-Jun-2024
  • (2024)Guidelines for using financial incentives in software-engineering experimentationEmpirical Software Engineering10.1007/s10664-024-10517-w29:5Online publication date: 10-Aug-2024
  • (2023)A Model and Survey of Distributed Data-Intensive SystemsACM Computing Surveys10.1145/360480156:1(1-69)Online publication date: 26-Aug-2023

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Software Engineering and Methodology
ACM Transactions on Software Engineering and Methodology  Volume 30, Issue 1
Continuous Special Section: AI and SE
January 2021
444 pages
ISSN:1049-331X
EISSN:1557-7392
DOI:10.1145/3446626
  • Editor:
  • Mauro Pezzè
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 20 January 2021
Accepted: 01 June 2020
Revised: 01 June 2020
Received: 01 March 2019
Published in TOSEM Volume 30, Issue 1

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Model-driven engineering
  2. big data architectures
  3. streaming applications

Qualifiers

  • Research-article
  • Research
  • Refereed

Funding Sources

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)35
  • Downloads (Last 6 weeks)2
Reflects downloads up to 22 Sep 2024

Other Metrics

Citations

Cited By

View all
  • (2024)IDE4ICDS: A Human-Centric and Model-Driven Proposal to Improve the Digitization of Clinical Practice GuidelineACM Transactions on Software Engineering and Methodology10.1145/367473233:7(1-38)Online publication date: 28-Jun-2024
  • (2024)Guidelines for using financial incentives in software-engineering experimentationEmpirical Software Engineering10.1007/s10664-024-10517-w29:5Online publication date: 10-Aug-2024
  • (2023)A Model and Survey of Distributed Data-Intensive SystemsACM Computing Surveys10.1145/360480156:1(1-69)Online publication date: 26-Aug-2023

View Options

Get Access

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media