Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3624062.3624284acmotherconferencesArticle/Chapter ViewAbstractPublication PagesscConference Proceedingsconference-collections
research-article

A data science pipeline synchronisation method for edge-fog-cloud continuum

Published: 12 November 2023 Publication History

Abstract

This paper presents an adaptive data delivery method for data science pipelines. While this method is feasible for processes communicating over any network, in this work we focus on edge-fog-cloud infrastructures. In a diagnostic phase, a model based on the Bernoulli principle is used to create a representation of bottlenecks in a pipeline. In a supervision phase, a watchman/sentinel cooperative system monitors the throughput of the pipeline stages to create a bottleneck-stage scheme. In a rectification phase, this system produces replicas of bottlenecks stages, mitigating the workload congestion using implicit parallelism and load balancing algorithms. This method is automatically and transparently invoked to produce a steady continuum dataflow. To test our proposal, we conducted a case study about the processing of medical and satellite data. The evaluation revealed that this method creates continuum dataflows, without neither characterising workloads nor knowing infrastructure details, which yields a competitive performance with state-of-the-art solutions.

Supplemental Material

MP4 File - Conference presentation recording
Recording of "A data science pipeline synchronisation method for edge-fog-cloud continuum" presentation at the The 18th Workshop on Workflows in Support of Large-Scale Science (WORKS23)

References

[1]
Valentina Armenise. 2015. Continuous delivery with Jenkins: Jenkins solutions to implement continuous delivery. In Proceedings of the Third International Workshop on Release Engineering. IEEE, IEEE, Florence, Italy, 24–27.
[2]
Yadu Babuji, Anna Woodard, Zhuozhao Li, Daniel S. Katz, 2019. Parsl: Pervasive Parallel Programming in Python. In Proceedings of the 28th International Symposium on High-Performance Parallel and Distributed Computing (Phoenix, AZ, USA) (HPDC ’19). Association for Computing Machinery, New York, NY, USA, 25–36. https://doi.org/10.1145/3307681.3325400
[3]
David Balla, Csaba Simon, and Markosz Maliosz. 2020. Adaptive Scaling of Kubernetes Pods. In NOMS 2020 - 2020 IEEE/IFIP Network Operations and Management Symposium (Budapest, Hungary). IEEE Press, Budapest, Hungary, 1–5. https://doi.org/10.1109/NOMS47738.2020.9110428
[4]
Daniel Balouek-Thomert, Eduard Gibert Renart, Ali Reza Zamani, Anthony Simonet, and Manish Parashar. 2019. Towards a computing continuum: Enabling edge-to-cloud integration for data-driven workflows. The International Journal of High Performance Computing Applications 33, 6 (2019), 1159–1174.
[5]
Liang Bao, Chase Wu, Xiaoxuan Bu, Nana Ren, and Mengqing Shen. 2019. Performance modeling and workflow scheduling of microservice-based applications in clouds. IEEE Transactions on Parallel and Distributed Systems 30, 9 (2019), 2114–2129.
[6]
Alfredo Barron, Dante D Sanchez-Gallegos, Diana Carrizales-Espinoza, JL Gonzalez-Compean, and Miguel Morales-Sandoval. 2022. On the Efficient Delivery and Storage of IoT Data in Edge–Fog–Cloud Environments. Sensors 22, 18 (2022), 7016.
[7]
Jean-Marcel Belmont. 2018. Hands-On Continuous Integration and Delivery: Build and release quality software at scale with Jenkins, Travis CI, and CircleCI. Packt Publishing Ltd, Birmigham, UK.
[8]
Keyan Cao, Yefan Liu, Gongjie Meng, and Qimeng Sun. 2020. An overview on edge computing research. IEEE access 8 (2020), 85714–85728.
[9]
Mohak Chadha, Jophin John, and Michael Gerndt. 2020. Extending SLURM for Dynamic Resource-Aware Adaptive Batch Scheduling. In 2020 IEEE 27th International Conference on High Performance Computing, Data, and Analytics (HiPC). IEEE, Pune, India, 223–232. https://doi.org/10.1109/HiPC50609.2020.00036
[10]
Wo Chang and Nancy Grady. 2019. NIST Big Data Interoperability Framework: Volume 1, Definitions. National Institute of Standards and Technology 1, 3 (2019-10-21 2019), 1–53.
[11]
Ewa Deelman, Karan Vahi, Mats Rynge, Rajiv Mayani, Rafael Ferreira, George Papadimitriou, and Miron Livny. 2019. The evolution of the pegasus workflow management software. CiSE 21, 4 (2019), 22–36.
[12]
Paolo Di Tommaso, Maria Chatzou, Evan W Floden, Pablo Prieto Barja, Emilio Palumbo, and Cedric Notredame. 2017. Nextflow enables reproducible computational workflows. Nature biotechnology 35, 4 (2017), 316–319.
[13]
Zhijun Ding, Sheng Wang, and Meiqin Pan. 2020. QoS-Constrained Service Selection for Networked Microservices. IEEE Access 8 (2020), 39285–39299.
[14]
E M Fajardo, J M Dost, B Holzman, T Tannenbaum, J Letts, A Tiradani, B Bockelman, J Frey, and D Mason. 2015. How much higher can HTCondor fly?Journal of Physics: Conference Series 664, 6 (dec 2015), 1–8.
[15]
E M Fajardo, J M Dost, B Holzman, T Tannenbaum, J Letts, A Tiradani, B Bockelman, J Frey, and D Mason. 2015. How much higher can HTCondor fly?Journal of Physics: Conference Series 664, 6 (dec 2015), 062014. https://doi.org/10.1088/1742-6596/664/6/062014
[16]
Ming Gao, Mingxia Chen, An Liu, Wai Hung Ip, and Kai Leung Yung. 2020. Optimization of microservice composition based on artificial immune algorithm considering fuzziness and user preference. IEEE Access 8 (2020), 26385–26404.
[17]
Dalvan Griebler, Marco Danelutto, Massimo Torquati, and Luiz Gustavo Fernandes. 2017. SPar: a DSL for high-level and productive stream parallelism. Parallel Processing Letters 27, 01 (2017), 1740005.
[18]
Tilda Herrgårdh, Elizabeth Hunter, Kajsa Tunedal, 2022. Digital twins and hybrid modelling for simulation of physiological variables and stroke risk. bioRxiv 1, 1 (2022), 33 pages. https://doi.org/10.1101/2022.03.25.485803 arXiv:https://www.biorxiv.org/content/early/2022/03/27/2022.03.25.485803.full.pdf
[19]
Anshul Jindal, Vladimir Podolskiy, and Michael Gerndt. 2019. Performance Modeling for Cloud Microservice Applications. In Proceedings of the 2019 ACM/SPEC International Conference on Performance Engineering (Mumbai, India) (ICPE ’19). Association for Computing Machinery, New York, NY, USA, 25–32. https://doi.org/10.1145/3297663.3310309
[20]
José A Joao, M Aater Suleman, Onur Mutlu, and Yale N Patt. 2012. Bottleneck identification and scheduling in multithreaded applications. ACM SIGARCH Computer Architecture News 40, 1 (2012), 223–234.
[21]
Changyuan Lin and Hamzeh Khazaei. 2020. Modeling and Optimization of Performance and Cost of Serverless Applications. IEEE Transactions on Parallel and Distributed Systems 32, 3 (2020), 615–632.
[22]
Ji Liu, Esther Pacitti, Patrick Valduriez, and Marta Mattoso. 2015. A survey of data-intensive scientific workflow management. Journal of Grid Computing 13, 4 (2015), 457–493.
[23]
Ravi Madduri, Kyle Chard, Ryan Chard, Lukasz Lacinski, Alex Rodriguez, Dinanath Sulakhe, David Kelly, Utpal Dave, and Ian Foster. 2015. The Globus Galaxies platform: delivering science gateways as a service. Concurrency and Computation: Practice and Experience 27, 16 (2015), 4344–4360.
[24]
Tobias Mönch, Arnd Huchzermeier, and Peter Bebersdorf. 2022. Variable takt time groups and workload equilibrium. International Journal of Production Research 60, 5 (2022), 1535–1552. https://doi.org/10.1080/00207543.2020.1864836 arXiv:https://doi.org/10.1080/00207543.2020.1864836
[25]
Agamenon R. E. Oliveira. 2019. History of the Bernoulli Principle. In Advances in Mechanism and Machine Science, Tadeusz Uhl (Ed.). Springer International Publishing, Cham, 1161–1178.
[26]
Justice Opara-Martins, Reza Sahandi, and Feng Tian. 2014. Critical review of vendor lock-in and its impact on adoption of cloud computing. In International Conference on Information Society (i-Society 2014). IEEE, London, UK, 92–97. https://doi.org/10.1109/i-Society.2014.7009018
[27]
Michael O. Rabin. 1990. The Information Dispersal Algorithm and its Applications. In Sequences, Renato M. Capocelli (Ed.). Springer New York, New York, NY, 406–419.
[28]
Hugo G Reyes-Anastacio, JL Gonzalez-Compean, Victor J Sosa-Sosa, Jesus Carretero, and Javier Garcia-Blas. 2020. Kulla, a container-centric construction model for building infrastructure-agnostic distributed and parallel applications. Journal of Systems and Software 168 (2020), 110665.
[29]
Sebastián Risco, Germán Moltó, Diana M Naranjo, and Ignacio Blanquer. 2021. Serverless workflows for containerised applications in the cloud continuum. Journal of Grid Computing 19 (2021), 1–18.
[30]
Rydning, Reinsel, and Gantz. 2018. The digitization of the world from edge to core. Framingham: IDC 1, 1 (2018), 28 pages.
[31]
Dante Domizzi Sánchez-Gallegos, Diana Di Luccio, Sokol Kosta, JL Gonzalez-Compean, and Raffaele Montella. 2021. An efficient pattern-based approach for workflow supporting large-scale science: The DagOnStar experience. Future Generation Computer Systems 122 (2021), 187–203.
[32]
Genaro Sanchez-Gallegos, Dante D Sanchez-Gallegos, JL Gonzalez-Compean, Hugo G Reyes-Anastacio, and Jesus Carretero. 2023. On the building of efficient self-adaptable health data science services by using dynamic patterns. Future Generation Computer Systems 145 (2023), 478–495.
[33]
Miguel Santiago-Duran, J.L. Gonzalez-Compean, André Brinkmann, Hugo G. Reyes-Anastacio, Jesus Carretero, Raffaele Montella, and Gregorio Toscano Pulido. 2020. A gearbox model for processing large volumes of data by using pipeline systems encapsulated into virtual containers. Future Generation Computer Systems 106 (2020), 304–319. https://doi.org/10.1016/j.future.2020.01.014
[34]
Polona Štefanič, Matej Cigale, Andrew C Jones, Louise Knight, Ian Taylor, Cristiana Istrate, George Suciu, Alexandre Ulisses, Vlado Stankovski, Salman Taherizadeh, 2019. SWITCH workbench: A novel approach for the development and deployment of time-critical microservice-based cloud-native applications. Future Generation Computer Systems 99 (2019), 197–212.
[35]
Enric Tejedor, Yolanda Becerra, Guillem Alomar, Anna Queralt, Rosa M Badia, Jordi Torres, Toni Cortes, and Jesús Labarta. 2017. PyCOMPSs: Parallel computational workflows in Python. The International Journal of High Performance Computing Applications 31, 1 (2017), 66–82.
[36]
USGS. July 31, 2020. EarthExplorer - USGS. https://earthexplorer.usgs.gov/s
[37]
Gerardo A Vazquez-Martinez, JL Gonzalez-Compean, Victor J Sosa-Sosa, Miguel Morales-Sandoval, and Jesus Carretero Perez. 2018. CloudChain: A novel distribution model for digital products based on supply chain principles. International Journal of Information Management 39 (2018), 90–103.
[38]
Adriano Vogel, Gabriele Mencagli, Dalvan Griebler, Marco Danelutto, and Luiz Gustavo Fernandes. 2021. Towards on-the-fly self-adaptation of stream parallel patterns. In PDP. IEEE, IEEE, Valladolid, Spain, 89–93.
[39]
Matt Welsh, David Culler, and Eric Brewer. 2001. SEDA: An Architecture for Well-Conditioned, Scalable Internet Services. SIGOPS Oper. Syst. Rev. 35, 5 (oct 2001), 230–243. https://doi.org/10.1145/502059.502057
[40]
Jinzhong Yang, Greg Sharp, Harini Veeraraghavan, Wouter van Elmpt, Andre Dekker, Tim Lustberg, and Mark Gooding. 2017. Data from lung CT segmentation challenge. https://doi.org/10.7937/K9/TCIA.2017.3R3FVZ08.
[41]
Nicholas E Young, Ryan S Anderson, Stephen M Chignell, Anthony G Vorster, Rick Lawrence, and Paul H Evangelista. 2017. A survival guide to Landsat preprocessing. Ecology 98, 4 (2017), 920–932.
[42]
Liang Zhang, Wenli Zheng, Chao Li, Yao Shen, and Minyi Guo. 2021. Autrascale: an automated and transfer learning solution for streaming system auto-scaling. In 2021 IEEE International Parallel and Distributed Processing Symposium (IPDPS). IEEE, IEEE, Portland, OR, USA, 912–921.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
SC-W '23: Proceedings of the SC '23 Workshops of The International Conference on High Performance Computing, Network, Storage, and Analysis
November 2023
2180 pages
ISBN:9798400707858
DOI:10.1145/3624062
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 12 November 2023

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. big data
  2. computing continuum
  3. edge-fog-cloud
  4. elasticity
  5. parallel patterns
  6. pipelines

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Funding Sources

  • FORDECYT-PRONACES
  • Spanish Ministry of Science and Innovation

Conference

SC-W 2023

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)83
  • Downloads (Last 6 weeks)7
Reflects downloads up to 10 Oct 2024

Other Metrics

Citations

Cited By

View all

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media