tutorial

Open access

A unifying model for distributed data-intensive systems

Author:

Alessandro MargaraAuthors Info & Claims

DEBS '22: Proceedings of the 16th ACM International Conference on Distributed and Event-Based Systems

Pages 176 - 179

https://doi.org/10.1145/3524860.3539782

Published: 15 July 2022 Publication History

Abstract

Modern applications handle increasingly larger volumes of data, generated at an unprecedented and constantly growing rate. They introduce challenges that are radically transforming the research fields that gravitate around data management and processing, resulting in a blooming of distributed data-intensive systems. Each such system comes with its specific assumptions, data and processing model, design choices, implementation strategies, and guarantees. Yet, the problems data-intensive systems face and the solutions they propose are frequently overlapping.

This tutorial presents a unifying model for data-intensive systems that dissects them into core building blocks, enabling a precise and unambiguous description and a detailed comparison. From the model, we derive a list of classification criteria and we use them to build a taxonomy of state-of-the-art systems. The tutorial offers a global view of the vast research field of data-intensive systems, highlighting interesting observations on the current state of things, and suggesting promising research directions.

References

[1]

Lorenzo Affetti, Alessandro Margara, and Gianpaolo Cugola. 2020. TSpoon: Transactions on a stream processor. J. Parallel and Distrib. Comput. 140 (2020), 65--79.

[2]

Joy Arulraj and Andrew Pavlo. 2017. How to Build a Non-Volatile Memory Database Management System. In Proc of the Intl Conf on Management of Data (SIGMOD '17). ACM, 1753--1758.

Digital Library

[3]

David F. Bacon, Nathan Bales, Nico Bruno, Brian F. Cooper, Adam Dickinson, Andrew Fikes, Campbell Fraser, Andrey Gubarev, Milind Joshi, Eugene Kogan, Alexander Lloyd, Sergey Melnik, Rajesh Rao, David Shue, Christopher Taylor, Marcel van der Holst, and Dale Woodford. 2017. Spanner: Becoming a SQL System. In Proc of the Intl Conf on Management of Data (SIGMOD '17). ACM, 331--343.

[4]

Peter Bailis, Alan D. Fekete, Michael J. Franklin, Ali Ghodsi, Joseph M. Hellerstein, and Ion Stoica. 2014. Coordination Avoidance in Database Systems. Proc. VLDB Endow. 8, 3 (2014), 185--196.

Digital Library

[5]

Bill Bejeck. 2018. Kafka Streams in Action: Real-time apps and microservices with the Kafka Streams API. Manning.

[6]

Paris Carbone, Asterios Katsifodimos, Stephan Ewen, Volker Markl, Seif Haridi, and Kostas Tzoumas. 2015. Apache Flink^™: Stream and Batch Processing in a Single Engine. IEEE Data Engineering Bulletin 38, 4 (2015), 28--38.

[7]

Ugur Cetintemel, Jiang Du, Tim Kraska, Samuel Madden, David Maier, John Meehan, Andrew Pavlo, Michael Stonebraker, Erik Sutherland, Nesime Tatbul, et al. 2014. S-Store: a streaming NewSQL system for big velocity applications. Proc of VLDB 7, 13 (2014), 1633--1636.

Digital Library

[8]

Ali Davoudian, Liu Chen, and Mengchi Liu. 2018. A Survey on NoSQL Stores. ACM Comput. Surv. 51, 2, Article 40 (2018).

[9]

Jeffrey Dean and Sanjay Ghemawat. 2008. MapReduce: Simplified Data Processing on Large Clusters. Commun. ACM 51, 1 (2008), 107--113.

Digital Library

[10]

Bonaventura Del Monte, Steffen Zeuch, Tilmann Rabl, and Volker Markl. 2022. Rethinking Stateful Stream Processing with RDMA. (2022).

[11]

Raul Castro Fernandez, Matteo Migliavacca, Evangelia Kalyvianaki, and Peter Pietzuch. 2014. Making State Explicit for Imperative Big Data Processing. In Proc of the USENIX Annual Technical Conf (ATC'14). USENIX Assoc., 49--60.

[12]

Martin Kleppmann. 2016. Designing Data-Intensive Applications: The Big Ideas Behind Reliable, Scalable, and Maintainable Systems. O'Reilly.

[13]

Jay Kreps, Neha Narkhede, Jun Rao, et al. 2011. Kafka: A distributed messaging system for log processing. In Proc of the Intl Workshop on Networking meets Databases (NetDB). USENIX, 1--7.

[14]

Rubao Lee, Minghong Zhou, Chi Li, Shenggang Hu, Jianping Teng, Dongyang Li, and Xiaodong Zhang. 2021. The Art of Balance: A RateupDB Experience of Building a CPU/GPU Hybrid Database Product. Proc of VLDB 14, 12 (2021), 2999--3013.

Digital Library

[15]

Jimmy Lin. 2017. The Lambda and the Kappa. IEEE Internet Computing 21, 5 (2017), 60--66.

Digital Library

[16]

Alessandro Margara, Gianpaolo Cugola, Nicoló Felicioni, and Stefano Cilloni. 2022. A Model and Survey of Distributed Data-Intensive Systems.

[17]

Matthias J. Sax, Guozhang Wang, Matthias Weidlich, and Johann-Christoph Freytag. 2018. Streams and Tables: Two Sides of the Same Coin. In Proc of the Intl Workshop on Real-Time Business Intelligence and Analytics (BIRTE '18). ACM, Article 1.

Digital Library

[18]

Weisong Shi, Jie Cao, Quan Zhang, Youhuizi Li, and Lanyu Xu. 2016. Edge Computing: Vision and Challenges. Internet of Things Journal 3, 5 (2016), 637--646.

[19]

Jeff Shute, Radek Vingralek, Bart Samwel, Ben Handy, Chad Whipkey, Eric Rollins, Mircea Oancea, Kyle Littlefield, David Menestrina, Stephan Ellner, John Cieslewicz, Ian Rae, Traian Stancescu, and Himani Apte. 2013. F1: A Distributed SQL Database That Scales. Proc of VLDB 6, 11 (2013), 1068--1079.

Digital Library

[20]

Michael Stonebraker. 2012. New Opportunities for New SQL. Commun. ACM 55, 11 (2012), 10--11.

Digital Library

[21]

Michael Stonebraker and Ugur Cetintemel. 2005. "One Size Fits All": An Idea Whose Time Has Come and Gone. In Proc of the Intl Conf on Data Engineering (ICDE '05). IEEE, 2--11.

Digital Library

[22]

Michael Stonebraker and Ariel Weisberg. 2013. The VoltDB Main Memory DBMS. IEEE Data Engineering Bulletin 36, 2 (2013), 21--27.

[23]

Alexander Thomson, Thaddeus Diamond, Shu-Chun Weng, Kun Ren, Philip Shao, and Daniel J. Abadi. 2012. Calvin: Fast Distributed Transactions for Partitioned Database Systems. In Proc of the Intl Conf on Management of Data (SIGMOD '12). ACM, 1--12.

[24]

Alexandre Verbitski, Anurag Gupta, Debanjan Saha, Murali Brahmadesam, Kamal Gupta, Raman Mittal, Sailesh Krishnamurthy, Sandor Maurice, Tengiz Kharatishvili, and Xiaofeng Bao. 2017. Amazon Aurora: Design Considerations for High Throughput Cloud-Native Relational Databases. In Proc of the Intl Conf on Management of Data (SIGMOD '17). ACM, 1041--1052.

Digital Library

[25]

Matei Zaharia, Reynold S. Xin, Patrick Wendell, Tathagata Das, Michael Armbrust, Ankur Dave, Xiangrui Meng, Josh Rosen, Shivaram Venkataraman, Michael J. Franklin, Ali Ghodsi, Joseph Gonzalez, Scott Shenker, and Ion Stoica. 2016. Apache Spark: A Unified Engine for Big Data Processing. Commun. ACM 59, 11 (2016), 56--65.

Digital Library

Cited By

Singh JChaudhary N(2024)Rest security framework for event streaming bus architectureInternational Journal of Information Technology10.1007/s41870-024-01836-816:5(3033-3047)Online publication date: 13-Apr-2024
https://doi.org/10.1007/s41870-024-01836-8
El Mendili FFattah M(2024)Emerging Concepts Using Blockchain and Big DataArtificial Intelligence, Data Science and Applications10.1007/978-3-031-48573-2_70(487-492)Online publication date: 30-Jan-2024
https://doi.org/10.1007/978-3-031-48573-2_70
Abdalla H(2022)A brief survey on big data: technologies, terminologies and data-intensive applicationsJournal of Big Data10.1186/s40537-022-00659-39:1Online publication date: 17-Nov-2022
https://doi.org/10.1186/s40537-022-00659-3

Index Terms

A unifying model for distributed data-intensive systems

Recommendations

A Model and Survey of Distributed Data-Intensive Systems
Data is a precious resource in today’s society, and it is generated at an unprecedented and constantly growing pace. The need to store, analyze, and make data promptly available to a multitude of users introduces formidable challenges in modern software ...
A Data Quality in Use model for Big Data

Beyond the hype of Big Data, something within business intelligence projects is indeed changing. This is mainly because Big Data is not only about data, but also about a complete conceptual and technological stack including raw and processed data, ...
From Databases to Big Data

There is a tremendous amount of buzz around the concept of "big data." In this article, the author discusses the origins of this trend, the relationship between big data and traditional databases and data processing platforms, and some of the new ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

DEBS '22: Proceedings of the 16th ACM International Conference on Distributed and Event-Based Systems

June 2022

210 pages

ISBN:9781450393089

DOI:10.1145/3524860

General Chair:
Yongluan Zhou
University of Copenhagen, Denmark
,
Program Chairs:
Panos K. Chrysanthis
University of Pittsburgh
,
Vincenzo Gulisano
Chalmers University of Technology, Sweden

Copyright © 2022 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 15 July 2022

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Tutorial

Conference

DEBS '22

Sponsor:

DEBS '22: The 16th ACM International Conference on Distributed and Event-based Systems

June 27 - 30, 2022

Copenhagen, Denmark

Acceptance Rates

DEBS '22 Paper Acceptance Rate 10 of 19 submissions, 53%;

Overall Acceptance Rate 145 of 583 submissions, 25%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

3
Total Citations
View Citations
191
Total Downloads

Downloads (Last 12 months)83
Downloads (Last 6 weeks)10

Reflects downloads up to 04 Oct 2024

Other Metrics

View Author Metrics

Citations

Cited By

Singh JChaudhary N(2024)Rest security framework for event streaming bus architectureInternational Journal of Information Technology10.1007/s41870-024-01836-816:5(3033-3047)Online publication date: 13-Apr-2024
https://doi.org/10.1007/s41870-024-01836-8
El Mendili FFattah M(2024)Emerging Concepts Using Blockchain and Big DataArtificial Intelligence, Data Science and Applications10.1007/978-3-031-48573-2_70(487-492)Online publication date: 30-Jan-2024
https://doi.org/10.1007/978-3-031-48573-2_70
Abdalla H(2022)A brief survey on big data: technologies, terminologies and data-intensive applicationsJournal of Big Data10.1186/s40537-022-00659-39:1Online publication date: 17-Nov-2022
https://doi.org/10.1186/s40537-022-00659-3

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents