Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Breaking the chains: on declarative data analysis and data independence in the big data era

Published: 01 August 2014 Publication History

Abstract

Data management research, systems, and technologies have drastically improved the availability of data analysis capabilities, particularly for non-experts, due in part to low-entry barriers and reduced ownership costs (e.g., for data management infrastructures and applications). Major reasons for the widespread success of database systems and today's multi-billion dollar data management market include data independence, separating physical representation and storage from the actual information, and declarative languages, separating the program specification from its intended execution environment. In contrast, today's big data solutions do not offer data independence and declarative specification. As a result, big data technologies are mostly employed in newly-established companies with IT-savvy employees or in large well-established companies with big IT departments. We argue that current big data solutions will continue to fall short of widespread adoption, due to usability problems, despite the fact that in-situ data analytics technologies achieve a good degree of schema independence. In particular, we consider the lack of a declarative specification to be a major road-block, contributing to the scarcity in available data scientists available and limiting the application of big data to the IT-savvy industries. In particular, data scientists currently have to spend a lot of time on tuning their data analysis programs for specific data characteristics and a specific execution environment. We believe that the research community needs to bring the powerful concepts of declarative specification to current data analysis systems, in order to achieve the broad big data technology adoption and effectively deliver the promise that novel big data technologies offer.

References

[1]
A. Alexandrov, R. Bergmann, S. Ewen, et al.: "The Stratosphere Platform for Big Data Analytics," VLDB Journal 05/2014.
[2]
S. Schelter, S. Ewen, K. Tzoumas, et al.: "All Roads Lead to Rome: Optimistic Recovery for Distributed Iterative Data Processing," CIKM 2013: 1919--1928.
[3]
S. Ewen, K. Tzoumas, M. Kaufmann, et al.: "Spinning Fast Iterative Data Flows," PVLDB 5(11): 1268--1279 (2012).
[4]
M. Heimel, V. Markl: "A First Step Towards GPU-assisted Query Optimization," ADMS@VLDB 2012: 33--44.
[5]
D. Battré, S. Ewen, F. Hueske, et al: "Nephele/PACTs: programming model and execution framework for web-scale analytical processing," SoCC 2010: 119--130.
[6]
M. Zaharia, M. Chowdhury, M. J. Franklin, et al: "Spark: cluster computing with working sets," HotCloud (2010).
[7]
D. Jiang, G. Chen, B. C. Ooi, K.-L. Tan, S. Wu: "epiC: an Extensible and Scalable System for Processing Big Data," PVLDB 7(7): 541--552 (2014).
[8]
S. Alsubaiee, Y. Altowim, H. Altwaijry, et al: "ASTERIX: An Open Source System for Big Data Management and Analysis." PVLDB 5(12): 1898--1901 (2012).
[9]
Stratosphere, http://www.stratosphere.eu, last checked Jul 7, 2014
[10]
Apache Flink Incubator Project, http://flink.incubator.apache.org/ last checked Jul 7, 2014

Cited By

View all
  • (2024)NebulaStream - Data Stream Processing in Massively Distributed, Heterogeneous, Volatile EnvironmentsProceedings of the 18th ACM International Conference on Distributed and Event-based Systems10.1145/3629104.3672505(1-3)Online publication date: 24-Jun-2024
  • (2022)Prescriptive analytics: a survey of emerging trends and technologiesThe VLDB Journal — The International Journal on Very Large Data Bases10.1007/s00778-019-00539-y28:4(575-595)Online publication date: 10-Mar-2022
  • (2022)Visualisation of Numerical Query Results on Industrial Data StreamsNew Trends in Database and Information Systems10.1007/978-3-031-15743-1_4(34-42)Online publication date: 29-Aug-2022
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Proceedings of the VLDB Endowment
Proceedings of the VLDB Endowment  Volume 7, Issue 13
August 2014
466 pages
ISSN:2150-8097
Issue’s Table of Contents

Publisher

VLDB Endowment

Publication History

Published: 01 August 2014
Published in PVLDB Volume 7, Issue 13

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)7
  • Downloads (Last 6 weeks)0
Reflects downloads up to 09 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2024)NebulaStream - Data Stream Processing in Massively Distributed, Heterogeneous, Volatile EnvironmentsProceedings of the 18th ACM International Conference on Distributed and Event-based Systems10.1145/3629104.3672505(1-3)Online publication date: 24-Jun-2024
  • (2022)Prescriptive analytics: a survey of emerging trends and technologiesThe VLDB Journal — The International Journal on Very Large Data Bases10.1007/s00778-019-00539-y28:4(575-595)Online publication date: 10-Mar-2022
  • (2022)Visualisation of Numerical Query Results on Industrial Data StreamsNew Trends in Database and Information Systems10.1007/978-3-031-15743-1_4(34-42)Online publication date: 29-Aug-2022
  • (2021)The Complexity and Expressive Power of Limit DatalogJournal of the ACM10.1145/349500969:1(1-83)Online publication date: 22-Dec-2021
  • (2021)Model-Based Big Data Analytics-as-a-Service: Take Big Data to the Next LevelIEEE Transactions on Services Computing10.1109/TSC.2018.281694114:2(516-529)Online publication date: 1-Mar-2021
  • (2020)Limit DatalogACM SIGMOD Record10.1145/3385658.338566048:4(6-17)Online publication date: 25-Feb-2020
  • (2020)Wisdom Media Era of Big Data in the Application of the Short Video from the MediaThe 2020 International Conference on Machine Learning and Big Data Analytics for IoT Security and Privacy10.1007/978-3-030-62743-0_40(283-289)Online publication date: 4-Nov-2020
  • (2020)Declarative Data Analysis Using Limit Datalog ProgramsReasoning Web. Declarative Artificial Intelligence10.1007/978-3-030-60067-9_7(186-222)Online publication date: 24-Jun-2020
  • (2019)Data Management Systems Research at TU BerlinACM SIGMOD Record10.1145/3335409.333541547:4(23-28)Online publication date: 17-May-2019
  • (2019)An Outlook to Declarative Languages for Big Steaming DataProceedings of the 13th ACM International Conference on Distributed and Event-based Systems10.1145/3328905.3332462(199-202)Online publication date: 24-Jun-2019
  • Show More Cited By

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media