Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Davos: a system for interactive data-driven decision making

Published: 01 July 2021 Publication History
  • Get Citation Alerts
  • Abstract

    Recently, a new horizon in data analytics, prescriptive analytics, is becoming more and more important to make data-driven decisions. As opposed to the progress of democratizing data acquisition and access, making data-driven decisions remains a significant challenge for people without technical expertise. In this regard, existing tools for data analytics which were designed decades ago still present a high bar for domain experts, and removing this bar requires a fundamental rethinking of both interface and backend.
    At Einblick, an MIT/Brown spin-off based on the Northstar project, we have been building the next generation analytics tool in the last few years. To overcome the shortcomings of existing processing engines, we propose Davos, Einblick's novel backend. Davos combines aspects of progressive computation, approximate query processing and sampling, with a specific focus on supporting user-defined operations. Moreover, Davos optimizes multi-tenant scenarios to promote collaboration. Both empirical evaluation and user study verify that Davos can greatly empower data analytics for new needs.

    References

    [1]
    [n.d.]. Apache Arrow. https://arrow.apache.org/.
    [2]
    [n.d.]. Apache Flink. https://flink.apache.org/.
    [3]
    [n.d.]. Apache Hadoop YARN. https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/YARN.html.
    [4]
    [n.d.]. Apache Storm. https://storm.apache.org/.
    [5]
    [n.d.]. The BF Scheduler. https://en.wikipedia.org/wiki/Brain_Fuck_Scheduler.
    [6]
    [n.d.]. Descriptive analytics 101: What happened? https://www.ibm.com/blogs/business-analytics/descriptive-analytics-101-what-happened/.
    [7]
    [n.d.]. Einblick demo video. https://www.youtube.com/watch?v=4eb_idT4YrM.
    [8]
    [n.d.]. Google Cloud. https://cloud.google.com/.
    [9]
    [n.d.]. Monetdb. https://www.monetdb.org/.
    [10]
    [n.d.]. Pandas. https://pandas.pydata.org/.
    [11]
    Swarup Acharya, Phillip B Gibbons, and Viswanath Poosala. 2000. Congressional samples for approximate answering of group-by queries. In Acm Sigmod Record, Vol. 29. ACM, 487--498.
    [12]
    Swarup Acharya, Phillip B Gibbons, Viswanath Poosala, and Sridhar Ramaswamy. 1999. The Aqua approximate query answering system. In ACM Sigmod Record, Vol. 28. ACM, 574--576.
    [13]
    Sameer Agarwal, Barzan Mozafari, Aurojit Panda, Henry Milner, Samuel Madden, and Ion Stoica. 2013. BlinkDB: queries with bounded errors and bounded response times on very large data. In Proceedings of the 8th ACM European Conference on Computer Systems. ACM, 29--42.
    [14]
    Peter A Boncz, Marcin Zukowski, and Niels Nes. 2005. MonetDB/X100: Hyper-Pipelining Query Execution. In Cidr, Vol. 5. 225--237.
    [15]
    Tyson Condie, Neil Conway, Peter Alvaro, Joseph M Hellerstein, Khaled Elmeleegy, and Russell Sears. 2010. MapReduce online. In Nsdi, Vol. 10. 20.
    [16]
    Andrew Crotty, Alex Galakatos, Kayhan Dursun, Tim Kraska, Carsten Binnig, Ugur Cetintemel, and Stan Zdonik. 2015. An architecture for compiling udfcentric workflows. Proceedings of the VLDB Endowment 8, 12 (2015), 1466--1477.
    [17]
    Andrew Crotty, Alex Galakatos, Emanuel Zgraggen, Carsten Binnig, and Tim Kraska. 2015. Vizdom: interactive analytics through pen and touch. Proceedings of the VLDB Endowment 8, 12 (2015), 2024--2027.
    [18]
    Philipp Eichmann, Emanuel Zgraggen, Carsten Binnig, and Tim Kraska. 2020. Idebench: A benchmark for interactive data exploration. In Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data. 1555--1569.
    [19]
    Franz Färber, Norman May, Wolfgang Lehner, Philipp Große, Ingo Müller, Hannes Rauhe, and Jonathan Dees. 2012. The SAP HANA Database-An Architecture Overview. IEEE Data Eng. Bull. 35, 1 (2012), 28--33.
    [20]
    Michael Hausenblas and Jacques Nadeau. 2013. Apache drill: interactive ad-hoc analysis at scale. Big Data 1, 2 (2013), 100--104.
    [21]
    Joseph M Hellerstein, Ron Avnur, Andy Chou, Christian Hidber, Chris Olston, Vijayshankar Raman, Tali Roth, and Peter J Haas. 1999. Interactive data analysis: The control project. Computer 32, 8 (1999), 51--59.
    [22]
    Joseph M Hellerstein, Peter J Haas, and Helen J Wang. 1997. Online aggregation. In Acm Sigmod Record, Vol. 26. ACM, 171--182.
    [23]
    Chris Jermaine, Subramanian Arumugam, Abhijit Pol, and Alin Dobra. 2008. Scalable approximate query processing with the DBO engine. ACM Transactions on Database Systems (TODS) 33, 4 (2008), 23.
    [24]
    Jaemin Jo, Wonjae Kim, Seunghoon Yoo, Bohyoung Kim, and Jinwook Seo. 2017. SwiftTuna: Responsive and incremental visual exploration of large-scale multidimensional data. In 2017 IEEE Pacific Visualization Symposium (PacificVis). IEEE, 131--140.
    [25]
    Niranjan Kamat, Prasanth Jayachandran, Karthik Tunga, and Arnab Nandi. 2014. Distributed and interactive cube exploration. In 2014 IEEE 30th International Conference on Data Engineering. IEEE, 472--483.
    [26]
    Zhicheng Liu and Jeffrey Heer. 2014. The effects of interactive latency on exploratory visual analysis. IEEE transactions on visualization and computer graphics 20, 12 (2014), 2122--2131.
    [27]
    Sergey Melnik, Andrey Gubarev, Jing Jing Long, Geoffrey Romer, Shiva Shivakumar, Matt Tolton, and Theo Vassilakis. 2010. Dremel: interactive analysis of web-scale datasets. Proceedings of the VLDB Endowment 3, 1--2 (2010), 330--339.
    [28]
    Jinglin Peng, Dongxiang Zhang, Jiannan Wang, and Jian Pei. 2018. AQP++: connecting approximate query processing with aggregate precomputation for interactive analytics. In Proceedings of the 2018 International Conference on Management of Data. ACM, 1477--1492.
    [29]
    Raghav Sethi, Martin Traverso, Dain Sundstrom, David Phillips, Wenlei Xie, Yutian Sun, Nezih Yegitbasi, Haozhun Jin, Eric Hwang, Nileema Shingte, et al. 2019. Presto: SQL on Everything. In 2019 IEEE 35th International Conference on Data Engineering (ICDE). IEEE, 1802--1813.
    [30]
    Fangjin Yang, Eric Tschetter, Xavier Léauté, Nelson Ray, Gian Merlino, and Deep Ganguli. 2014. Druid: A real-time analytical data store. In Proceedings of the 2014 ACM SIGMOD international conference on Management of data. ACM, 157--168.
    [31]
    Matei Zaharia, Tathagata Das, Haoyuan Li, Timothy Hunter, Scott Shenker, and Ion Stoica. 2013. Discretized streams: Fault-tolerant streaming computation at scale. In Proceedings of the twenty-fourth ACM symposium on operating systems principles. ACM, 423--438.
    [32]
    Kai Zeng, Sameer Agarwal, Ankur Dave, Michael Armbrust, and Ion Stoica. 2015. G-ola: Generalized on-line aggregation for interactive analysis on big data. In Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data. ACM, 913--918.

    Cited By

    View all
    • (2023)VeLP: Vehicle Loading Plan Learning from Human Behavior in Nationwide Logistics SystemProceedings of the VLDB Endowment10.14778/3626292.362630517:2(241-249)Online publication date: 1-Oct-2023
    • (2023)Building a Collaborative Data Analytics System: Opportunities and ChallengesProceedings of the VLDB Endowment10.14778/3611540.361158016:12(3898-3901)Online publication date: 1-Aug-2023
    • (2022)Guided Text-based Item ExplorationProceedings of the 31st ACM International Conference on Information & Knowledge Management10.1145/3511808.3557141(3410-3420)Online publication date: 17-Oct-2022

    Index Terms

    1. Davos: a system for interactive data-driven decision making
          Index terms have been assigned to the content through auto-classification.

          Comments

          Information & Contributors

          Information

          Published In

          cover image Proceedings of the VLDB Endowment
          Proceedings of the VLDB Endowment  Volume 14, Issue 12
          July 2021
          587 pages
          ISSN:2150-8097
          Issue’s Table of Contents

          Publisher

          VLDB Endowment

          Publication History

          Published: 01 July 2021
          Published in PVLDB Volume 14, Issue 12

          Qualifiers

          • Research-article

          Contributors

          Other Metrics

          Bibliometrics & Citations

          Bibliometrics

          Article Metrics

          • Downloads (Last 12 months)30
          • Downloads (Last 6 weeks)1

          Other Metrics

          Citations

          Cited By

          View all
          • (2023)VeLP: Vehicle Loading Plan Learning from Human Behavior in Nationwide Logistics SystemProceedings of the VLDB Endowment10.14778/3626292.362630517:2(241-249)Online publication date: 1-Oct-2023
          • (2023)Building a Collaborative Data Analytics System: Opportunities and ChallengesProceedings of the VLDB Endowment10.14778/3611540.361158016:12(3898-3901)Online publication date: 1-Aug-2023
          • (2022)Guided Text-based Item ExplorationProceedings of the 31st ACM International Conference on Information & Knowledge Management10.1145/3511808.3557141(3410-3420)Online publication date: 17-Oct-2022

          View Options

          Get Access

          Login options

          Full Access

          View options

          PDF

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader

          Media

          Figures

          Other

          Tables

          Share

          Share

          Share this Publication link

          Share on social media