Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Building a Collaborative Data Analytics System: Opportunities and Challenges

Published: 01 August 2023 Publication History

Abstract

Real-time collaboration has become increasingly important in various applications, from document creation to data analytics. Although collaboration features are prevalent in editing applications, they remain rare in data-analytics applications, where the need for collaboration is even more crucial. This tutorial aims to provide attendees with a comprehensive understanding of the challenges and design decisions associated with supporting real-time collaboration and user interactions in data analytics systems. We will discuss popular conflict resolution technologies, the unique challenges of facilitating collaborative experiences during the workflow construction and execution phases, and the complexities of supporting responsive user interactions during job execution.

References

[1]
[n.d.]. Data Science and Analytics Automation Platform | Alteryx --- alteryx.com. https://www.alteryx.com/. [Accessed 17-Apr-2023].
[2]
[n.d.]. Deepnote: Analytics and data science notebook for teams. --- deepnote.com. https://deepnote.com/. [Accessed 17-Apr-2023].
[3]
[n.d.]. GitHub - microsoft/FluidFramework: Library for building distributed, realtime collaborative web applications --- github.com. https://github.com/microsoft/FluidFramework. [Accessed 18-Apr-2023].
[4]
[n.d.]. GitHub - share/sharedb: Realtime database backend based on Operational Transformation (OT) --- github.com. https://github.com/share/sharedb. [Accessed 18-Apr-2023].
[5]
[n.d.]. Google Colab --- research.google.com. https://research.google.com/colaboratory/faq.html. [Accessed 17-Apr-2023].
[6]
[n.d.]. Introduction to Databricks notebooks | Databricks on AWS --- docs.databricks.com. https://docs.databricks.com/notebooks/index.html. [Accessed 17-Apr-2023].
[7]
[n.d.]. Notebook feature requests · Issue 977 · ipython/ipython --- github.com. https://github.com/ipython/ipython/issues/977#issuecomment-5559489. [Accessed 17-Apr-2023].
[8]
[n.d.]. Open for Innovation | KNIME --- knime.com. https://www.knime.com/. [Accessed 17-Apr-2023].
[9]
[n.d.]. Project Jupyter --- jupyter.org. https://jupyter.org/. [Accessed 18-Apr-2023].
[10]
[n.d.]. RapidMiner | Amplify the Impact of Your People, Expertise Data --- rapidminer.com. https://rapidminer.com/. [Accessed 17-Apr-2023].
[11]
Paris Carbone, Asterios Katsifodimos, Stephan Ewen, Volker Markl, Seif Haridi, and Kostas Tzoumas. 2015. Apache Flink™: Stream and Batch Processing in a Single Engine. IEEE Data Eng. Bull. 38, 4 (2015), 28--38. http://sites.computer.org/debull/A15dec/p28.pdf
[12]
@EinblickAI. [n.d.]. Multiplayer Python Notebooks on an Interactive Canvas --- einblick.ai. https://www.einblick.ai/. [Accessed 17-Apr-2023].
[13]
Kevin Jahns. [n.d.]. How we made Jupyter Notebooks collaborative with Yjs --- blog.jupyter.org. https://blog.jupyter.org/how-we-made-jupyter-notebooks-collaborative-with-yjs-b8dff6a9d8af. [Accessed 17-Apr-2023].
[14]
Xiaozhen Liu, Zuozhi Wang, Shengquan Ni, Sadeem Alsudais, Yicong Huang, Avinash Kumar, and Chen Li. 2022. Demonstration of Collaborative and Interactive Workflow-Based Data Analytics in Texera. Proc. VLDB Endow. 15, 12 (2022), 3738--3741. https://www.vldb.org/pvldb/vol15/p3738-liu.pdf
[15]
Petru Nicolaescu, Kevin Jahns, Michael Derntl, and Ralf Klamma. 2015. Yjs: A Framework for Near Real-Time P2P Shared Editing on Arbitrary Data Types. In Engineering the Web in the Big Data Era - 15th International Conference, ICWE 2015, Rotterdam, The Netherlands, June 23--26, 2015, Proceedings (Lecture Notes in Computer Science), Philipp Cimiano, Flavius Frasincar, Geert-Jan Houben, and Daniel Schwabe (Eds.), Vol. 9114. Springer, 675--678.
[16]
Nuno M. Preguiça, Carlos Baquero, and Marc Shapiro. 2019. Conflict-Free Replicated Data Types CRDTs. In Encyclopedia of Big Data Technologies, Sherif Sakr and Albert Y. Zomaya (Eds.). Springer.
[17]
Matthew Rocklin. 2015. Dask: Parallel Computation with Blocked algorithms and Task Scheduling. In Proceedings of the 14th Python in Science Conference 2015 (SciPy 2015), Austin, Texas, July 6 - 12, 2015, Kathryn Huff and James Bergstra (Eds.). scipy.org, 126--132.
[18]
Kumawat Santosh and Ajay Khunteta. 2010. A Survey on Operational Transformation Algorithms: Challenges, Issues and Achievements. International Journal of Computer Applications 3 (07 2010).
[19]
Zeyuan Shang, Emanuel Zgraggen, Benedetto Buratti, Philipp Eichmann, Navid Karimeddiny, Charlie Meyer, Wesley Runnels, and Tim Kraska. 2021. Davos: A System for Interactive Data-Driven Decision Making. Proc. VLDB Endow. 14, 12 (2021), 2893--2905.
[20]
Peng Yin, Haowen Lai, Shiqi Zhao, Ruijie Fu, Ivan Cisneros, Ruohai Ge, Ji Zhang, Howie Choset, and Sebastian A. Scherer. 2022. AutoMerge: A Framework for Map Assembling and Smoothing in City-scale Environments. CoRR abs/2207.06965 (2022). arXiv:2207.06965
[21]
Matei Zaharia, Mosharaf Chowdhury, Tathagata Das, Ankur Dave, Justin Ma, Murphy McCauly, Michael J. Franklin, Scott Shenker, and Ion Stoica. 2012. Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing. In Proceedings of the 9th USENIX Symposium on Networked Systems Design and Implementation, NSDI 2012, San Jose, CA, USA, April 25--27, 2012, Steven D. Gribble and Dina Katabi (Eds.). USENIX Association, 15--28. https://www.usenix.org/conference/nsdi12/technical-sessions/presentation/zaharia
[22]
Matei Zaharia, Mosharaf Chowdhury, Michael J. Franklin, Scott Shenker, and Ion Stoica. 2010. Spark: Cluster Computing with Working Sets. In 2nd USENIX Workshop on Hot Topics in Cloud Computing, HotCloud'10, Boston, MA, USA, June 22, 2010, Erich M. Nahum and Dongyan Xu (Eds.). USENIX Association. https://www.usenix.org/conference/hotcloud -cluster-computing-working-sets

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Proceedings of the VLDB Endowment
Proceedings of the VLDB Endowment  Volume 16, Issue 12
August 2023
685 pages
ISSN:2150-8097
Issue’s Table of Contents

Publisher

VLDB Endowment

Publication History

Published: 01 August 2023
Published in PVLDB Volume 16, Issue 12

Check for updates

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 49
    Total Downloads
  • Downloads (Last 12 months)48
  • Downloads (Last 6 weeks)1
Reflects downloads up to 15 Oct 2024

Other Metrics

Citations

View Options

Get Access

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media