Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article
Open access

Optimizing Dataflow Systems for Scalable Interactive Visualization

Published: 26 March 2024 Publication History
  • Get Citation Alerts
  • Abstract

    Supporting the interactive exploration of large datasets is a popular and challenging use case for data management systems. Traditionally, the interface and the back-end system are built and optimized separately, and interface design and system optimization require different skill sets that are difficult for one person to master. To enable analysts to focus on visualization design, we contribute VegaPlus, a system that automatically optimizes interactive dashboards to support large datasets. To achieve this, VegaPlus leverages two core ideas. First, we introduce an optimizer that can reason about execution plans in Vega, a back-end DBMS, or a mix of both environments. The optimizer also considers how user interactions may alter execution plan performance, and can partially or fully rewrite the plans when needed. Through a series of benchmark experiments on seven different dashboard designs, our results show that VegaPlus provides superior performance and versatility compared to standard dashboard optimization techniques.

    References

    [1]
    Leilani Battle, Remco Chang, and Michael Stonebraker. 2016. Dynamic Prefetching of Data Tiles for Interactive Visualization. In Proceedings of the 2016 International Conference on Management of Data (SIGMOD '16). ACM, New York, NY, USA, 1363--1375. https://doi.org/10.1145/2882903.2882919
    [2]
    Leilani Battle, Philipp Eichmann, Marco Angelini, Tiziana Catarci, Giuseppe Santucci, Yukun Zheng, Carsten Binnig, Jean-Daniel Fekete, and Dominik Moritz. 2020a. Database Benchmarking for Supporting Real-Time Interactive Querying of Large Data. In Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data (Portland, OR, USA) (SIGMOD '20). Association for Computing Machinery, New York, NY, USA, 1571--1587. https://doi.org/10.1145/3318464.3389732
    [3]
    Leilani Battle, Philipp Eichmann, Marco Angelini, Tiziana Catarci, Giuseppe Santucci, Yukun Zheng, Carsten Binnig, Jean-Daniel Fekete, and Dominik Moritz. 2020b. Database Benchmarking for Supporting Real-Time Interactive Querying of Large Data. In Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data (SIGMOD '20). Association for Computing Machinery, New York, NY, USA, 1571--1587. https://doi.org/10.1145/3318464.3389732
    [4]
    Leilani Battle and Jeffrey Heer. 2019. Characterizing Exploratory Visual Analysis: A Literature Review and Evaluation of Analytic Provenance in Tableau. Computer Graphics Forum, Vol. 38, 3 (2019), 145--159. https://doi.org/10.1111/cgf.13678 _eprint: https://onlinelibrary.wiley.com/doi/pdf/10.1111/cgf.13678.
    [5]
    Leilani Battle and Carlos Scheidegger. 2020. A Structured Review of Data Management Technology for Interactive Visualization and Analysis. IEEE Transactions on Visualization and Computer Graphics (2020).
    [6]
    L. Bavoil, S.P. Callahan, P.J. Crossno, J. Freire, C.E. Scheidegger, C.T. Silva, and H.T. Vo. 2005. VisTrails: enabling interactive multiple-view visualizations. In VIS 05. IEEE Visualization, 2005. 135--142. https://doi.org/10.1109/VISUAL.2005.1532788 ISSN: null.
    [7]
    Michael Bostock, Vadim Ogievetsky, and Jeffrey Heer. 2011. D³ Data-Driven Documents. IEEE Transactions on Visualization and Computer Graphics, Vol. 17, 12 (Dec. 2011), 2301--2309. https://doi.org/10.1109/TVCG.2011.185 Conference Name: IEEE Transactions on Visualization and Computer Graphics.
    [8]
    Steven P. Callahan, Juliana Freire, Emanuele Santos, Carlos E. Scheidegger, Cláudio T. Silva, and Huy T. Vo. 2006. VisTrails: visualization meets data management. In Proceedings of the 2006 ACM SIGMOD international conference on Management of data (SIGMOD '06). Association for Computing Machinery, Chicago, IL, USA, 745--747. https://doi.org/10.1145/1142473.1142574
    [9]
    Mackinlay Card. 1999. Readings in information visualization: using vision to think. Morgan Kaufmann.
    [10]
    Sye-Min Chan, Ling Xiao, John Gerth, and Pat Hanrahan. 2008. Maintaining interactivity while exploring massive time series. In 2008 IEEE Symposium on Visual Analytics Science and Technology. 59--66. https://doi.org/10.1109/VAST.2008.4677357 ISSN: null.
    [11]
    Sirish Chandrasekaran, Owen Cooper, Amol Deshpande, Michael J. Franklin, Joseph M. Hellerstein, Wei Hong, Sailesh Krishnamurthy, Samuel R. Madden, Fred Reiss, and Mehul A. Shah. 2003. TelegraphCQ: continuous dataflow processing. In Proceedings of the 2003 ACM SIGMOD international conference on Management of data (SIGMOD '03). Association for Computing Machinery, San Diego, California, 668. https://doi.org/10.1145/872757.872857
    [12]
    Kyriaki Dimitriadou, Olga Papaemmanouil, and Yanlei Diao. 2014. Explore-by-example: an automatic query steering framework for interactive data exploration. In Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data (SIGMOD '14). Association for Computing Machinery, Snowbird, Utah, USA, 517--528. https://doi.org/10.1145/2588555.2610523
    [13]
    Bailu Ding, Sudipto Das, Ryan Marcus, Wentao Wu, Surajit Chaudhuri, and Vivek R. Narasayya. 2019. AI Meets AI: Leveraging Query Executions to Improve Index Recommendations. In Proceedings of the 2019 International Conference on Management of Data (Amsterdam, Netherlands) (SIGMOD '19). Association for Computing Machinery, New York, NY, USA, 1241--1258. https://doi.org/10.1145/3299869.3324957
    [14]
    Harish Doraiswamy, Huy T. Vo, Cláudio T. Silva, and Juliana Freire. 2016. A GPU-based index to support interactive spatio-temporal queries over historical data. In 2016 IEEE 32nd International Conference on Data Engineering (ICDE). 1086--1097. https://doi.org/10.1109/ICDE.2016.7498315 ISSN: null.
    [15]
    Philipp Eichmann, Emanuel Zgraggen, Carsten Binnig, and Tim Kraska. 2020. IDEBench: A Benchmark for Interactive Data Exploration. In Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data (SIGMOD '20). Association for Computing Machinery, New York, NY, USA, 1555--1569. https://doi.org/10.1145/3318464.3380574
    [16]
    Ahmed Eldawy, Mohamed F. Mokbel, Saif Alharthi, Abdulhadi Alzaidy, Kareem Tarek, and Sohaib Ghani. 2015. SHAHED: A MapReduce-based system for querying and visualizing spatio-temporal satellite data. In 2015 IEEE 31st International Conference on Data Engineering. 1585--1596. https://doi.org/10.1109/ICDE.2015.7113427 ISSN: 2375-026X.
    [17]
    Nivan Ferreira, Jorge Poco, Huy T. Vo, Juliana Freire, and Cláudio T. Silva. 2013. Visual Exploration of Big Spatio-Temporal Urban Data: A Study of New York City Taxi Trips. IEEE Transactions on Visualization and Computer Graphics, Vol. 19, 12 (Dec. 2013), 2149--2158. https://doi.org/10.1109/TVCG.2013.226
    [18]
    Yifan Fu, Xingquan Zhu, and Bin Li. 2012. A survey on instance selection for active learning. Knowledge and Information Systems, Vol. 35 (2012), 249--283. https://api.semanticscholar.org/CorpusID:5009954
    [19]
    Alex Galakatos, Andrew Crotty, Emanuel Zgraggen, Carsten Binnig, and Tim Kraska. 2017. Revisiting Reuse for Approximate Query Processing. Proc. VLDB Endow., Vol. 10, 10 (June 2017), 1142--1153. https://doi.org/10.14778/3115404.3115418
    [20]
    Jon Gjengset, Malte Schwarzkopf, Jonathan Behrens, Lara Timbó Araújo, Martin Ek, Eddie Kohler, M. Frans Kaashoek, and Robert Morris. 2018. Noria: dynamic, partially-stateful data-flow for high-performance web applications. 213--231. https://www.usenix.org/conference/osdi18/presentation/gjengset
    [21]
    R. Herbrich, T. Graepel, and K. Obermayer. 1999. Support vector learning for ordinal regression. In 1999 Ninth International Conference on Artificial Neural Networks ICANN 99. (Conf. Publ. No. 470), Vol. 1. 97--102 vol.1. https://doi.org/10.1049/cp:19991091
    [22]
    Stratos Idreos, Martin Kersten, and Stefan Manegold. 2007. Database Cracking. In CIDR, Vol. 7. 68--78.
    [23]
    Plotly Technologies Inc. 2015. Collaborative data science. https://plot.ly. https://plot.ly
    [24]
    Quansight Inc. [n.,d.]. ibis-vega-transform. https://github.com/Quansight/ibis-vega-transform
    [25]
    Zoi Kaoudi, Jorge-Arnulfo Quiané-Ruiz, Bertty Contreras-Rojas, Rodrigo Pardo-Meza, Anis Troudi, and Sanjay Chawla. 2020. ML-based Cross-Platform Query Optimization. In 2020 IEEE 36th International Conference on Data Engineering (ICDE). 1489--1500. https://doi.org/10.1109/ICDE48307.2020.00132
    [26]
    Nicolas Kruchten, Jon Mease, and Dominik Moritz. 2022. VegaFusion: Automatic Server-Side Scaling for Interactive Vega Visualizations. In 2022 IEEE Visualization and Visual Analytics (VIS). 11--15. https://doi.org/10.1109/VIS54862.2022.00011
    [27]
    Jianping Kelvin Li and Kwan-Liu Ma. 2020a. P4: Portable Parallel Processing Pipelines for Interactive Information Visualization. IEEE Transactions on Visualization and Computer Graphics, Vol. 26, 3 (March 2020), 1548--1561. https://doi.org/10.1109/TVCG.2018.2871139 Conference Name: IEEE Transactions on Visualization and Computer Graphics.
    [28]
    Jianping Kelvin Li and Kwan-Liu Ma. 2020b. P6: A Declarative Language for Integrating Machine Learning in Visual Analytics. arXiv:2009.01399 [cs] (Sept. 2020). http://arxiv.org/abs/2009.01399 arXiv: 2009.01399.
    [29]
    Henry Liu, Mingbin Xu, Ziting Yu, Vincent Corvinelli, and Calisto Zuzarte. 2015. Cardinality Estimation Using Neural Networks. In Proceedings of the 25th Annual International Conference on Computer Science and Software Engineering (Markham, Canada) (CASCON '15). IBM Corp., USA, 53--59.
    [30]
    Zhicheng Liu, Biye Jiang, and Jeffrey Heer. 2013. imMens: Real-time visual querying of big data. In Computer graphics forum, Vol. 32. Wiley Online Library, 421--430.
    [31]
    Mohammad Sultan Mahmud, Joshua Zhexue Huang, Salman Salloum, Tamer Z. Emara, and Kuanishbay Sadatdiynov. 2020. A survey of data partitioning and sampling methods to support big data analysis. Big Data Mining and Analytics, Vol. 3, 2 (June 2020), 85--101. https://doi.org/10.26599/BDMA.2019.9020015 Conference Name: Big Data Mining and Analytics.
    [32]
    Ryan Marcus, Parimarjan Negi, Hongzi Mao, Chi Zhang, Mohammad Alizadeh, Tim Kraska, Olga Papaemmanouil, and Nesime Tatbul. 2019. Neo. Proceedings of the VLDB Endowment, Vol. 12, 11 (jul 2019), 1705--1718. https://doi.org/10.14778/3342263.3342644
    [33]
    Ryan Marcus and Olga Papaemmanouil. 2019. Plan-Structured Deep Neural Network Models for Query Performance Prediction. Proc. VLDB Endow., Vol. 12, 11 (jul 2019), 1733--1746. https://doi.org/10.14778/3342263.3342646
    [34]
    Ben McCamish, Vahid Ghadakchi, Arash Termehchy, Behrouz Touri, and Liang Huang. 2018. The Data Interaction Game. In Proceedings of the 2018 International Conference on Management of Data (SIGMOD '18). Association for Computing Machinery, Houston, TX, USA, 83--98. https://doi.org/10.1145/3183713.3196899
    [35]
    Haneen Mohammed. 2020. Continuous Prefetch for Interactive Data Applications. In Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data (SIGMOD '20). Association for Computing Machinery, New York, NY, USA, 2841--2843. https://doi.org/10.1145/3318464.3384405
    [36]
    Dominik Moritz, Jeff Heer, and Bill Howe. 2015. Dynamic Client-Server Optimization for Scalable Interactive Visualization on the Web. In Workshop on Data Systems for Interactive Analysis (DSIA) at IEEE VIS 2015 (Chicago, IL).
    [37]
    Dominik Moritz, Bill Howe, and Jeffrey Heer. 2019. Falcon: Balancing Interactive Latency and Resolution Sensitivity for Scalable Linked Visualizations. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems (CHI '19). Association for Computing Machinery, Glasgow, Scotland Uk, 1--11. https://doi.org/10.1145/3290605.3300924
    [38]
    Derek G. Murray, Frank McSherry, Rebecca Isaacs, Michael Isard, Paul Barham, and Martín Abadi. 2013. Naiad: a timely dataflow system. In Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles (SOSP '13). Association for Computing Machinery, New York, NY, USA, 439--455. https://doi.org/10.1145/2517349.2522738
    [39]
    Derek G. Murray, Frank McSherry, Michael Isard, Rebecca Isaacs, Paul Barham, and Martin Abadi. 2016. Incremental, iterative data processing with timely dataflow. Commun. ACM, Vol. 59, 10 (Sept. 2016), 75--83. https://doi.org/10.1145/2983551
    [40]
    Bureau of Transportation Statistics. [n.,d.]. http://web.archive.org/web/20080207010024/http://www.808multimedia.com/winnt/kernel.htm. Accessed: 2010-09--30.
    [41]
    Alvitta Ottley, Roman Garnett, and Ran Wan. 2019. Follow The Clicks: Learning and Anticipating Mouse Interactions During Exploratory Data Analysis. Computer Graphics Forum, Vol. 38, 3 (2019), 41--52. https://doi.org/10.1111/cgf.13670
    [42]
    Mirjana Pavlovic, Eleni Tzirita Zacharatou, Darius Sidlauskas, Thomas Heinis, and Anastasia Ailamaki. 2016. Space odyssey: efficient exploration of scientific data. In Proceedings of the Third International Workshop on Exploratory Search in Databases and the Web (ExploreDB '16). Association for Computing Machinery, New York, NY, USA, 12--18. https://doi.org/10.1145/2948674.2948677
    [43]
    PostgreSQL. 2019. Postgresql: The world's most advanced open source relational database. https://www.postgresql.org/.
    [44]
    Fotis Psallidas and Eugene Wu. 2018. Smoke: fine-grained lineage at interactive speed. Proceedings of the VLDB Endowment, Vol. 11, 6 (Feb. 2018), 719--732. https://doi.org/10.14778/3199517.3199522
    [45]
    Mark Raasveldt and Hannes Mühleisen. 2019. DuckDB: An Embeddable Analytical Database. In Proceedings of the 2019 International Conference on Management of Data (Amsterdam, Netherlands) (SIGMOD '19). Association for Computing Machinery, New York, NY, USA, 1981--1984. https://doi.org/10.1145/3299869.3320212
    [46]
    Donghao Ren, Bongshin Lee, and Tobias Höllerer. 2017. Stardust: Accessible and Transparent GPU Support for Information Visualization Rendering. Computer Graphics Forum, Vol. 36, 3 (2017), 179--188. https://doi.org/10.1111/cgf.13178
    [47]
    Neal Richardson, Ian Cook, Nic Crane, Dewey Dunnington, Romain François, Jonathan Keane, Drago? Moldovan-Grünfeld, Jeroen Ooms, and Apache Arrow. 2023. arrow: Integration to 'Apache' 'Arrow'. https://github.com/apache/arrow/, https://arrow.apache.org/docs/r/.
    [48]
    Arvind Satyanarayan, Dominik Moritz, Kanit Wongsuphasawat, and Jeffrey Heer. 2017. Vega-Lite: A Grammar of Interactive Graphics. IEEE Transactions on Visualization and Computer Graphics, Vol. 23, 1 (Jan. 2017), 341--350. https://doi.org/10.1109/TVCG.2016.2599030 Conference Name: IEEE Transactions on Visualization and Computer Graphics.
    [49]
    Arvind Satyanarayan, Ryan Russell, Jane Hoffswell, and Jeffrey Heer. 2016. Reactive Vega: A Streaming Dataflow Architecture for Declarative Interactive Visualization. IEEE Transactions on Visualization and Computer Graphics, Vol. 22, 1 (Jan. 2016), 659--668. https://doi.org/10.1109/TVCG.2015.2467091 Conference Name: IEEE Transactions on Visualization and Computer Graphics.
    [50]
    W.J. Schroeder, K.M. Martin, and W.E. Lorensen. 1996. The design and implementation of an object-oriented toolkit for 3D graphics and visualization. In Proceedings of Seventh Annual IEEE Visualization (1996).
    [51]
    Anil Shanbhag, Alekh Jindal, Samuel Madden, Jorge Quiane, and Aaron J. Elmore. 2017. A robust partitioning scheme for ad-hoc query workloads. In Proceedings of the 2017 Symposium on Cloud Computing. Association for Computing Machinery, New York, NY, USA, 229--241. https://doi.org/10.1145/3127479.3131613
    [52]
    Tarique Siddiqui, Alekh Jindal, Shi Qiao, Hiren Patel, and Wangchao Le. 2020. Cost Models for Big Data Query Processing: Learning, Retrofitting, and Our Findings. CoRR, Vol. abs/2002.12393 (2020). showeprint[arXiv]2002.12393 https://arxiv.org/abs/2002.12393
    [53]
    Tarique Siddiqui, Albert Kim, John Lee, Karrie Karahalios, and Aditya Parameswaran. 2016. Effortless data exploration with zenvisage: an expressive and interactive visual analytics system. Proceedings of the VLDB Endowment, Vol. 10, 4 (Nov. 2016), 457--468. https://doi.org/10.14778/3025111.3025126
    [54]
    Manish Singh, Arnab Nandi, and H. V. Jagadish. 2012. Skimmer: rapid scrolling of relational query results. In Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data (SIGMOD '12). Association for Computing Machinery, Scottsdale, Arizona, USA, 181--192. https://doi.org/10.1145/2213836.2213858
    [55]
    Observable standard library. [n.,d.]. https://github.com/observablehq/stdlib.
    [56]
    C. Stolte, D. Tang, and P. Hanrahan. 2002. Polaris: a system for query, analysis, and visualization of multidimensional relational databases. IEEE Transactions on Visualization and Computer Graphics, Vol. 8, 1 (Jan. 2002), 52--65. https://doi.org/10.1109/2945.981851 Conference Name: IEEE Transactions on Visualization and Computer Graphics.
    [57]
    Wenbo Tao, Xiaoyu Liu, Remco Chang, and Michael Stonebraker. 2019a. Kyrix: Interactive Visual Data Exploration at Scale. arXiv:1905.04638 [cs] (May 2019). http://arxiv.org/abs/1905.04638 arXiv: 1905.04638.
    [58]
    Wenbo Tao, Xiaoyu Liu, Yedi Wang, Leilani Battle, Remco Chang, and Michael Stonebraker. 2019b. Kyrix: Interactive Pan/Zoom Visualizations at Scale. Computer Graphics Forum, Vol. 38, 3 (2019), 529--540. https://doi.org/10.1111/cgf.13708 _eprint: https://onlinelibrary.wiley.com/doi/pdf/10.1111/cgf.13708.
    [59]
    Pawel Terlecki, Fei Xu, Marianne Shaw, Valeri Kim, and Richard Wesley. 2015. On Improving User Response Times in Tableau. In Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data (SIGMOD '15). ACM, New York, NY, USA, 1695--1706. https://doi.org/10.1145/2723372.2742799
    [60]
    Edward R Tufte. 1985. The visual display of quantitative information. The Journal for Healthcare Quality (JHQ), Vol. 7, 3 (1985), 15.
    [61]
    Shivaram Venkataraman, Zongheng Yang, Michael Franklin, Benjamin Recht, and Ion Stoica. 2016. Ernest: Efficient Performance Prediction for Large-Scale Advanced Analytics. In 13th USENIX Symposium on Networked Systems Design and Implementation (NSDI 16). USENIX Association, Santa Clara, CA, 363--378. https://www.usenix.org/conference/nsdi16/technical-sessions/presentation/venkataraman
    [62]
    Francesco Ventura, Zoi Kaoudi, Jorge-Arnulfo Quiané-Ruiz, and Volker Markl. 2021. Expand your Training Limits! Generating Training Data for ML-based Data Management. Proceedings of the 2021 International Conference on Management of Data (2021). https://api.semanticscholar.org/CorpusID:235473953
    [63]
    Kanit Wongsuphasawat, Zening Qu, Dominik Moritz, Riley Chang, Felix Ouk, Anushka Anand, Jock Mackinlay, Bill Howe, and Jeffrey Heer. 2017. Voyager 2: Augmenting Visual Analysis with Partial View Specifications. In ACM Human Factors in Computing Systems (CHI). http://idl.cs.washington.edu/papers/voyager2
    [64]
    Rong Zhu, Wei Chen, Bolin Ding, Xingguang Chen, Andreas Pfadler, Ziniu Wu, and Jingren Zhou. 2023. Lero: A Learning-to-Rank Query Optimizer.
    [65]
    Kostas Zoumpatianos, Stratos Idreos, and Themis Palpanas. 2014. Indexing for interactive exploration of big data series. In Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data (SIGMOD '14). Association for Computing Machinery, Snowbird, Utah, USA, 1555--1566. https://doi.org/10.1145/2588555.2610498

    Cited By

    View all
    • (2024)Assessing the landscape of toolkits, frameworks, and authoring tools for urban visual analytics systemsComputers & Graphics10.1016/j.cag.2024.104013(104013)Online publication date: Jul-2024

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image Proceedings of the ACM on Management of Data
    Proceedings of the ACM on Management of Data  Volume 2, Issue 1
    SIGMOD
    February 2024
    1874 pages
    EISSN:2836-6573
    DOI:10.1145/3654807
    Issue’s Table of Contents
    This work is licensed under a Creative Commons Attribution International 4.0 License.

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 26 March 2024
    Published in PACMMOD Volume 2, Issue 1

    Author Tags

    1. data analytics
    2. scalable visualization

    Qualifiers

    • Research-article

    Funding Sources

    • NSF

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)160
    • Downloads (Last 6 weeks)47
    Reflects downloads up to 27 Jul 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Assessing the landscape of toolkits, frameworks, and authoring tools for urban visual analytics systemsComputers & Graphics10.1016/j.cag.2024.104013(104013)Online publication date: Jul-2024

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Get Access

    Login options

    Full Access

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media