Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3299869.3320246acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article
Public Access

Data Debugging and Exploration with Vizier

Published: 25 June 2019 Publication History

Abstract

We present Vizier, a multi-modal data exploration and debugging tool. The system supports a wide range of operations by seamlessly integrating Python, SQL, and automated data curation and debugging methods. Using Spark as an execution backend, Vizier handles large datasets in multiple formats. Ease-of-use is attained through integration of a notebook with a spreadsheet-style interface and with visualizations that guide and support the user in the loop. In addition, native support for provenance and versioning enable collaboration and uncertainty management. In this demonstration we will illustrate the diverse features of the system using several realistic data science tasks based on real data.

References

[1]
B. S. Arab, S. Feng, B. Glavic, S. Lee, X. Niu, and Q. Zeng. 2018. GProM - A Swiss Army Knife for Your Provenance Needs. IEEE Data Eng. Bull. 41, 1 (2018), 51--62.
[2]
M. Bendre, V. Venkataraman, X. Zhou, K. Chang, and A. G. Parameswaran. 2018. Towards a Holistic Integration of Spreadsheets with Databases: A Scalable Storage Engine for Presentational Data Management. In ICDE.
[3]
S. P. Callahan, J. Freire, E. Santos, C. Eduardo Scheidegger, Cláudio T. Silva, and Huy T. Vo. 2006. VisTrails: visualization meets data management. In SIGMOD.
[4]
J. Freire, B. Glavic, O. Kennedy, and H. Mueller. 2016. The exception that improves the rule. In HILDA.
[5]
D. Koop and J. Patel. 2017. Dataflow Notebooks: Encoding and Tracking Dependencies of Cells. In TaPP.
[6]
Y. Yang, N. Meneghetti, R. Fehling, Z. Hua Liu, and O. Kennedy. 2015. Lenses: An On-Demand Approach to ETL. PVLDB 8, 12 (2015), 1578-- 1589.

Cited By

View all
  • (2024)"We Have No Idea How Models will Behave in Production until Production": How Engineers Operationalize Machine LearningProceedings of the ACM on Human-Computer Interaction10.1145/36536978:CSCW1(1-34)Online publication date: 26-Apr-2024
  • (2022)DataPrism: Exposing Disconnect between Data and SystemsProceedings of the 2022 International Conference on Management of Data10.1145/3514221.3517864(217-231)Online publication date: 10-Jun-2022
  • (2022)VizLinter: A Linter and Fixer Framework for Data VisualizationIEEE Transactions on Visualization and Computer Graphics10.1109/TVCG.2021.311480428:1(206-216)Online publication date: 1-Jan-2022
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SIGMOD '19: Proceedings of the 2019 International Conference on Management of Data
June 2019
2106 pages
ISBN:9781450356435
DOI:10.1145/3299869
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 25 June 2019

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. data cleaning
  2. data curation
  3. data debugging
  4. data integration
  5. data on-boarding
  6. notebooks
  7. provenance
  8. spreadsheets
  9. uncertainty
  10. workflows

Qualifiers

  • Research-article

Funding Sources

Conference

SIGMOD/PODS '19
Sponsor:
SIGMOD/PODS '19: International Conference on Management of Data
June 30 - July 5, 2019
Amsterdam, Netherlands

Acceptance Rates

SIGMOD '19 Paper Acceptance Rate 88 of 430 submissions, 20%;
Overall Acceptance Rate 785 of 4,003 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)141
  • Downloads (Last 6 weeks)20
Reflects downloads up to 22 Sep 2024

Other Metrics

Citations

Cited By

View all
  • (2024)"We Have No Idea How Models will Behave in Production until Production": How Engineers Operationalize Machine LearningProceedings of the ACM on Human-Computer Interaction10.1145/36536978:CSCW1(1-34)Online publication date: 26-Apr-2024
  • (2022)DataPrism: Exposing Disconnect between Data and SystemsProceedings of the 2022 International Conference on Management of Data10.1145/3514221.3517864(217-231)Online publication date: 10-Jun-2022
  • (2022)VizLinter: A Linter and Fixer Framework for Data VisualizationIEEE Transactions on Visualization and Computer Graphics10.1109/TVCG.2021.311480428:1(206-216)Online publication date: 1-Jan-2022
  • (2022)Data distribution debugging in machine learning pipelinesThe VLDB Journal10.1007/s00778-021-00726-w31:5(1103-1126)Online publication date: 31-Jan-2022
  • (2021)Efficient Uncertainty Tracking for Complex Queries with Attribute-level BoundsProceedings of the 2021 International Conference on Management of Data10.1145/3448016.3452791(528-540)Online publication date: 9-Jun-2021
  • (2020)Debugging large-scale data science pipelines using daggerProceedings of the VLDB Endowment10.14778/3415478.341552713:12(2993-2996)Online publication date: 14-Sep-2020
  • (2020)Automatically Generating Data Exploration Sessions Using Deep Reinforcement LearningProceedings of the 2020 ACM SIGMOD International Conference on Management of Data10.1145/3318464.3389779(1527-1537)Online publication date: 11-Jun-2020
  • (2020)Surfacing Visualization MiragesProceedings of the 2020 CHI Conference on Human Factors in Computing Systems10.1145/3313831.3376420(1-16)Online publication date: 21-Apr-2020

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media