Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3596673.3596972acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
short-paper
Open access

Teaching Data Science by Visualizing Data Table Transformations: Pandas Tutor for Python, Tidy Data Tutor for R, and SQL Tutor

Published: 23 June 2023 Publication History

Abstract

Data science instructors often find it hard to explain to students how a piece of code written in Python, R, or SQL executes in order to transform tabular data. They currently resort to hand-drawing diagrams or making presentation slides to illustrate the semantics of operations such as filtering, sorting, reshaping, pivoting, grouping, and joining. These diagrams are time-consuming to create and do not synchronize with real code or data that students are learning about. In this paper we show that a step-by-step visual representation of tabular data transforms can help instructors to explain these operations. To do so, we created a table visualization library that illustrates the row-, column-, and cell-wise relationships between an operation's input and output tables. On top of this library we built a trio of free web-based visualization tools - Pandas Tutor for Python, Tidy Data Tutor for R tidyverse, and SQL Tutor - that run users' code and automatically produce diagrams of how Python/R/SQL transforms data tables step-by-step from input to output. Since launching in Dec 2021, over 61,000 people from over 160 countries have visited our website to try out these tools.

References

[1]
2019. Pyodide is a Python distribution for the browser and Node.js based on WebAssembly. https://pyodide.org/. Accessed: 2023-02-20.
[2]
2022. 11 Types of Data Science Jobs (With Responsibilities). https://www.indeed.com/career-advice/finding-a-job/types-of-data-science-jobs. Accessed: 2023-02-20.
[3]
2023. pandas - MultiIndex / advanced indexing. https://pandas.pydata.org/docs/user_guide/advanced.html. Accessed: 2023-02-20.
[4]
2023. pandas - Python Data Analysis Library. https://pandas.pydata.org/. Accessed: 2023-02-20.
[5]
2023. Tidyverse: R packages for data science. https://www.tidyverse.org/. Accessed: 2023-02-20.
[6]
Philip Guo. 2021. Ten Million Users and Ten Years Later: Python Tutor's Design Guidelines for Building Scalable and Sustainable Research Software in Academia. In The 34th Annual ACM Symposium on User Interface Software and Technology (Virtual Event, USA) (UIST '21). Association for Computing Machinery, New York, NY, USA, 1235--1251.
[7]
Philip J. Guo, Sean Kandel, Joseph M. Hellerstein, and Jeffrey Heer. 2011. Proactive Wrangling: Mixed-Initiative End-User Programming of Data Transformation Scripts. In Proceedings of the 24th Annual ACM Symposium on User Interface Software and Technology (Santa Barbara, California, USA) (UIST '11). Association for Computing Machinery, New York, NY, USA, 65--74.
[8]
Mark Harrower and Cynthia A Brewer. 2003. ColorBrewer.org: an online tool for selecting colour schemes for maps. The Cartographic Journal 40, 1 (2003), 27--37.
[9]
Meta Platforms Inc. 2019. LibCST - A Concrete Syntax Tree (CST) parser and serializer library for Python. https://github.com/Instagram/LibCST. Accessed: 2023-02-20.
[10]
Alexander C. Kafka. 2018. With Student Interest Soaring, Berkeley Creates New Data-Sciences Division. The Chronicle of Higher Education (Nov 2018).
[11]
Sean Kandel, Andreas Paepcke, Joseph Hellerstein, and Jeffrey Heer. 2011. Wrangler: Interactive Visual Specification of Data Transformation Scripts. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (Vancouver, BC, Canada) (CHI '11). Association for Computing Machinery, New York, NY, USA, 3363--3372.
[12]
Meraj Khan, Larry Xu, Arnab Nandi, and Joseph M. Hellerstein. 2017. Data Tweening: Incremental Visualization of Data Transforms. Proceedings of the VLDB Endowment 10, 6 (2017), 661--672.
[13]
Sean Kross and Philip J. Guo. 2021. Tidy Data Tutor - visualize R tidyverse data pipelines. https://tidydatatutor.com/. Accessed: 2023-02-20.
[14]
Sam Lau and Philip J. Guo. 2021. Pandas Tutor - visualize Python pandas code. https://pandastutor.com/. Accessed: 2023-02-20.
[15]
Aristotelis Leventidis, Jiahui Zhang, Cody Dunne, Wolfgang Gatterbauer, H. V. Jagadish, and Mirek Riedewald. 2020. QueryVis: Logic-based Diagrams Help Users Understand Complicated SQL Queries Faster. In Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data. 2303--2318.
[16]
Daphne Miedema and George Fletcher. 2021. SQLVis: Visual query representations for supporting SQL learners. In 2021 IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC). IEEE, 1--9.
[17]
Andrés Moreno, Niko Myller, Erkki Sutinen, and Mordechai Ben-Ari. 2004. Visualizing Programs with Jeliot 3. In Proceedings of the Working Conference on Advanced Visual Interfaces (Gallipoli, Italy) (AVI '04). Association for Computing Machinery, New York, NY, USA, 373--376.
[18]
MySQL. 2023. EXPLAIN Statement. https://dev.mysql.com/doc/refman/8.0/en/explain.html. Accessed: 2023-02-20.
[19]
Devin Petersohn, Stephen Macke, Doris Xin, William Ma, Doris Lee, Xiangxi Mo, Joseph E. Gonzalez, Joseph M. Hellerstein, Anthony D. Joseph, and Aditya Parameswaran. 2020. Towards Scalable Dataframe Systems. Proc. VLDB Endow. 13, 12 (jul 2020), 2033--2046.
[20]
Fotis Psallidas and Eugene Wu. 2018. Smoke: Fine-Grained Lineage at Interactive Speed. Proc. VLDB Endow. 11, 6 (feb 2018), 719--732.
[21]
Xiaoying Pu, Sean Kross, Jake M. Hofman, and Daniel G. Goldstein. 2021. Datamations: Animated Explanations of Data Analysis Pipelines. In Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems. 1--14.
[22]
Mark Raasveldt and Hannes Mühleisen. 2019. DuckDB: An Embeddable Analytical Database. In Proceedings of the 2019 International Conference on Management of Data (Amsterdam, Netherlands) (SIGMOD '19). Association for Computing Machinery, New York, NY, USA, 1981--1984.
[23]
Nischal Shrestha, Titus Barik, and Chris Parnin. 2021. Unravel: A Fluent Code Explorer for Data Wrangling. In The 34th Annual ACM Symposium on User Interface Software and Technology (Virtual Event, USA) (UIST '21). Association for Computing Machinery, New York, NY, USA, 198--207.
[24]
Juha Sorva, Ville Karavirta, and Lauri Malmi. 2013. A Review of Generic Program Visualization Systems for Introductory Programming Education. ACM Trans. Comput. Educ. 13, 4, Article 15 (nov 2013), 64 pages.
[25]
Juha Sorva and Teemu Sirkiä. 2010. UUhistle: A Software Tool for Visual Program Simulation. In Proceedings of the 10th Koli Calling International Conference on Computing Education Research (Koli, Finland) (Koli Calling '10). Association for Computing Machinery, New York, NY, USA, 49--54.
[26]
Hadley Wickham. 2014. Tidy data. Journal of Statistical Software 59,10 (2014), 1--23.
[27]
Eugene Wu. 2020. databass is a query compilation engine built for Columbia's database courses. https://github.com/w6113/databass-public. Accessed: 2023-02-20.
[28]
Eugene Wu. 2022. SQLTutor Visualizes Query Execution. https://cudbg.github.io/sqltutor/. Accessed: 2023-02-20.

Cited By

View all
  • (2024)Programming Language Learning in K-12 EducationEmpowering STEM Educators With Digital Tools10.4018/979-8-3693-9806-7.ch010(227-260)Online publication date: 1-Nov-2024
  • (2024)WaitGPT: Monitoring and Steering Conversational LLM Agent in Data Analysis with On-the-Fly Code VisualizationProceedings of the 37th Annual ACM Symposium on User Interface Software and Technology10.1145/3654777.3676374(1-14)Online publication date: 13-Oct-2024
  • (2024)Data Science Mastery Learning Using Parsons Problems-Inspired Table TransformationsProceedings of the 55th ACM Technical Symposium on Computer Science Education V. 210.1145/3626253.3635556(1604-1605)Online publication date: 14-Mar-2024
  • Show More Cited By

Index Terms

  1. Teaching Data Science by Visualizing Data Table Transformations: Pandas Tutor for Python, Tidy Data Tutor for R, and SQL Tutor

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    DataEd '23: Proceedings of the 2nd International Workshop on Data Systems Education: Bridging education practice with education research
    June 2023
    63 pages
    ISBN:9798400702075
    DOI:10.1145/3596673
    This work is licensed under a Creative Commons Attribution International 4.0 License.

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 23 June 2023

    Check for updates

    Author Tags

    1. data science education
    2. tabular data
    3. code visualization

    Qualifiers

    • Short-paper

    Funding Sources

    Conference

    DataEd '23
    Sponsor:

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)330
    • Downloads (Last 6 weeks)40
    Reflects downloads up to 13 Jan 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Programming Language Learning in K-12 EducationEmpowering STEM Educators With Digital Tools10.4018/979-8-3693-9806-7.ch010(227-260)Online publication date: 1-Nov-2024
    • (2024)WaitGPT: Monitoring and Steering Conversational LLM Agent in Data Analysis with On-the-Fly Code VisualizationProceedings of the 37th Annual ACM Symposium on User Interface Software and Technology10.1145/3654777.3676374(1-14)Online publication date: 13-Oct-2024
    • (2024)Data Science Mastery Learning Using Parsons Problems-Inspired Table TransformationsProceedings of the 55th ACM Technical Symposium on Computer Science Education V. 210.1145/3626253.3635556(1604-1605)Online publication date: 14-Mar-2024
    • (2023)Detangler: Helping Data Scientists Explore, Understand, and Debug Data Wrangling Pipelines2023 IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC)10.1109/VL-HCC57772.2023.00031(189-198)Online publication date: 3-Oct-2023

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Login options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media