Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3508546.3508642acmotherconferencesArticle/Chapter ViewAbstractPublication PagesacaiConference Proceedingsconference-collections
research-article

Auto-generating Textual Data Stories Using Data Science Pipelines

Published: 25 February 2022 Publication History
  • Get Citation Alerts
  • Abstract

    Understanding a dataset directly is challenging but transforming the results of data analysis into data stories could help people build mental models and understand the dataset easily. In this paper, we present a new framework for data-to-text NLG to generate data stories for specific personas. In order to understand the feasibility of this method and if the human generated story is consistent with the story generated by the data science pipelines, we present two experiments: a data story study with 3 financial experts, 4 Ph.D. students, and 20 Amazon Mechanical Turk workers, which offers several data stories generated by humans; and a validation study involving 39 Amazon Mechanical Turk workers who conducted usability and understandability assessments for 9 high-quality data stories, written by humans and machine. We conduct a qualitative analysis of human-written data stories to determine what people consider when writing data stories and if the human generated story is consistent with the one generated by the data science pipeline. The experimental results show that readers comprehend machine-written data stories as well as they comprehend human-written data stories.

    References

    [1]
    Manyika, M. G. I. J., Chui, M., Groves, P., Farrell, D., Kuiken, S. V., Doshi, E. A., 2013. Open data: Unlocking innovation and performance with liquid information. https://www.mckinsey.com/business-functions/mckinsey-digital/our-insights/open-data-unlocking-innovation-and-performance-with-liquid-information
    [2]
    Gregory, K., Cousijn, H., Groth, P.T., Scharnhorst, A., Wyatt, S., 2018. Understanding data retrieval practices: a social informatics perspective. https://pure.knaw.nl/ws/portalfiles/portal/6148431/1801.04971.pdf
    [3]
    Gershon, N. and Page, W. 2001. What storytelling can do for information visualization. Commun. ACM, 44(8):31–37, 2001.
    [4]
    Danqing Shi, Xinyue Xu, Fuling Sun, Y ang Shi and Nan Cao. 2020. Calliope: Automatic Visual Data Story Generation from a Spreadsheet. IEEE TRANSACTIONS ON VISUALIZA TION AND COMPUTER GRAPHICS, VOL. 27, NO. 2 https://arxiv.org/abs/2010.09975
    [5]
    Jones, M. Tim. 2018. "Data, structure, and the data science pipeline." An introduction to data science, Part 1, IBM developerWorks, February 1. Accessed 2021-10-22.
    [6]
    Dimitra Gkatzia, Oliver Lemon, and Verena Rieser. 2016. Natural language generation enhances human decisionmaking with uncertain information. In ACL
    [7]
    Portet, F., Reiter, E., Hunter, J., and Sripada, S. 2007. Automatic Generation of Textual Summaries from Neonatal Intensive Care Data. In: Bellazzi, Riccardo, Ameen Abu-Hanna and Jim Hunter (Ed.), 11th Conference on Artifical Intelligence in Medicine (AIME 07), pp. 227-236.
    [8]
    Reiter, E. Sripada, S. Hunter, J. Yu, J. and Davy, I. 2005. Choosing words in computer-generated weather forecasts. Artificial Intelligence, 167:137– 169.
    [9]
    Riedl, M. O., and Young, R. M. 2010. Narrative planning: Balancing plot and character. InJAIR.
    [10]
    Sripada, S. and Gao, F. 2007. Summarizing dive computer data: A case study in integrating textual and graphical presentations of numerical data. In Proceedings of Workshop on Multimodal Output Generation, volume CTIT Proceedings of the Workshop on Multimodal Output Generation, pages 149–157.
    [11]
    Sripada, S. Reiter, E. Davy, I. 2003. SumTime-Mousam: Configurable marine weather forecast generator Expert Update, pp. 4-10
    [12]
    Turner, R. Sripada, S. and Reiter, E. 2009. Generating approximate geographic descriptions. In European Workshop on Natural Language Generation, pages 42–49.
    [13]
    Reiter, E. 2007. An Architecture for data-to-text systems.Proceedings of the European Workshop of Natural Language Generation 2007, 97–104.
    [14]
    Seabold, Skipper and Perktold, Josef. 2010. Statsmodels: Econometric and Statistical Modeling with Python. Proceedings of the 9th Python in Science Conference.
    [15]
    Wickham, 2019. Welcome to the Tidyverse. Journal of Open Source Software, 4(43), 1686, https://doi.org/10.21105/joss.01686
    [16]
    Grolemund, G., Wickham, H. 2014. A cognitive interpretation of data analysis. Int. Stat. Rev. 82(2): 184– 204.
    [17]
    Koesten, L., Simperl, E., Kacprzak, E., Blount, T., Tennison, J. 2018. Everything you always wanted to know about a dataset: studies in data summarisation. CoRR arXiv:1810.12423
    [18]
    Lee, B. Henry Riche, N. Isenberg, P. and Carpendale, S. 2015. "More than telling a story: Transforming data into visually shared stories", IEEE Comput. Graph. Appl., vol. 35, no. 5, pp. 84-90, Sep./Oct. 2015, [online] Available: http://dx.doi.org/10.1109/MCG.2015.99
    [19]
    Cohen, J. 1988. Statistical power analysis for the behavioral sciences (2nd ed.). Hillsdale, NJ: Lawrence Erlbaum.
    [20]
    Cohen, J. 1994. The earth is round (p < .05). American Psychologist, 49, 997-1003.
    [21]
    McLeod, S. A. 2019. What does effect size tell you? Simply psychology: https://www.simplypsychology.org/effect-size.html
    [22]
    Kotrlik, J. W., & Williams, H. A. 2003. The incorporation of effect size in information technology, learning, and performance research. Information Technology, Learning, and Performance Journal, 21(1), 1–7.

    Cited By

    View all
    • (2022)Technical Features and Trends of Data Science in Financial EngineeringFrontiers in Business, Economics and Management10.54097/fbem.v4i3.10684:3(34-37)Online publication date: 31-Jul-2022

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Other conferences
    ACAI '21: Proceedings of the 2021 4th International Conference on Algorithms, Computing and Artificial Intelligence
    December 2021
    699 pages
    ISBN:9781450385053
    DOI:10.1145/3508546
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 25 February 2022

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Data Storytelling
    2. Data science
    3. Data sensemaking
    4. NLP

    Qualifiers

    • Research-article
    • Research
    • Refereed limited

    Conference

    ACAI'21

    Acceptance Rates

    Overall Acceptance Rate 173 of 395 submissions, 44%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)30
    • Downloads (Last 6 weeks)1
    Reflects downloads up to 26 Jul 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2022)Technical Features and Trends of Data Science in Financial EngineeringFrontiers in Business, Economics and Management10.54097/fbem.v4i3.10684:3(34-37)Online publication date: 31-Jul-2022

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format.

    HTML Format

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media