Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

A Framework for Generating Summaries from Temporal Personal Health Data

Published: 15 July 2021 Publication History
  • Get Citation Alerts
  • Abstract

    Although it has become easier for individuals to track their personal health data (e.g., heart rate, step count, and nutrient intake data), there is still a wide chasm between the collection of data and the generation of meaningful summaries to help users better understand what their data means to them. With an increased comprehension of their data, users will be able to act upon the newfound information and work toward striving closer to their health goals. We aim to bridge the gap between data collection and summary generation by mining the data for interesting behavioral findings that may provide hints about a user’s tendencies. Our focus is on improving the explainability of temporal personal health data via a set of informative summary templates, or “protoforms.” These protoforms span both evaluation-based summaries that help users evaluate their health goals and pattern-based summaries that explain their implicit behaviors. In addition to individual-level summaries, the protoforms we use are also designed for population-level summaries. We apply our approach to generate summaries (both univariate and multivariate) from real user health data and show that the summaries our system generates are both interesting and useful.

    References

    [1]
    Alberto Alvarez-Alvarez and Gracian Trivino. 2013. Linguistic description of the human gait quality. Engineering Applications of Artificial Intelligence 26, 1 (2013), 13–23.
    [2]
    Tatsuya Aoki, Akira Miyazawa, Tatsuya Ishigaki, Keiichi Goshima, Kasumi Aoki, Ichiro Kobayashi, Hiroya Takamura, and Yusuke Miyao. 2018. Generating market comments referring to external resources. In Proceedings of the International Conference on Natural Language Generation.
    [3]
    American Diabetes Association. 2019. 5. Lifestyle management: Standards of medical care in diabetes—2019. Diabetes Care 42, Suppl. 1 (2019), S46–S60.
    [4]
    James Baldwin, Trevor P. Martin, and Jonathan M. Rossiter. 1998. Time series modelling and prediction using fuzzy trend information. In Proceedings of the International Conference on Soft Computing and Information Intelligent Systems.
    [5]
    Ildar Z. Batyrshin and Leonid B. Sheremetov. 2008. Perception-based approach to time series data mining. Applied Soft Computing 8, 3 (2008), 1211–1221.
    [6]
    Fatih Emre Boran, Diyar Akay, and Ronald R. Yager. 2016. An overview of methods for linguistic summarization with fuzzy sets. Expert Systems with Applications 61 (2016), 356–377.
    [7]
    Rita Castillo-Ortega, Nicolás Marín, Daniel Sánchez, and Andrea Tettamanzi. 2011. Linguistic summarization of time series data using genetic algorithms. In Proceedings of the Conference of the European Society for Fuzzy Logic and Technology.
    [8]
    Jarvis T.-Y. Cheung and George Stephanopoulos. 1990. Representation of process trends—Part I. A formal representation framework. Computers & Chemical Engineering 14, 4 (1990), 495–510.
    [9]
    Eun Kyoung Choe, Nicole B. Lee, Bongshin Lee, Wanda Pratt, and Julie A. Kientz. 2014. Understanding quantified-selfers’ practices in collecting and exploring personal data. In Proceedings of the ACM Conference on Human Factors in Computing Systems.
    [10]
    James Codella, Chohreh Partovian, Hung-Yang Chang, and Ching-Hua Chen. 2018. Data quality challenges for person-generated health and wellness data. IBM Journal of Research and Development 62, 1 (Jan. 2018), Article 3, 8 pages.
    [11]
    Patricia Conde-Clemente, Jose M. Alonso, Eldman O. Nunes, Angel Sanchez, and Gracian Trivino. 2017. New types of computational perceptions: Linguistic descriptions in deforestation analysis. Expert Systems with Applications 85 (2017), 46–60.
    [12]
    Gautam Das, King-Ip Lin, Heikki Mannila, Gopal Renganathan, and Padhraic Smyth. 1998. Rule discovery from time series. In Proceedings of the ACM SIGKDD Conference on Knowledge Discovery and Data Mining.
    [13]
    Luka Eciolaza, Martin Pereira-Farina, and Gracian Trivino. 2013. Automatic linguistic reporting in driving simulation environments. Applied Soft Computing 13, 9 (2013), 3956–3967.
    [14]
    Steven Elsworth and Stefan Guttel. 2020. ABBA: Adaptive Brownian bridge-based symbolic aggregation of time series. Data Mining and Knowledge Discovery 34 (2020), 1175–1200.
    [15]
    Albert Gatt, François Portet, Ehud Reiter, Jim Hunter, Saad Mahamood, Wendy Moncur, and Somayajulu Sripada. 2009. From data to text in the neonatal intensive care unit: Using NLG technology for decision support and information management. AI Communications 22, 3 (Aug. 2009), 153–186.
    [16]
    Herbert Paul Grice. 1967. Logic and conversation. In Studies in the Way of Words, Paul Grice (Ed.). Harvard University Press, 41–58.
    [17]
    Gabriela Guimarães and Alfred Ultsch. 1999. A method for temporal knowledge conversion. In Advances in Intelligent Data Analysis, David J. Hand, Joost N. Kok, and Michael R. Berthold (Eds.). Springer, 369–380.
    [18]
    Zengyou He, Xiaofei Xu, and Shengchun Deng. 2002. Squeezer: An efficient algorithm for clustering categorical data. Journal of Computer Science and Technology 17 (2002), 611–624.
    [19]
    Frank Höppner. 2001. Learning temporal rules from state sequences. In Proceedings of the IJCAI Workshop on Learning from Temporal and Spatial Data.
    [20]
    Janusz Kacprzyk and Anna Wilbik. 2008. Linguistic summarization of time series using fuzzy logic with linguistic quantifiers: A truth and specificity based approach. In Proceedings of the International Conference on Artificial Intelligence and Soft Computing. 241–252.
    [21]
    Janusz Kacprzyk, Anna Wilbik, and Slawomir Zadrozny. 2008. Linguistic summarization of time series using a fuzzy quantifier driven aggregation. Fuzzy Sets and Systems 159, 12 (2008), 1485–1499.
    [22]
    Janusz Kacprzyk, Anna Wilbik, and Slawomir Zadrozny. 2010. An approach to the linguistic summarization of time series using a fuzzy quantifier driven aggregation. International Journal of Intelligent Systems 25, 5 (May 2010), 411–439.
    [23]
    Janusz Kacprzyk, Ronald R. Yager, and Slawomir Zadrozny. 2002. Fuzzy Linguistic Summaries of Databases for an Efficient Business Data Analysis and Decision Support. Springer, Boston, MA, 129–152.
    [24]
    Katarzyna Kaczmarek-Majer and Olgierd Hryniewicz. 2019. Application of linguistic summarization methods in time series forecasting. Information Sciences 478 (2019), 580–594.
    [25]
    Guillaume Klein, Yoon Kim, Yuntian Deng, Jean Senellart, and Alexander M. Rush. 2017. OpenNMT: Open-source toolkit for neural machine translation. arxiv:1701.02810.
    [26]
    Philipp Koehn, Hieu Hoang, Alexandra Birch, Chris Callison-Burch, Marcello Federico, Nicola Bertoldi, Brooke Cowan, et al. 2007. Moses: Open source toolkit for statistical machine translation. In ACL Companion Volume: Demo and Poster Sessions.
    [27]
    Xuan-May Le, Tuan Tran, and Hien Nguyen. 2020. An improvement of SAX representation for time series by using complexity invariance. Intelligent Data Analysis 24 (2020), 625–641.
    [28]
    Jessica Lin, Eamonn J. Keogh, Li Wei, and Stefano Lonardi. 2007. Experiencing SAX: A novel symbolic representation of time series. Data Mining and Knowledge Discovery 15 (2007), 107–144.
    [29]
    Walter Maner and Sean Joyce. 1997. WXSYS Weather Lore + Fuzzy Logic = Weather Forecasts. Retrieved March 13, 2021 from https://www.researchgate.net/publication/237546595_WXSYS_Weather_Lore_Fuzzy_Logic_Weather_Forecasts
    [30]
    Matthew J. Menne, Imke Durre, Bryant Korzeniewski, Shelley McNeal, Kristy Thomas, Xungang Yin, Steven Anthony, et al. 2020. Global Historical Climatology Network Daily (GHCN-Daily), Version 3. Retrieved March 13, 2021 from https://www.ncei.noaa.gov/
    [31]
    Gilles Moyse and Marie-Jeanne Lesot. 2016. Linguistic summaries of locally periodic time series. Fuzzy Sets and Systems 285 (2016), 94–117.
    [32]
    Soichiro Murakami, Akihiko Watanabe, Akira Miyazawa, Keiichi Goshima, Toshihiko Yanase, Hiroya Takamura, and Yusuke Miyao. 2017. Learning to generate market comments from stock prices. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics.
    [33]
    Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. 2002. Bleu: A method for automatic evaluation of machine translation. In Proceedings of the Annual Meeting of the Association for Computational Linguistics.
    [34]
    Elizabeth Peel, Margaret Douglas, and Julia Lawton. 2007. Self monitoring of blood glucose in type 2 diabetes: Longitudinal qualitative study of patients’ perspectives. BMJ 335, 7618 (Sept. 2007), 493.
    [35]
    Reza Rawassizadeh, Elaheh Momeni, Chelsea Dobbins, Joobin Gharibshah, and Michael Pazzani. 2016. Scalable daily human behavioral pattern mining from multivariate temporal data. IEEE Transactions on Knowledge and Data Engineering 28, 11 (Nov. 2016), 3098–3112.
    [36]
    Ehud Reiter and Robert Dale. 2000. Building Natural Language Generation Systems. Cambridge University Press.
    [37]
    Daniel Sanchez-Valdes, Alberto Alvarez-Alvarez, and Gracian Trivino. 2016. Dynamic linguistic descriptions of time series applied to self-track the physical activity. Fuzzy Sets and Systems 285 (2016), 162–181.
    [38]
    Patrick Schäfer and Mikael Högqvist. 2012. SFA: A symbolic Fourier approximation and index for similarity search in high dimensional datasets. In Proceedings of the 15th International Conference on Extending Database Technology. 516–527.
    [39]
    Somayajulu G. Sripada, Ehud Reiter, Jim Hunter, and Jin Yu. 2003. Generating English summaries of time series data using the Gricean maxims. In Proceedings of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 187–196.
    [40]
    Si Sun and Kaitlin L. Costello. 2018. Designing decision-support technologies for patient-generated data in type 1 diabetes. In AMIA Annual Symposium Proceedings. 1645–1654.
    [41]
    Romel Torres. 2019. Alpha Vantage. Retrieved March 13, 2021 from https://github.com/RomelTorres/alpha_vantage
    [42]
    A. Ultsch. 1993. Knowledge extraction from self-organizing neural networks. In Information and Classification, Otto Opitz, Berthold Lausen, and Rüdiger Klar (Eds.). Springer, Berlin, Germany, 301–306.
    [43]
    Chris van der Lee, Emiel Krahmer, and Sander Wubben. 2018. Automated learning of templates for data-to-text generation: Comparing rule-based, statistical and neural methods. In Proceedings of the International Conference on Natural Language Generation.
    [44]
    Ingmar Weber and Palakorn Achananuparp. 2016. Insights from machine-learned diet success prediction. In Proceedings of the Pacific Symposium on Biocomputing.
    [45]
    Anna Wilbik and Uzay Kaymak. 2015. Linguistic summarization of processes—A research agenda. In Proceedings of the Conference of the International Fuzzy Systems Association and the European Society for Fuzzy Logic and Technology.
    [46]
    Anna Wilbik, James M. Keller, and Gregory L. Alexander. 2011. Linguistic summarization of sensor data for eldercare. In Proceedings of the IEEE International Conference on Systems, Man, and Cybernetics.
    [47]
    Dongrui Wu, Jerry M. Mendel, and Jhiin Joo. 2010. Linguistic summarization using IF-THEN rules. In Proceedings of the International Conference on Fuzzy Systems. 1–8.
    [48]
    Ronald R. Yager. 1982. A new approach to the summarization of data. Information Sciences 28, 1 (1982), 69–86.
    [49]
    Lotfi A. Zadeh. 1975. The concept of a linguistic variable and its application to approximate reasoning—I. Information Sciences 8, 3 (1975), 199–249.
    [50]
    Lotfi A. Zadeh. 1983. A computational approach to fuzzy quantifiers in natural languages. Computers & Mathematics with Applications 9, 1 (1983), 149–184.
    [51]
    Lotfi A. Zadeh. 2002. A prototype-centered approach to adding deduction capability to search engines—The concept of protoform. In Proceedings of the IEEE Symposium on Intelligent Systems.
    [52]
    Mohammed J. Zaki. 2001. SPADE: An efficient algorithm for mining frequent sequences. Machine Learning 42, 1 (Jan. 2001), 31–60.

    Cited By

    View all
    • (2023)14 Years of Self-Tracking Technology for mHealth—Literature Review: Lessons Learned and the PAST SELF FrameworkACM Transactions on Computing for Healthcare10.1145/35926214:3(1-43)Online publication date: 8-Sep-2023
    • (2022)Monitoring neurological disorders with AI-enabled wearable systemsProceedings of the 2022 Workshop on Emerging Devices for Digital Biomarkers10.1145/3539494.3542755(24-28)Online publication date: 1-Jul-2022
    • (2022)Generating Comparative Explanations of Financial Time SeriesAdvances in Databases and Information Systems10.1007/978-3-031-15740-0_10(121-132)Online publication date: 5-Sep-2022
    • Show More Cited By

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Transactions on Computing for Healthcare
    ACM Transactions on Computing for Healthcare  Volume 2, Issue 3
    Survey Paper
    July 2021
    226 pages
    EISSN:2637-8051
    DOI:10.1145/3476113
    Issue’s Table of Contents
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 15 July 2021
    Accepted: 01 January 2021
    Revised: 01 November 2020
    Received: 01 March 2020
    Published in HEALTH Volume 2, Issue 3

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Linguistic data summarization
    2. natural language summaries
    3. personal health data
    4. protoforms
    5. sequence mining
    6. time-series analysis

    Qualifiers

    • Research-article
    • Research
    • Refereed

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)38
    • Downloads (Last 6 weeks)6

    Other Metrics

    Citations

    Cited By

    View all
    • (2023)14 Years of Self-Tracking Technology for mHealth—Literature Review: Lessons Learned and the PAST SELF FrameworkACM Transactions on Computing for Healthcare10.1145/35926214:3(1-43)Online publication date: 8-Sep-2023
    • (2022)Monitoring neurological disorders with AI-enabled wearable systemsProceedings of the 2022 Workshop on Emerging Devices for Digital Biomarkers10.1145/3539494.3542755(24-28)Online publication date: 1-Jul-2022
    • (2022)Generating Comparative Explanations of Financial Time SeriesAdvances in Databases and Information Systems10.1007/978-3-031-15740-0_10(121-132)Online publication date: 5-Sep-2022
    • (2022)Semantic Technologies for Clinically Relevant Personal Health ApplicationsPersonal Health Informatics10.1007/978-3-031-07696-1_10(199-220)Online publication date: 23-Nov-2022

    View Options

    Get Access

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format.

    HTML Format

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media