Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article
Free access
Just Accepted

Data management for continuous learning in EHR systems

Online AM: 07 May 2024 Publication History

Abstract

To gain a comprehensive understanding of a patient’s health, advanced analytics must be applied to the data collected by electronic health record (EHR) systems. However, managing and curating this data requires carefully designed workflows. While digitalization and standardization enable continuous health monitoring, missing data values and technical issues can compromise the consistency and timeliness of the data. In this paper, we propose a workflow for developing prognostic models that leverages the SMART BEAR infrastructure and the capabilities of the Big Data Analytics (BDA) engine to homogenize and harmonize data points. Our workflow improves the quality of the data by evaluating different imputation algorithms and selecting one that maintains the distribution and correlation of features similar to the raw data. We applied this workflow to a subset of the data stored in the SMART BEAR repository and examined its impact on the prediction of emerging health states such as cardiovascular disease and mild depression. We also discussed the possibility of model validation by clinicians in the SMART BEAR project, the transmission of subsequent actions in the decision support system, and the estimation of the required number of data points.

References

[1]
C Agostinho, A Pimenta, M Marques, KM Tsiouris, F Kalatzis, C Nikitas, E Iliadou, M Occhipinti, I Kouris, D Koutsouris, et al. 2022. Healthier and Independent Living of the Elderly: Interoperability in a Cross-Project Pilot. In CEUR Workshop Proceedings. CEUR, 1–4.
[2]
Marco Anisetti, Claudio A. Ardagna, and Nicola Bena. 2023. Multi-Dimensional Certification of Modern Distributed Systems. IEEE TSC 16, 3 (2023).
[3]
Marco Anisetti, Claudio A. Ardagna, Nicola Bena, and Ernesto Damiani. 2023. Rethinking Certification for Trustworthy Machine-Learning-Based Applications. IEEE IC 27, 6 (2023).
[4]
Claudio A. Ardagna and Nicola Bena. 2023. Non-Functional Certification of Modern Distributed Systems: A Research Manifesto. In Proc. of IEEE SSE 2023. Chicago, IL, USA.
[5]
Michael Armbrust, Ali Ghodsi, Reynold Xin, and Matei Zaharia. 2021. Lakehouse: a new generation of open platforms that unify data warehousing and advanced analytics. In Proceedings of CIDR, Vol.  8.
[6]
Antonia Azzini, Sylvio Barbon Jr, Valerio Bellandi, Tiziana Catarci, Paolo Ceravolo, Philippe Cudré-Mauroux, Samira Maghool, Jaroslav Pokorny, Monica Scannapieco, Florence Sedes, et al. 2021. Advances in data management in the big data era. In Advancing Research in Information and Communication Technology: IFIP’s Exciting First 60+ Years, Views from the Technical Committees and Working Groups. Springer, 99–126.
[7]
Francesco Bagattini, Isak Karlsson, Jonathan Rebane, and Panagiotis Papapetrou. 2019. A classification framework for exploiting sparse multi-variate temporal features with application to adverse drug event detection in medical records. BMC medical informatics and decision making 19, 1 (2019), 1–20.
[8]
Valerio Bellandi. 2023. A Big Data Infrastructure in Support of Healthy and Independent Living: A Real Case Application. Intelligent Systems Reference Library 229 (2023), 95 – 134.
[9]
Valerio Bellandi, Paolo Ceravolo, Ernesto Damiani, Samira Maghool, Ioannis Basdekis, Matteo Cesari, Eleftheria Iliadou, and Mircea Dan Marzan. 2022. A methodology to engineering continuous monitoring of intrinsic capacity for elderly people. Complex & Intelligent Systems(2022), 3953–3971. https://doi.org/10.1007/s40747-022-00775-w
[10]
Valerio Bellandi, Paolo Ceravolo, Ernesto Damiani, and Stefano Siccardi. 2022. Smart Healthcare, IoT and Machine Learning: A Complete Survey. Intelligent Systems Reference Library 212 (2022), 307 – 330. https://doi.org/10.1007/978-3-030-83620-7_13
[11]
Valerio Bellandi, Paolo Ceravolo, Samira Maghool, and Stefano Siccardi. 2022. Toward a general framework for multimodal big data analysis. Big Data 10, 5 (2022), 408–424.
[12]
Duane Bender and Kamran Sartipi. 2013. HL7 FHIR: An Agile and RESTful approach to healthcare information exchange. In Proceedings of the 26th IEEE international symposium on computer-based medical systems. IEEE, 326–331.
[13]
LR Bergman. 1996. Measurement and data quality in longitudinal research. European Child & Adolescent Psychiatry 5 (1996), 28–32.
[14]
Munish Bhatia and Sandeep K Sood. 2019. Exploring temporal analytics in fog-cloud architecture for smart office healthcare. Mobile Networks and Applications 24, 4 (2019), 1392–1410.
[15]
David Blumenthal and Marilyn Tavenner. 2010. The “meaningful use” regulation for electronic health records. New England Journal of Medicine 363, 6 (2010), 501–504.
[16]
Paolo Ceravolo and Emanuele Bellini. 2019. Towards configurable composite data quality assessment. In 2019 IEEE 21st Conference on Business Informatics (CBI), Vol.  1. IEEE, 249–257.
[17]
Alessia Cristiano, Sara De Silvestri, Stela Musteata, Alberto Sanna, Diana Trojaniello, Valerio Bellandi, Paolo Ceravolo, and Matteo Cesari. 2021. IoT Platform for Ageing Society: the SMART BEAR Project. In The Thirteenth International Conference on eHealth, Telemedicine, and Social Medicine (eTELEMED 2021). IARIA.
[18]
Bradley J Erickson and Felipe Kitamura. 2021. Magician’s corner: 9. Performance metrics for machine learning models., e200126 pages.
[19]
European Commission. 2022. Exchange of electronic health records across the EU. https://digital-strategy.ec.europa.eu/en/policies/electronic-health-records. Accessed: 2022-12-04.
[20]
Chenguang Fang and Chen Wang. 2020. Time series data imputation: A survey on deep learning approaches. arXiv preprint arXiv:2011.11347(2020).
[21]
Venkat Gudivada, Amy Apon, and Junhua Ding. 2017. Data quality considerations for big data and machine learning: Going beyond data cleaning and transformations. International Journal on Advances in Software 10, 1 (2017), 1–20.
[22]
Xin He, Kaiyong Zhao, and Xiaowen Chu. 2021. AutoML: A survey of the state-of-the-art. Knowledge-Based Systems 212 (2021), 106622.
[23]
Rachael A Hughes, Jon Heron, Jonathan AC Sterne, and Kate Tilling. 2019. Accounting for missing data in statistical analyses: multiple imputation is not always the answer. International journal of epidemiology 48, 4 (2019), 1294–1304.
[24]
Janus Christian Jakobsen, Christian Gluud, Jørn Wetterslev, and Per Winkel. 2017. When and how should multiple imputation be used for handling missing data in randomised clinical trials–a practical guide with flowcharts. BMC medical research methodology 17, 1 (2017), 1–10.
[25]
Clemens Scott Kruse, Anna Stein, Heather Thomas, and Harmander Kaur. 2018. The use of electronic health records to support population health: a systematic review of the literature. Journal of medical systems 42, 11 (2018), 1–16.
[26]
Daniel Lewkowicz, Attila Wohlbrandt, and Erwin Boettinger. 2020. Economic impact of clinical decision support interventions based on electronic health records. BMC health services research 20, 1 (2020), 1–12.
[27]
Xiaopeng Li, Qiang Zeng, Lannan Luo, and Tongbo Luo. 2020. T2pair: Secure and usable pairing for heterogeneous iot devices. In Proceedings of the 2020 acm sigsac conference on computer and communications security. 309–323.
[28]
Yuehua Liu, Tharam Dillon, Wenjin Yu, Wenny Rahayu, and Fahed Mostafa. 2020. Missing value imputation for industrial IoT sensor data with large gaps. IEEE Internet of Things Journal 7, 8 (2020), 6855–6867.
[29]
Marko Luksa. 2017. Kubernetes in action. Simon and Schuster.
[30]
Paul Madley-Dowd, Rachael Hughes, Kate Tilling, and Jon Heron. 2019. The proportion of missing data should not be used to guide decisions on multiple imputation. Journal of clinical epidemiology 110 (2019), 63–73.
[31]
David Magnusson and Lars R Bergman. 1990. Data quality in longitudinal research. Vol.  3. Cambridge University Press.
[32]
Joshua C Mandel, David A Kreda, Kenneth D Mandl, Isaac S Kohane, and Rachel B Ramoni. 2016. SMART on FHIR: a standards-based, interoperable apps platform for electronic health records. Journal of the American Medical Informatics Association 23, 5(2016), 899–908.
[33]
Nir Menachemi and Taleah H Collum. 2011. Benefits and drawbacks of electronic health record systems. Risk management and healthcare policy 4 (2011), 47.
[34]
Isaac Moshe, Yannik Terhorst, Kennedy Opoku Asare, Lasse Bosse Sander, Denzil Ferreira, Harald Baumeister, David C Mohr, and Laura Pulkki-Råback. 2021. Predicting symptoms of depression and anxiety using smartphone and wearable data. Frontiers in psychiatry 12 (2021), 625247.
[35]
Chetta Ngamjarus. 2016. n4Studies: Sample size calculation for an epidemiological study on a smart device. Siriraj Medical Journal 68, 3 (2016), 160–170.
[36]
Vadim Peretokin, Ioannis Basdekis, Ioannis Kouris, Jonatan Maggesi, Mario Sicuranza, Qiqi Su, Alberto Acebes, Anca Bucur, Vinod Jaswanth Roy Mukkala, Konstantin Pozdniakov, et al. 2022. Overview of the SMART-BEAR Technical Infrastructure. In Proceedings of the 8th International Conference on Information and Communication Technologies for Ageing Well and e-Health-ICT4AWE,. SciTePress, 117–125.
[37]
PS Raja and KJSC Thangavel. 2020. Missing value imputation using unsupervised machine learning techniques. Soft Computing 24, 6 (2020), 4361–4392.
[38]
Ewa Rudnicka, Paulina Napierała, Agnieszka Podfigurna, Błażej Męczekalski, Roman Smolarczyk, and Monika Grymowicz. 2020. The World Health Organization (WHO) approach to healthy ageing. Maturitas 139(2020), 6–11.
[39]
Casey N Ta and Chunhua Weng. 2019. Detecting systemic data quality issues in electronic health records. Studies in health technology and informatics 264 (2019), 383.
[40]
U.S. Centers for Medicare & Medicaid Services. 2021. Electronic Health Records. https://www.cms.gov/Medicare/E-Health/EHealthRecords. Accessed: 2021-01-12.
[41]
Tjeerd van der Ploeg, Peter C Austin, and Ewout W Steyerberg. 2014. Modern modelling techniques are data hungry: a simulation study for predicting dichotomous endpoints. BMC medical research methodology 14, 1 (2014), 1–13.
[42]
Upkar Varshney. 2013. Smart medication management system and multiple interventions for medication adherence. Decision Support Systems 55, 2 (2013), 538–551.

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Internet Technology
ACM Transactions on Internet Technology Just Accepted
EISSN:1557-6051
Table of Contents
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the owner/author(s).

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Online AM: 07 May 2024
Accepted: 23 March 2024
Revised: 19 March 2024
Received: 08 April 2023

Check for updates

Author Tags

  1. Internet of Things
  2. Electronic Health Records
  3. Data Management
  4. Continuous Learning

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 150
    Total Downloads
  • Downloads (Last 12 months)150
  • Downloads (Last 6 weeks)32
Reflects downloads up to 17 Oct 2024

Other Metrics

Citations

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Get Access

Login options

Full Access

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media