Abstract
A large fraction of the electronic health records (EHRs) consists of clinical measurements collected over time, such as lab tests and vital signs, which provide important information about a patient’s health status. These sequences of clinical measurements are naturally represented as time series, characterized by multiple variables and large amounts of missing data, which complicate the analysis. In this work, we propose a novel kernel which is capable of exploiting both the information from the observed values as well the information hidden in the missing patterns in multivariate time series (MTS) originating e.g. from EHRs. The kernel, called TCK\(_{IM}\), is designed using an ensemble learning strategy in which the base models are novel mixed mode Bayesian mixture models which can effectively exploit informative missingness without having to resort to imputation methods. Moreover, the ensemble approach ensures robustness to hyperparameters and therefore TCK\(_{IM}\) is particularly well suited if there is a lack of labels—a known challenge in medical applications. Experiments on three real-world clinical datasets demonstrate the effectiveness of the proposed kernel.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Agniel, D., et al.: Biases in electronic health record data due to processes within the healthcare system: retrospective observational study. BMJ 361, k1479 (2018)
Bagnall, A., et al.: The UEA multivariate time series classification archive 2018. arXiv preprintarXiv:1811.00075 (2018)
Baydogan, M.: LPS Matlab implementation (2014). http://www.mustafabaydogan.com/. Accessed 06 Sept 2019
Baydogan, M.G., Runger, G.: Time series representation and similarity based on local autopatterns. Data Min. Knowl. Disc. 30(2), 476–509 (2016)
Berndt, D.J., Clifford, J.: Using dynamic time warping to find patterns in time series. In: 3rd International Conference on Knowledge Discovery and Data Mining, pp. 359–370. AAAI Press (1994)
Bianchi, F.M., et al.: Learning representations of multivariate time series with missing data. Patt. Rec. 96, 106973 (2019)
Branagan, G., Finnis, D.: Prognosis after anastomotic leakage in colorectal surgery. Dis. Colon Rectum 48(5), 1021–1026 (2005)
Che, Z., et al.: Recurrent neural networks for multivariate time series with missing values. Sci. Rep. 8(1), 6085 (2018)
Cuturi, M., Fast global alignment kernel Matlab implementation (2011). http://www.marcocuturi.net/GA.html. Accessed 02 Sept 2019
Cuturi, M.: Fast global alignment kernels. In: Proceedings of the 28th International Conference on Machine Learning, pp. 929–936 (2011)
Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm. J. Roy. Stat. Soc. Ser. B (Methodol.) 39, 1–38 (1977)
Dietterich, T.G.: Ensemble methods in machine learning. In: International Workshop on Multiple Classifier Systems, pp. 1–15 (2000)
Donders, A.R., et al.: Review: a gentle introduction to imputation of missing values. J. Clin. Epidemiol. 59(10), 1087–1091 (2006)
Halpern, Y., et al.: Electronic medical record phenotyping using the anchor and learn framework. J. Am. Med. Inform. Assoc. 23(4), 731–40 (2016)
Lewis, S.S., et al.: Assessing the relative burden of hospital-acquired infections in a network of community hospitals. Infect. Control Hosp. Epidemiol. 34(11), 1229–1230 (2013)
Li, Q., Xu, Y.: VS-GRU: a variable sensitive gated recurrent neural network for multivariate time series with massive missing values. Appl. Sci. 9(15), 3041 (2019)
Little, R.J., Rubin, D.B.: Statistical Analysis with Missing Data. Wiley, Hoboken (2014)
Ma, Z., Chen, G.: Bayesian methods for dealing with missing data problems. J. Korean. Stat. Soc. 47(3), 297–313 (2018)
Magill, S.S., et al.: Prevalence of healthcare-associated infections in acute care hospitals in Jacksonville. Florida. Infect. Control 33(03), 283–291 (2012)
Mikalsen, K.Ø., et al.: Time series cluster kernel for learning similarities between multivariate time series with missing data. Pattern Recogn. 76, 569–581 (2018)
Olszewski, R.T.: Generalized feature extraction for structural pattern recognition in time-series data. Ph.D. thesis, Carnegie Mellon University, Pittsburgh, PA, USA (2001)
Schafer, J.L., Graham, J.W.: Missing data: our view of the state of the art. Psychol. Methods 7(2), 147 (2002)
Shao, J., Zhong, B.: Last observation carry-forward and last observation analysis. Stat. Med. 22(15), 2429–2441 (2003)
Sharafoddini, A., et al.: A new insight into missing data in intensive care unit patient profiles: observational study. JMIR Med Inform. 7(1), e11605 (2019)
Shukla, S.N., Marlin, B.: Interpolation-prediction networks for irregularly sampled time series. In: ICLR (2019)
Silva, I., et al.: Predicting in-hospital mortality of ICU patients: the physionet/computing in cardiology challenge 2012. In: 2012 Computing in Cardiology, pp. 245–248. IEEE (2012)
Snijders, H., et al.: Anastomotic leakage as an outcome measure for quality of colorectal cancer surgery. BMJ Qual. Saf. 22(9), 759–767 (2013)
Soguero-Ruiz, C., et al.: Predicting colorectal surgical complications using heterogeneous clinical data and kernel methods. J. Biomed. Inform. 61, 87–96 (2016)
Zhang, Z.: Missing data imputation: focusing on single imputation. Ann. Transl. Med. 4(1), 9 (2016)
Acknowledgements
The authors would like to thank K. Hindberg for assistance on extraction of EHR data and the physicians A. Revhaug, R.-O. Lindsetmo and K. M. Augestad for helpful guidance throughout the study.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Appendix—Synthetic benchmark datasets
Appendix—Synthetic benchmark datasets
To test how well TCK\(_{IM}\) performs for a varying degree of informative missingness, we generated in total 16 synthetic datasets by randomly injecting missing data into 4 MTS benchmark datasets. The characteristics of the datasets are described in Table 4. We transformed all MTS in each dataset to the same length, T, where T is given by \( T = \left\lceil T_{max} / \left\lceil T_{max}/25 \right\rceil \right\rceil . \) Here, \( \lceil \, \rceil \) is the ceiling operator and \(T_{max}\) is the length of the longest MTS in the original dataset.
Datasets. The following procedure was used to create 8 synthetic datasets with missing data from the Wafer and Japanese vowels datasets. We randomly sampled a number \(c_v \in \{-1, 1\} \) for each attribute \(v \in \{1,\dots , V\} \), where \(c_v =1\) indicates that the attribute and the labels are positively correlated and \(c_v =-1\) negatively correlated. Thereafter, we sampled a missing rate \(\gamma _{nv}\) from \(\mathcal {U}[ 0.3 + E \cdot c_v \cdot (y^{(n)}-1), 0.7 + E \cdot c_v \cdot (y^{(n)}-1)]\) for each MTS \(X^{(n)}\) and attribute. The parameter E was tuned such that the Pearson correlation (absolute value) between the missing rates for the attributes \(\gamma _v\) and the labels \(y^{(n)}\) took the values \(\{0.2, \, 0.4, \, 0.6, \, 0.8\}\), respectively. By doing so, we could control the amount of informative missingness and because of the way we sampled \(\gamma _{nv}\), the missing rate in each dataset was around 50% independently of the Pearson correlation. Further, the following procedure was used to create 8 synthetic datasets from the uWave and Character trajectories datasets, which both consist of only 3 attributes. We randomly sampled a number \(c_v \in \{-1, 1\} \) for each attribute \(v \in \{1,\dots , V\} \). Attribute(s) with \(c_v = -1\) became negatively correlated with the labels by sampling \(\gamma _{nv}\) from \(\mathcal {U}[ 0.7 - E \cdot (y^{(n)}-1), 1 - E \cdot (y^{(n)}-1)]\), whereas the attribute(s) with \(c_v = 1\) became positively correlated with the labels by sampling \(\gamma _{nv}\) from \(\mathcal {U}[ 0.3 + E \cdot (y^{(n)}-1), 0.6 + E \cdot (y^{(n)}-1)]\). The parameter E was computed in the same way as above. Then, we computed the mean of each attribute \(\mu _v\) over the complete dataset and let each element with \( x^{(n)}_v(t) > \mu _v\) be missing with probability \(\gamma _{nv}\). This means that the probability of being missing is dependent on the value of the missing element, i.e. the missingness mechanism is MNAR within each class. Hence, this type of informative missingness is not the same as the one we created for the Wafer and Japanese vowels datasets.
Baselines. Three baseline models were created. The first baseline, namely ordinary TCK, ignores the missingness mechanism. In the second one, refered to as TCK\(_B\), we modeled the missing patterns naively by concatenating the binary missing indicator MTS R to the MTS X and creating a new MTS U with 2V attributes. Then, ordinary TCK was trained on the datasets consisting of \(\{U^{(n)}\}\). In the third baseline, TCK\(_0\), we investigated how well informative missingness can be captured by imputing zeros for the missing values and then training the TCK on the imputed data.
Results. Table 5 shows the performance of the proposed TCK\(_{IM}\) and the three baselines for all of the 16 synthetic datasets. We see that the proposed TCK\(_{IM}\) achieves the best accuracy for 14 out of 16 datasets and is the only method which consistently has the expected behaviour, namely that the accuracy increases as the correlation between missing values and class labels increases. It can also be seen that the performance of TCK\(_{IM}\) is similar to TCK when the amount of information in the missing patterns is low, whereas TCK is clearly outperformed when the informative missingness is high. This demonstrates that TCK\(_{IM}\) can effectively exploit informative missingness.
Rights and permissions
Copyright information
© 2021 The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG
About this chapter
Cite this chapter
Mikalsen, K.Ø., Soguero-Ruiz, C., Jenssen, R. (2021). A Kernel to Exploit Informative Missingness in Multivariate Time Series from EHRs. In: Shaban-Nejad, A., Michalowski, M., Buckeridge, D.L. (eds) Explainable AI in Healthcare and Medicine. Studies in Computational Intelligence, vol 914. Springer, Cham. https://doi.org/10.1007/978-3-030-53352-6_3
Download citation
DOI: https://doi.org/10.1007/978-3-030-53352-6_3
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-53351-9
Online ISBN: 978-3-030-53352-6
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)