Automated tracking of level of consciousness and delirium in critical illness using deep learning

Sun, Haoqi; Kimchi, Eyal; Akeju, Oluwaseun; Nagaraj, Sunil B.; McClain, Lauren M.; Zhou, David W.; Boyle, Emily; Zheng, Wei-Long; Ge, Wendong; Westover, M. Brandon

doi:10.1038/s41746-019-0167-0

Download PDF

Article
Open access
Published: 09 September 2019

Automated tracking of level of consciousness and delirium in critical illness using deep learning

npj Digital Medicine volumeÂ 2, ArticleÂ number:Â 89 (2019) Cite this article

7176 Accesses
23 Citations
36 Altmetric
Metrics details

Subjects

Abstract

Over- and under-sedation are common in the ICU, and contribute to poor ICU outcomes including delirium. Behavioral assessments, such as Richmond Agitation-Sedation Scale (RASS) for monitoring levels of sedation and Confusion Assessment Method for the ICU (CAM-ICU) for detecting signs of delirium, are often used. As an alternative, brain monitoring with electroencephalography (EEG) has been proposed in the operating room, but is challenging to implement in ICU due to the differences between critical illness and elective surgery, as well as the duration of sedation. Here we present a deep learning model based on a combination of convolutional and recurrent neural networks that automatically tracks both the level of consciousness and delirium using frontal EEG signals in the ICU. For level of consciousness, the system achieves a median accuracy of 70% when allowing prediction to be within one RASS level difference across all patients, which is comparable or higher than the median technicianânurse agreement at 59%. For delirium, the system achieves an AUC of 0.80 with 69% sensitivity and 83% specificity at the optimal operating point. The results show it is feasible to continuously track level of consciousness and delirium in the ICU.

Development of a deep learning model that predicts critical events of pediatric patients admitted to general wards

Article Open access 27 February 2024

Supervised deep learning with vision transformer predicts delirium using limited lead EEG

Article Open access 16 May 2023

Continuous and automatic mortality risk prediction using vital signs in the intensive care unit: a hybrid neural network approach

Article Open access 04 December 2020

Introduction

The âICU triadâ of pain, agitation, and delirium¹ makes the intensive care unit (ICU) an intensely stressful experience for many critically ill patients. Sedatives and analgesics are widely used to minimize pain and agitation. Unfortunately, over- and under-sedation and analgesia are common, affecting about 70% of ICU patients.² Over-sedation is associated with hypotension, prolonged ventilation, and ICU length of stay; under-sedation is likewise associated with pain, agitation, cardiac arrhythmias, immune dysfunction, and ventilator desynchrony. Both over- and under-sedation are associated with delirium, leading to poorer cognition³ and clinical outcomes.⁴

Many clinical assessment tools have been designed to monitor the level of consciousness in the ICU, including the Ramsay Scale,⁵ Sedation-Agitation Scale (SAS),⁶ and Richmond Agitation-Sedation Scale (RASS).⁷ Similarly, delirium is also assessed using behavioral scales such as the Confusion Assessment Method for the ICU (CAM-ICU)⁸ and Intensive Care Delirium Screening Checklist (ICDSC).⁹ The inter-rater agreement with these scales is relatively high.^7,8,10 However, these assessments have inherent limitations: (1) they do not continuously track patient status; (2) they are not directly based on physiology; and (3) clinical assessments interrupt sleep. Some prior studies suggest that continuous tracking of the level of consciousness and delirium can contribute positively to ICU outcomes, and may improve ICU post-discharge outcomes and reduce costs.^11,12

Various reviews^13,14,15 have discussed the potential importance of continuous EEG (cEEG) monitoring in the ICU, such as recognizing non-convulsive status epilepticus, recognizing hypoactive delirium, and managing sedation levels. Although cEEG is increasingly implemented for monitoring patients after cardiac arrest and providing routine care in several hospitals, there are not enough trained clinical neurophysiologists available for reading EEG. Thus, tracking the level of consciousness and delirium in a continuous manner in the ICU has remained a challenge. Closely related to the topic, anesthesia depth monitors using EEG signals are used in the operation room to continuously track anesthesia depth, such as the Bispectral Index (BIS) (Aspect Medical Systems, Norwood, MA, USA) and Narcotrend Index (Monitor Technik, Bad Bramstedt, Germany). Their algorithms are mostly based on extracting various EEG spectral and entropy features and performing regression analysis.¹⁶ However, these monitors in the operation room are not optimized for the ICU, since they have been developed to monitor consciousness in relatively normal brains, while the brains in the ICU are very different.

Recent developments in deep learning have found promising applications in healthcare domains.¹⁷ Deep learning algorithms can learn task-relevant features from raw signals, reducing the need to handcraft features or biomarkers for a specific task. This ability is promising for EEG-based tracking of level of consciousness and delirium, where human experts may not be able to identify all features in EEG waveforms relevant to the brain states of interest.

Here we develop a deep learning model that automatically tracks both the level of consciousness and delirium. The input to the system is the preprocessed EEG waveform without extracting any features. We evaluate different aspects of its performance including tracking accuracy and delay. We interpret the model by showing the important regions of the EEG signal that lead to the final prediction. The results provide evidence for the feasibility of continuously tracking level of consciousness and delirium in the ICU.

Results

Tracking level of consciousness

As shown in Fig. 1a, CNNâ+âLSTM achieved similar MAE compared to using CNN (pâ=â1.0), and outperformed spectrogram or band power (pâ<â0.05), suggesting that CNN learned a better set of features compared to spectral domain only. The MAE was comparative to technician-nurse agreement. In Fig. 1b we compared the AUCs for RASS assessments only at â5, â4 vs. â1, 0. CNNâ+âLSTM achieved AUC 0.83 (95% CI 0.81â0.85). CNNâ+âLSTM had significantly better AUC than the non-deep learning method (bpâ+âBSR: using band power and burst suppression ratio (BSR) as the features and ordinal regression as the model). As shown in Fig. 1c, the accuracy was 24% for the CNNâ+âLSTM model which was higher than technicianânurse agreement (pâ<â0.05); when allowing for one level of difference, the accuracy was 70% which was comparative to technicianânurse agreement. Overall, CNNâ+âLSTM and CNN only achieved the best performance. LSTM mainly learned to smooth without reducing performance. The distributions of the individual performance metrics for CNNâ+âLSTM are shown in Supplementary Fig. 4.

Tracking delirium

In Fig. 1d we show the receiver operator curve (ROC) between true CAM-ICU and the predicted probability of having positive CAM-ICU. It achieved an AUC of 0.80 (95% CI 0.73â0.86). In Table 1 we show the sensitivity, specificity, and thresholds at various operating points. The optimal operating point¹⁸ is associated with sensitivity of 0.69 and specificity of 0.83 at a threshold of 0.59.

Table 1 Sensitivity, specificity, and threshold at different operating points for tracking delirium

Full size table

A probability calibration curve for delirium detection is shown in Supplementary Fig. 5, including the curve before and after re-calibration. The calibration error (mean absolute error to the diagonal line) after re-calibration is 0.040 (95% CI 0.032â0.094).

Tracking delay for level of consciousness

In Fig. 2 we show boxplots for several different cases. The delay was longer for larger increases or decreases in the level of consciousness, since it required longer time for the âz-scoreâ (the output from the final ordinal regression layer before applying the thresholds to convert into discrete RASS levels, see Methods) to climb up or drop down for a larger distance.

Example of continuous tracking

The model was trained only on EEG data 1âh around each assessment to maintain proximity to RASS or CAM-ICU scores. Here we illustrate the application of the model on longer, continuous EEG signals, as shown in Fig. 3. For level of consciousness, we show the predicted z-score, which is a continuous value but can be discretized into RASS levels. The predicted z-score matched well with the periods around RASS assessments (solid lines in panel b). Panel d shows the continuously tracked probability of delirium.

Model interpretation

To interpret what EEG patterns this model has learned, we computed the gradient of the z-score at the end of a period with respect to the input EEG signal. The signal parts with larger gradient had higher impact on the final z-score, and were thus more important to the model predictions. Figure 4 shows a 1.9-min signal with both true and predicted RASS at 0, i.e. awake and calm state. The important parts (red) showed blinking artifacts, a sign of wakefulness. In Fig. 5 we demonstrate another example with both true and predicted RASS at â5, i.e. coma. Here the important parts (red) showed slow waves and low amplitude, which are characteristic of depressed levels of consciousness (e.g. sleep, encephalopathy, and coma).

Discussion

We have demonstrated a system that can automatically track both the level of consciousness (LOC) and delirium in the ICU using frontal EEG signals and deep learning. The results show the feasibility of providing continuous measures of level of consciousness and delirium for ICU patients. Unlike current behavioral assessments, our model is based on physiological signals, being more direct than behavior. The values also have natural interpretations: for level of consciousness, the discretized levels map onto RASS scores; for delirium, the predicted value between 0 and 1 is the probability of being delirious. This system has the potential to improve the management of both sedation and delirium in ICU.

Multiple studies have studied the classification of level of consciousness in both operative room and ICU patients. Engemann et al.¹⁹ used the extra-trees algorithm to classify 327 patients with unresponsive/minimally conscious state vs. 66 healthy controls. They applied the classifier across different cohorts, EEG protocols and centers, and obtained AUC ranging from 0.73 to 0.78. The modest performance confirms the heterogeneity between cohorts, protocols and centers. Nagaraj et al.²⁰ used atomic decomposition and a support vector machine classifier to classify RASS â5, â4 vs. â1, 0 based on 44 patientsâa subset of our dataset. They achieved AUC at 0.91. The results are comparable to the case of training specifically a binary classification achieving AUC 0.89 (95% CI 0.88â0.91) instead of ordinal regression (Supplementary Fig. 6). As discussed in their Dataset Section, they excluded difficult cases and used two RASS assessments from each patient. Our results are more generalizable by including more patients and achieving similar performance.

Multiple studies have compared simple features of EEG signals in delirium vs. non-delirium patients, such as spectral power²¹ and functional connectivity,²² anticipating the possibility of continuously tracking delirium from EEG in ICU.²³ Our model provides the ability to track delirium in continuously with similar AUC as clinical risk prediction models.^24,25 van der Kooi et al.²⁶ describe delirium detection based on EEG collected from 28 delirious and age-gender matched 28 non-delirious post-cardiothoracic surgery patients. They found that the relative delta power at F8-Pz channel during the eye closed condition is increased in delirium patients, and achieves the best discrimination with AUC 0.99 (95% 0.97â1.00). Of note, our study is carried out in mechanically ventilated and sedated patients in the ICU, which represents a more severely ill population than post-operation patients.

Our model exhibits delay in response to the change in level of consciousness. The change in the predicted value lagged behind the change in EEG by 0.5â6âmin. Although not directly comparable, Zanner et al.²⁷ measured the time delay in BIS, Narcotrend Index, and cerebral state index when measuring anesthesia depth in the operating room. They found that 24 to 122âs were needed before the new state was identified. Similarly, their time delays were not constant and varied depending on the starting and ending anesthesia depth. For tracking level of consciousness and delirium in ICU patients, these delays are acceptable in practice.

The tracking performance for level of consciousness is not perfect. As seen in Fig. 1c, the exact agreement between the true and predicted RASS is low at 24% for using CNNâ+âLSTM. When allowing one RASS level difference, the agreement increases up to 70%. The discrepancy probably comes from multiple sources. (1) The behavioral difference between consecutive RASS levels in terms of EEG is small. In fact, RASS â1 is defined as eye contact more than 10âs, while RASS â2 is defined as less than 10âs. This distinction may not be possible to accurately sort out in terms of brain activity. (2) Heterogeneity exists between patients, such as different ICU admission diagnoses and different sedatives and analgesics,²⁹ each of which may have a different effect on EEG. Multiple different EEG patterns may therefore correspond to the same behavioral state and thus to the same RASS level. For example, non-convulsive seizures and burst suppression can both present clinically with coma, but have very different EEGs. The results highlight the difficulty of inferring a behavioral state (RASS) from the EEGâmore difficult than in the operating room (e.g. BIS). A future direction is to calibrate the prediction algorithm based on the existing RASSâ+âEEG observations for a given patient. (3) We have not considered the differences between effects of various sedative and analgesic drugs in the ICU. (4) Human error and variability are inherent in each clinical assessment. We have measured the technician-nurse agreement in our dataset (Fig. 1, Supplementary Fig. 8). The median of the mean absolute difference between technician and nurse assessments across all patients is between 1.00 and 1.97 RASS levels. Even though formal studies show good between-rater agreement,⁷ our data show that in practice the agreement is not as high.

There are some limitations in our approach. (1) Our sample size for RASS measurement was 174 patients, which may not be large enough to capture the full range of variation in EEG patterns and corresponding behavioral states that occur in the ICU. (2) Due to heterogeneity between patients, the variance of tracking performance among patients is large. More detailed stratification of patients into different phenotypes and training different models for each phenotype is a possible future approach. (3) The current model does not consider the different effects of different sedatives on the EEG. Our model likely mainly reflects EEG patterns under propofol, given that patients in our cohort were mainly sedated using propofol. (4) Our data include no positive RASS scores during times when both CAM-ICU and EEG are available, as shown in Supplementary Fig. 1b. We speculate that there are two likely reasons for this. First, hypoactive delirium is more common than hyperactive delirium, particularly in ICU patients. Second, nurses actively adjust sedation levels to prevent patients becoming agitated or combative while patients remain on mechanical ventilation, and in our study the EEG leads were removed when the patient was weaned from mechanical ventilation. Thus the potential utility of EEG for hyperactive delirium has to be studied in other cohorts. (5) CAM-ICU assessments (where incidence is high) were performed only done once per day. More frequent assessment of delirium status would better reflect the dynamic course of delirium and provide more training data.

Methods

Dataset

The study was a single-center, prospective observational study approved by the Partners Institutional Review Board (IRB). The IRB waived the requirement for written signed consent in this study. The EEG signals were collected from 195 distinct ICU patients. The inclusion criteria were: (1) age â¥18 years; (2) on mechanical ventilation; and (3) have at least one RASS or CAM-ICU assessment during EEG recording. The exclusion criteria were: (1) any known focal neurologic deficits or dementia; and (2) poor EEG signal quality by visual inspection (ten patients excluded). The final dataset contains 174 patients. The average ICU stay was 12â13 days. The most commonly used sedative was propofol. Patient characteristics are summarized in Table 2.

Table 2 Patient characteristics

Full size table

RASS and CAM-ICU

To measure the level of consciousness, we used the Richmond agitation-sedation scale (RASS)⁷ as the target to train the model. RASS was assessed by ICU nurses and clinical research technicians approximately every 2âh. RASS has ten levels from â5 to +4 as shown in Supplementary Table 1. The range from â5 to 0 (inclusive) describes different levels of sedation, where â5 and â4 indicate coma (unarousable, no response to verbal or noxious stimulation) and 0 indicates an alert and calm state. The range from +1 to +4 (inclusive) describes different levels of agitation which are associated with hyperactive delirium. In this study we limited RASS assessments to those of normal or decreased levels of arousal only, i.e. â5 to 0, since (1) there was no positive RASS during CAM-ICU assessments with EEG signal available in the dataset (Supplementary Fig. 1) and (2) being combative and agitated can be reliably detected by ICU staff.

To measure delirium, we used the CAM-ICU as the target to train the model. The CAM-ICU is a screening protocol that is performed about every 24âh⁸ (Supplementary Table 2). While unresponsive patients (RASSâ=ââ4 or â5) are typically not further assessed in formal use of the CAM-ICU, we treated these patients as CAM-ICU positive for model training purposes, given the clearly abnormal mental status.

EEG preprocessing

The EEG signals were recorded using Sedline brain function monitors (Masimo Corporation, Irvine, CA, USA), with 250âHz sampling rate and 4 frontal electrodes. We re-referenced the signals to 2 bipolar channels: Fp1-F7 and Fp2-F8. The signals were first notch filtered at 60âHz, then bandpass filtered between 0.5âHz to 20âHz, and finally downsampled to 62.5âHz.

We took 1âh EEG segments 30âmin before and 30âmin after each RASS or CAM-ICU assessment. This is because the assessment times recorded by the ICU nurses may be imprecise, since they are recorded after performing assessments. We therefore included the longer EEG segment to ensure it included the actual assessment time.

EEG artifact were defined based on the presence of any of the following in any EEG channel: (1) maximum amplitude higher than 1000âÂµV; (2) standard deviation less than 0.2âÂµV; (3) overly fast changes of more than 900ÂµV within 0.1âs; or (4) spuriously staircase-like spectrum, when the maximum value obtained by convolution with a predefined staircase-like kernel exceeds an empirical threshold of 10, indicating the presence of nonphysiologic single-frequency artifacts from ICU machines (e.g. cooling blankets or pumps).

Deep learning model

The overall deep learning model consisted of convolutional neural network (CNN) followed by long-short term memory (LSTM), as shown in Supplementary Fig. 9. CNN extracts useful information from each 4âs in the EEG waveform and LSTM provides the temporal context. The CNN followed the architecture in Hannun et al.²⁸ It contains 8 blocks mainly consisting of two convolutional layers (conv) and a skip layer maxpooling connection. The output from CNN is then fed to a two-layer LSM, followed by an output layer, which is ordinal regression for RASS and binary classification for CAM-ICU. The ordinal regression learns a continuous âz-scoreâ and the thresholds. If needed, we can apply the learned thresholds to discretize z-score into RASS levels. The binary classification outputs the probability of being CAM-ICU positive (delirium). The detailed description of the model architecture and coding details can be found in Supplementary Methods.

Model training

To avoid the model being overfit to the dataset, we randomly split patients into ten groups (folds). We took each fold as a testing set, and the other ninefolds as the training set. For the training set we further randomly selected 10% of assessments as the validation set, and the remaining 90% of assessments as the training set. The model with the minimum loss on the validation set was used, and then results were calculated for the held-out testing set. The above procedure was repeated for each fold.

To prepare data for CNN, the 1âh EEG signal around each assessment was segmented into 4âs windows with 2âs overlap (Supplementary Fig. 2a). We removed 4s-segments identified as artifact. 10% of segments were removed due to artifacts. The input to the CNN has size N x 2âÃâ250, where N is the number of 4s-segments, 2 is the number of channels and 250 is the number of time points in 4âs (62.5âHz). The choice of 4âs window is inspired by domain knowledge â in clinical neurology practice, windows of 10âs are used, but 4âs is enough to discern features usually used to describe the EEG, e.g. the presence of delta or theta slowing, epileptiform abnormalities, and EEG suppression.

Data preparation for LSTM is different. There are 900 4s-segments in each 1âh EEG signal. Training an LSTM model on such a long sequence is difficult. Therefore we trained the two layers of LSTM separately while fixing the parameters in the already trained CNN. The first LSTM layer was trained using 9.5âmin sequences with step size 1âmin (Supplementary Fig. 2b). The input had size N x 142âÃâ2âÃâ250, where 142 is the number of 4s-segments in a 9.5âmin sequence. To train the second LSTM layer, we fixed the first LSTM layer. 1âh sequences were used with size N x 900âÃâ2âÃâ250, where 900 is the number of 4s-segments in a 1âh sequence (Supplementary Fig. 2c). Sequences with more than 50% of 4s-segments being artifact were removed, otherwise the artifacts in 4s-segments were kept to ensure continuity of the sequence. 9% of the sequences were removed.

For the CAM-ICU, since the number of samples was less than that of RASS, we copied the first M layers of the RASS CNN model to the CAM-ICU CNN model and fixed them to avoid overfitting; only the layers after the first M layers were trained. The performance of different Mâs is shown in Supplementary Fig. 3. Here we took Mâ=â5 since it achieved the best validation performance.

In both tasks, to address the imbalance of RASS levels or CAM-ICU scores in the dataset, we computed sample weights for each level inversely proportional to the number of examples in this level from the training set. The models were trained with a minibatch size of 32 and the RMSprop optimizer with learning rate 0.001.

Model evaluation

The final performance was reported using the testing patients pooled from all folds. For tracking RASS, the predicted z-score was averaged across all 4s-segments in each 1âh sequence, and then the thresholds learned by the ordinal regression layer were used to discretize the averaged z-score to produce the predicted RASS level. We evaluated the RASS tracking performance using three metrics: (1) balanced mean absolute error (MAE), i.e. the average absolute difference between true and predicted RASS levels, weighted by class weights inverse proportional to number of samples in that class; (2) balanced accuracy when allowing up to one level difference, weighted by class weights; and (3) binary classification performance, measured by area under the receiver operator curve (AUC), for discriminating RASS levels â5 or â4 (âcomaâ) from â1 or 0 (âawakeâ), while discarding other levels. For tracking CAM-ICU, the predicted probability was averaged across all 4s-segments in each 1âh sequence to get the probability of being delirious.

The accuracy per 4âs without averaging (CNN only) is shown in Supplementary Fig. 7. These accuracies are worse than the averaged versions. The 4âs window is best thought as a step for local evaluation of the signal, and these local evaluations are aggregated to compute the probability of RASS/delirium at the present time, based on the prior EEG. Our model still reports an updated prediction every 4âs (this is the step size), although the prediction for the present time is based on the past 1âh. By contrast, in the ICUs in our institution, RASS is manually assessed every 2âh, and delirium is formally assessed only one time per day, thus the proposed method is an improvement.

Technicianânurse agreement

Since RASS assessments were available from both ICU nurses and clinical research technicians, we were able to measure the technicianânurse agreement, as follows. For each assessment done by each research staff member, we found the closest nurse assessment for the same patient. We excluded assessment pairs more than 4âh apart.

Baseline methods to be compared

To compare with other deep learning candidates, we built three other models (1) using EEG waveforms as input and CNN only; (2) using EEG spectrograms as input and LSTM only; and (3) using EEG band powers as input and LSTM only. The CNN and LSTM had the same structure as in Supplementary Fig. 9. The EEG band powers included delta (0â4âHz), theta (4â8âHz), and alpha (8â12âHz), as well as the relative band power normalized by total power (0â12âHz).

To compare with non-deep learning methods, we extracted the above band power from each 4s-segment, which were then averaged across 1âh time. We also extracted the BSR, i.e., the proportion of time within 1âh having signal envelope less than 5âÂµV. After generating these features, we trained ordinal regression for RASS; and logistic regression, support vector machine, and random forest for CAM-ICU.

Statistical tests

To compare the performance among multiple algorithms, we used KruskalâWallis one-way analysis of variance (KW-ANOVA), which is a nonparametric version of ANOVA. The null hypothesis is that the medians of all groups are equal. We used Dunnâs test (two-sided) as the post hoc test together with Bonferroni multiple comparisons correction to decide which pairs had significantly different medians. The confidence intervals mentioned below are all 95% confidence interval obtained by bootstrapping 1000 times.

Delays in tracking level of consciousness

For each patient we artificially concatenated two segments of 9.5âmin EEG signals with different RASS levels, denoted as RASS1 and RASS2, where the absolute difference between RASS1 and RASS2 was more than one level. The delay is defined as the time from concatenation point to the first time the prediction reaches RASS2âÂ±â1.

Reporting Summary

Further information on research design is available in the Nature Research Reporting Summary linked to this article.

Data Availability

The data that support the findings of this study are available from the corresponding author upon reasonable request.

Code Availability

Code for the algorithm development, evaluation, and statistical analysis is open source with no restrictions and is available from https://github.com/mghcdac/rass_delirium_eeg_prediction.

References

Reade, M. C. & Finfer, S. Sedation and delirium in the intensive care unit. N. Engl. J. Med. 370, 444â454 (2014).
ArticleÂ CASÂ Google ScholarÂ
Kaplan, L. & Bailey, H. Bispectral index (BIS) monitoring of ICU patients on continuous infusion of sedatives and paralytics reduces sedative drug utilization and cost. Crit. Care 4, P190 (BioMed Central 2000).
Wolters, A. E. et al. Long-term outcome of delirium during intensive care unit stay in survivors of critical illness: a prospective cohort study. Crit. Care 18, R125 (2014).
ArticleÂ Google ScholarÂ
Hughes, C. G., McGrane, S. & Pandharipande, P. P. Sedation in the intensive care setting. Clin. Pharmacol. Adv. Appl. 4, 53 (2012).
Google ScholarÂ
Ramsay, M., Savege, T., Simpson, B. & Goodwin, R. Controlled sedation with alphaxalone-alphadolone. Br. Med. J. 2, 656 (1974).
ArticleÂ CASÂ Google ScholarÂ
Riker, R. R., Picard, J. T. & Fraser, G. L. Prospective evaluation of the Sedation-Agitation Scale for adult critically ill patients. Crit. Care Med. 27, 1325â1329 (1999).
ArticleÂ CASÂ Google ScholarÂ
Sessler, C. N. et al. The Richmond AgitationâSedation Scale: validity and reliability in adult intensive care unit patients. Am. J. Respir. Crit. Care Med. 166, 1338â1344 (2002).
ArticleÂ Google ScholarÂ
Ely, E. W. et al. Evaluation of delirium in critically ill patients: validation of the Confusion Assessment Method for the Intensive Care Unit (CAM-ICU). Crit. Care Med. 29, 1370â1379 (2001).
ArticleÂ CASÂ Google ScholarÂ
Bergeron, N., Dubois, M.-J., Dumont, M., Dial, S. & Skrobik, Y. Intensive Care Delirium Screening Checklist: evaluation of a new screening tool. Intensive Care Med. 27, 859â864 (2001).
ArticleÂ CASÂ Google ScholarÂ
Ely, E. W. et al. Monitoring sedation status over time in ICU patients: reliability and validity of the Richmond Agitation-Sedation Scale (RASS). JAMA 289, 2983â2991 (2003).
ArticleÂ Google ScholarÂ
Dale, C. R. et al. Improved analgesia, sedation, and delirium protocol associated with decreased duration of delirium and mechanical ventilation. Ann. Am. Thorac. Soc. 11, 367â374 (2014).
ArticleÂ Google ScholarÂ
Sackey, P. V. Frontal EEG for intensive care unit sedation: treating numbers or patients? Crit. Care 12, 186 (2008).
ArticleÂ Google ScholarÂ
Friedman, D., Claassen, J. & Hirsch, L. J. Continuous electroencephalogram monitoring in the intensive care unit. Anesth. Analg. 109, 506â523 (2009).
ArticleÂ Google ScholarÂ
Kubota, Y., Nakamoto, H., Egawa, S. & Kawamata, T. Continuous EEG monitoring in ICU. J. Intensive Care 6, 39 (2018).
ArticleÂ Google ScholarÂ
Herman, S. T. et al. Consensus statement on continuous EEG in critically ill adults and children, Part II: personnel, technical specifications and clinical practice. J. Clin. Neurophysiol. Publ. Am. Electroencephalogr. Soc. 32, 96 (2015).
Google ScholarÂ
Bilgili, B. et al. Utilizing bi-spectral index (BIS) for the monitoring of sedated adult ICU patients: a systematic review. Minerva Anestesiol. 83, 288â301 (2017).
PubMedÂ Google ScholarÂ
Yu, K.-H., Beam, A. L. & Kohane, I. S. Artificial intelligence in healthcare. Nat. Biomed. Eng. 2, 719 (2018).
ArticleÂ Google ScholarÂ
Liu, X. Classification accuracy and cut point selection. Stat. Med. 31, 2676â2686 (2012).
ArticleÂ Google ScholarÂ
Engemann, D. A. et al. Robust EEG-based cross-site and cross-protocol classification of states of consciousness. Brain 141, 3179â3192 (2018).
ArticleÂ Google ScholarÂ
Nagaraj, S. B. et al. Electroencephalogram based detection of deep sedation in ICU patients using atomic decomposition. IEEE Trans. Biomed. Eng. 65, 2684â2691 (2018).
ArticleÂ Google ScholarÂ
van der Kooi, A. W., Slooter, A., van Het, K. M. & Leijten, F. EEG in delirium: increased spectral variability and decreased complexity. Clin. Neurophysiol. 125, 2137 (2014).
ArticleÂ Google ScholarÂ
van Dellen, E. et al. Decreased functional connectivity and disturbed directionality of information flow in the electroencephalography of intensive care unit patients with delirium after cardiac surgery. Anesthesiol. J. Am. Soc. Anesthesiol. 121, 328â335 (2014).
Google ScholarÂ
van der Kooi, A. W., Leijten, F. S., van der Wekken, R. J. & Slooter, A. J. What are the opportunities for EEG-based monitoring of delirium in the ICU? J. Neuropsychiatry Clin. Neurosci. 24, 472â477 (2012).
ArticleÂ Google ScholarÂ
Van den Boogaard, M. et al. Development and validation of PRE-DELIRIC (PREdiction of DELIRium in ICu patients) delirium prediction model for intensive care patients: observational multicentre study. BMJ 344, e420 (2012).
ArticleÂ Google ScholarÂ
Wassenaar, A. et al. Multinational development and validation of an early prediction model for delirium in ICU patients. Intensive Care Med. 41, 1048â1056 (2015).
ArticleÂ CASÂ Google ScholarÂ
Van Der Kooi, A. W. et al. Delirium detection using EEG. Chest 147, 94â101 (2015).
ArticleÂ Google ScholarÂ
Zanner, R., Pilge, S., Kochs, E., Kreuzer, M. & Schneider, G. Time delay of electroencephalogram index calculation: analysis of cerebral state, bispectral, and narcotrend indices using perioperatively recorded electroencephalographic signals. Br. J. Anaesth. 103, 394â399 (2009).
ArticleÂ CASÂ Google ScholarÂ
Hannun, A. Y. et al. Cardiologist-level arrhythmia detection and classification in ambulatory electrocardiograms using a deep neural network. Nat. Med. 25, 65â69 (2019).
ArticleÂ CASÂ Google ScholarÂ
Purdon, P. L., Sampson, A., Pavone, K. J. & Brown, E. N. Clinical electroencephalography for anesthesiologistspart I: background and basic signatures. Anesthesiol. J. Amer. Soc. Anesthesiol. 123, 937â960 (2015).
CASÂ Google ScholarÂ

Download references

Acknowledgements

M.B.W. received support from NIH-NINDS (1K23NS090900, 1R01NS102190, 1R01NS102574, 1R01NS107291). E.K. received funding from NIH-NIMH (1K08MH11613501). We acknowledge the support from the Critical Care EEG Monitoring Research Consortium (CCEMRC).

Author information

Authors and Affiliations

Department of Neurology, Massachusetts General Hospital, Boston, MA, USA
Haoqi Sun,Â Eyal Kimchi,Â Lauren M. McClain,Â Emily Boyle,Â Wei-Long Zheng,Â Wendong GeÂ &Â M. Brandon Westover
Department of Anesthesia, Critical Care, and Pain Medicine, Massachusetts General Hospital, Boston, MA, USA
Oluwaseun AkejuÂ &Â David W. Zhou
Department of Clinical Pharmacy and Pharmacology, University Medical Center Groningen, University of Groningen, Groningen, the Netherlands
Sunil B. Nagaraj

Authors

Haoqi Sun
View author publications
You can also search for this author in PubMedÂ Google Scholar
Eyal Kimchi
View author publications
You can also search for this author in PubMedÂ Google Scholar
Oluwaseun Akeju
View author publications
You can also search for this author in PubMedÂ Google Scholar
Sunil B. Nagaraj
View author publications
You can also search for this author in PubMedÂ Google Scholar
Lauren M. McClain
View author publications
You can also search for this author in PubMedÂ Google Scholar
David W. Zhou
View author publications
You can also search for this author in PubMedÂ Google Scholar
Emily Boyle
View author publications
You can also search for this author in PubMedÂ Google Scholar
Wei-Long Zheng
View author publications
You can also search for this author in PubMedÂ Google Scholar
Wendong Ge
View author publications
You can also search for this author in PubMedÂ Google Scholar
M. Brandon Westover
View author publications
You can also search for this author in PubMedÂ Google Scholar

Contributions

H.S.: analyzed the data, drafted and critically revised the paper. E.K., O.A., W.L.Z., W.G.: revised the paper critically. S.B.N., L.M.M., D.W.Z. and E.B.: collected the data. M.B.W.: conceptualized and designed the experiment, and revised the paper critically. All authors approved the completed final version and are accountable for all aspects of the work.

Corresponding author

Correspondence to M. Brandon Westover.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisherâs note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Material

reporting summary

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the articleâs Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the articleâs Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Sun, H., Kimchi, E., Akeju, O. et al. Automated tracking of level of consciousness and delirium in critical illness using deep learning. npj Digit. Med. 2, 89 (2019). https://doi.org/10.1038/s41746-019-0167-0

Download citation

Received: 22 April 2019
Accepted: 20 August 2019
Published: 09 September 2019
DOI: https://doi.org/10.1038/s41746-019-0167-0

This article is cited by

Electroencephalography in delirium assessment: a scoping review
- Tim L. T. Wiegand
- Jan RÃ©mi
- Konstantinos Dimitriadis
BMC Neurology (2022)
EEG-based grading of immune effector cell-associated neurotoxicity syndrome
- Daniel K. Jones
- Christine A. Eckhardt
- M. Brandon Westover
Scientific Reports (2022)
Semi-automated tracking of pain in critical care patients using artificial intelligence: a retrospective observational study
- Naoya Kobayashi
- Takuya Shiga
- Masanori Yamauchi
Scientific Reports (2021)
KÃ¼nstliche Intelligenz in der Neurointensivmedizin
- N. Schweingruber
- C. Gerloff
Der Nervenarzt (2021)