Timeâfrequency timeâspace LSTM for robust classification of physiological signals

Pham, Tuan D.

doi:10.1038/s41598-021-86432-7

Download PDF

Article
Open access
Published: 25 March 2021

Timeâfrequency timeâspace LSTM for robust classification of physiological signals

Tuan D. Pham¹Â

Scientific Reports volumeÂ 11, ArticleÂ number:Â 6936 (2021) Cite this article

24k Accesses
Metrics details

Subjects

Abstract

Automated analysis of physiological time series is utilized for many clinical applications in medicine and life sciences. Long short-term memory (LSTM) is a deep recurrent neural network architecture used for classification of time-series data. Here timeâfrequency and timeâspace properties of time series are introduced as a robust tool for LSTM processing of long sequential data in physiology. Based on classification results obtained from two databases of sensor-induced physiological signals, the proposed approach has the potential for (1) achieving very high classification accuracy, (2) saving tremendous time for data learning, and (3) being cost-effective and user-comfortable for clinical trials by reducing multiple wearable sensors for data recording.

A high-density 1,024-channel probe for brain-wide recordings in non-human primates

Article 24 June 2024

Deep learning-based prediction of one-year mortality in Finland is an accurate but unfair aging marker

Article Open access 24 June 2024

Brain organoid reservoir computing for artificial intelligence

Article 11 December 2023

Introduction

Analysis and classification of clinical time-series data in physiology and disease processes are considered as a catalyst for biomedical research and education. Innovative computerized tools for physiological data classification are increasingly needed to facilitate investigations on new unsolved challenging problems in clinical and life sciences with respect to both basic and translational perspectives. Conventional methods for classification of physiological time series to detect abnormal conditions include fractals, chaos, nonlinear dynamics, signal coding, pattern matching, and machine learning. The current surge of modern artificial intelligence (AI) opens a new approach for sequential data classification with long short-term memory (LSTM) networks¹, which are an architecture of deep learning. LSTM networks are a type of recurrent neural networks that learn order dependence in sequential data.

There are many methods developed for classification of time series in different fields of applications. Time-series classification algorithms based on discriminatory features can be categorized into six main groups²: (1) whole series, (2) intervals, (3) shapelets, (4) dictionary, (5) combinations, and (6) model. For the whole-series approach, classification is performed by comparing the similarity between two time series using a distance measure. The methods of intervals choose one or multiple intervals of the series and use summary measures as features for classification. The methods of shapelets define a class with phase-independent patterns called shapelets, then a class is identified by the existence of one or more shapelets in the whole time series. The dictionary-based methods classify time series based on the frequency of its recurring subseries. The methods of combinations try to combine two or more methods of the whole series, intervals, shapelets, and dictionary for classification. The model-based methods fit a time series to mathematical models constructed for the classes and then assign the time series to the class that has the largest similarity score given by the class model. Most recently, deep-learning methods or deep neural networks have been reported to outperform many baseline time-series classification approaches and appear to be the most promising techniques for classifying temporal data³.

Because LSTM networks can capture long-term temporal dependencies, they have been applied to provide solutions for many difficult problems in bioinformatics and computational biology⁴. As a state-of-the-art method for learning physiological models for disease prediction, many applications of LSTM and other deep-learning networks have recently been reported in literature, such as classifying electroencephalogram (EEG) signals in emotion, motor imagery, mental workload, seizure, sleep stage, and event related potentials⁵, non-EEG signals in Parkinsonâs disease (PD)⁶, learning and synthesis of respiration, electromyograms, and electrocardiograms (ECG) signals⁷, decoding of gait phases using EEG⁸, and early prediction of stress, health, and mood using wearable sensor data⁹.

The present work presents a timeâfrequency timeâspace LSTM tool for robust and efficient classification of physiological time series, while solutions obtained from conventional LSTM networks would result in lower accuracy and higher data training time. Furthermore, for the case of clinical gait analysis with the use of measurement sensors to assess biomechanical patterns and therapeutic plan for rehabilitation in patients disabled from conditions such as PD and post stroke, long walk trials are recommended to obtain at least 370 strides¹⁰. Such long-distance walks result in long records of physiological measurements, cause discomfort to the patients, and may be impractical to perform in many clinical settings¹¹.

Differentiating patients with PD from healthy controls using gait data was studied in¹², which trained fuzzy neural networks with wavelet features extracted from the gait data. Another study extracted gait features with the short-time Fourier transform and used the support vector machines (SVMs) for the classification task¹³. To capture the local changes in the dynamics of gait signals, the feature-extraction method of shifted 1-D local binary patterns and a multilayer perceptron, which is a class of feed-forward artificial neural networks, were used for the classification of PD and healthy controls¹⁴. The extraction of time-domain and frequency-domain features of gait data for training with random decision forests, which are an ensemble machine-learning method for classification, was reported in a more recent study for detecting patients with PD¹⁵. All these studies employed shallow neural networks or SVMs. However, deep neural networks are known to be the most advanced models of the neural-network approach and shown to be of performance superior to other types of statistical classifiers¹⁶.

The novel idea for classification of physiological data with LSTM presented herein is the creation of complementary timeâfrequency and timeâspace features of time series. In signal processing, instead of viewing a time series as a one-dimensional signal, timeâfrequency analysis studies a signal in both time and frequency domains simultaneously by some function whose domain is the two-dimensional real plane to extract transient features from the signal by a timeâfrequency transform. Timeâfrequency signal processing for feature extraction was reviewed as a useful approach for pattern recognition¹⁷ that provided successful applications, including EEG seizure detection and classification¹⁷, classification of ultra-high-frequency signals¹⁸, classification of vibration events¹⁹, and classification of EEG signals and episodic memory²⁰.

In nonlinear dynamics, the timeâspace analysis attempts to transform one-dimensional signal into a two-dimensional space to enable the visualization of the recurrences of states of a dynamical system at certain times and enable the extraction of distinctive features representing behaviors of different dynamical mechanisms underlying nonlinear time series. The extraction of novel features from time series not only facilitate the power of signal compression for deep learning but also enhances the capability of LSTM networks for robust signal classification. In chaos theory, the method of recurrence plots (RPs) was developed for nonlinear time-series analysis²¹. RPs and extended methods were further addressed for the analysis of complex systems^22,23 and dynamical features of nonlinear time series²⁴. While an RP is a binary visualization of recurrences of states of a dynamical system at certain pairs of time, a fuzzy recurrence plot (FRP)²⁵ displays the visualization as a grayscale image. Because of being much richer in texture than RPs, the technique of FRPs of time series is a preferred approach for texture analysis and has been successfully applied to extract texture features for pattern recognition, including classification of PD and control subjects using deep learning^6,26, tensor decomposition²⁷, and SVMs²⁸; and other neuro-degenerative diseases²⁹.

In general, the timeâfrequency analysis is known a preferred approach for the representation and essential feature extraction of non-stationary signals because it is effective for estimating the underlying characteristics composing the signals¹⁷, whereas the timeâspace analysis provides another kind of visual information about the signals by detecting hidden dynamical features being inherent in the data. The combination of complementary features generated by both timeâfrequency and timeâspace analysis methods is therefore promising for enhancing the classification power of the sequential deep learning.

Data

ECG data

ECG signals capture the electrical activity of a human heart over a period of time. ECG signals are used by physicians for examining the condition of a patientâs heartbeat to detect if the condition is normal or irregular. Atrial fibrillation (AF) is a type of irregular heartbeat that occurs when the upper chambers of the heart (atria) beat out of coordination with the lower chambers (ventricles). The ECG data³⁰ used in this study are publicly available from the PhysioNet: The Research Resource for Complex Physiologic Signals. The data consist of ECG signals sampled at 300 Hz and classified by a group of experts into normal sinus rhythm, AF, alternative rhythm, and noise. The purpose of the creation of this challenging database was to call for the development of new methods for classifying these types of cardiac arrhythmias. Information about the number of participants in the recordings of normal rhythm, AF rhythm, and other rhythms is not available from the data source³¹.

Gait in Parkinsonâs disease data

The Gait in Parkinsonâs Disease database³² consists of time series of vertical ground reaction force in Newtons of gait dynamics from 93 patients with idiopathic PD and 73 healthy controls. This database is also publicly available from the PhysioNet: The Research Resource for Complex Physiologic Signals. The data consist of the vertical ground reaction force (in Newtons) signals of the subjects as they walked at their usual, self-selected pace for approximately 2 minutes on level ground. The force was measured as a function of time with 8 sensors placed underneath each foot. The force signals of each of the 16 sensors placed under the two feet of each subject were digitized and recorded at 100 samples per second.

Timeâfrequency and timeâspace analysis

Instantaneous frequency

The instantaneous frequency (IF) of a non-stationary signal is a time-varying parameter that relates to the average of the frequencies f present in the signal as it evolves over time instants t^33,34. The IF function estimates the IF of a signal at a sampling rate by computing the spectrogram power spectrum P(t,Â f) and estimating the IF as

$$\begin{aligned} IF(t) = \frac{\int _{-\infty }^{\infty } f P(t,f) df}{\int _{-\infty }^{\infty } P(t,f) df}. \end{aligned}$$

(1)

The power spectrum is a mathematical expression of the amount of the signal at a frequency f. For a periodic signal, peaks at the fundamental frequency and its harmonics are observed at the spectrum; for a quasiperiodic signal, peaks at linear combinations of related frequencies observed; and a chaotic signal yields broad band components to the spectrum. In practice, the exact solution for the power spectrum cannot be determined because a signal x(t) is not infinitely long but measured over a finite interval $0 \le t \le T$. Therefore, the power spectrum needs to be numerically estimated. A method for estimating the power spectrum of a time series $x_k$, $k = 0, \dots , N-1$ is described as follows.

The spectral density of a time series of length N can be approximated as³⁵

$$\begin{aligned} P_N(f) = \frac{\Delta t}{N} \left|\sum _{k=0}^{N-1} x_k e^{-i 2 \pi fk \Delta t} \right|^2, \end{aligned}$$

(2)

where $\Delta t$ is the sampling interval.

If the spectral value is calculated at $f = j \Delta f$, where $\Delta f = 1/(N \Delta t)$, and $\Delta t = 1$, then

$$\begin{aligned} P_j = \frac{1}{N} \left|\sum _{k=0}^{N-1} x_k e^{-i 2 \pi \frac{jk}{N}} \right|^2 = \frac{1}{N} \left|X_j \right|^2, \end{aligned}$$

(3)

which indicates the discrete Fourier transform (DFT), $X_j$, as

$$\begin{aligned} X_j = \sum _{k=0}^{N-1} x_k e^{-i 2 \pi \frac{jk}{N}}, j = 0, \dots , N-1. \end{aligned}$$

(4)

However, it was proved that the power spectrum estimate expressed in Eq. (3) is not properly scaled³⁵. Therefore, the estimate is modified as

$$\begin{aligned} P_j= & {} \frac{1}{WN} \left|\sum _{k=0}^{N-1} w_k x_k e^{-i 2 \pi \frac{jk}{N}} \right|^2, j = 0, \dots , N-1; \end{aligned}$$

(5)

in which

$$\begin{aligned} W = \frac{1}{N} \sum _{j=0}^{N-1} w_j^2, \end{aligned}$$

(6)

where $w_j$, $j= 0, \dots , N-1$, are the weights or coefficients of a window function (the Kaiser window³⁶ is applied in this study).

The estimate of $P_j$ expressed in Eq. (5) using the fast Fourier transform (FFT) can be sequentially carried out as follows³⁵.

Truncate the time series or pad with zeros so that $N=2^n$, where n is a positive integer.
Weight the time series with a window function.
Calculate the DFT of the weighted time series $(w_k x_k)$ using the FFT.
Calculate $P_j$ using Eq. (5).

Spectral entropy

The spectral entropy (SE) of a signal is a measure of its spectral power distribution^33,34. The SE treats the normalized power distribution of the signal in the frequency domain as a probability distribution and calculates its Shannon entropy. The Shannon entropy in this context is known as the spectral entropy of the signal. Given a timeâfrequency power spectrogram P(t,Â f), the probability distribution at time t, $0 \le t \le T$; and frequency point m, $m = 1, \dots , N$; denoted as p(t,Â m), is

$$\begin{aligned} p(t,m) = \frac{P(t,m)}{\sum _f P(t,f)}, \end{aligned}$$

(7)

where $f \in [0, fs/2]$ is specified in this study, and fs is the sampling frequency.

The spectral entropy at time t, denoted as H(t), is given as

$$\begin{aligned} H(t) = - \sum _{m=1}^N p(t,m) \log _2 p(t,m). \end{aligned}$$

(8)

Fuzzy recurrence plot

In the study of dynamical systems, a sequence of values in time can be transformed into an object in space. This transformation allows the sequence to be analyzed in space. Such space is called the phase space. The object in the phase space is called the phase space set. The transformation of a sequence of values in time into an object in the phase space can be done using the time-delay embedding³⁷. The embedding dimension describes the space (such as a line, an area, or a volume) that contains the object³⁸. Time delay, which is also called lag, expresses the amount of offset in a time series. Mathematically, the phase-space reconstruction using time-delay embedding for a time series ($z_1, z_2, \dots , z_I$) can be performed as ${{\mathbf {y}}}_i = (z_i, z_{i+\phi }, \dots , z_{i+(d-1)\phi }$, $i = 1, \dots , I-(d-1)\phi$, where $\phi$ and d are time delay and embedding dimension, respectively.

In fuzzy logic³⁹, a fuzzy set is defined as a collection of distinct objects whose membership grades in the set are expressed with real numbers. In mathematic terms, let U be a universe of discourse and F a subset of U. The fuzzy set F is characterized by a fuzzy membership function $\mu _F(x)$ that maps each element $x \in U$ to the interval [0, 1]: $\mu _F(x): U \rightarrow [0, 1]$. The real value of $\mu _F(x)$ is called the fuzzy membership grade of x in F. The notion of a fuzzy set can be expressed in the following three cases: 1) $\mu _F(x) = 0$ if x is not totally in F, 2) $\mu _F(x) = 1$ if x is totally in F, and 3) $0< \mu _F(x) < 1$ if x is partially in F. Thus, the greater value of the fuzzy membership grade is, the more certain x is a member of F.

In cluster analysis, data points can be assigned to different groups or clusters. Points that are most similar to each other belong to the same cluster. Based on the concept of fuzzy sets, fuzzy clustering assigns the data points to all clusters with different degrees of fuzzy membership. In other words, the fuzzy membership value of a data point for a certain cluster indicates how positive the data point belongs to that cluster.

Now let ${{\mathbf {X}}} = ({\mathbf {x}}_1, \dots , {\mathbf {x}}_N) \in {\mathbb {R}}^{Nm}$ with ${\mathbf {x}}_i \in {\mathbb {R}}^m$ be a phase-space collection of a signal transformed by the time-delay embedding method, c a pre-defined number of clusters, ${{\mathbf {V}}}=\{{\mathbf {v}}_1, \dots , {\mathbf {v}}_c\}$ a set of clusters, and $\mu ({\mathbf {x}}_i,{\mathbf {v}}_q)$, $i=1, \dots , N$, $q=1, \dots , c$, fuzzy membership grades expressing the degrees of phase-space vectors ${\mathbf {x}}_i$ belonging to cluster centers ${\mathbf {v}}_q \in {{\mathbf {V}}}$. These fuzzy membership grades can be determined using the fuzzy c-means algorithm⁴⁰. An FRP, denoted by $\tilde{{\mathbf {R}}}$, is defined as²⁵

$$\begin{aligned} \tilde{{\mathbf {R}}}(i,j) = \mu ({\mathbf {x}}_i,{\mathbf {x}}_j), \, i, j = 1, \dots , N, \end{aligned}$$

(9)

where $\mu ({\mathbf {x}}_i,{\mathbf {x}}_j) \in [0, 1]$ is the fuzzy membership of similarity between ${\mathbf {x}}_i$ and ${\mathbf {x}}_j$.

The elements of an FRP, $\tilde{{\mathbf {R}}}(i,j)$, $i = 1, \dots , N$, $j = 1, \dots , N$, can be inferred using three properties of fuzzy relations as follows.

1.
Reflexivity:
$$\begin{aligned} \mu ({\mathbf {x}}_i,{\mathbf {x}}_i) = 1, \, i=1, \dots , N. \end{aligned}$$
(10)
2.
Symmetry:
$$\begin{aligned} \mu ({\mathbf {x}}_i,{\mathbf {v}}_q) = \mu ({\mathbf {v}}_q,{\mathbf {x}}_i), \, i = 1, \dots , N, q = 1, \dots , c. \end{aligned}$$
(11)
3.
Transitivity:
$$\begin{aligned} \mu ({\mathbf {x}}_i,{\mathbf {x}}_j) = \max [\min \{\mu ({\mathbf {x}}_i,{\mathbf {v}}_q), \mu ({\mathbf {x}}_j,{\mathbf {v}}_q)\}], q = 1, \dots , c. \end{aligned}$$
(12)

As an example, to illustrate some difference in the visual display of an RP and an FRP, Fig.Â 1 shows a time series of 2000 points of the X-component of the Lorenz (chaotic) system⁴¹, and its RP and FRP. The RP was constructed using the embedding $= 3$, time delay $= 1$, and a conventional value for the similarity threshold $= 5\%$ of the standard deviation of the signals. The FRP was constructed using the embedding $= 3$, time delay $= 1$, and number of clusters $= 3$. The grayscale image of the FRP is much richer in texture than the binary image of the RP.

Fuzzy recurrence image entropy

Entropy of a grayscale image is a statistical measure of randomness to characterize the texture of the image. As an FRP is a grayscale image, the entropy of an FRP image is defined as

$$\begin{aligned} E_{FRI} = - \sum _{k=1}^{K} p_k \log _2 p_k, \end{aligned}$$

(13)

where $K = 256$, which is the number of gray levels of the FRP (obtained by converting real values of pixels in [0, 1] to integers in [0, 255]), and $p_k$ is the probability associated with the intensity level k, $k = 1, \dots , K$, obtained from the normalized histogram for the k-th bin.

Fuzzy recurrence entropy

Based on the definition of the non-probabilistic entropy of a fuzzy set⁴², the entropy of an $N \times N$ FRP or fuzzy recurrence entropy that is a measure of the degree of uncertainty of recurrences of the reconstructed phase space of a signal is defined as⁴³

$$\begin{aligned} E_{FR} = \sum _{i=}^N \sum _{j=1}^N - \mu ({\mathbf {x}}_i,{\mathbf {x}}_j) \, \log _2 \mu ({\mathbf {x}}_i,{\mathbf {x}}_j) - [1-\mu ({\mathbf {x}}_i,{\mathbf {x}}_j)] \, \log _2[1-\mu ({\mathbf {x}}_i,{\mathbf {x}}_j)], \end{aligned}$$

(14)

where $\mu ({\mathbf {x}}_i,{\mathbf {x}}_j)$ corresponds to $\tilde{{\mathbf {R}}}(i,j)$ defined in Eq. (9).

Timeâfrequency timeâspace long short-term memory networks

Based on LSTM networks^1,4,44, in which the proposed input timeâfrequency (TF) and timeâspace (TS) features are included, the architecture for a TFâTS LSTM block is graphically described in Fig.Â 2. This figure illustrates the flow of an input time series ${{\mathbf {u}}} = ({{\mathbf {u}}_1}, \dots , {{\mathbf {u}}_M}) \in {\mathbb {R}}^{MQ}$ through an LSTM layer, where M is the number of segments split from the original time series of length L, and Q the number of features. In this study, $M = \lceil L/N \rceil$, where $N=128$, $\lceil \rceil$ denotes the ceiling function, and $Q=4$. The input at a time point is the concatenation of the four features extracted for the segment at the same time point, i.e., ${{\mathbf {u}}}_\tau = (F_{\tau 1}, F_{\tau 2}, F_{\tau 3}, F_{\tau 4})^T$, $\tau = 1, \dots , M$, where $F_{\tau 1}$, $F_{\tau 2}$, $F_{\tau 3}$, and $F_{\tau 4}$ are the instantaneous frequency, spectral entropy, fuzzy recurrence image entropy, and fuzzy recurrence entropy extracted from segment ${{\mathbf {u}}}_\tau$, respectively.

The learnable weights of an LSTM layer are the input weights, denoted as ${{\mathbf {a}}}$, recurrent weights, denoted as ${{\mathbf {r}}}$, and bias, denoted as b. The matrices ${{\mathbf {A}}}$, ${{\mathbf {R}}}$, and vector ${{\mathbf {b}}}$ are the concatenations of the input weights, recurrent weights, and bias of each component, respectively. The concatenations are expressed as

$$\begin{aligned} {{\mathbf {A}}} \, = \, [{{\mathbf {a}}}_{i}, {{\mathbf {a}}}_{f}, {{\mathbf {a}}}_{g}, {{\mathbf {a}}}_{o}]^T, \end{aligned}$$

(15)

$$\begin{aligned} {{\mathbf {R}}} \,= \, [{{\mathbf {r}}}_{i}, {{\mathbf {r}}}_{f}, {{\mathbf {r}}}_{g}, {{\mathbf {r}}}_{o}]^T, \end{aligned}$$

(16)

$$\begin{aligned} {{\mathbf {b}}} \,= \, [{b}_{i}, {b}_{f}, {b}_{g}, {b}_{o}]^T, \end{aligned}$$

(17)

where i, f, g, and o denote the input gate, forget gate, cell candidate, and output gate, respectively.

The cell state at time step $\tau$ is defined as

$$\begin{aligned} {{\mathbf {c}}}_\tau = f_\tau \circ {{\mathbf {c}}}_{\tau -1} + i_\tau \circ g_\tau , \end{aligned}$$

(18)

where $\circ$ is the Hadamard product.

The hidden state at time step $\tau$ is given by

$$\begin{aligned} {{\mathbf {h}}}_\tau = o_\tau \circ \sigma _c({{\mathbf {c}}}_{\tau }), \end{aligned}$$

(19)

where $\sigma _c$ is the state activation function that is usually computed as the hyperbolic tangent function (tanh).

At time step $\tau$, the input gate ($i_\tau$), forget gate ($f_\tau$), cell candidate ($g_\tau$), and output gate ($o_\tau$) are defined as

$$\begin{aligned} i_\tau \,=\, \sigma _g ({{\mathbf {a}}}_i {{\mathbf {u}}}_\tau + {{\mathbf {r}}}_i {{\mathbf {h}}}_{\tau -1} + {b}_i), \end{aligned}$$

(20)

$$\begin{aligned} f_\tau \,=\, \sigma _g ({{\mathbf {a}}}_f {{\mathbf {u}}}_\tau + {{\mathbf {r}}}_f {{\mathbf {h}}}_{\tau -1} + {b}_f), \end{aligned}$$

(21)

$$\begin{aligned} g_\tau \,=\, \sigma _c ({{\mathbf {a}}}_g {{\mathbf {u}}}_\tau + {{\mathbf {r}}}_g {{\mathbf {h}}}_{\tau -1} + {b}_g), \end{aligned}$$

(22)

$$\begin{aligned} o_\tau\,=\, \sigma _g ({{\mathbf {a}}}_o {{\mathbf {u}}}_\tau + {{\mathbf {r}}}_o {{\mathbf {h}}}_{\tau -1} + {b}_o), \end{aligned}$$

(23)

where $\sigma _g$ denotes the gate activation function that usually adopts the sigmoid function.

A bidirectional LSTM (bi-LSTM)⁴⁵ is an extension of traditional LSTM that can improve performance on sequence classification problems. Instead of being trained with one LSTM on the input time series, a bi-LSTM architecture is trained with both time directions simultaneously with hidden forward and backward layers. The first on the input time series as it is and the second on a reversed copy of the time series. This architecture learns bidirectional long-term dependencies between time steps of time series and therefore can provide additional context to the network and result in fuller learning on the data.

The procedures for obtaining data balance for training and testing sets, and the transformation of raw time series into TF and TS features for LSTM learning and classification are outlined in Fig.Â 3.

To obtain signals of the same length contained in both training and testing datasets, the histogram of the distribution of the lengths of the signals is observed to detect the majority length. Signals of lengths that are less than the majority are discarded, and those that are longer than the majority are split into segments of the majority length and the remaining samples of the signal are ignored if there are any. Creating signals of equal length is particularly useful for the training of the networks that breaks the data into mini-batches. In the same mini-batch, the training pads or truncates the signals to have the same length. However, it is known that the process of padding or truncating can reduce the performance of the networks because of the added or missed information caused by the padding or truncating, respectively. To obtain the data balance in each class for both training and testing, copies of the signals of the minority class are repeated to achieve the same size of the signals of the majority class. This step is described in Fig.Â 3a. The next step is to extract the TF features of the signals using the instantaneous frequency and spectral entropy and the TS features of the signals using the fuzzy recurrence image entropy and fuzzy recurrence entropy for training the networks (Fig.Â 3b). The same TF and TS features are extracted from the testing signals as the input for the trained TFâTS LSTM networks to carry out the classification task (Fig.Â 3c).

Performance measures

Let condition positive P be the total number of disease signals, condition negative N the total number of healthy control signals, true positive TP the number of disease signals correctly identified as disease, false positive FP the number of healthy control signals incorrectly identified as disease, true negative TN the number of healthy control signals correctly identified as healthy control, and false negative FN the number of the disease signals incorrectly identified as healthy control.

Accuracy (ACC) is defined as

$$\begin{aligned} ACC = \frac{TP+TN}{P+N}. \end{aligned}$$

(24)

Sensitivity (SEN) is defined in this study as the portion of the disease signals that are correctly identified as having the condition:

$$\begin{aligned} SEN = \frac{TP}{P}. \end{aligned}$$

(25)

Specificity (SPE) is the portion of the healthy control signals that are correctly identified as not having the disease:

$$\begin{aligned} SPE = \frac{TN}{N}. \end{aligned}$$

(26)

Precision (PRE) is calculated as

$$\begin{aligned} PRE = \frac{TP}{TP+FP}. \end{aligned}$$

(27)

$F_1$ score is the harmonic mean of precision and sensitivity and calculated as

$$\begin{aligned} F_1 = \frac{2TP}{2TP+FP+FN}. \end{aligned}$$

(28)

Results

TablesÂ 1 and 2 list the tenfold cross-validation results of two physiological databases: ECG, and Gait in Parkinsonâs Disease, respectively. For the ECG database, this experiment used normal sinus rhythm (5050 signals) and AF (738 signals) for binary classification. For the Gait in Parkinsonâs Disease data, this study used the time series recorded from only one sensor under the left foot labeled as L5 on the database. The purpose of selecting the sensor data recorded at the L5 location was to compare with the work reported in⁴⁷, which used four sensors at L5, L7, R7, and R8 for the classification of gait patterns. The LSTM used in this study was the bi-LSTM (LSTM will be used as bi-LSTM subsequently). To extract the TF features, sampling frequency was set as 300 Hz. To extract the TS features, the embedding dimension $= 1$, time delay $= 1$, and number of clusters $= 3$ for computing the FRPs. The specifications of the FRP parameters were based on previous studies^25,43, which provided satisfactorily results and were not as sensitive for constructing FRPs as for RPs²⁵.

Table 1 Ten-fold cross-validation metrics for ECG of classification of atrial fibrillation and normal sinus rhythm.

Full size table

Table 2 Ten-fold cross-validation metrics for classification of gait of patients with Parkinsonâs disease and healthy controls.

Full size table

All TF and TS features were standardized to improve the network training and testing⁴⁶. For the LSTM specifications, the network layer with an output size $= 100$, fully connected layer $= 2$ (two classes), followed by a softmax layer and a classification layer. Training options of the bi-LSTM were set as optimizer $=$ âAdamâ (adaptive moment estimation), including $L_2$ regularization factor, maximum number of epochs $= 80$, minimum batch size $= 150$, initial learning rate $= 0.01$, and gradient threshold $= 1$.

For the ECG data, the TFâTS LSTM significantly outperformed conventional LSTM in terms of classification accuracy (58% and 94% for conventional LSTM and TFâTS LSTM, respectively), other statistical measures (sensitivity, specificity, precision, and $F_1$ score), and training time (3506 minutes and 1 minute for LSTM and TFâTS LSTM, respectively, where the time for computing the four features was excluded in the TFâTS LSTM training). The specificity (34%) is much lower than the sensitivity (83%) obtained from the conventional LSTM, while these two measures are much more balanced using the TFâTS LSTM (sensitivity $= 91\%$ and specificity $= 96\%$).

For the gait data, using the signals recorded from only one sensor, TFâTS LSTM provided perfect classification metrics (accuracy $= 100\%$, sensitivity $= 100\%$, specificity $= 100\%$, precision $= 100\%$, and $F_1$ score $= 1$) with the training time of $< 1$ minute (the time for computing the four features was excluded). The use of conventional LSTM yielded the accuracy $= 79\%$ with 111 minutes for data training. Other five previous methods^{12,13,14,15,47} that studied the same database used the number of sensors between 4 and 16 obtained accuracy rates between 77%¹² and 98%¹⁵ (standard deviations of classification results obtained from these five methods were not given in literature⁴⁷).

Discussion

Computer experiments have shown that TFâFS LSTM achieved very high performance in the classification task and saved tremendous training time in comparison with the conventional implementation of the conventional LSTM. As an example, Fig.Â 4 shows the contrast of the training processes of conventional LSTM and TFâTS LSTM with respect to the convergence of accuracy and the number of iterations. Not only the TFâTS LSTM outperformed conventional LSTM, classification results of gait in Parkinsonâs disease in terms of accuracy, sensitivity, specificity, precision, and $F_1$ score obtained from the TFâTS LSTM are higher than those previously reported in literature. In particular, the TFâTS LSTM used the data recorded from only one sensor. The significant reduction in biomedical sensors to measure human physiological parameters in real time for disease detection has an implication for promising the userâs comfort and contributing to the low cost, simplicity, and portability in wearable sensor technology.

In this study, only the gait data recorded by one sensor located at L5 were used to compare with the other work⁴⁷ that included the data recorded by four sensors located at L5, L7, R7, and R8. The gait classification with the use of a single sensor located at L5 obtained from the proposed TFâTS LSTM outperformed the use of the four sensors for the classification obtained from the methods of phase space reconstruction, empirical mode decomposition, and neural networks⁴⁷. Tests of the TFâTS LSTM for the gait classification using data recorded from other single sensors were not carried out. However, the current comparison has shown the better performance of the TFâTS LSTM. As the five methods^{12,13,14,15,47}, which were compared with the TFâTS LSTM using the gait data, were proposed and implemented by other authors, it would be difficult to fairly implement these methods for the classification of the ECG data without the provision of the source codes. However, it is shown that the test results obtained from the TFâTS LSTM are significantly higher than the LSTM using the two datasets, and the classification accuracy obtained from the LSTM using the gait data from only one sensor (79%) is higher than the result reported in¹² using the gait data from 8 sensors (77%).

Here the signal lengths were made to be the same length of the majority. In case, if the majority does not exist or the histogram has a uniform distribution, the signal lengths can be made to be equal to the length of the shortest signal. In general, signals of lengths that are shorter than the majority can be included for the classification. However, it has been mentioned earlier, creating signals of equal length can be more effective for the network training and testing. In practice, the recording of physiological signals that meet some standard length for testing is feasible because it is based on the majority.

As shown in Fig.Â 4, the high accuracy of the TFâTS LSTM training could be reached while the training of the LSTM with raw time series could not improve much in accuracy. Furthermore, the TFâTS LSTM requires much shorter time for training in comparison with the training of raw long time series. This is because it is trained with sequential features of the time series instead of the time series, where the length of the features is much shorter than that of the original data and the effectiveness of the standardized features is an important factor for improving the network performance during training.

Feature extraction can be related to dimensionality reduction by which multivariate data can be reduced to lower-dimensional space for more manageable data processing. The physiological time series used in this study are one-dimensional time series. On the contrary, these time series were split into equal segments from which the four features were extracted for learning and classification by the TFâTS LSTM. In other words, the one-dimensional time series were transformed into much shorter sequences of 4 feature dimensions as shown in Fig.Â 2. The extracted features provide essential information of the data in timeâfrequency and timeâspace domains, which are intended to be complementary, informative, and non-redundant responses. Thus, the transformed data can facilitate the subsequent learning and leverage discriminative power of the sequential deep learning, leading to better class predictions. The results obtained in this study have shown the TFâTS LSTM outperformed other statistical classifiers, including SVMs and multilayer perceptron.

In summary, the finding is that training the LSTM network with raw time series produce poor classification results but training the network with TF and TS features extracted from the signals can both significantly enhance the classification performance and reduce the training time.

The Matlab-based TFâTS LSTM software for classification of physiological signals is designed to be easily utilized by biomedical and life science users who do not have technical knowledge in AI, signal processing, and general physics by following provided step-by-step instructions (Supplementary Note). In biomedical data, the problem of data imbalance is common, which can significantly prevent classifiers from achieving good results. The software suggests how to design a balance of class samples for training and testing datasets when minority classes exist.

Conclusions

An AI-based approach for improving the performance in detecting diseases using physiological signals have been presented and discussed. The proposed method takes advantages of information extracted from both frequency and space out of the temporal data for effective deep learning to increase the classification task and lower computational complexity. Although the method was developed for classifying time series in physiology, it can be readily applied to the classification of other biological and clinical signals, such as time series in gene expression⁴⁸, neurology⁴⁹, and epidemiology⁵⁰.

The AI-based method presented in this work was tested using the records obtained from a single-sensor measurement of gait in PD. The results suggest the method has potential to be able to reduce the need of using multiple sensors for recording physiological data, thus resulting in both cost-saving and comfort to the participants. Further tests of the method with other multiple-sensor data would be necessary to confirm the finding. Wearable sensors are useful devices for evaluating patient outcomes in clinical trials. However, the devices need to provide physical ease to participants so that they are prepared to wear them. Otherwise, the deployment of such tools will not be practically feasible, particularly when applying to the older adult ($> 50 \,\hbox {years}$) population⁵¹.

Software availability

MATLAB software, ECG data for AF and normal sinus rhythm, and Supplementary Note for running the ECG data used in this paper are publicly available at the authorâs personal homepage: https://sites.google.com/view/tuan-d-pham/codes under the title âTFâTS LSTMâ.

References

Hochreiter, S. & Schmidhuber, J. Long short-term memory. Neural Comput. 9, 1735â1780 (1997).
ArticleÂ CASÂ PubMedÂ Google ScholarÂ
Bagnall, A., Lines, J., Bostrom, A., Large, J. & Keogh, E. The great time series classification bake off: a review and experimental evaluation of recent algorithmic advances. Data Min. Knowl. Disc. 31, 606â660 (2017).
ArticleÂ MathSciNetÂ Google ScholarÂ
Fawaz, H. I., Forestier, G., Weber, J., Idoumghar, L. & Muller, P. A. Deep learning for time series classification: a review. Data Min. Knowl. Disc. 33, 917â963 (2019).
ArticleÂ MathSciNetÂ Google ScholarÂ
Greff, K., Srivastava, R. K., Koutnik, J., Steunebrink, B. R. & Schmidhuber, J. LSTM: a search space odyssey. IEEE Trans. Neural Netw. Learn. Syst. 28, 2222â2232 (2017).
ArticleÂ MathSciNetÂ PubMedÂ Google ScholarÂ
Craik, A., He, Y. & Contreras-Vidal, J. L. Deep learning for electroencephalogram (EEG) classification tasks: a review. J. Neural Eng. 16, 031001 (2019).
ArticleÂ ADSÂ PubMedÂ Google ScholarÂ
Pham, T. D., Wardell, K., Eklund, A. & Salerud, G. Classification of short time series in early Parkinsonâs disease with deep learning of fuzzy recurrence plots. IEEE/CAA J. Autom. Sin. 6, 1306â1317 (2019).
ArticleÂ MathSciNetÂ Google ScholarÂ
Belo, D., Rodrigues, J., Vaz, J. R., Pezarat-Correia, P. & Gamboa, H. Biosignals learning and synthesis using deep neural networks. BioMed. Eng. OnLine 16, 115 (2017).
ArticleÂ PubMedÂ PubMed CentralÂ Google ScholarÂ
Tortora, S., Ghidoni, S. S., Chisari, C., Micera, S. & Artoni, F. Deep learning-based BCI for gait decoding from EEG with LSTM recurrent neural network. J. Neural Eng. 17, 046011 (2020).
ArticleÂ ADSÂ PubMedÂ Google ScholarÂ
Umematsu, T., Sano, A. & Picard, R. W. Daytime data and LSTM can forecast tomorrowâs stress, health, and happiness. In Proc. 41st Annual International Conference of the IEEE Engineering in Medicine and Biology Society 2186â2190 (2019).
Hollman, J. H. et al. Number of strides required for reliable measurements of pace, rhythm and variability parameters of gait during normal and dual task walking in older individuals. Gait Posture 32, 23â28 (2010).
ArticleÂ PubMedÂ Google ScholarÂ
Kribus-Shmiel, L., Zeilig, G., Sokolovski, B. & Plotnik, M. How many strides are required for a reliable estimation of temporal gait parameters? Implementation of a new algorithm on the phase coordination index. PLoS ONE 13, e0192049 (2018).
ArticleÂ PubMedÂ PubMed CentralÂ CASÂ Google ScholarÂ
Lee, S. H. & Lim, J. S. Parkinsonâs disease classification using gait characteristics and wavelet-based feature extraction. Expert Syst. Appl. 39, 7338â7344 (2012).
ArticleÂ Google ScholarÂ
Daliri, M. R. Chi-square distance kernel of the gaits for the diagnosis of Parkinsonâs disease. Biomed. Signal Process. Control 8, 66â70 (2013).
ArticleÂ Google ScholarÂ
Ertugrul, O. F., Kaya, Y., Tekin, R. & Almali, M. N. Detection of Parkinsonâs disease by shifted one dimensional local binary patterns from gait. Expert Syst. Appl. 56, 156â163 (2016).
ArticleÂ Google ScholarÂ
Acici, K., Erdas, C. B., Asuroglu, T., Toprak, M. K., Erdem, H. & Ogul, H. A random forest method to detect Parkinsonâs disease via gait analysis. In Proc. Int. Conf. Engineering Applications of Neural Networks 609â619 (2017).
Dargan, S. et al. A survey of deep learning and its applications: a new paradigm to machine learning. Arch. Comput. Methods Eng. 27, 1071â1092 (2020).
ArticleÂ MathSciNetÂ Google ScholarÂ
Boashash, B., Khan, N. A. & Ben-Jabeur, T. Timeâfrequency features for pattern recognition using high-resolution TFDs: a tutorial review. Digit. Signal Proc. 40, 1â30 (2015).
ArticleÂ MathSciNetÂ Google ScholarÂ
Wang, K., Li, J., Zhang, S., Qiu, Y. & Liao, R. Time-frequency features extraction and classification of partial discharge UHF signals. In Proc. 2014 International Conference on Information Science, Electronics and Electrical Engineering 1231â1235 (2014).
Xu, C., Guan, J., Bao, M., Lu, J. & Ye, W. Pattern recognition based on time-frequency analysis and convolutional neural networks for vibrational events in $\phi$-OTDR. Opt. Eng. 57, 016103 (2018).
ArticleÂ ADSÂ CASÂ Google ScholarÂ
Anderson, R. & Sandsten, M. Time-frequency feature extraction for classification of episodic memory. EURASIP J. Adv. Signal Process. 2020, 19 (2020).
ArticleÂ ADSÂ Google ScholarÂ
Eckmann, J. P., Kamphorst, S. O. & Ruelle, D. Recurrence plots of dynamical systems. Europhys. Lett. 5, 973â977 (1987).
ArticleÂ ADSÂ Google ScholarÂ
Marwan, N. et al. Recurrence plots for the analysis of complex systems. Phys. Rep. 438, 237â329 (2007).
ArticleÂ ADSÂ MathSciNetÂ Google ScholarÂ
Zou, Y., Donner, R. V., Marwan, N., Donges, J. F. & Kurths, J. Complex network approaches to nonlinear time series analysis. Phys. Rep. 787, 1â97 (2019).
ArticleÂ ADSÂ MathSciNetÂ Google ScholarÂ
Goswami, B. A brief introduction to nonlinear time series analysis and recurrence plots. Vibration 2, 332â368 (2019).
ArticleÂ Google ScholarÂ
Pham, T. D. Fuzzy recurrence plots. EPL 116, 50008 (2016).
ArticleÂ ADSÂ CASÂ Google ScholarÂ
Canturk, I. Fuzzy recurrence plot-based analysis of dynamic and static spiral tests of Parkinsonâs disease patients. Neural Comput. Appl. 33, 349â360 (2021).
ArticleÂ Google ScholarÂ
Pham, T. D. & Yan, H. Tensor decomposition of gait dynamics in Parkinsonâs disease. IEEE Trans. Biomed. Eng. 65, 1820â827 (2018).
ArticleÂ ADSÂ PubMedÂ Google ScholarÂ
Pham, T. D. Pattern analysis of computer keystroke time series in healthy control and early-stage Parkinsonâs disease subjects using fuzzy recurrence and scalable network features. J. Neurosci. Methods 307, 194â202 (2018).
ArticleÂ PubMedÂ Google ScholarÂ
Pham, T. D. Texture classification and visualization of time series of gait dynamics in patients with neuro-degenerative diseases. IEEE Trans. Neural Syst. Rehabilit. Eng. 26, 188â196 (2018).
ArticleÂ Google ScholarÂ
AF classification from a short single lead ECG recordingâThe PhysioNet Computing in Cardiology Challenge 2017. PhysioNet. https://physionet.org/content/challenge-2017/1.0.0/.
Clifford, G. D. et al. AF classification from a short single lead ECG recording: the PhysioNet/Computing in Cardiology Challenge 2017. Comput. Cardiol. 44, 11. https://doi.org/10.22489/CinC.2017.065-469 (2017).
ArticleÂ Google ScholarÂ
Gait in Parkinsonâs disease. PhysioNet. https://physionet.org/content/gaitpdb/1.0.0/.
Boashash, B. Estimating and interpreting the instantaneous frequency of a signal-Part 1: fundamentals. Proc. IEEE 80, 520â538 (1992).
ArticleÂ ADSÂ Google ScholarÂ
Boashash, B. Estimating and interpreting the instantaneous frequency of a signal-Part 2: algorithms and applications. Proc. IEEE 80, 540â568 (1992).
ArticleÂ ADSÂ Google ScholarÂ
Buttkus, B. Spectral Analysis and Filter Theory in Applied Geophysics (Springer, 2000).
BookÂ Google ScholarÂ
Kaiser, J. F. & Schafer, R. W. On the use of the $I_0$-sinh window for spectrum analysis. IEEE Trans. Acoust. Speech Signal Process. 28, 105â107 (1980).
ArticleÂ Google ScholarÂ
Takens, F. Detecting strange attractors in turbulence. Lect. Notes Math. 898, 366â381 (1981).
ArticleÂ MathSciNetÂ MATHÂ Google ScholarÂ
Liebovitch, L. S. Fractals and Chaos Simplified for the Life Sciences (Oxford University Press, 1998).
MATHÂ Google ScholarÂ
Zadeh, L. A. Fuzzy sets. Inf. Control 8, 338â353 (1965).
ArticleÂ MATHÂ Google ScholarÂ
Bezdek, J. C. Pattern Recognition with Fuzzy Objective Function Algorithms (Plenum Press, 1981).
BookÂ MATHÂ Google ScholarÂ
Lorenz, E. N. Deterministic nonperiodic flow. J. Atmos. Sci. 20, 130â141 (1963).
ArticleÂ ADSÂ MathSciNetÂ MATHÂ Google ScholarÂ
de Luca, A. & Termini, S. A definition of a nonprobabilistic entropy in the setting of fuzzy sets theory. Inf. Control 20, 301â312 (1972).
ArticleÂ MathSciNetÂ MATHÂ Google ScholarÂ
Pham, T. D. Fuzzy recurrence entropy. EPL 130, 40004 (2020).
ArticleÂ ADSÂ CASÂ Google ScholarÂ
Yu, Y., Si, X., Hu, C. & Zhang, J. A review of recurrent neural networks: LSTM cells and network architectures. Neural Comput. 31, 1235â1270 (2019).
ArticleÂ MathSciNetÂ PubMedÂ MATHÂ Google ScholarÂ
Schuster, M. & Paliwal, K. K. Bidirectional recurrent neural networks. IEEE Trans. Signal Process. 45, 2673â2681 (1997).
ArticleÂ ADSÂ Google ScholarÂ
Brownlee, J. How to scale data for long short-term memory networks in Python. Machine Learning Mastery, 07 July 2017. https://machinelearningmastery.com/how-to-scale-data-for-long-short-term-memory-networks-in-python/.
Zeng, W., Yuan, C., Wanga, Q., Liu, F. & Wang, Y. Classification of gait patterns between patients with Parkinsonâs disease and healthy controls using phase space reconstruction (PSR), empirical mode decomposition (EMD) and neural networks. Neural Netw. 111, 64â76 (2019).
ArticleÂ PubMedÂ Google ScholarÂ
Qian, L., Zheng, H., Zhou, H., Qin, R. & Li, J. Classification of time series gene expression in clinical studies via integration of biological network. PLoS ONE 8, e58383 (2013).
ArticleÂ ADSÂ CASÂ PubMedÂ PubMed CentralÂ Google ScholarÂ
Costa, I. G., Schonhuth, A., Hafemeister, C. & Schliep, A. Constrained mixture estimation for analysis and robust classification of clinical time series. Bioinformatics 25, i6âi14 (2009).
ArticleÂ CASÂ PubMedÂ PubMed CentralÂ Google ScholarÂ
Perkins, T. A. et al. Heterogeneous local dynamics revealed by classification analysis of spatially disaggregated time series data. Epidemics 29, 100357 (2019).
ArticleÂ PubMedÂ Google ScholarÂ
Keogh, A., Dorn, J. F., Walsh, L., Calvo, F. & Caulfield, B. Comparing the usability and acceptability of wearable sensors among older Irish adults in a real-world context: observational study. JMIR mHealth uHealth 8, e15704 (2020).
ArticleÂ PubMedÂ PubMed CentralÂ Google ScholarÂ

Download references

Author information

Authors and Affiliations

Center for Artificial Intelligence, Prince Mohammad Bin Fahd University, Khobar, 31952, Saudi Arabia
Tuan D. Pham

Authors

Tuan D. Pham
View author publications
You can also search for this author in PubMedÂ Google Scholar

Contributions

T.D.P. conceptualized, designed the study, implemented the methods, and carried out the computer experiments. T.D.P. wrote the manuscript.

Corresponding author

Correspondence to Tuan D. Pham.

Ethics declarations

Competing interests

The author declares no competing interests.

Additional information

Publisher's note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Supplementary Information.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Pham, T.D. Timeâfrequency timeâspace LSTM for robust classification of physiological signals. Sci Rep 11, 6936 (2021). https://doi.org/10.1038/s41598-021-86432-7

Download citation

Received: 14 November 2020
Accepted: 16 March 2021
Published: 25 March 2021
DOI: https://doi.org/10.1038/s41598-021-86432-7

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.