Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Computers in Biology and Medicine: Shibin Wu, Jianlin Ou, Lin Shu, Guohua Hu, Zhen Song, Xiangmin Xu, Zhuoming Chen

Download as pdf or txt
Download as pdf or txt
You are on page 1of 13

Computers in Biology and Medicine 144 (2022) 105355

Contents lists available at ScienceDirect

Computers in Biology and Medicine


journal homepage: www.elsevier.com/locate/compbiomed

MhNet: Multi-scale spatio-temporal hierarchical network for real-time


wearable fall risk assessment of the elderly
Shibin Wu a, b, Jianlin Ou d, 1, Lin Shu a, b, *, Guohua Hu b, Zhen Song b, Xiangmin Xu a, b, c,
Zhuoming Chen d
a
School of Future Technology, South China University of Technology, Guangzhou, 511442, China
b
School of Electronic and Information Engineering, South China University of Technology, Guangzhou, 510641, China
c
Institute of Modern Industrial Technology of SCUT in Zhongshan, Zhongshan, 528400, China
d
The First Affiliated Hospital of Jinan University, Guangzhou, 510630, China

A R T I C L E I N F O A B S T R A C T

Keywords: Continuous fall risk assessment and real-time high falling risk warning are extremely necessary for the elderly, to
Cross-subject protect their lives and ensure their quality of life. Wearable in-shoe pressure sensors have the potential to achieve
Hierarchical network these targets, due to their adequate wearing comfort. However, it is a great challenge to remove the individual
Plantar pressure
differences of foot pressure data and identify the accurate fall risk from fewer gait cycles to realize real-time
Real-time fall risk assessment
Wearable
warning. We explored a hierarchical deep learning network named MhNet for real-time fall risk assessment,
which utilized the advantages of two-layer network, to reach hierarchical tasks to reduce probability of
misidentification of high fall risk subjects, by establishing a borderline category using the rehabilitation labels,
and extracting multi-scale spatio-temporal features. It was trained by using a wearable plantar pressure dataset
collected from 48 elderly subjects. This method could achieve a real time fall risk identification accuracy of
73.27% by using only 9 gaits, which was superior to traditional methods. Moreover, the sensitivity reached
76.72%, proving its strength in identifying high risk samples. MhNet might be a promising way in real-time fall
risk assessment for the elderly in their daily activities.

1. Introduction limb muscle strength deficiency, cognitive impairment, visual impair­


ment [5,7–9]. Among which, external factors such as strength and dis­
Globally, falls are a major public health problem. It is estimated that ease are self-perceived symptoms, while long-term abnormal gait, a
646,000 fatal falls occur each year, making it the second leading cause of recessive factor, is often hard to be self-detected, making it difficult to
unintentional injury death [1]. As one of the most common causes of take fall prevention measures in advance. Therefore, it is meaningful to
accidental injury in the elderly, fall has the characteristics of high fre­ evaluate the risk of fall by monitoring the gait of the elderly. At present,
quency of occurrence, high treatment cost and long recovery time, traditional methods for assessing risk of falls in the elderly include
which seriously affects the health and daily life of the elderly [2,3]. observation, scale questionnaire and motor function test. Nevertheless,
Compared with fall detection which cannot prevent damage that it is difficult to evaluate the fall risk of the elderly in a long-term using
happened in elderly, fall risk assessment is an “afterwards approach” these methods, while long-term plantar pressure monitoring is a possible
that provides an opportunity for elderly to identify fall risk and avoid way in fall risk assessment.
falling through walking habit adjustment and hospital treatment ac­ In recent years, sensors and wearable technologies had been devel­
cording to assessment results [4]. Using scientific fall risk assessment oped rapidly. Wearable devices had the characteristics of miniaturiza­
tools to screen people at high risk of falling can effectively reduce the tion and portability, and were suitable for daily life, which was
incidence of falling in the elderly and improve the quality of life of the important in remote monitoring [40], especially in field of health
elderly [5,6]. monitoring [38,43,44], making elderly feel safe and comfortable in
The main causes of falls in the elderly include gait imbalance, lower long-term wearing [11–13,16,35]. Sensors commonly used in wearable

* Corresponding author. School of Future Technology, South China University of Technology, Guangzhou, 511442, China.
E-mail address: shul@scut.edu.cn (L. Shu).
1
S. Wu and J. Ou are co-first authors of the article.

https://doi.org/10.1016/j.compbiomed.2022.105355
Received 6 January 2022; Received in revised form 24 February 2022; Accepted 24 February 2022
Available online 8 March 2022
0010-4825/© 2022 Elsevier Ltd. All rights reserved.
S. Wu et al. Computers in Biology and Medicine 144 (2022) 105355

devices included pressure and inertial categories (IMU), which could enough information. For example, Meyer [21] et al. deployed sensors in
objectively evaluate the risk of falling through recording the elderly’s the chest and thighs, and Martinez [22] et al. used two smart phones for
movement and physiological data. Inertial sensors were often integrated data collection. Giansanti et al. [37] constructed ANN based on 3 ac­
with the smartphone [20,22], or separately attached to waist or other celerometers and 3 rate gyroscopes on back. These studies could achieve
positions on the body [21,23,37,40]. Compared with inertial sensors, high accuracy in the training stage, but it lacked operability in practical
devices integrated with pressure sensors were more comfortable, which applications.
did not require lifestyle changes, and were suitable for the elderly Consideration of subject variation during training will greatly in­
[14–16]. crease the generalization performance of the model, and it will be more
Based on long-term continuous plantar pressure monitoring, studies likely to perform well in the face of unknown samples in practical
have shown that one’s gaits represent different fall risk in different application. At present, in the relevant studies of the application of deep
period. For example, not all the gaits of the elderly with high falling risk learning method in the identification of fall risk in the elderly, many of
are abnormal [10], so the periodical detection and short-term conven­ the studies did not consider the situation of cross-subjects and were hard
tional assessment have the defects of high subjective degree and high to verify its generalization performance [24,29–31]. Some studies
contingency. Hong et al. [39,42] confirmed the effectiveness of machine considered subject variability, but they used the hold-out method or
learning methods in biological signal processing. Meanwhile, the nested cross validation [41] to construct the test set. Since the hold-out
real-time output of the results with fewer gaits was more helpful for the test samples were randomly selected, the results were also random.
fall risk assessment. Thus, deep learning methods are potential for Thus, the degree of generalization could not be ensured in this
real-time monitoring using limited gait cycles, which avoid feature en­ cross-subjects situation [22,23,26].
gineering that is insufficient in real-time and can only explain infor­ Compared with traditional methods, deep learning avoided compli­
mation within the cognitive range [17–19,36]. Using fewer gaits, the cated feature engineering and could output results in real time [20,30,
elderly could get feedback of fall risk quickly, which helped to detect the 32]. Some of the existing deep methods for fall risk assessment did not
abnormal gait in time and make timely warning and adjustment take advantage of real-time output and need to traverse most or all of the
suggestions. data of subjects to get an output. Meyer [21] used the method of median
Moreover, subject variability always exists, which makes the tradi­ score decision to calculate accuracy, which required all samples of
tional methods unable to guarantee the consistent accuracy in the face of subjects.
unknown samples. The plantar pressure signal of each elderly person The plantar pressure can fully reflect the information of the feet and
contained its own personality features, such as the difference of the integrate with the way of the insole or shoes to ensure the comfort of
plantar force, the difference of the gait pattern during walking, the wearing [35]. The elderly do not need to pay too much attention to how
difference of the contact area between the feet and insole, which would to use, and at the same time, the absence of obvious devices also pro­
lead to the failure of the model to accurately identify the fall risk across vides the elderly with more confidence [13]. At present, there was few
subjects, seriously affecting the model’s generalization ability. Mis­ work on fall risk identification based on plantar pressure [30]. Although
identifying high risk subjects as low risk might cause serious conse­ this work took advantage of the convenience and low cost of plantar
quences. To the extent known to us, there were few studies on fall risk pressure sensing, the author neither considered the cross-subject situa­
identification in cross-subject deep learning according to continuous tion nor made full use of the advantages of deep learning, because the
plantar pressure data. In addition, information leakage occurs in some ConvLSTM method in this paper needed to traversal all the data before
work when cross-subject was tried. output results and the fall risk was therefore not possible assessed in real
In view of the above shortcomings of existing work, we innovatively time. What’s worse, the author shuffled the data of the training set and
proposed a self-collected continuous plantar pressure-fall risk dataset the test set for training without considering the subject variability,
and a multi-scale spatio-temporal hierarchical network for real-time fall which was of little reference significance in practical application. The
risk assessment. The remainder of this paper is organized as follows. In introduction and existing problems of relevant works are summarized in
Section II, the related work of fall risk classification were investigated, Table 1. As shown in Table 1, the training and validation methods of the
and the focus is on the research using deep learning methods. In Section related work were simply listed, as well as the dataset, strengths and
III, we introduced the methodology, including experimental paradigm, weakness.
dataset, data processing, architecture of the hierarchical fall risk
assessment network, voting mechanism and validation method. In Sec­ 3. Methodology
tion IV, all results were listed, and we also compared the differences in
results between our method and other deep learning methods. Finally, In this section, we present the details on experimental paradigm, self-
the summary based on this work were made in Section V. built dataset, data collection equipment, data pre-processing, 2-layer
hierarchical fall risk assessment methods and validation method.
2. Related work
3.1. Subjects, experimental paradigm and equipment
Most of existing deep learning methods for identifying fall risk
mainly used data collected by inertial sensors such as acceleration sensor Continuous plantar pressure data from 50 older adults was collected
and gyroscope as input [20–26,37]. The inertial sensor could exist in two as a self-built dataset. Inclusion criteria was that patients over 65 years
forms, one was as a separate external device to be worn on some parts of of age who were able to walk independently without assistance and who
the subject’s body. However, this kind of external integrated equipment had no cardiovascular disease, orthopedic or neurological diseases that
was expensive and had very low wearability, and the exposed equipment affected movement, history of psychiatric disorders such as mania or
was easy to catch people’s attention, which would cause certain psy­ delirium, or a history of foot wounds or deformities. In the end, 48
chological pressure on the elderly and was not suitable for their daily use subjects (24 females and 24 males, age of 74.5 ± 6.7 years, height of
[27,28]. The second was embedded in smart phones, bracelets, chest 1.61 ± 0.08 m, weight of 61.4 ± 8.3 kg) met these criteria. The reason
bands and other devices, in the form of portable devices. The existence why two subjects were excluded from the dataset was that the amount of
in the form of electronic devices avoided the problem of poor wear­ plantar pressure data of these two elderly subjects was not enough in the
ability, but it brought the challenge of high price and inconvenience for process of data collection, which was far less than that of other subjects
the elderly. In addition, in order to achieve high accuracy of fall risk (around 150 gaits) and not suitable for subsequent data processing and
assessment, the inertial sensor needed to deploy multiple sensors in modeling. The doctors and researchers decided together to exclude the
multiple positions of the human body to ensure the acquisition of data of these two subjects from the data analysis.

2
S. Wu et al. Computers in Biology and Medicine 144 (2022) 105355

Table 1
Summary of related work on fall risk assessment in deep learning.
Ref. Methodology Validation Subjects Sensors Accuracy and advantages Weakness

[23] LSTM Hold-out method 37 high fall risk, 39 low 2 IMUs on both feet Classification accuracy in Hold-out method leads to random
fall risk hold-out dataset (92.1%). accuracy. It requires multiple devices,
and is expensive and not portable.
[21] Bi-LSTM Leave One 18 fallers, 19 non-fallers 7 Accelerometer on Classification accuracy in Sensors need to be placed in multiple
Subject Out cross- (For PwMS patients) various body locations LOSO(73%); locations. Multi-gait voting cannot
validation High voting accuracy in 1 detect falls in real time.
min median decision score
(86%)
[22] Transfer Hold-out method 235 high risk subjects, 2 Accelerometer from 2 Classification accuracy in Hold-out method leads to random
learning, CNN 422 low risk subjects smartphone on different hold-out dataset (86.4%); accuracy. It requires multiple devices. It
body locations Large number of subjects is expensive and not suitable for the
elderly.
[25] LSTM, CNN Leave One 18 subjects walk 2 IMUs on both wrist Classification accuracy in Abnormal gait is simulated by normal
Subject Out cross- normally or with LOSO(87.8%) people and does not meet the real needs.
validation impairment glasses It requires multiple devices, and is
expensive and not portable.
[20] CNN 5-fold cross 30 subjects were asked An accelerometer from Prospective study No cross-subjects exploration;
validation to perform 9 normal smartphone in left or Information leakage;
and 8 fall risk activities right pocket Meaningless accuracy(91.5%)
[26] Multi-task Hold-out method 101 fallers, 195 non- An accelerometer on Large number of subjects; Hold-out method leads to random
learning, fallers lower back Good classification accuracy
ConvLSTM accuracy in hold-out
dataset (70%)
[30] ConvLSTM Randomly divide 46 high fall risk, 39 low Footscan plantar force Comfortable and portable in No cross-subjects exploration;
into training set fall risk measurement practical application Information leakage; Inflated accuracy
and test set (91.5%).
[37] MLP-NNs Hold-out method 90 subjects for training 3 accelerometers and 3 Sensitivity in hold-out It requires multiple devices on back.
(three risk level) and rate gyroscopes on back dataset (87%)
100 subjects for testing

Subjects were divided into two groups based on their Berg Balance Aircraft of the First Affiliated Hospital of Jinan University (KY-2020-
score (BBS): those with a BBS of 40 or higher were labeled as low risk 099). All the subjects were informed before the data collection and a
(Number = 25), and those with a BBS of less than 40 were labeled with written consent form was obtained prior to the experiment.
high risk (Number = 23). Berg Balance Scale was a method that could
quickly evaluate the functional walking balance ability of subjects by 3.2. Data pre-processing and borderline sample definition
calculating the scores of 14 movements including “from sitting to
standing”, “standing with eyes closed without support” and “standing on Each subject’s 16-channel plantar pressure data were collected and
one leg”. It could be used as an important evaluation index for the risk of arranged vertically in sequence to form a two-dimensional tensor of 16
falling in the elderly. × L, where L represents the length of plantar pressure signal in time.
To avoid the mistakes and contingency when labelling by BBS, the Denoising and normalization were carried out to obtain the two-
TUGT (Time Up and Go Test) and the subjects’ fall history (within six dimensional tensors with uniform length. A sliding window for data
months and within one year) were also recorded as the reference for the augmentation with the length of 3 gaits was used to split the sequential
final label. If there was a significant difference between the results of the plantar pressure signals of each subject, where the overlapping was set
Berg score and the TUGT and the fall history, the subject would further as 2 gaits. Taking the 3 gaits as an example, data of 48 subjects was split
redefine the label in the subsequent work and came up with the concept into totally 7462 samples. Each sample contained the plantar pressure
of “Borderline Sample” in the layer 1 classification network. In order to data of three continuous gait cycles. All the sample labels were consis­
label the pressure data of walking gaits accurately, the BBS and TUGT tent with the label of the subject. The advantage of training with a small
were regarded as the prior method to label the data for we collected data number of gait samples was that the risk of falling could be identified
form subjects immediately after BBS and TUGT, making sure the strong only by walking with a few gaits during the test, which met the real-time
correlation between data and label. In addition, fall history revealed the requirement of fall risk assessment and could greatly reduce the risk of
overall fall risk of subjects, but it was reckless to assume that former fall falling in the elderly.
history could reflect fall risk of gaits at present for the wide time gap The data collected according to the experimental paradigm were
between them. According to the classification criteria of these three labeled by Berg score mainly, whose threshold was set at 40 points.
evaluation methods, we labeled all subjects together with the doctor. Some of the subjects had a Berg score around 40, a few of them with a
The experimental paradigm and label analysis procedure are shown in high risk of falling had a Berg score slightly above 40, while some low
Fig. 1 (a) and (b). risk subjects owned a Berg score slightly below 40. However, this is not
During the data collection phase, each subject was asked to wear a the problem of assessment methods such as Berg Balance Scale, TUGT
pair of smart shoes of appropriate size, which are no different from or­ and fall history, but because the elder’s fall risk has the characteristics as
dinary shoes, and walk independently for at least 2 min according to follows: 1) The data samples of high risk subjects would not all show
their walking habits. Eight flexible pressure sensors were deployed in high risk features, while a certain proportion of them did. Therefore,
the sensing insole of each smart shoe to collect data at a frequency of 20 although the patients were diagnosed as high risk of falling by clinical
Hz during walking, and a wireless transmission module was installed observation, they might show ordinary gait data contrary to the original
inside the shoe to transmit data to a mobile application in real time [16], label when they were tested on the BBS or TUGT. And the closer the Berg
as can be seen in Fig. 1 (c). What’s more, using the independent AD by score was to 40, the higher the proportion of samples with labels that
every single sensor, parallel data acquisition allowed extremely narrow were contrary to the original labels and the greater the proportion of
time gap of data collection in synchronization issue. high risk samples would be.
The procedure was approved by the Ethics Committee of Medical Based on this systematic problem, the concept of “Borderline

3
S. Wu et al. Computers in Biology and Medicine 144 (2022) 105355

samples. Multidimensional Scaling (MDS) was used to cluster features


with reduced dimensions, as shown in Fig. 3. In Fig. 3, green dots
represent low risk sample, purple dots represent high risk sample and
orange dots represent borderline sample. Obviously, it can be seen that
high risk samples are relatively easy to be distinguished in the periphery
of both 2-dimensional and 3-dimensional feature space, so the task of
layer 1 is to screen part of high risk samples first, and ensure that low
risk samples will not be misidentified into high risk samples, otherwise
the performance of the overall model would be affected. Fig. 3 (c) and
(d) show that borderline samples are mixed with low risk samples and
some high risk samples are still well distinguished. Therefore, in order to
reduce the classification difficulty of layer 1, the low risk sample was
grouped with the borderline sample and then dichotomized with the
high risk sample.
In practical application, misidentifying high risk sample for a low
risk sample is extremely harmful for the elderly. Therefore, the overall
network should be inclined to identify more high risk samples under the
premise of ensuring the upper limit of misidentification of low risk
samples. The setting of 2-layer MhNet made high risk samples more
likely to be identified, because the hierarchical assessment framework
provided two opportunities for ensuring identifying high risk samples.
One was that layer 1 made high risk samples easy to be distinguished.
The remaining high risk samples that were not identified in layer 1 were
entered into layer 2 for a further classification. It’s worth noting that the
misidentified low risk and borderline samples in layer 1 would also
affect the assessment performance.
In data pre-processing, samples containing three gaits had been ob­
tained. In order to explore which gait number was more suitable for real-
time identification of fall risk, we set different sizes of voting segments,
that meant a segment contained 3, 5, 7, 9 or 11 samples, which was
equivalent to 5, 7, 9, 11, 13 gaits, respectively. Each m samples from a
subject were arranged in time order into a segment, where m could be 3,
5, 7, 9, 11. The output of m samples was voted according to different
threshold configurations to obtain the segment accuracy in real time.

3.4. Layer 1: CNN classification with bias loss function

As shown in Fig. 2, the performance of layer 1 determined the ac­


curacy of the whole MhNet. The layer 1 used the CNN classification
network to assess the fall risk and directly output the identified high risk
samples without entering layer 2 through voting mechanism. If a high
Fig. 1. Experimental setup: (a) Experimental Paradigm, (b) Label analysis
procedure and (c) Smart shoe system with wearable in-shoe pressure sensors.
risk sample was misclassified as low risk and borderline sample, it would
be entered into layer 2 for the second identification.
Comparison of different classifiers: Four network models,
Sample” was proposed. As shown in Fig. 1(b), the range of Berg score for
including CNN, MLP, DG-DANN and LSTM were compared in layer 1.
the borderline sample would be determined by both the TUGT results
The number of subjects with incorrect fall risk identification referred to
and the fall history. Label correction is a common method in pre-
the number of subjects whose borderline samples and low risk samples
processing [30]. On the basis of the Berg score, if there were TUGT
were all identified as high risk samples, while the number of subjects
and fall history records inconsistent with the original label, they would
with correct fall risk identification referred to the number of subjects
be included in the borderline samples. The TUGT threshold was set as
whose high risk samples were all correctly identified.
18 s and subjects were at high fall risk when fall history was found.
As shown in Table 2, the assessment performance of CNN was
According to the statistical results from labeling process, the borderline
significantly better than that of other classifiers for the number of sub­
samples were counted, linking to the subjects whose Berg score ranged
jects with incorrect risk identification of CNN was much lower than
from 38 to 47, including both high risk samples and low risk samples. A
those of other models, which met the premise of layer 1 not mis­
score below 38 was defined as high risk, while a score above 47 was
identifying low risk samples as far as possible. Therefore, the classifi­
defined as low risk.
cation model of layer 1 was determined as CNN composing two layers of
convolution layer and pooling layer, as well as two layers of fully-
3.3. MhNet:Multi-scale spatio-temporal hierarchical network connected layer, which reduced the calculation cost of layer 1 with a
small number of parameters and sped up the recognition of the whole
The overall multi-scale spatio-temporal hierarchical network MhNet. Actually, the number of nodes was set to 300 and 100 in the two
(MhNet) pipeline is shown in Fig. 2. MhNet included 2 layers of classi­ fully-connected layers of CNN.
fication module, where layer 1 was built on CNN, and layer 2 used Optimal Bias Factor t Selection: Another important target of layer
modified Domain Adaptive Neural Network (DG- DANN) [33], which 1 was to reduce the probability of misidentifying low risk and borderline
involved multi-scale spatio-temporal features extraction. Before samples. The bias weight t was added to the binary cross-entropy
training, CNN in layer 1 was used to extract features of high risk sam­ function L1 (p, q) in layer 1 to achieve this goal. The optimized loss
ples, borderline samples and low risk samples from a part of the training function can be expressed as:

4
S. Wu et al. Computers in Biology and Medicine 144 (2022) 105355

Fig. 2. MhNet: Multi-scale spatio-temporal Hierarchical Network. The framework composed of 2 layers. The first layer uses CNN to screen out the high risk samples
that are easy to distinguish; the second layer is the DG-DANN based on the multi-scale spatio-temporal features, which can classify unscreened samples into high risk
and low risk ones. The testing samples that are output from layer 1 and layer 2 are combined to a segment arranged by time order and the segment assessment result
can be obtained by a voting mechanism.

Fig. 3. Clustering diagram of plantar pressure features from randomly selected training subjects: (a) 2D and (b)3D clustering of high-low risk labels; (c).2D and (b)3D
clustering of high-borderline-low risk labels.

L1(p, q) = − p log q − (1 − p)(1 − log q) × t (1) optimal loss equation was determined.
Based on the classifier of CNN, the performance of correct and
where t ε (1,∞). The weight of the second item in the loss function
incorrect number of different t for layer 1 classification was compared.
was increased, so that the training process of the first layer tended to
However, it was not possible to determine the best t by specific in­
minimize the second term of L1 (p, q). This would make the classification
dicators only through correct and incorrect numbers, so its classification
tend to be classified as low risk and borderline samples, thus reducing
performance was quantified by weighting the sum of correct and
the probability of samples being classified as high risk samples in Layer 1
incorrect numbers, represented by Sw. Among which, the weight of the
of MhNet. Even if high risk samples were misidentified, they could be
wrong number was set to − 1.5, and the weight of the correct number
corrected through layer 2. By constantly adjusting the value of t, the

5
S. Wu et al. Computers in Biology and Medicine 144 (2022) 105355

Table 2 correlation features between the 8 channels of the left and right feet. The
Comparison of different classifiers in layer 1. step size of 1 × 20 one-dimensional convolution kernels was set as 1 to
Method Number of subjects with correct Number of subjects with incorrect extract the timing features of each sensor channel. The 16 × 2 rectan­
risk identification risk identification gular convolution kernel could extract the correlation in the space of two
MLP 12 (60%) 8 (40%) feet, and carry out relatively comprehensive feature extraction with the
LSTM 4 (40%) 6 (60%) size of three dimensions to ensure the capture of effective information.
CNN 4 (33%) 8 (67%) MS-DG-DANN: The network structure of MS-DG-DANN is shown in
DG- 10 (45%) 12 (55%) Fig. 5. Plantar pressure data was input into two subnetworks of Label
DANN
Classifier and Domain Classifier by using the spatio-temporal feature
*Configurations in layer 1 kept fixed in all models when training, e.g. batchsize, obtained by the parallel three-layer feature extractor. Label Classifier
learning rate, optimizer type, iterations, l2 regularization parameter. predicted the fall risk, and Domain Classifier identified which subject
the plantar pressure data came from. The two classifiers got two loss
was set to +1, which were empirical values. functions, denoted as loss Ly and loss Ld, respectively.
Sw = − 1.5 × Nic + 1 × Nc (2) Ly(p, q) = − p(l)log q(l) − p(h)log q(h) (3)

where Nic referred to the number of subjects with incorrect fall risk where q(l) and q(h) referred to the low and high risk labels. q(l) and q(h)
identification and Nc referred to the number of subjects with correct fall referred to the predicted values of the model.
risk identification. A gradient inversion layer was added before the domain classifier, so
There were two reasons for such a setting. First, the harmfulness of that the gradient was reversed during the backpropagation and the
misidentification to the model was more worthy of attention than the plantar pressure data cannot be distinguished by which subject it came
positive influence of correct identification. Second, misidentification of from. The loss function of layer 2 was expressed as follows:
low risk samples as high risk samples would be directly output from
L2 = Ly(p, q) + Ld(p, q) (4)
MhNet without any chance of correction, which would seriously affect
the accuracy of the overall model. Finally, the training results of the whole model made loss decline,
which brought two trends. One was to increase Ld, that was, to confuse
the distribution of fields between samples, the other was to reduce Ly
3.5. Layer 2: domain generalization on domain adversarial neural
and improved the accuracy of risk classification. In the case of adver­
network with multi-scale spatio-temporal convolution (MS-DG-DANN)
sary, the model could achieve cross-subject effect to a certain extent by
reducing the possibility that the differences of subjects’ domain distri­
In the processing of physiological signals, the feature distribution of
bution could be recognized.
each subject is different. Therefore, how to eliminate individual differ­
In fact, in the last step of spatio-temporal feature extraction process
ences and train the classification model with outstanding generalization
(MS), the vector lengths expanded at three different scales were 480,
performance across subjects has become a challenging issue in this field.
528 and 1088, respectively. In the process of confusing domain distri­
Ma [33] et al. proposed the network structure of DG-DANN, in which
bution, the last layer from MS was first fully connected with a layer with
gradient reverse layer was used to achieve the adversarial effect between
200 nodes, and then contacted together to obtain a 600-node layer.
domain classifier and label classifier, eliminating the individual differ­
While in the process of label classification, the last layer from MS was
ences of each subject in the dataset.
first fully connected with two different layers. In detail, three vectors
Multi-scale Spatial and Temporal Features (MS): Bio-signals re­
with length of 480, 528, 1088 respectively were connected with fully-
cord a biological event in both space and time, including plantar pres­
connected layers with the number of nodes 200, 300, 500. Then, they
sure. Through multi-scale convolution, many researches had achieved
were connected to three 100-dimension vectors. Finally, three 100-
excellent performance both in classification and segmentation [34]. We
dimension vectors were contacted together to obtain a 300-node layer,
made full use of the physiological significance of continuous plantar
followed by a 100-node layer. Therefore, in addition to three kinds of
pressure data, as shown in Fig. 4. In addition, the input of DG-DANN was
features that increased physiological interpretability, MhNet also ob­
improved to MS-DG-DANN by extracting the multi-scale spatial and
tained 100 deep features in the last layer before classification.
temporal features of plantar pressure signal. In convolution, 8 × 4, 1 ×
20 and 16 × 2 convolution kernels were used, respectively. The step size
of 8 × 4 rectangular convolution kernels was set as 8 to extract the
respective spatial features of the left and right feet, namely the

Fig. 4. Schematic diagram of spatial and temporal features of two feet.

6
S. Wu et al. Computers in Biology and Medicine 144 (2022) 105355

Fig. 5. MS-DG-DANN Network Architecture. It includes multi-scale feature extraction module, label classifier and domain classifier, aiming at extract feature with
physiological significance and reduce the individual difference between source domain and target domain. In training phase, both label classifier and domain
classifier would be updated through back propagation. In testing phase, only the updated label classifier would be used to assess samples.

3.6. Voting mechanism 3.7. Leave-One-Subject-Out cross-validation

Samples in different segments were voted during the test with For the purpose of ensuring a certain accuracy of the classification
different thresholds. Since not all gaits of the elderly at risk of falling model under the condition of new sample input, the leave-One-Subject-
were abnormal, voting in segments was performed under the threshold Out cross-validation (LOSO-CV) was used for the learning of the two
of 10–70% in order to explore the optimal threshold for identifying high layers. Each subject represented an independent domain and served as
risk samples. The threshold of 10% indicated that the whole samples in the data in the target domain (Each subject took turns as a test set), while
the segment would be identified as high risk when at least 10% of all other data were combined and used as the source domain (training
samples in segment were classified as high risk samples during the test, set). In this way, these data could be utilized to the maximum extent and
and the same to 20–70%. As shown in Fig. 2, it should be noted that the the randomness of data sets and results caused by the hold-out method
voting samples were all arranged and tested in time order, which was the could be avoided, as well as the data leakage. Compared with hold-out
same as real daily monitoring. method and other randomly selection validation, LOSO-CV is more
In addition, in order to prevent deviation of results, there would be appropriate in biomedical signal process, especially where commercial
no overlap between segments. The samples of each segment went production and cross-subjects generalization ability are needed. Also,
through layer 1 and the output was two fall risk assessment results, the data of the remaining subjects were randomly shuffled and divided
including high risk and low risk/borderline. At the end of layer 1, into training set and validation set at a proportion of 80%–20% during
samples from segments that were identified as low risk and borderline the training phase.
would enter layer 2, while the rest of samples that were identified as In layer 1, all samples in the training set were used to train a biased
high risk would be directly outputted. At this point, some samples in a CNN. All samples of the remaining one subject were arranged according
segment already had identification results (samples identified as high to the order of gaits and divided into segments of different sizes of 3, 5,
risk in layer 1), and the remaining samples entering layer 2 would also 7, 9 and 11 samples. In layer 2, borderline samples used their original
get identification results. So far, all the samples arranged according to labels, namely high or low risk based on the Berg score, which was the
time order in a segment had obtained their respective fall risk assess­ main labelling method. The real-time output of MhNet was obtained by
ment results, and then it could be voted according to the threshold of averaging all 48 subjects with the accuracy obtained by LOSO-CV. The
10%, 20%, 30%, 40%, 50%, 60%, 70% within segment and the segment accuracy of the overall model was calculated jointly by the outputs of
accuracy of that subject could be obtained. layer 1 and layer 2 in time order. Actually, the accuracy corresponding
to each segment size represented the real-time accuracy under the

7
S. Wu et al. Computers in Biology and Medicine 144 (2022) 105355

number of gaits, which was consistent with segment size. lower than those of MS-DG-DANN, referring that the overall perfor­
mance of MS-DG-DANN was better than that of DG-DANN. Therefore,
4. Result and discussion compared with traditional feature extraction methods, multi-scale spa­
tio-temporal features (MS) could more easily extract comprehensive
In this section, the exploration of the optimal bias factor t in the loss information from plantar pressure that benefitted the fall risk
equation in layer 1 was firstly described for the study found that the assessment.
overall result of MhNet was related to t. Then, the differences between
DG-DANN and MS-DG-DANN were compared. Finally, the real-time 4.3. Real-time performance of MhNet
accuracy and other performance were tested by different methods to
verify the effect of real-time fall risk assessment. The above experi­ For comparing all these models fairly, we have tested several pa­
mental results confirmed the generality of this method and its general­ rameters and selected those which obtained the highest accuracy on the
ization performance across subjects. test set. The parameters of all methods used in the task of assess fall risk
are shown in Table 3. In addition, we applied dropout(rate = 0.5) before
4.1. Optimal t result in layer 1 the output layer, batch normalization after convolutional layer and
suitable L2 regularization parameters in order to solve over-fitting issue.
Fig. 6 shows the number of subjects with correct and incorrect fall All the experiments in this paper are implemented with Pytorch
risk identification of CNN corresponding to different values of t within framework with the version 1.4.0 in GPU RTX 1080Ti/2080Ti.
1–20. One subject could be regarded as correct one or incorrect one in Accuracy and F1 score: The real-time segment accuracy under each
layer 1 only if all samples of the subject were correctly or incorrectly segment size are shown in Table 4. Compared with deep learning
identified. Fig. 7 shows the comprehensive performance score after the methods such as LSTM, VggNet-16, AlexNet and DG-DANN, MhNet had
weighted sum of correct numbers Nc and wrong numbers Nic. Sw was the a great improvement in accuracy and F1 score. When segment size was 3
selection index of the optimal t. CNN had the best performance score of samples, MhNet led the other methods in accuracy by up to 11.94% in
10.5 at layer 1 when t = 19. the threshold of 50%. In the same way, MhNet led the other methods in
According to the mechanism of bias factor t in the loss function, the accuracy by up to 13.63%, 13.24%, 12.81%, 13.28% when segment size
ideal result should be that when t > 1, the weighted sum score should be was 5, 7, 9, 11 samples, respectively.
higher than that when t = 1. In fact, we could see from Fig. 7 that, The highest accuracy was 73.27% when voting segment sizes were 7
although not all t would increase the weighted sum score in layer 1, most samples, and the highest accuracy was also achieved by MhNet in all
of t in Fig. 6 would reduce the number of subjects with incorrect risk segment sizes under all voting threshold. That was to say, only 9 gaits
identification in layer 1. It referred that the improvement of bias factor t were needed to achieve the accuracy of 73.27% by using MhNet, which
for loss function was helpful to reduce the probability of misidentifica­ had a significant improvement in the real-time performance of plantar
tion in layer 1, which would affect performance of the whole MhNet. pressure monitoring. However, the RNN network such as LSTM, which
extracted temporal sequence context information, had low accuracy in
4.2. Performance of multi-scale spatio-temporal features all segment size and all thresholds when crossing subjects and therefore
cannot be applied to actual scenes. And the traditional convolutional
Multi-scale Spatio-Temporal Features were used to improve DG- network such as AlexNet and VggNet-16 performed better than RNN
DANN, so the role of MS in fall risk identification was analyzed by network, indicating that convolutional information was more efficient
comparing the performance of MS-DG-DANN and DG-DANN by LOSO- than temporal sequence context information in such small number of
CV. samples.
Based on all 48 subjects, the results were averaged in subject level. As By comparing the performance of MhNet under different segment
shown in Fig. 8, MS-DG-DANN performed better in accuracy and F1 sizes, it could be seen that when the segment size was 3 samples, 5
score, which indicated that MS was helpful for MhNet to improve ac­ samples, 7 samples and 11 samples, identification accuracies were all
curacy and model quality. However, both methods had poor sensitivity, higher than 72.9%. It was important to note that with the increase of
showing the necessity and urgency of identifying high risk samples. segment sizes, the accuracy of each classifier did not increase in positive
The box plot from DG-DANN and MS-DG-DANN was also shown in correlation, but slightly decrease when segment sizes were 9 and 11
Fig. 8. The median of DG-DANN was slightly higher than MS-DG-DANN, samples. Because the samples in the segment were arranged according to
while the upper limit and lower limit of DG-DANN were significantly time order, more samples increased the probability of introducing more
noise. And we found that the segment sizes of 3, 5, 7 samples were
capable of providing enough information and suited the voting mecha­
nism in time order.
By comparing the performance of different methods under different
voting thresholds, it showed that 10% was not a superb choice for fall
risk assessment, for the accuracy of all methods under the threshold of
10% were the lowest. On the contrary, threshold of 50% was the best in
all classifiers except AlexNet. AlexNet performed the best under the
threshold of 40% in segment size of 3, 5, 7, 9 samples. In the segment of
11 samples, AlexNet performed better under the threshold of 50%. Other
methods performed the best under the threshold of 50% in all segment
sizes. Based on the analysis of the above results, 50% was the most
appropriate threshold for segment voting.
The F1 score of MhNet was superior to other methods, indicating that
the proposed method had a better quality. With the increase of thresh­
olds, F1 score was slightly decreased in all methods, referring that the
thresholds needed to set in an appropriate range to keep the model
balanced. The closest method was VggNet-16, which was ahead of other
Fig. 6. CNN performance with different t in layer 1: t was the biased factor in methods except MhNet in both accuracy and F1 score, revealing that the
loss of layer 1. depth of the network and the method of feature extraction had a certain

8
S. Wu et al. Computers in Biology and Medicine 144 (2022) 105355

Fig. 7. Weighted sum score with different t: t was the biased factor in loss of layer 1.

Fig. 8. Performance for LOSO accuracies in MS-DG-DANN and DG-DANN.

sensitivity, which was nearly 77%, indicating that MhNet was adequate
Table 3
in identifying high risk samples, which was significant in daily moni­
Parameters of different methods in training.
toring for elderly people. In Table 5, all classifiers performed better in
Method Learning Rate L2 Regularization Parameter segment sizes of 7 samples and 9 samples. Moreover, traditional
AlexNet 1 × 10− 2
0.02 methods had poor performance in sensitivity in all segment sizes except
2
LSTM 1 × 10− 0.02 VggNet-16, which showed that their ability to identify high risk samples
3
VggNet-16 5 × 10− 0.01
3
was insufficient.
DG-DANN 1 × 10− 0.01
MhNet 1 × 10− 3
0.01
In Fig. 9, when 7 samples were used as voting segment size, AlexNet,
LSTM and DG-DANN tended to judge high risk samples as low risk
*Other configurations kept fixed in all methods when training, e.g. batchsize = samples, because their sensitivities were lower than 60% and specific­
32, optimizer type = Adam optimizer, epochs = 50, early stop was set.
ities were high. VggNet-16 had more balanced recognition ability and
MhNet had the best identification performance, which could be seen
effect on the identification accuracy and overall performance. from the proportion of TP and TN in the confusion matrix. LSTM and
Sensitivity and Specificity: Sensitivity showed the ability of iden­ VggNet-16 showed the low accuracy of identifying low risk samples
tifying high risk subjects, which was the key index of fall risk assessment from Fig. 9 (b) and (c). On the contrary, MhNet, AlexNet and DG-DANN
domain. Sensitivity and specificity could be seen in Fig. 9 and Table 5. had better performance in specificity. In summary, it could be seen that
Fig. 9 shows the confusion matrix of each classifier with segment size of MhNet was more sensitive to high risk samples and was more suitable
7 samples under the threshold of 50%, for MhNet achieved the best for the use scenarios of elderly people falling. In addition, MhNet had an
performance in this setting. Table 5 shows the trend and general situa­ absolute leading advantage in sensitivity and specificity, demonstrating
tion in sensitivity. We could know that MhNet performed well in

9
S. Wu et al. Computers in Biology and Medicine 144 (2022) 105355

Table 4
Real-time performance of different methods.
Method (t = 19) AlexNet LSTM VggNet-16 DG-DANN MhNet(Ours)

Segment size/Threshold Accuracy F1 Score Accuracy F1 Score Accuracy F1 Score Accuracy F1 Score Accuracy F1 Score
(%) (%) (%) (%) (%) (%) (%) (%) (%) (%)

Segment size = 10% 61.60 61.95 57.19 60.98 62.83 66.26 61.70 62.36 69.03 71.02
3 20% 61.60 61.95 57.19 60.98 62.83 66.26 61.70 62.36 69.03 71.02
30% 61.60 61.95 57.19 60.98 62.83 66.26 61.70 62.36 69.03 71.02
40% 60.91 56.69 60.18 57.84 64.83 63.93 62.41 57.76 72.12 72.02
50% 60.91 56.69 60.18 57.84 64.83 63.93 62.41 57.76 72.12 72.02
60% 60.91 56.69 60.18 57.84 64.83 63.93 62.41 57.76 72.12 72.02
70% 60.19 51.30 53.93 47.78 64.01 56.87 59.60 49.22 69.25 67.23
Segment 10% 61.33 63.21 56.08 61.92 61.40 66.53 61.07 63.50 66.93 69.82
size¼5 20% 61.33 63.21 56.08 61.92 61.40 66.53 61.07 63.50 66.93 69.82
30% 61.57 59.96 59.12 60.39 64.17 66.08 62.31 61.28 71.65 72.60
40% 61.57 59.96 59.12 60.39 64.17 66.08 62.31 61.28 71.65 72.60
50% 61.16 56.53 59.33 56.99 65.28 64.31 62.75 58.03 72.96 72.94
60% 61.16 56.53 59.33 56.99 65.28 64.31 62.75 58.03 72.96 72.94
70% 60.77 53.86 56.20 52.09 64.96 61.16 60.16 52.16 71.93 70.60
Segment 10% 61.01 64.10 55.82 62.80 59.94 66.26 59.82 63.68 65.54 69.13
size¼7 20% 61.80 61.38 57.87 61.04 63.26 66.72 62.72 63.09 70.45 72.00
30% 61.74 59.13 59.17 59.25 64.56 65.95 63.10 60.65 72.62 73.19
40% 61.74 59.13 59.17 59.25 64.56 65.95 63.10 60.65 72.62 73.19
50% 61.36 56.87 60.03 58.17 65.96 65.00 63.21 58.15 73.27 73.17
60% 60.67 54.30 59.98 54.89 65.70 62.07 59.77 52.29 71.88 71.18
70% 60.67 54.30 59.98 54.89 65.70 62.07 59.77 52.29 71.88 71.18
Segment 10% 61.20 64.96 54.62 62.56 58.94 66.40 58.14 63.18 64.16 68.28
size¼9 20% 61.18 61.94 57.87 62.22 63.15 67.59 62.28 63.54 69.45 71.49
30% 61.97 60.46 58.21 60.18 63.88 66.57 63.04 62.17 71.94 72.99
40% 61.97 60.46 58.21 60.18 63.88 66.57 63.04 62.17 71.94 72.99
50% 61.32 56.36 60.23 57.74 65.59 64.44 63.33 58.44 73.04 73.09
60% 61.06 55.00 59.84 56.52 65.35 62.65 60.81 54.15 72.60 72.13
70% 60.02 52.77 59.56 53.30 64.95 58.95 58.56 50.20 71.11 69.80
Segment 10% 61.64 62.94 57.22 62.38 62.21 67.15 62.43 64.13 68.20 70.79
size¼11 20% 62.43 62.20 57.85 60.66 63.15 66.53 62.36 60.60 70.67 72.17
30% 63.19 61.25 58.72 59.66 62.98 65.37 62.32 59.13 72.48 73.10
40% 61.40 57.97 59.23 58.47 64.04 64.57 62.50 58.21 72.94 73.27
50% 61.41 56.42 59.68 57.73 65.86 65.30 62.84 57.66 72.96 72.95
60% 60.79 54.71 59.37 56.49 65.47 62.44 60.65 54.00 72.89 72.52
70% 60.42 53.36 59.17 54.39 65.38 60.87 58.64 50.11 72.52 71.78

Fig. 9. Confusion matrixes of 5 classifiers in segment size of 7 samples, (a) AlexNet, (b) LSTM, (c) VggNet-16, (d) DG-DANN, (e) MhNet. TP: Number of high risk
samples that were correctly predicted, TN: Number of low risk samples that were correctly predicted, FN: Number of high risk samples that were wrongly predicted,
FP: Number of low risk samples that were wrongly predicted.

could be seen that MhNet had best performance in segment size of 7


Table 5
samples from Table 4, so that the comparison between different classi­
Performance of sensitivity between different methods in threshold of 50% (%).
fiers was only done in segment size of 7 samples and threshold of 50%.
Method AlexNet LSTM VggNet- DG- MhNet As shown in Fig. 10, the median and lower limit of MhNet were far
16 DANN (ours)
greater than those of other methods. Moreover, AlexNet, LSTM, DG-
Segment size =3 53.61 57.26 65.32 53.86 75.20 DANN had poor performance in lower limit, referring that it had a
Segment size =5 52.93 56.47 65.54 53.97 76.38
wild distribution difference of identification accuracy from these clas­
Segment size =7 53.38 58.24 66.25 53.58 76.41
Segment size =9 52.34 56.95 65.35 54.03 76.72 sifiers, which showed randomness in daily usage easier than other
Segment size = 52.34 57.71 67.31 53.02 76.41 classifiers. Similarly, the values at the top of the box represented the p-
11 value of each two methods and also showed the differences between
different methods. In summary, it could be seen that MhNet had more
concentrated accuracy in identifying and judging the risk of falling when
that MhNet was a superior and more balanced method.
facing unknown subjects, and was also more reliable in daily
Accuracy distribution of various subjects: To further understand
application.
the classification accuracy and distribution of different models of each
ROC Curve and AUC Score: The ROC curve and AUC of each
method, a box diagram was drawn and the results are shown in Fig. 10. It

10
S. Wu et al. Computers in Biology and Medicine 144 (2022) 105355

training through feature engineering, lacking interpretability


exploration of deep features.
4) MhNet reduced the individual difference between source and target
domain, improving the ability of assessing fall risk across subjects.
Compared with MhNet, CNN [20] was built to carry out 5-fold
validation based on acceleration data. In Ref. [30], authors used
the same plantar pressure data as in this paper to classify fall risk in
non-cross-subjects situation. Both of them were better than MhNet in
accuracy but neither of them used cross-subjects validation method,
so the problems caused by information leakage might exist.

Weaknesses of MhNet were summarized as follows, 1) The accuracy


of MhNet needs to be further improved. 2) The self-built data set is small
which needs to be expanded.
Fig. 10. Box plots for LOSO accuracies in different classifiers in segment size of
7 samples and threshold of 50%. The values at the top refer to p-value. 4.4. Performance of MhNet in different data length

classifier in segment size of 7 samples is shown in Fig. 11, and the bal­ In order to explore the assessment performance of each model under
ance of each method is further analyzed. When segment size was set as 7 different length of gait data, we added experiment and compared the
samples, the ROC of MhNet was significantly higher than that of other results under the sample length of 3 gaits, 5 gaits and 7 gaits when
methods, and its AUC also performed well, reaching 77.30%, which was segment size = 7 (voting under 7 samples) and threshold = 50%. The
higher more than 5.39% compared with other methods. Followed by optimal t value was used as the bias factor of layer 1 for data of three
VggNet-16, it indicated that MhNet was excellent and stable enough. different lengths, among which optimal t were 19, 18, 2 when gait
The results showed that MhNet was obviously superior to other methods numbers were 3, 5, 7, respectively. In addition, samples with a data
and was suitable for real-time detection and early warning of falls in the length of 3 gaits needed to be overlapped in 2 gaits during preprocess­
elderly. ing. In the same way, 5-gait samples and 7-gait samples needed to be
Strengths of MhNet were summarized as follows. overlapped in 3 gaits and 5 gaits during preprocessing to achieve data
augmentation.
1) MhNet achieved an adequate accuracy of fall risk assessment under The results were shown in Table 6, where accuracy of MhNet was the
Leave-One-Subject-Out validation for real-time applications, veri­ highest in all gait numbers, and the results of 5-gait length and 7-gait
fying its adaptability of cross-subjects situation to deal with un­ length were inferior to those of 3-gait length. The accuracy of the
known samples. More importantly, MhNet had improved the real- input of the 3, 5, 7 gaits was 73.27%, 70.8% and 69.0%, respectively.
time risk assessment, requiring only 9 gaits to reach the accuracy However, in all five methods, the accuracy decreased as the number of
of 73.27%. Under hold-out validation, multi-task learning [26] was gaits in each sample increased.
used to achieve accuracy of 70%. Under LOSO-CV, compared with The reason for this might be that the increase of the gait numbers of a
MhNet, Bi-LSTM [21] was constructed to achieve overall voting ac­ single sample led to the decrease of the total data amount. While for the
curacy of 86% which needed all the IMU data from PwMS patients (at deep network, a smaller data amount meant the increase of the possi­
least 50 gaits), which was slightly inadequate in real-time bility of over-fitting, leading to a slight decline in the accuracy under
applications. LOSO. Thus, in the experiment comparing different input lengths of a
2) Sensitivity of MhNet was over 76%, indicating the ability of MhNet single sample, as shown in Table 6, it could be seen that data pre­
to identify high risk samples that were significant in health moni­ processing with three gaits as a sample length was the best, which was
toring. Also, MhNet’s sensitivity was at least 10% higher than more suitable for subsequent feature extraction of fall risk assessment
traditional methods. task.
3) MhNet used a physiologically interpretive feature extraction method
to extract multi-scale deep features with spatio-temporal information 4.5. Limitation
under plantar pressure, which provided physiological meaning for
deep features. However, other related works only used the deep There are undoubtedly some limitations in this work. We had
features of the model, or constructed feature vector in advance for confirmed the evaluability of fall risk in flat walking, but we did not
consider the generalization ability of the model in all kinds of gaits
variation in real daily. Therefore, we will try to expand the dataset and
add multiple types of gaits. In addition, we need to fully use borderline

Table 6
Real-time performance of different data length.
Method (Threshold = 50%) Gait Number/Sample

3 (t = 19) 5 (t = 18) 7 (t = 2)

AlexNet Accuracy (%) 61.4 60.6 61.0


F1 Score (%) 56.9 60.5 61.0
LSTM Accuracy (%) 60.0 54.7 56.9
F1 Score (%) 58.2 54.6 56.9
VggNet-16 Accuracy (%) 66.0 64.0 61.5
F1 Score (%) 65.0 63.5 61.1
DG-DANN Accuracy (%) 63.2 65.9 64.6
Fig. 11. ROC curve in 5 methods(segment size = 7 samples, threshold = 50%). F1 Score (%) 58.2 65.8 64.5
MhNet (Ours) Accuracy (%) 73.3 70.8 69.0
ROC curves reveal the difference between each model. AUC is the area under
F1 Score (%) 73.2 70.8 69.0
the ROC curve, reflecting the performance of each model numerically.

11
S. Wu et al. Computers in Biology and Medicine 144 (2022) 105355

as the third class that should necessarily assessed in daily life, instead of [8] T. Hortobágyi, P. Rider, A.H. Gruber, P. DeVita, Age and muscle strength mediate
the age-related biomechanical plasticity of gait, Eur. J. Appl. Physiol. 116 (4)
only regarding it as the transitional samples in layer1.In order to figure
(2016) 805–814, https://doi.org/10.1007/s00421-015-3312-8.
out which sensor contributes most in the final output, we will construct a [9] M.D. Eriksen, N. Greenhalgh-Stanley, G.V. Engelhardt, Home safety, accessibility,
model that can choose the most valued channel of plantar pressure and elderly health: evidence from falls, J. Urban Econ. 87 (2013) 14–24, https://
automatically in the future. doi.org/10.2139/ssrn.2344916.
[10] R. Alkhatib, M.O. Diab, C. Corbier, M. El Badaoui, Machine learning algorithm for
gait analysis and classification on early detection of Parkinson, IEEE Sensors Lett. 4
5. Conclusion (6) (2020) 1–4, https://doi.org/10.1109/LSENS.2020.2994938.
[11] R. Ramesh, L. Irene, J. Tzyy-Ping, Fall prediction and prevention systems: recent
trends, challenges, and future research directions, Sensors 17 (11) (2017) 2509,
In this paper, we studied the fall risk assessment based on the plantar https://doi.org/10.3390/s17112509.
pressure data of the elderly in cross-subject situation, where MhNet [12] S.J.M. Bamberg, A.Y. Benbasat, D.M. Scarborough, D.E. Krebs, J.A. Paradiso, Gait
analysis using a shoe-integrated wireless sensor system, IEEE Trans. Inf. Technol.
was proposed, which was a hierarchical cross-subject fall risk assess­
Biomed. 12 (4) (2008) 413–423, https://doi.org/10.1109/titb.2007.899493.
ment network, and could deal with unknown samples in practical [13] S.J. Preece, J.Y. Goulermas, L.P. Kenney, D. Howard, A comparison of feature
application. Compared with traditional methods, MhNet showed a great extraction methods for the classification of dynamic activities from accelerometer
data, IEEE (Inst. Electr. Electron. Eng.) Trans. Biomed. Eng. 56 (3) (2008) 871–879,
improvement in accuracy and sensitivity (higher than 72% in accuracy
https://doi.org/10.1109/TBME.2008.2006190.
and higher than 75.20% in sensitivity), revealing an acceptable reli­ [14] Q. Zhang, Y.L. Wang, Y. Xia, X. Wu, T.V. Kirk, X.D. Chen, A low-cost and highly
ability in real-time fall risk assessment using only 9 gaits. MhNet has integrated sensing insole for plantar pressure measurement, Sens. Bio-Sens. Res. 26
potential and application prospects in identifying high fall risk for the (2019) 100298, https://doi.org/10.1016/j.sbsr.2019.100298.
[15] Z. Li, B. Zhang, K. Li, T. Zhang, X. Yang, A wide linearity range and high sensitivity
elderly. In the future work, these four directions are planned to be flexible pressure sensor with hierarchical microstructures via laser marking,
explored: 1) Innovation of the domain adaptation module in response to J. Mater. Chem. C 8 (9) (2020) 3088–3096, https://doi.org/10.1039/
complex individual differences, 2) Construction of multi-channel feature C9TC06352H.
[16] D. Wang, J. Ouyang, P. Zhou, J. Yan, L. Shu, X. Xu, A novel low-cost wireless
extraction method based on pre knowledge of physiological data; 3) footwear system for monitoring diabetic foot patients, IEEE Trans. Biomed. Circ.
Improvement of generalization performance of models trained from Syst. 15 (1) (2020) 43–54, https://doi.org/10.1109/TBCAS.2020.3043538.
small dataset. 4) Study of physiological or biomechanical interpret­ [17] J. Howcroft, E.D. Lemaire, J. Kofman, Prospective elderly fall prediction by older-
adult fall-risk modeling with feature selection, Biomed. Signal Process Control 43
ability of deep models in the hope of obtaining high performance models (2018) 320–328, https://doi.org/10.1016/j.bspc.2018.03.005.
that are more suitable for daily life. [18] J.R. Silva, I. Sousa, J.S. Cardoso, Fusion of clinical, self-reported, and multisensor
data for predicting falls, IEEE J. Biomed. Health Inf. 24 (1) (2019) 50–56, https://
doi.org/10.1109/JBHI.2019.2951230.
Data availability statement [19] J. Silva, J. Madureira, C. Tonelo, D. Baltazar, C. Silva, A. Martins, I. Sousa,
Comparing machine learning approaches for fall risk assessment, in: International
Research data are not shared. Conference on Bio-inspired Systems and Signal Processing 5, 2017, pp. 223–230,
https://doi.org/10.5220/0006227802230230.
[20] Q. Zhang, S. Zhu, Real-time activity and fall risk detection for aging population
using deep learning, in: 2018 9th IEEE Annual Ubiquitous Computing, Electronics
Declaration of competing interest & Mobile Communication Conference, 2018, pp. 1055–1059, https://doi.org/
10.1109/UEMCON.2018.8796672.
[21] B.M. Meyer, L.J. Tulipani, R.D. Gurchiek, D.A. Allen, L. Adamowicz, D. Larie, R.
The authors declare that they have no known competing financial
S. McGinnis, Wearables and deep learning classify fall risk from gait in multiple
interests or personal relationships that could have appeared to influence sclerosis, IEEE J. Biomed. Health Inf. 25 (5) (2020) 1824–1831, https://doi.org/
the work reported in this paper. 10.1109/JBHI.2020.3025049.
[22] M.T. Martinez, P.D. Leon, Falls risk classification of older adults using deep neural
networks and transfer learning, IEEE J. Biomed. Health Inf. 24 (1) (2019) 144–150,
Acknowledgement https://doi.org/10.1109/JBHI.2019.2906499.
[23] C. Tunca, G. Salur, C. Ersoy, Deep learning for fall risk assessment with inertial
sensors: utilizing domain knowledge in spatio-temporal gait parameters, IEEE J.
This work was supported in part by the Technology Program of Biomed. Health Inf. 24 (7) (2019) 1994–2005, https://doi.org/10.1109/
Guangzhou under Grant 202002030354 and Grant 202002030262, in JBHI.2019.2958879.
part by the Science and Technology Project of Zhongshan under Grant [24] D. Chahyati, R. Hawari, Fall Detection on Multimodal Dataset Using Convolutional
Neural Netwok and Long Short Term Memory, 2020 International Conference on
2019AG024 and Grant 2020B2053, in part by the Natural Science Advanced Computer Science and Information Systems, 2020, pp. 371–376, https://
Foundation of Guangdong Province under Grant 2018A030310407, in doi.org/10.1109/ICACSIS51025.2020.9263201.
part by the Guangzhou Key Laboratory of Body Data Science under [25] I. Kiprijanovska, H. Gjoreski, M. Gams, Detection of gait abnormalities for fall risk
assessment using wrist-worn inertial sensors and deep learning, Sensors 20 (18)
Grant 201605030011, in part by the Major Science and Technology
(2020) 5373, https://doi.org/10.3390/s20185373.
Projects in Guangdong Province under Grant 2016B010108008, and in [26] A. Nait Aicha, G. Englebienne, K.S. Van Schooten, M. Pijnappels, B. Kröse, Deep
part by National Key Research and Development Project learning to predict falls in older adults based on daily-life trunk accelerometry,
(2020YFC2005700). Sensors 18 (5) (2018) 1654, https://doi.org/10.3390/s18051654.
[27] S. Pardoel, G. Shalin, J. Nantel, E.D. Lemaire, J. Kofman, Early detection of
freezing of gait during walking using inertial measurement unit and plantar
References pressure distribution data, Sensors 21 (6) (2021) 2246, https://doi.org/10.3390/
s21062246.
[28] K.J. Merry, E. Macdonald, M. MacPherson, O. Aziz, E. Park, M. Ryan, C.J. Sparrey,
[1] World Health Organization, World Health Statistics, 2018. https://www.who.
Classifying sitting, standing, and walking using plantar force data, Med. Biol. Eng.
int/news-room/fact-sheets/detail/falls/. (Accessed 23 February 2022), 2018.
Comput. 59 (1) (2021) 257–270, https://doi.org/10.1007/s11517-020-02297-4.
[2] A. Ozcan, H. Donat, N. Gelecek, M. Ozdirenc, D. Karadibak, The relationship
[29] M. Suzuki, R. Yamamoto, Y. Ishiguro, H. Sasaki, H. Kotaki, Deep learning
between risk factors for falling and the quality of life in older adults, BMC Publ.
prediction of falls among nursing home residents with Alzheimer’s disease, Geriatr.
Health 5 (1) (2005) 1–6, https://doi.org/10.1186/1471-2458-5-90.
Gerontol. Int. 20 (6) (2020) 589–594, https://doi.org/10.1111/ggi.13920.
[3] Centers for Disease Control and Prevention, Injury Prevention and Control, 2018.
[30] S. Liang, Y. Liu, G. Li, G. Zhao, Elderly fall risk prediction with plantar center of
https://www.cdc.gov/injury/wisqars/. (Accessed 23 February 2022).
force using ConvLSTM algorithm, in: IEEE International Conference on Cyborg and
[4] G. Zhao, L. Chen, H. Ning, Sensor-based fall risk assessment: A survey, Healthcare 9
Bionic Systems, 2019, pp. 36–41, https://doi.org/10.1109/
(11) (2021) 1448, https://doi.org/10.3390/healthcare9111448.
CBS46900.2019.9114487, 2019.
[5] A.F. Ambrose, G. Paul, J.M. Hausdorff, Risk factors for falls among older adults: a
[31] Q. Zhang, Deep Learning of Biomechanical Dynamics in Mobile Daily Activity and
review of the literature, Maturitas 75 (1) (2013) 51–61, https://doi.org/10.1016/j.
Fall Risk Monitoring, IEEE Healthcare Innovations and Point of Care Technologies,
maturitas.2013.02.009.
2019, pp. 21–24, https://doi.org/10.1109/HI-POCT45284.2019.8962763, 2019.
[6] L.Z. Rubenstein, Falls in older people: epidemiology, risk factors and strategies for
[32] Y. Wu, R. Huang, H. Ge, Research on gait detection algorithm based on plantar
prevention, Age Ageing 35 (suppl_2) (2006) 37–41, https://doi.org/10.1093/
pressure, J. Phys. Conf. 1549 (2020), 022068, https://doi.org/10.1088/1742-
ageing/afl084.
6596/1549/2/022068.
[7] M.E. Tinetti, M. Speechley, S.F. Ginter, Risk factors for falls among elderly persons
living in the community, N. Engl. J. Med. 319 (26) (1988) 1701–1707, https://doi.
org/10.1056/NEJM198812293192604.

12
S. Wu et al. Computers in Biology and Medicine 144 (2022) 105355

[33] B.Q. Ma, H. Li, Y. Luo, B.L. Lu, Depersonalized Cross-Subject Vigilance Estimation [39] J. Hong, Y. Luo, M. Mou, J. Fu, Y. Zhang, W. Xue, F. Zhu, Convolutional neural
with Adversarial Domain Generalization, 2019 International Joint Conference on network-based annotation of bacterial type IV secretion system effectors with
Neural Networks, 2019, pp. 1–8, https://doi.org/10.1109/IJCNN.2019.8852347. enhanced accuracy and reduced false discovery, Briefings Bioinf. 21 (5) (2020)
[34] B. Ji, J. Ren, X. Zheng, C. Tan, R. Ji, Y. Zhao, K. Liu, A multi-scale recurrent fully 1825–1836, https://doi.org/10.1093/bib/bbz120.
convolution neural network for laryngeal leukoplakia segmentation, Biomed. [40] A. Pinto, G.A. De Assis, L.C. Torres, T. Beltrame, D.M. Domingues, Wearables and
Signal Process Control 59 (2020) 101913, https://doi.org/10.1016/j. detection of falls: a comparison of machine learning methods and sensors
bspc.2020.101913. positioning, Neural Process. Lett. (2022) 1–15, https://doi.org/10.1007/s11063-
[35] L. Shu, K.Y. Mai, X.M. Tao, Y. Li, W.C. Wong, K.F. Lee, C.P. Yuen, Monitoring 021-10724-2.
diabetic patients by novel intelligent footwear system, in: 2012 International [41] K. Mcmanus, B.R. Greene, L.G.M. Ader, B. Caulfield, B, Development of data-driven
Conference on Computerized Healthcare, 2012, pp. 91–94, https://doi.org/ metrics for balance impairment and fall risk assessment in older adults, IEEE (Inst.
10.1109/ICCH.2012.6724478. Electr. Electron. Eng.) Trans. Biomed. Eng. (2022) 1, https://doi.org/10.1109/
[36] P. Melillo, R. Castaldo, G. Sannino, A. Orrico, G. De Pietro, L. Pecchia, Wearable TBME.2022.3142617, 1.
technology and ECG processing for fall risk assessment, prevention and detection, [42] J. Hong, Y. Luo, Y. Zhang, J. Ying, W. Xue, T. Xie, T.F. Zhu, Protein functional
in: 2015 37th Annual International Conference of the IEEE Engineering in annotation of simultaneously improved stability, accuracy and false discovery rate
Medicine and Biology Society, 2015, pp. 7740–7743, https://doi.org/10.1109/ achieved by a sequence-based deep learning, Briefings Bioinf. 21 (4) (2020)
EMBC.2015.7320186. 1437–1447, https://doi.org/10.1093/bib/bbz081.
[37] D. Giansanti, G. Maccioni, S. Cesinaro, F. Benvenuti, V. Macellari, Assessment of [43] M. Yağanoğlu, Real time wearable speech recognition system for deaf persons,
fall-risk by means of a neural network based on parameters assessed by a wearable Comput. Electr. Eng. 91 (2021) 107026, https://doi.org/10.1016/j.
device during posturography, Med. Eng. Phys. 30 (3) (2008) 367–372, https://doi. compeleceng.2021.107026.
org/10.1016/j.medengphy.2007.04.006. [44] B.R. Greene, I. Premoli, K. McManus, D. McGrath, B. Caulfield, Predicting fall
[38] M. Yağanoğlu, C. Köse, Wearable vibration based computer interaction and counts using wearable sensors: a novel digital biomarker for Parkinson’s disease,
communication system for deaf, Appl. Sci. 7 (12) (2017) 1296, https://doi.org/ Sensors 22 (1) (2022) 54, https://doi.org/10.3390/s22010054.
10.3390/app7121296.

13

You might also like