Tmech 2018

See discussions, stats, and author profiles for this publication at: https://www.researchgate.
net/publication/328153268
Deep Full-Body Motion Network (DFM-Net) for a Soft Wearable Motion Sensing
Suit
Article in IEEE/ASME Transactions on Mechatronics · October 2018

DOI: 10.1109/TMECH.2018.2874647
CITATIONS READS
101 2,704
5 authors, including:
Dooyoung Kim Junghan Kwon

Korea Advanced Institute of Science and Technology Seoul National University
4 PUBLICATIONS 242 CITATIONS 14 PUBLICATIONS 514 CITATIONS
SEE PROFILE SEE PROFILE
Sungho Jo
Korea Advanced Institute of Science and Technology
155 PUBLICATIONS 2,787 CITATIONS
SEE PROFILE
All content following this page was uploaded by Sungho Jo on 11 February 2019.
The user has requested enhancement of the downloaded file.

IEEE/ASME TRANSACTIONS ON MECHATRONICS 1
Deep Full-Body Motion Network (DFM-Net)

for a Soft Wearable Motion Sensing Suit
Dooyoung Kim1 , Junghan Kwon2 , Seunghyun Han1 , Yong-Lae Park2 , Sungho Jo1
Abstract—Soft sensors are becoming more popular in wear-

ables as a means of tracking human body motions due to
their high stretchability and easy wearability. However, previous
research not only was limited to only certain body parts but
also showed problems in both calibration and processing of
the sensor signals, which are caused by the high nonlinearity
and hysteresis of the soft materials and also by misplacement
and displacement of the sensors during motion. Although this
problem can be alleviated through redundancy by employing an
increased number of sensors, it will lay another burden of heavy
processing and power consumption. Moreover, complete full-
body motion tracking has not been achieved yet. Therefore, we
propose use of deep learning for full-body motion sensing, which
significantly increases efficiency in calibration of the soft sensor
and estimation of the body motions. The sensing suit is made of
stretchable fabric and contains 20 soft strain sensors distributed
on both the upper and the lower extremities. Three athletic
motions were tested with a human subject, and the proposed Fig. 1. (a) Prototype of the soft wearable sensing suit for full-body motion
learning-based calibration and mapping method showed a higher tracking. (b) Corresponding 3-D skeleton reconstructed using DFM-Net.
accuracy than traditional methods that are mainly based on
mathematical estimation, such as linear regression.
Index Terms—soft sensors, body motion tracking, soft wear- Although previous studies have shown feasibility of motion
ables, deep learning sensing of the human body using soft sensors, they have been
mostly limited to certain areas, and full-body motion tracking
I. I NTRODUCTION using soft sensors has not been reported yet, to the best of our
knowledge [5–7, 10]. In order to implement this, the following
H UMAN motion tracking has been widely used in various

research and commercial applications, such as biome-
chanics study, rehabilitation, three-dimensional (3-D) anima-
challenges should be addressed.
First, the soft sensors typically show high nonlinearity and
hysteresis in response [11, 12]. Moreover, it is more difficult
tion for video games and movies, and virtual/augmented reality
to estimate body motions from the sensor signals if multiple
(VR/AR) [1–4].
mechanical stimuli, such as strain and pressure are applied.
One of the main requirements for motion tracking is to
Although the pressure effect can be minimized by carefully
accurately estimate the position and orientation of the joints or
selecting the locations of the sensors [5], it makes the design
the segments of the human body based on the measurements
process complicated. It has recently been proposed to solve the
from the sensors. To meet this requirement, soft wearable
problem of nonlinearity and hysteresis of soft sensors using
sensors are becoming popular due to their high stretchability
deep learning [13], but it was limited only to a single sensor.
and easy wearability. Soft sensors can be either directly
Another challenge is calibration of a large number soft
attached to the skin of the wearer or sewn on a garment to be
sensors integrated to the suit, which makes the calibration
worn, and track the motions of the specific body parts, such as
process time-consuming and complex. A human body has
ankles, knees, hips, shoulders, and hands, without restricting
many joints with a single DOF or multiple DOFs and, the
the natural degrees of freedom (DOF) of the wearer [5–9].
same as or more number of sensors than the number of DOFs
This work was supported in part by the National Research Foundation of in each joint are required. Although it has been proposed to
Korea (NRF) Grant funded by the Korean Government (MSIT) under Grant calibrate the multiple sensors using linear regression [6], it
NRF-2016R1A5A1938472 and in part by Institute for Information & commu-
nications Technology Promotion (IITP) Grant funded by the MSIT (No.2017- was limited to a single joint.
0-01778). Korea Advanced Institute of Science and Technology (Dooyoung Therefore, we propose a deep full-body motion network
Kim and Junghan Kwon contributed equally to this work.) (Corresponding (DFM-Net) for tracking full-body motions using a soft wear-
authors: Yong-Lae Park and Sungho Jo)
1 Dooyoung Kim, Seunghyun Han, and Sungho Jo are with School of able sensing suit (Fig. 1) in this paper. The suit contains 20 soft
Computing, KAIST. Daejeon 305-701, Republic of Korea. {dykim07, shann, microfluidic strain sensors distributed both on the upper and
shjo}@kaist.ac.kr the lower bodies, and they are simultaneously calibrated and
2 Junghan Kwon and Yong-Lae Park are with Department of Mechanical
Engineering, Seoul National University, Seoul, 08826, Republic of Korea. processed to estimate the 3-D body motions. A deep neural
{jhkwon, ylpark}@snu.ac.kr network for temporal sequence modeling was implemented to
take care of nonlinearity and hysteretic responses of the soft

OP z OP
sensors. Moreover, it was constructed to adequately represent
the relationship between the motions from multi-DOF body
joints and the sensing data. In order to verify our learning-
~
based calibration and mapping approach, three different ath-
letic motions were tested with a human subject, and the test ZkGG
result showed a higher accuracy than traditional methods that OP
were mainly based on mathematical estimation, such as linear OP z
{G h
regression.
The remainder of this paper is organized as follows. A brief
survey of existing motion tracking and soft sensors is presented
iG t G
in Section II. Section III shows our soft wearable sensing
OTP
suit and its signal characteristics. Section IV discusses the
structure of our learning-based calibration process, followed OP OP oGG
G
by experimental results that evaluates the performance of the
proposed method in Section V. Finally, we conclude our
research in Section VI. np
II. R ELATED W ORK

Body motion tracking has been a long-standing question in Fig. 2. Fabrication process of the soft strain sensor.
biomechanics and rehabilitation, and various approaches have
been proposed using different types of devices, such as electro- D
goniometers [14, 15], camera-based optical systems [16, 17], PP
and inertial measurement units (IMUs) [18–20]. In spite of
PP
their success in some level, they still have limitations. For
example, electrogoniometers have difficulty in detecting multi-
DOF joint motions. Although optical systems are highly useful E PP
for full-body motions with high accuracy, they are limited to
only indoor use due to the multiple cameras fixed around
the subject. IMU systems are able to overcome this space
limitation, but they show errors with high-speed motions and
position drifts for a long-term measurement. Furthermore,
rigid electronics needed to be attached to different locations Fig. 3. Prototype of the soft strain sensor in (a) original length, and (b) fully
of the body, which may cause discomfort to the wearers. stretched.
To address the above issues, a lower-limb soft wearable
sensing suit for gait measurement has been developed using
cover the full range of body motion, ii) to be easy to wear
microfluidic soft strain sensors [5]. Although it showed fea-
and use, and iii) to provide a higher accuracy than that of
sibility of using soft wearable sensors, the tracking motions
commercial home-entertainment products, such as Kinect [4]
were limited in a sagittal plane, and the calibration was
that has an root-mean-square error (RMSE) of 0.12 m [21].
based on simple linear fitting that does not fully represent
the complex body motions and the nonlinear characteristic of
III. S OFT W EARABLE S ENSING S UIT
the soft sensors. Another approach has been made to detect
3-D motions of the ankle joint using multiple capacitive soft A. Design and Fabrication of the Soft Strain Sensor
sensors [6]. In this case, linear regression was used for cali- The design of the soft sensors was based on our previous
bration of joint angles from multiple sensor signals. Although work [5, 22], and a simplified fabrication process was devel-
the device was easy to wear and able to detect the 3-D ankle oped, as shown in Fig. 2.
motions, it showed installation issues on alignment, anchoring, A silicone (Ecoflex 50, Smooth-On Inc.) layer was cast
and slippage of the soft sensors caused by deformations of using a 3-D printed mold and bonded to a spin-coated bot-
the human body [10]. Soft sensors have also been used for tom silicone layer. Then, a liquid metal compound (eutectic
upper body tracking [7]. Two piezo-resistive soft sensors gallium-indium; eGaIn) was injected into the microchannel,
were attached to the shoulder for detecting two-DOF shoulder and signal wires were plugged into the microchannel directly.
motions. Although it showed capability of detecting multi- Next, both ends of the sensor were reinforced with mesh fabric
DOF joint motions, the method of direct attachment to the using silicone adhesive. Finally, hook and loop fasteners were
skin may significantly reduce the practicality of the system as directly sewn to the mesh fabric for attachment of the sensor to
the number of sensors increases. the full-body suit. The sewing process holds the signal wires
Therefore, to be practical for applications, such as rehabil- tight and helps to prevent the signal wires from being pulled
itation, gaming, VR/AR, etc., the soft sensing suits need i) to out of the sensor.
The sensor is made of only soft material that makes itself TABLE I
easily wearable and lightweight. It can operate up to over L OCATIONS AND TARGET JOINTS OF THE ATTACHED SENSORS
130% strain due to its high stretchability (Fig. 3). When Sensor ID Location Target joint
stretched, the embedded microchannel increases electrical re- 01, 02 Elbows Forearms
sistance based on the increase and the decrease of its length 03, 04 Top of trapezius Upper arms and Shoulders
and the cross-section, respectively. We assume that the effect 05, 06 Pectoralis major Upper arms and Shoulders
07, 08 Back of deltoideus Shoulders and spine
of temperature change of the human body are negligible based 09, 10 Upper side of latissimus dorsi Spine
on the result of our previous work [23]. 11, 12 Lower side of latissimus dorsi Spine
13, 14 Flanks Spine
The resistance changes of the soft sensors are measured by 15, 16 Hip Thighs
a simple voltage-divider circuit [13] and a 16-bit analog-to- 17, 18 Side of hip Thighs
digital converter (ADC) data acquisition module (NI USB- 19, 20 Fore side of knee Shins
6259, National Instruments) [24]. The measured data is trans-
ferred to the processing computer through universal serial bus
(USB) interface.
B. Sensor Placement
The location of the soft sensors were carefully selected in
consideration of the position of the body joints as well as the
position of the joint muscles, as shown in Table I and Fig. 4(a).
A total of 20 soft strain sensors were attached on the sensing
suit: six and 14 sensors on the lower and the upper bodies,
respectively. The soft sensors were attached directly to the
elbow (ID: 01, 02) and the knee (ID: 19, 20) joints that have
a single DOF [5]. To measure both bending and twisting of
the hip joint simultaneously, four sensors were attached to the
back (ID: 15, 16) and the side (ID: 17, 18) of the hip.
In contrast to single-DOF joints, the upper body joints,
especially the spine and the shoulder, make complex motions
using multiple joints and muscles. Therefore, we attached
multiple sensors to the upper body and predicted the mo-
tion of the spine, shoulder, and upper arms by weighted
composition of the sensor outputs. Since the movements of
the spine are generated by the muscles around them, we
attached a total of eight sensors on these muscles. Four sensors
(ID: 11, 12, 13, 14) were attached to the lower part of the
upper body along the muscles at the spine, and the other Fig. 4. (a) Locations and IDs of the sensors on the sensing suit. (b) The front
four sensors (ID: 07, 08, 09, 10) were attached to the upper and back of the prototype.
part of the back. In addition, the motion of the shoulder and
the upper arm are more complex than the other parts due to
the shoulder joints that have five DOFs: roll, pitch, yaw, and Fig. 5(b) shows the measured data when the knee joint was
two-DOF translations in the sagittal plane. We considered the repeatedly in flexion and extension, and Fig. 5(c) shows the
directions of the muscle fibers of the three muscles that move relationship between the knee joint angles and the measured
the shoulder: trapezius (ID: 03, 04), pectoralis major (ID: 05, signals from the soft sensor. The result shows nonlinearity in
06), and back of deltoideus (ID: 07, 08). response and hysteretic loops during flexion and extension. It
Fig. 4(b) shows the complete prototype of the soft wearable was already observed that the nonlinearity was caused by the
sensing suit. As a garment base, we used a flexible spandex unwanted pressure when the sensor was directly placed on a
suit with a Velcro-friendly surface for easy attachment and bony surface [5], as well as by the hysteresis of the sensor
detachment of the soft sensors as well as the optical tracking itself [22].
markers for calibration. Fig. 6 shows the output signals from the spine and the
shoulder sensors during a windmill motion. Although the
two sensors in each joint were symmetrically positioned, the
C. Limitations of the Sensing Suit signals showed difference in magnitude, noise, and pattern.
In order to understand the nonlinear characteristic of the This is due to the aforementioned issues of soft sensors when
soft sensor in motion, we attached a soft sensor to the knee combined with a suit, such as alignment, anchoring, slippage
joint (Fig. 5(a)) and measured the sensor signals. Reference of the sensors and deformation of the human body [10], which
joint angle values were also collected from a motion capture consequently lead us to think about implementing a deep
system simultaneously. neural network to deal with the issues more effectively.
Fig. 7. Schematic representation of data acquisition setup (triangle: local

Fig. 5. Measured data from a soft sensor during eight repetitions of coordinate center, circle: tracking points).
knee joint’s flexion and extension motions. (a) An illustration of the sensor
attachment on a knee joint. (b) Measured sensing data over time. (c) Relation
between the knee joint angle and the measured sensor signal. were measured by an optical motion capture system (Prim13,
OptiTrack) which included five optical cameras and dedi-
(a) (b) cated software [16]. We selected 13 tracking points (an atlas,
0.5 07 11
08 0.47 12 both scapulars, shoulders, elbows, wrists, knees, and ankles)
from the reference motions to reconstruct the predicted pose
Voltage [V]
Voltage [V]
0.4 skeleton (Fig. 7(b)). The hip center was defined as the local
0.42
coordinate center of the proposed model. The data acquisition
0.3 rate was set at 120 Hz for both the soft sensors and the optical
0.37
motion capture system.
0 3 6 9 12 0 3 6 9 12
Time [sec] Time [sec]
B. Data Preprocessing
Fig. 6. Sensor outputs during a windmill motion: (a) Back of deltoideus (ID:
07, 08). (b) Lower side of latissimus dorsi (ID: 11, 12)
At each time step t, data sets from the soft sensors and
the motion capture system were collected as a signal vector
x(t) ∈ RS and a position vector y (t) ∈ R3M , respectively:
IV. C ALIBRATION U SING DFM-N ET (t) (t) (t)
x(t) = {x1 , x2 , . . . xS } (2)
The aim of the sensor calibration is to find a calibration
(t) (t) (t)
model F in y (t) = {y1 , y2 , . . . , yM |ym ∈ R3 } (3)
ŷ = F (x|Ω) (1)
where S denotes the number of sensors, M denotes the number
where Ω is the calibration parameters. The model F predicts of tracking points, and m denotes the index of tracking points.
the state ŷ from the sensor output x. In this paper, we define In addition, a sequence of the sensor outputs x(t−n:t) ∈ Rn×S
a new calibration model, DFM-Net, and learn Ω through is used so that the sequential phenomenon of the human
machine learning. In the training step, our model learns Ω motion is included:
using a training data set consisting of a pair of sensor output
x(t−n:t) = {x(t−n) , x(t−n+1) , . . . , x(t) } (4)
x and optical motion capture data y. After training, our model
should be able to predict the current motion of the wearer ŷ where n denotes the time window. No preprocessing proce-
from x using the trained model. dures, such as a low-pass or band-pass filter, were used in this
process.
A. Measurement Setup for Calibration
The environment for acquiring calibration data sets is shown C. DFM-Net: Deep Full-body Motion Network
in Fig. 7. To acquire training data sets, the user wore the The architecture of DFM-Net for the soft wearable sensing
sensing suit and stood in a 4 m × 4 m calibration space, suit is shown in Fig. 8. The model is comprised of two com-
and then made calibration motions. The resistance changes ponents: a sequence encoder network (SEN) and a kinematic
of the 20 soft sensors were then measured by the acquisition decoder network (KDN). The network flow in our model is
circuit and DAQ. At the same time, the reference motions as follows. First, a feature vector r(t) which represents the
Fig. 8. Architecture of DFM-Net: x(t−n:t) is a sequence of the sensor output. r(t) is a temporal feature vector, and ŷ (t) is a predicted position of the
tracking points.
tion operation by . At the beginning, each LSTM cell takes

information as follows: the input of the LSTM cell x(t) , the
previous hidden state ht−1 ∈ Rk , and the previous cell state
c(t−1) ∈ Rk , where k is the dimension of the hidden and cell
states. The cell first chooses what information is accepted from
the previous cell state c(t−1) by the forget gate unit:

f (t) = σ Uf x(t) + Wf h(t−1) + bf (5)
where U ∈ Rk×k , W ∈ Rk×S , and b ∈ Rk are the input

weights, recurrent weights, and biases, respectively. The next
step decides what information is taken from the input vector
x(t) by the input gate unit i(t) and activation unit a(t) :

i(t) = σ Ui x(t) + Wi h(t−1) + bi (6)

a(t) = tanh Ua x(t) + Wa h(t−1) + ba (7)
Then, the cell state c(t) at the current time step t is derived
from the above equations as follows:
c(t) = f (t) c(t−1) + i(t) a(t) (8)
(t)
After that, the new hidden state h is obtained from the
current cell state c(t) and output gate unit as follows:

Fig. 9. (a) LSTM cell in the network model, where σ(·) is the sigmoid o(t) = σ Uo x(t) + Wo h(t−1) + bo (9)
activation function, tanh(·) is the hyperbolic tangent function, and is the
element-wise multiplication operation. (b) Unfolded n sequential structure of h(t) = o(t) tanh(c(t) ) (10)
the sequence encoder network.
Fig. 9(b) illustrates the unfolded n sequential structure of the
SEN. In each time step t, the sequential sensor outputs x(t−n:t)
sequential phenomenon of the sensor outputs is extracted from pass through the two LSTM layers to extract the temporal
the input data x(t−n:t) using the SEN. Then, the KDN receives information h(t) from x(t−n:t) . Finally, the last sensor output
r(t) and predicts the current position of tracking points ŷ (t) . x(t) and hidden state form the SEN h(t) are concatenated into
In the training step, the pair of collected data sets, x(t−n:t) a feature vector r(t) ∈ Rs+k to represent both current and
and y (t) , are used to train the DFM-Net model. After training, temporal features at once.
the model only observes x(t−n:t) to predict ŷ (t) . Finally, the 2) Kinematic Decoder Network: Now we present a model-
motion skeleton is created from the predicted position at free approach based on a deep feed-forward network to predict
tracking points ŷ (t) . the position of the tracking points y (t) from the feature vector
1) Sequence Encoder Network: The SEN is based on long r(t) (Fig. 10). Our network had six fully-connected neural
short-term memory networks (LSTMs), which are commonly network (FCNN) layers. Each layer is modeled as in the
used as deep learning techniques for temporal sequence anal- following equation:
ysis [25]. The LSTMs can memorize previous inputs and f(i) (ri ; wi , bi ) = riT wi + bi (11)
use the memory to predict sequential outputs by recurrent
where ri is an input vector of the i-th layer, wi , bi are weights
connections in hidden units, which store temporal information.
and biases. As an activation function, a rectified linear unit
Fig. 9(a) shows an operation flow of each LSTM cell. We
(ReLU) is used at each layer except the last one [26].
denote the sigmoid activation function by σ(·), the hyperbolic
tangent function by tanh(·), and the element-wise multiplica- ReLU(r) = max{0, r} (12)
TABLE II
N ETWORK MODELS AND PARAMETERS OF DFM-N ET.
Component Layer index Model Parameters

SEN 1 LSTM Hidden: 128, Dropout: 0.5
2 LSTM Hidden: 128
KDN 1 FCNN In: 148, Out: 128, ReLU
2 FCNN In: 128, Out: 128, ReLU
6 FCNN In: 128, Out: 39
are combined to represent various motions using one trained

model.
A healthy male, 1.79 m in height, was recruited as a subject.
The action space was a virtual box of 1.31 m × 1.61 m ×
2.09 m (height).
We compared our method with linear regressions
(LRs) [31], which have been used for tracking 3-D
Fig. 10. Kinematic decoder network: the model predicts the location of each ankle motions [6]. The objective of LRs is finding the
(t)
tracking points ŷm ∈ R3 from the input feature vector r(t) . coefficients W ∈ R(Sλ+1)×3M to minimize the residual sum
of squares between the coordinates of the tracking points
Y ∈ RN ×3M and the predictions XW from the sensor
From the above two equations, we derive the i-th layer output outputs X ∈ RN ×(Sλ+1) as follows:
for the next layer as follows:
min ||XW − Y ||22
ri+1 = max{0, riT wi + bi } (13) W
where λ is the polynomial order of the LR model. In the

The output vector ŷ ∈ R3M , from the last layer f(out) ,
experimental results, we only showed a first (LR1) and a
represents the 3-D coordinates of M tracking points.
second (LR2) order polynomial linear-regressions, since the
3) Implementation Details: We built our model using the
calibration accuracies after third order were lower than the
PyTorch deep learning framework [27]. The number of sensors
second order using the given training data sets [31]. We used
S was 20, and the number of tracking points M was 13.
the scikit-learn machine learning library to implement these
At each time step t, we observed the past 1 seconds, thus
comparison methods [32]. To measure the performance of the
the input window size n was 120 frames. The dimension of
proposed method, we used a RMSE as follows:
the hidden state in the SEN, k, was 128. Dropout [28] was v
only used in the first LSTM layer in SEN to prevent over- u
u 1 X N X M X 3
fitting, and the dropout rate was 0.5. In KDN, the first-layer’s RM SE(ŷ, y) = t (ŷi,m,d − yi,m,d )2
input dimension was 148, the last-layer’s output dimension 3N M i m
d
was 3 × 13, and the others were 128 (See Table II.). The
In our experiment, our model required relatively short
Glorot initialization algorithm [29] was used to initialize the
time to train the model and predict the position of body
parameters in the DFM-Net model. Adam [30] was used as
parts. The training process took no more than 75 seconds
our optimization algorithm, and the learning rate was 0.001.
using a graphics processing unit (GPU) (GeForce GTX 1080,
As the cost function, we used a mean square error (MSE) loss
NVIDIA). Furthermore, the prediction took 0.21 milliseconds
defined as follows:
on average, which is faster than the sensor data acquisition rate
N M 3
1 XXX (120 Hz), thus enabling our method to predict the full-body
M SE(ŷ, y) = (ŷi,m,d − yi,m,d )2 motion in real time.
3N M i m
d
where N is the number of samples in the test data set, d is

A. Result of Full-body Motion Tracking
the dimension index of the tracking points. In training, the
number of epochs was 30, and the mini-batch size was 500. Table III demonstrates the performance of our method and
the comparison with the other calibration models. The length
of each training motion was 63 seconds, and the method was
V. E XPERIMENTAL R ESULTS
tested using four test sets: SQ, WM, BR, and a merger of
For evaluation, we captured three types of activity data sets: all test data sets. As can be seen in the table, the overall
squat (SQ), bend and reach (BR), and windmill (WM). In each error is only 29.5 mm, while the worst-case (WM) is just
data set, the training set size was 7,560 frames (63 seconds), 38.7 mm. From this result, we confirm that our sensing suit
and the test set size was 1,560 frames (13 seconds). In training, and DFM-Net calibration model are accurate enough to track
the model learned a dataset in which these three training sets full-body motions, considering the action space of the human
: Ground Truth : Predic!on

Squat (SQ)
Bend and Reach (BR)
Windmill (WM)
Time [s] 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.5 5.5
Fig. 11. Motion flow of reconstructed skeleton using the soft wearable sensing suit and the proposed calibration method, DFM-Net. The motions were
captured every 60th frame (0.5 seconds).
TABLE III TABLE IV

E XPERIMENTAL RESULTS OF FULL - BODY MOTION TRACKING AND E XPERIMENTAL RESULT OF THE SELECTED SEGMENT ’ S POSITIONS
COMPARISON WITH THE OTHER CALIBRATION METHODS .
RMSE (mm) Atlas Shoulder Elbow Wrist Knee Ankle
RMSE (mm) Overall SQ BR WM SQ 11.6 30.9 13.5 27.3 28.2 13.1
DFM-Net(ours) 29.5 21.9 25.2 38.7 BR 23.2 31.5 24.3 29.0 30.2 10.8
LR1 54.6 40.9 45.0 72.4 WM 33.8 47.1 34.7 43.5 53.4 17.1
LR2 40.0 21.4 26.8 60.3
Moreover, the overall RMSE of the proposed method rapidly

body. In comparison with the other calibration methods, DFM- decreased and become as small as almost 50 mm only within
Net showed the best performance in all tests except for the SQ the first (Fig. 12(a)). Especially from 10 to 30 seconds, LR2,
motion. However, the difference between DFM-Net and LR2 which showed the next best performance in overall motions,
(the second best) is negligible (0.5 mm) in the SQ. It can be shows the worst performance due to overfitting of the noise
inferred from this performance gap that the linear combination in the model with the calibration data set [33].
models have a limitation to represent the complexity of human The drift effect is another issue in sensor calibrations.
motions. Fig. 13 shows the changes in RMSE of the prediction results
Notably, the DFM-Net shows outstanding performance in over time and their linear fitting lines. If the calibration result
WM (21.6 mm) compared with the other methods. The major has a drift, the gradients of the linear fitting lines must be
difference between the WM and the other motions is the positive. However, they are almost zero as we can see in
complexity of the spine movement. The WM has both twisting Fig. 13. Therefore, we can say that the drifts in the sensor
and bending of the spine, but the other motions do not signals from the suit are negligible at least for the three test
have twisting. This result supports that the DFM-Net better motions.
represent the rotation of the upper body.
To evaluate the tracking quality of sequential body motions,
the full-body skeletons were reconstructed from the predicted B. Detailed Observation of the Tracking Results
tracking points ŷ (t) every 60th frame (0.5 seconds). From To present a detailed analysis of full-body motion tracking
Fig. 11, we can observe that the reconstructed skeleton follows results, we categorized the tracking points into target segment
in proximity to the ground truth from the optical motion sets: atlas, shoulder, elbow, wrist, knee, and ankle. Fig. 14
tracking system. illustrates the test results and Table IV summarizes the results.
In practice, the size of the calibration data set is an important In Fig. 14, the cumulative distribution (y-axis) represents how
factor of wearable sensors. Collecting the calibration data set many tracking points are accurately predicted within the given
is hard and time-consuming. To test how many data sets RMSEs (x-axis).
are needed to calibrate our sensing suit, we cropped each Figs. 14(a) to 14(d) illustrate the calibration quality of the
training motion data set with a certain set size and merged complex spine motions, such as bending and twisting. The
into one set. Fig. 12 presents the RMSE of the test result accuracy of the atlas represents the calibration quality of the
according to the size of each training motion. As seen in spine’s bending motion. In the squat motion, the position of
Fig. 12, the proposed method requires fewer data set for the the atlas is just up and down. In the BR, it moves in both
same calibration qualities compared with the other methods. the upper and forward directions because the spine is bent
(a) (b)
DFM-Net DFM-Net
300 LR1 300 LR1
LR2 LR2
250 250
200 200
RMSE [mm]
RMSE [mm]
150 150
100 100
50 50
0 0
0 10 20 30 40 50 60 0 10 20 30 40 50 60
Calibration data size [sec] Calibration data size [sec]
(c) (d)
DFM-Net DFM-Net
300 LR1 300 LR1
LR2 LR2
250 250
200 200
RMSE [mm]
RMSE [mm]
150 150
100 100
50 50
0 0
0 10 20 30 40 50 60 0 10 20 30 40 50 60
Calibration data size [sec] Calibration data size [sec]
Fig. 12. Comparison of the calibration results with varied length of calibration data sets between (DFM-Net) the proposed calibration method, (LR1) first
order polynomial linear-regression, and (LR2) second order polynomial linear-regression. (a) overall, (b) SQ, (c) BR, and (d) WM.
positions. In the SQ, the upper arms move from the bottom
to a forward-up position. In the BR, the upper arms move
up and down and the elbows are bending. In the WM, the
shoulders and elbow joints are fixed. As can be seen in Fig.
14(c), which represent the prediction accuracy of the shoulder
joint motion, only a quarter of the elbow predictions gave
RMSEs over 40.4 mm, the worst-case motion (windmill). In
addition, the RMSE of the wrist is less than 29 mm except
the WM motion (Table IV).
Fig. 13. The changes in RMSEs of the prediction results over the time
Figs. 14(e) and 14(f) show the motions of the thighs
(dashed) and their linear fitting lines (solid). SQ: 0.13t + 20, BR: 0.30t + 22, (position of knee) and shins (position of ankle). In Fig. 14(e),
WM: −0.29t + 38 the calibration quality of the twisting motion (windmill) was
slightly lower than the others, but only a quarter of the
predictions gave RMSEs over 62.6 mm.
forwards. In the WM, the side direction is added. As we can
see in Fig. 14(a), only a quarter of the predictions are over the
40 mm. C. Model Analysis
The prediction quality of the twisting motions of the spine In order to understand the internal process of the DFM-Net,
can be inferred from the RMSE of the shoulders that is shown we compared the input feature vector x, which is the output of
in Fig. 14(b). The WM, other than the SQ and the BR, has the the suit, and the internal feature vector r, which is the output
spine’s twisting motion. The twisting is more complicated to of the SEN. We projected these two feature vectors from high-
measure than the bending because the sensor stretches along dimensional space to a 2-D plane by t-distributed stochastic
multiple axes. the maximum gap between the WM and the neighbor embedding (t-SNE) [34] to visualize and compare the
others is less than 15.2 mm, and the cumulative distribution characteristics of the two feature vectors. A distance between
is 0.75, as shown in Fig. 14(b). This result supports that our two points in the 2-D plane represents the similarity of the
method can predict the spine’s twisting with almost the same features at these points (less distance means high similarity).
accuracy as that of bending. Fig. 15 illustrates the temporal flow of the x (a), and r (b)
Figs. 14(c) and 14(d) present the prediction result of the arm for each motion: SQ (red), BR (blue), and WM (green). The
1.0 (a) 1.0 (b) 1.0 (c)
0.8 0.8 0.8
Cumulative distribution
0.6 0.6 0.6
0.4 0.4 0.4
0.2 SQ 0.2 SQ 0.2 SQ

BR BR BR
WM WM WM
0.0 0.0 0.0
0 25 50 75 100.0 0 25 50 75 100.0 0 25 50 75 100.0
RMSE [mm] RMSE [mm] RMSE [mm]
1.0 (d) 1.0 (e) 1.0 (f)
0.8 0.8 0.8

0.6 0.6 0.6
0.4 0.4 0.4
0.2 SQ 0.2 SQ 0.2 SQ

BR BR BR
WM WM WM
0.0 0.0 0.0
0 25 50 75 100.0 0 25 50 75 100.0 0 25 50 75 100.0
RMSE [mm] RMSE [mm] RMSE [mm]
Fig. 14. Cumulative distribution of the position tracking errors: (a) Atlas, (b) Shoulder, (c) Elbow, (d) Wrist, (e) Knee, (f) Ankle.
Fig. 15. Temporal flows of the features and their reference motions from the optical motion tracking system. (a) The soft sensor output features of soft
wearable sensing suit x. (b) The extracted features using the SEN r from the x.
feature flow of all motions in both x and r is in a cyclic pattern Fig. 15(b) demonstrates that the SEN is able to extract the
which represents the repetition of the activity. feature that best represents the temporal sequence and the
current motion from the output signals of the soft wearable
The feature flows in the feature vector x is highly disordered
motion sensing suit. Although all three features are similar,
which is likely due to the nonlinear property of the multiple
the feature flows in feature vector r are more distinguishable
soft sensors. For example, feature flows of similar SQ motions
than the other two. Thus, it implies that our model is more
in the shaded area 1 are highly different although they were
robust even with the presence of the anomalies, such as noise.
expected to be constantly alike. On the other hand, most
feature points from the motions in the shaded area 2 are In conclusion, Fig. 15 indicates that our deep learning model
located very close to one another although they are from successfully learned how to generalize the characteristics of
different sequential poses. The other two motions also show the sensor data because its representational capacity was suffi-
similar results. cient to capture the nonlinear and the hysteretic characteristics
of the soft sensing suit for full-body motion tracking. Although gait measurement,” International Journal of Robotics
linear regressions are able to calibrate the multiple sensors Research, vol. 33, no. 14, pp. 1748–1764, 2014.
in our experiment, their hypothesis space were unsuitable for [6] M. Totaro, T. Poliero, A. Mondini, C. Lucarotti,
modeling the intractable properties due to the imperfection of G. Cairoli, J. Ortiz, and L. Beccai, “Soft smart garments
its modeling capability. Thus, it was weak in the untrained for lower limb joint position analysis,” Sensors, vol. 17,
anomalies, making clear the reason that our deep learning no. 10, 2017.
model showed better performance than the others. [7] H. Lee, J. Cho, and J. Kim, “Printable skin adhesive
stretch sensor for measuring multi-axis human joint
VI. C ONCLUSION angles,” in 2016 IEEE International Conference on
Robotics and Automation (ICRA), May 2016, pp. 4975–
In this study, we developed a calibration method, DFM-
4980.
Net, for a soft wearable sensing suit for full-body motion
[8] J. B. Chossat, Y. Tao, V. Duchaine, and Y.-L. Park,
tracking. While previous research has mostly focused on the
“Wearable soft artificial skin for hand motion detection
motion of a single joint, we measured full-body motions using
with embedded microfluidic strain sensing,” in 2015
multiple soft sensors. Furthermore, we defined a calibration
IEEE International Conference on Robotics and Automa-
method, DFM-Net, using deep neural networks to overcome
tion (ICRA), May 2015, pp. 2568–2573.
the obstacles of the soft sensor calibration and challenges in
[9] J. T. Muth, D. M. Vogt, R. L. Truby, Y. Mengüç, D. B.
human kinematic modeling.
Kolesky, R. J. Wood, and J. A. Lewis, “Embedded
To evaluate our method, three types of athletic motions
3d printing of strain sensors within highly stretchable
were tested, and the RMSE of overall motions was 29.5 mm,
elastomers,” Advanced Materials, vol. 26, no. 36, pp.
while that of the worst-case motion (WM) was only 38.7 mm.
6307–6312, 2014.
The experimental results showed that the proposed method
[10] C. R. Walker and I. A. Anderson, “Monitoring diver kine-
provided a higher accuracy even with a small-sized calibration
matics with dielectric elastomer sensors,” in Electroactive
data set than the other methods.
Polymer Actuators and Devices (EAPAD) 2017, vol.
In addition, the proposed model was able to extract the
10163. International Society for Optics and Photonics,
features that well represent the body poses in a motion
2017, p. 1016307.
from the nonlinear raw outputs from the multiple sensors as
[11] Y.-L. Park, D. Tepayotl-Ramirez, R. J. Wood, and C. Ma-
described in the model analysis.
jidi, “Influence of cross-sectional geometry on the sen-
Our method showed the high accuracy in pose prediction,
sitivity and hysteresis of liquid-phase electronic pressure
but there is a remaining challenge of a need for calibration
sensors,” Applied Physics Letters, vol. 101, no. 19, p.
every time when a user wears the sensing suit. One area of the
191904, 2012.
future work will be development of an improved calibration
[12] Y.-L. Park, C. Majidi, R. Kramer, P. Bérard, and R. J.
model that can reuse pre-trained model parameters to simplify
Wood, “Hyperelastic pressure sensing with a liquid-
the calibration procedure.
embedded elastomer,” Journal of Micromechanics and
To the best of our knowledge, this is the first study to track
Microengineering, vol. 20, no. 12, p. 125029, 2010.
the full-body motion using soft strain sensors and deep neural
[13] S. Han, T. Kim, D. Kim, Y.-L. Park, and S. Jo, “Use
networks. We believe that this research will provide a new
of deep learning for characterization of microfluidic soft
research directions in soft robotics.
sensors,” IEEE Robotics and Automation Letters, vol. PP,
Details of our implementation and source codes are avail-
no. 99, pp. 1–1, 2018.
able at https://github.com/KAIST-NMAIL/DFMNET.
[14] P. Rowe, C. Myles, S. Hillmann, and M. Hazlewood,
“Validation of flexible electrogoniometry as a measure
R EFERENCES of joint kinematics,” Physiotherapy, vol. 87, no. 9, pp.
[1] H. Zhou and H. Hu, “Human motion tracking for reha- 479–488, 2001.
bilitation—a survey,” Biomedical Signal Processing and [15] “Biometrics ltd.” [Online]. Available: http://www.
Control, vol. 3, no. 1, pp. 1–18, 2008. biometricsltd.com
[2] T. B. Moeslund, A. Hilton, and V. Krüger, “A survey [16] “Optitrack.” [Online]. Available: http://optitrack.com
of advances in vision-based human motion capture and [17] “Vicon.” [Online]. Available: http://vicon.com
analysis,” Computer vision and image understanding, [18] D. T.-P. Fong and Y.-Y. Chan, “The use of wearable in-
vol. 104, no. 2, pp. 90–126, 2006. ertial motion sensors in human lower limb biomechanics
[3] J. C. Chan, H. Leung, J. K. Tang, and T. Komura, studies: a systematic review,” Sensors, vol. 10, no. 12,
“A virtual reality dance training system using motion pp. 11 556–11 565, 2010.
capture technology,” IEEE Transactions on Learning [19] H. Fourati, N. Manamanni, L. Afilal, and Y. Handrich,
Technologies, vol. 4, no. 2, pp. 187–195, 2011. “Complementary observer for body segments motion
[4] “Kinect for Xbox One.” [Online]. Available: https:// capturing by inertial and magnetic sensors,” IEEE/ASME
www.xbox.com/en-US/xbox-one/accessories/kinect transactions on Mechatronics, vol. 19, no. 1, pp. 149–
[5] Y. Mengüç, Y.-L. Park, H. Pei, D. Vogt, P. M. Aubin, 157, 2014.
E. Winchell, L. Fluke, L. Stirling, R. J. Wood, and [20] D. Roetenberg, H. Luinge, and P. Slycke, “Xsens mvn:
C. J. Walsh, “Wearable soft sensing suit for human full 6dof human motion tracking using miniature inertial
sensors,” Xsens Motion Technologies BV, Tech. Rep, Dooyoung Kim received the B.S degree in Com-
2009. puter Science from Republic of Korea Naval
Academy, Changwon, South Korea, in 2007 and the
[21] S. Choppin and J. Wheat, “The potential of the microsoft M.S degree in Computer Engineering from Seoul
kinect in sports analysis and biomechanics,” Sports Tech- National University, Seoul, South Korea. Since 2015,
nology, vol. 6, no. 2, pp. 78–85, 2013. he has been working towards his Ph.D. degree in
Computer Science at KAIST , Daejeon, South Ko-
[22] Y.-L. Park, B. R. Chen, and R. J. Wood, “Design rea. His current research interests include machine
and fabrication of soft artificial skin using embedded learning, soft robots, and intelligent combat systems.
microchannels and liquid conductors,” IEEE Sensors
Journal, vol. 12, no. 8, pp. 2711–2718, Aug 2012.
[23] D. M. Vogt, Y.-L. Park, and R. J. Wood, “Design and
characterization of a soft multi-axis force sensor using
Junghan Kwon received the B.S and M.S degree
embedded microfluidic channels,” IEEE sensors Journal, in Naval Architecture and Ocean Engineering from
vol. 13, no. 10, pp. 4056–4064, 2013. Seoul National University, Seoul, Korea, in 2008 and
[24] “National Instruments USB-6259 DAQ.” [Online]. 2010, respectively. Since 2017, he has been working
toward Ph.D. degree in Mechanical Engineering at
Available: http://sine.ni.com/nips/cds/view/p/lang/en/nid/ Seoul National University. His research interests
202803 include soft sensors, soft actuators, soft wearable
[25] S. Hochreiter and J. Schmidhuber, “Long short-term robots, and rehabilitation devices.
memory,” Neural computation, vol. 9, no. 8, pp. 1735–
1780, 1997.
[26] V. Nair and G. E. Hinton, “Rectified linear units im-
prove restricted boltzmann machines,” in Proceedings of
the 27th international conference on machine learning Seunghyun Han received the B.S. degree in Com-
(ICML-10), 2010, pp. 807–814. puter Engineering from Sogang University, Seoul,
[27] A. Paszke, S. Gross, S. Chintala, and G. Chanan, South Korea, in 2016. Since 2016, he has been
working toward M.S. degree in Computer Science at
“Pytorch,” 2017. [Online]. Available: https://github.com/ KAIST, Daejeon, South Korea. His current research
pytorch/pytorch interests include machine learning, deep learning and
[28] N. Srivastava, G. E. Hinton, A. Krizhevsky, I. Sutskever, soft robots.
and R. Salakhutdinov, “Dropout: a simple way to prevent
neural networks from overfitting.” Journal of machine
learning research, vol. 15, no. 1, pp. 1929–1958, 2014.
[29] X. Glorot and Y. Bengio, “Understanding the difficulty
of training deep feedforward neural networks,” in Pro-
ceedings of the Thirteenth International Conference on Yong-Lae Park received the M.S. and Ph.D. degrees
in Mechanical Engineering from Stanford Univer-
Artificial Intelligence and Statistics, 2010, pp. 249–256. sity, Stanford, CA, USA, in 2005 and 2010, re-
[30] D. Kingma and J. Ba, “Adam: A method for stochastic spectively. He is currently an Assistant Professor in
optimization,” arXiv preprint arXiv:1412.6980, 2014. the Department of Mechanical Engineering, Seoul
National University, Seoul, Korea. Prior to joining
[31] D. C. Montgomery, E. A. Peck, and G. G. Vining, SNU, he was an Assistant Professor in the Robotics
Introduction to linear regression analysis. John Wiley Institute, Carnegie Mellon University, Pittsburgh,
& Sons, 2012, vol. 821. PA, USA (2013-2017). His current research interests
include soft robots, artificial skin sensors and muscle
[32] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, actuators, and soft wearable robots and devices.
B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer,
R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cour-
napeau, M. Brucher, M. Perrot, and E. Duchesnay,
“Scikit-learn: Machine learning in Python,” Journal of Sungho Jo (M’09) received the B.S. degree from the
Machine Learning Research, vol. 12, pp. 2825–2830, School of Mechanical and Aerospace Engineering,
Seoul National University, Seoul, South Korea, in
2011. 1999, and the M.S. degree in mechanical engineering
[33] K. P. Murphy, Machine Learning: A Probabilistic Per- and the Ph.D. degree in electrical engineering and
spective. MIT Press, 2012. computer science from the Massachusetts Institute
of Technology (MIT), Cambridge, MA, USA, in
[34] L. v. d. Maaten and G. Hinton, “Visualizing data using t- 2001 and 2006, respectively. From 2006 to 2007, he
sne,” Journal of Machine Learning Research, vol. 9, no. was a Post-Doctoral Researcher with the MIT Media
Nov, pp. 2579–2605, 2008. Lab. Since 2007, he has been with the School of
Computing, KAIST, Daejeon, South Korea, where
he is currently an Associate Professor. His current research interests include
robotic intelligence, brain–machine interface, and wearable computing.
View publication stats

Tmech 2018

Uploaded by

Copyright:

Available Formats

Tmech 2018

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Tmech 2018

Uploaded by

Copyright:

Available Formats

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

Article in IEEE/ASME Transactions on Mechatronics · October 2018

Dooyoung Kim Junghan Kwon

SEE PROFILE SEE PROFILE

The user has requested enhancement of the downloaded file.

Deep Full-Body Motion Network (DFM-Net)

Abstract—Soft sensors are becoming more popular in wear-

H UMAN motion tracking has been widely used in various

take care of nonlinearity and hysteretic responses of the soft

II. R ELATED W ORK

Fig. 7. Schematic representation of data acquisition setup (triangle: local

tion operation by . At the beginning, each LSTM cell takes

where U ∈ Rk×k , W ∈ Rk×S , and b ∈ Rk are the input

Component Layer index Model Parameters

are combined to represent various motions using one trained

where λ is the polynomial order of the LR model. In the

where N is the number of samples in the test data set, d is

: Ground Truth : Predic!on

Bend and Reach (BR)

TABLE III TABLE IV

Moreover, the overall RMSE of the proposed method rapidly

1.0 (a) 1.0 (b) 1.0 (c)

0.8 0.8 0.8

0.4 0.4 0.4

0.2 SQ 0.2 SQ 0.2 SQ

0.8 0.8 0.8

0.4 0.4 0.4

0.2 SQ 0.2 SQ 0.2 SQ

View publication stats

You might also like