1. Introduction
Fall accidents in the construction industry have been studied over several decades and have been identified as a common hazard and the leading cause of fatalities. In Taiwan, the Council of Labor Affairs reported that the construction industry accounted for 43–53% of all occupational accidental deaths, and fall accidents alone accounted for 23–33% of all accidental deaths [
1]. The U.S. Bureau of Labor Statistics [
2] reported that in 2021, nearly 1 in 5 workplace deaths occurred in the construction industry. Just over one-third of all construction deaths were caused by falls, slips, and trips. Of these, almost all were caused by falls to a lower level. The construction industry accounted for 46.2% of all fatal falls, slips, and trips in 2021.
As sensor hardware and artificial intelligence rapidly advance today, sensors attachable to the human body have been used to capture human motion. Motion-capture technologies typically recognize human actions by capturing the sensed action data of a target because computer systems and humans do not understand raw data without further analysis. The technologies used to recognize the actions of construction workers can be classified into vision- and non-vision-based technologies. Vision-based technologies convert raw video images of an observed target person into computerized data that can be understood by the designed system. They include marker-based (e.g., optical system) and marker-less (e.g., RGB-depth cameras) technologies. In general, compared to marker-less systems, marker-based motion-capture systems are more expensive, require complex setups, and interfere more with workers’ activities. However, they offer higher accuracy than marker-less systems and avoid occlusion problems in some cases.
Non-vision-based technologies identify human actions without visual perception of a target person, and they usually use inertial sensors, such as accelerometers, gyroscopes, and pressure sensors, to sense the actions of the target person. Accelerometers, occasionally coupled with gyroscopes, are widely used to track human actions in the construction industry because they are mobile, wearable, and suitable for the complex working environments of construction sites. In addition, they eliminate the problems of visual obscuration.
Meanwhile, inertial measurement units (IMU) have been used in different application domains, such as physical activity monitoring for individual fitness [
3] or sports performance [
4]. Jones et al. [
5] used the accelerometer data of 85,670 participants from the UK Biobank and performed a genome-wide association study of eight derived sleep traits representing sleep quality, quantity, and timing. Similarly, accelerometers were also used to analyze human gait patterns for detecting walking abnormalities to help assess musculoskeletal conditions [
6] and improve rehabilitation progress [
7]. Arias et al. [
8] characterized the number of minutes of moderate and vigorous physical activity at work and outside of work during 7 consecutive days by studying the data of 55 commercial construction workers.
In addition, IMUs have been used to recognize various activities at construction sites. For example, Sanhudo et al. [
9] used wearable accelerometers and supervised machine learning algorithms to classify 10 different activities (e.g., gearing up, hammering, masonry, painting, sawing, screwing, and sitting) in a simulated laboratory environment. A few researchers have focused on detecting awkward postures that contribute to work-related musculoskeletal disorders. For example, Nath et al. [
10] used built-in smartphone sensors to unobtrusively monitor workers’ bodily postures and autonomously identify potential work-related ergonomic risks. Arias et al. [
8] used accelerometers to monitor construction workers’ activities, but they focused on classifying moderate and vigorous physical activity at work and outside of work.
Finally, IMUs have been used to detect falls, trip accidents, or other portents that possibly contribute to accidents at construction sites. For example, Dzeng et al. [
11] were the first to study the feasibility of using multiple accelerometers and gyroscopes to detect falls and fall portents associated with tiling activities without unnecessary movements in limited scaffold spaces. Fang and Dzeng [
12] continued development work on a smartphone-based personal safety monitoring system. This system received external signals wirelessly from motion sensors attached to a vest at the chest position, waist, and arm, as well as a set of brain wave sensors inside a helmet, and transmitted these signals to a monitoring server for further analysis. They proposed an algorithm and experimented to detect falls, trips, and portents (e.g., heavy footsteps, sudden knee movements, sudden swaying, abrupt body reflexes) by considering four different physiological statuses of the subjects (i.e., sleepiness, fatigue, normal, and inebriation). Achour et al. [
13] developed an accelerometer-data-based algorithm to detect worker falls with a focus on reducing power consumption by using sensor timers.
In summary, several IMU-based systems for detecting fall-related accidents have been developed and have yielded satisfactory accuracy. However, in almost all of the related studies, experiments were conducted in well-controlled simulated laboratory environments, and these experiments involved limited worker activities. Unlike production factories, the working environments of construction workers are far more complex with unpaved, uneven grounds and many temporary or unfinished structures (e.g., scaffolding, rough floor). These noises may affect the detection performance of IMU-based systems tremendously. Similar to other researchers, the corresponding author’s team developed an IMU-based system to detect falls and related portents for construction workers and achieved satisfactory performance in a simulated laboratory environment. However, when the system was applied to real construction projects, it produced many false alarms, which made it impractical for use in such projects.
In the present work, first, two versions of previously developed algorithms for detecting falls and stumbles of construction workers are reviewed. Then, the findings obtained by applying one of the existing systems to a building and three real riverbank conservation projects are presented. Finally, the paper describes how the authors solve the problem of false alarm generation that occurs when the system is applied to real projects by redesigning the experiment and redeveloping the system based on the gated recurrent unit (GRU) deep machine learning model.
3. Methodologies
Thanks to the advancement of graphical processing units in recent years, it has become possible to use deep neural networks for solving complex classification problems in diverse domains. Among them, convolution neural networks (CNN) and recurrent neural networks (RNN) are two of the most commonly used models. While CNN has been mainly used for image recognition, RNN is suitable for processing time- or space-dependent data series, such as natural language processing. In this study, RNN is used to classify the time series of IMU data for identifying alarm events such as stumble, fall, and coma.
Despite the widespread use and promising performance of RNN, the conventional RNN cannot capture long-term memory, and it is affected by the vanishing gradient problem during the process of propagation through various network layers. Hochreiter and Schmidhuber [
14] proposed long short-term memory (LSTM) to better retain important long-term memories during the propagation process. In this study, intensive sampling of IMU readings led to the accumulation of a large amount of data. For instance, an event with a 2 s window may generate 20 time units of data series, and retaining these data at the initial time unit could be essential for target classification. Nevertheless, while LSTM can retain long memory, the initial data trial in this study was time consuming owing to the large volume of data generated by high-frequency sampling. Chung et al. [
15] proposed the gated RNN by using the GRU to improve propagation efficiency and reduce memory loading while retaining long-term memory. Therefore, in the present study, the GRU version of LSTM is employed.
3.1. Problem Statement
Given the accelerometer, gyroscope, and magnetometer data generated by the built-in IMUs of the smartphones worn by the participating workers, the system generates alarms when the workers stumble, fall, or are in a coma; alternatively, it shows the safe status when an accident does not occur. The objective of the present research is to reduce false alarms while maintaining the same detection accuracy as that of HER-S.
3.2. Research Design
Figure 2 depicts the research process. First, an experiment simulating the construction site environment outside the laboratory is designed, and two sets of IMU data for training and testing purposes are collected experimentally. Then, features are extracted from the training data. These extracted features are labeled with correct answers and fed to the GRU model. After training is complete and the validation criteria are fulfilled, the trained model is considered ready for classifying workers’ movement types. The test data are used to test and evaluate the performance of the model.
3.3. Experiment Design
The objective of this study is to improve the detection accuracy of the system when it is used in real projects. Because it is unsafe and impractical to collect data and observe workers’ unsafe events in actual construction sites without interrupting their work, the project selected for the experiment must at least simulate the environment in which the reported types of false alarms (
Table 1) are triggered easily. The authors identified several types of environments on their university campus, including uneven roads, inclined and abrupt slopes, stairs, and straight and curved motorcycle lanes with speed bumpers. The participants were asked to perform the following tasks under surveillance.
Walking on even and uneven roads, climbing four flights of stairs, and walking on inclined slopes in and around the department building.
Walking on an uneven gravel trail and on inclined and abrupt slopes for 1.5 min in the forest located on the campus.
Riding a motorcycle in the motorcycle lanes on the campus, which include straight and curved lanes with bumpers.
Simulating falls by raising a plush doll to which a 1 kg weight and a smartphone are attached, allowing the doll to sway a few times, and then dropping it from the half-story height of a bookshelf, as well as from the one-story height of a stairway.
Simulating a stumble by deliberately falling on an upholstered floor surface in the forward, left-side, right-side, and rear directions.
Simulating progressive coma by resting, sitting down, and going into a simulated coma (i.e., remaining still) for at least 30 s. Simulating abrupt coma by going into coma directly from a standing position.
Resting freely in a sitting position for 2 min.
3.4. Data Preprocessing
Thirty students participated in the experiment. The data series collected from each participant included seven pieces of information: date, time, accelerometer data along the X, Y, and Z axes, pitch angle, and roll angle. These data were additionally labeled with the type of status (i.e., 0, 1, 2, and 3 for safe, fall, stumble, and coma, respectively) based on post-observation of recorded surveillance videos. Useful annotations such as “impact with the ground after fall” that may be helpful for post analysis were attached to the corresponding time frames. These annotations were used only for human analysis, not for machine learning.
The initial amount of data collected was huge and required reduction. The sampling frequency of the sensors was set to 0.1 s. Consequently, each participant generated approximately 5500 data series in the entire experiment. With a total of 30 participants, this resulted in 165,000 data series. Furthermore, each data series included seven pieces of information (e.g., date, time, accelerometer data). This led to a total dataset size of 1,155,000 data points.
Considering the duration of the target events (e.g., fall, stumble), the authors arbitrarily set 2 s (i.e., 20 data series with a 0.1 s sampling rate) as the data sliding window with a stride of 1 for determining the type of an event and the label of the 10th data series as the event’s status. For example, the first event is composed of the 1st to the 20th data series, and the status label of the 10th sequence is the status of this event. The second event is composed of the 2nd to the 21st data series, and the label of the 11th is its status.
3.5. Machine Learning Model
The GRU machine learning model was implemented on the Kaggle platform [
16] by using Python programming language. The model mainly used the
sequential and
keras.layes modules of the
keras.models package. The GRU built herein was composed of five hidden layers (i.e.,
dense layers), two
dropout layers, and one
output layer.
The activation function of the model’s hidden layer was rectified linear unit (
ReLU), which is commonly used as an activation function for hidden layers, along with others such as
sigmoid,
tanh, and
Leaky ReLU. The
sigmoid function is susceptible to the vanishing gradient problem and outputs non-zero-centered output values, which causes all weights to become either positive or negative when all inputs are positive.
Tanh improves upon the non-zero centering issue of
sigmoid, but it does not fully address the vanishing gradient problem. Glorot et al. [
17] proposed
ReLU, which solves the vanishing gradient problem. However,
ReLU is susceptible to neuron death when the input values are negative. Therefore,
Leaky ReLU was developed to address this problem.
ReLU was selected in this study because all the inputs were positive values. In what follows, the GRU-based system developed herein is called GRU-S.
5. Discussion
In general, GRU-S outperformed HER-S, except in detecting comas. A post-review of the recorded videos was performed to identify the situations in which GRU-S tended to make mistakes. Stumbles with obvious kneeling first (i.e., a two-step process instead of continuous stumbling and falling) tended to confuse GRU-S more than other types of stumbles. Among the 124 and 85 false alarms related to stumble and fall, the situations that triggered the highest numbers of alarms were quick or sudden safe movement with abrupt accelerometer movements, such as walking on stairs or uneven surfaces and riding a motorcycle over potholes. Among the 35 and 3 false alarms related to stumble and fall, the situations that triggered the highest numbers of alarms were unsteady walking or abrupt heavy stepping or stomping when walking on uneven or sloping surfaces.
Although GRU-S underperformed HER-S in detecting comas, it triggered fewer false alarms (5 vs. 13). GRU-S produced FPs of coma only when the subjects were resting, while HER-S tended to trigger FPs when the subjects were resting or stopping when riding a motorcycle.
In what follows, the data of the three main types of false alarms are analyzed further, and possible remedial strategies for future improvement of GRU-S are proposed.
5.1. False Identification between Fall and Stumble
Figure 6 shows examples of
SVMa data (i.e., SVM of accelerometer data) for the
fall and
stumble events. Although the two polylines have different magnitudes and time spans, they resemble each other in terms of patterns, which may lead to misidentification. Both
fall from height and
stumble are initiated with a fall, resulting in
SVMa values closer to zero first. Then, when the subjects hit the ground either because of a
fall from height or a
stumble,
SVMa increases abruptly and reaches its peak owing to the counterforce produced by the ground and, finally, follows the patterns corresponding to unsteady movements.
Such misidentifications accounted for less than 5% of the detection results and did not seem to cause problems in the experiment. However, to improve the accuracy of distinguishing a fall from a stumble, more realistic data must be collected, if possible, and the SVM magnitudes of the two situations must be studied. One could also study the SVMa patterns of the following actions after the events. The fall of interest of this study is fall from height, which usually causes severe harm that immobilizes a person at least temporarily. Thus, standing and walking events are unlikely to be detected in the following actions. Conversely, larger movements, standing, or walking patterns are likely to be detected following a stumble. Therefore, observing the difference between the actions after fall and stumble will help to distinguish between them.
5.2. False Identification between Safe and Stumble
GRU-S mis-detected a few safe situations as stumbles, but this did not significantly affect its overall detection accuracy. These mis-detections usually occurred when the subjects walked on uneven sloped surfaces. The authors believe that accurately distinguishing such safe walking situations from the stumble situations based on IMUs alone is difficult because the corresponding inertial data patterns are similar, and the differences between their magnitudes are ambiguous. The study and detection of the differences between the possible actions that follow the two situations might be more practical than detecting the two situations directly. For example, walking on uneven surfaces usually results in the same false stumbles for a longer time, while the stumble pattern of a true stumble lasts only 2–3 s.
5.3. False Identification between Coma and Rest
This type of misidentification was the main problem of GRU-S. Since the
coma data were collected by the subjects simulating the
coma conditions, the authors argue that the system’s underperformance cannot be concluded decisively, and the system could perform better when it encounters a true coma. Because the subjects were not really in a coma, they might have made slight movements invisible to the naked eye but were detectable by IMUs. Therefore, the three misidentifications of coma by GRU-S (
Table 4) could be TNs because the subjects were unable to maintain complete stillness.
Suppose the subjects did maintain complete stillness during the experiments (i.e., all 10
comas were true
comas), and GRU-S did underperform HER-S. The situations that caused GRU-S to fail in identifying
comas were those in which the subjects pretended to faint by gradually sitting down first before entering the coma.
Figure 7 shows comparisons of SVMa and SVMo (SVM of rotation angle) between the fainted-while-sitting and sitting-and-resting poses of a subject over a 4 s timeframe. The resemblance between the two situations makes it difficult to distinguish between them based on inertial data alone. However, the two situations did not resemble each other over longer periods (e.g., 30 s) because the subjects tended to move slightly when resting unless they were asleep. Thus, one way to distinguish the fainted-while-sitting situation from the sitting-and-resting situation is by providing the model with a higher-level decision-making mechanism based on the observation of a longer timeframe. For example, the current system generates a detection label every 2 s. Monitoring a human who has really fainted while sitting for a minute would more likely result in a steady series of 30
coma labels. Meanwhile, monitoring a sitting and resting human would more likely result in a few
safe labels among
coma labels because of slightly unavoidable movements. The difference between the two patterns over a longer timeframe could increase the feasibility of distinguishing between the corresponding situations.
Another approach is to integrate GRU-S and HER-S, make GRU-S responsible for detecting only stumble and fall, and make HER-S responsible for detecting only coma. However, by replacing GRU-S with HER-S for detecting coma, a greater number of false alarms will be generated than when using GRU-S alone. As a solution, either this tradeoff must be accommodated or the threshold of HER-S must be studied further and adjusted.
6. Conclusions
This paper reviewed existing applications of IMUs to detect common accidents of workers at construction sites (i.e., stumble, fall, and coma). The various algorithms reviewed herein performed well in simulated working environments inside laboratories. However, they generated too many false alarms in practical environments because of the complexity of these environments and workers’ behaviors (e.g., walking on uneven sloped surfaces or stairways, or riding motorcycles on site). In this research, an existing algorithm (i.e., hierarchical threshold algorithm) was applied to real construction sites, and the situations that tended to trigger false alarms were identified. Based on the feedback obtained from these environments, the authors redesigned and conducted the aforementioned simulated experiment outside the laboratory, targeting situations prone to false alarms. A new GRU-based system, GRU-S, was developed to reduce the frequencies of various types of false alarms in outdoor environments.
GRU-S outperformed the existing benchmark model, HER-S, with higher sensitivities and fewer false alarms in detecting stumble (100% sensitivity vs. 40%, 3 false alarms vs. 85) and fall (95% sensitivity vs. 65%, 13 false alarms vs. 124). However, it fared poorer than HER-S in detecting coma in terms of sensitivity, but it triggered fewer false alarms (70% sensitivity vs. 100%, 5 false alarms vs. 13). Assuming there were no slight movements that were invisible to the naked eye when the subjects enacted the coma status, and GRU-S did underperform HER-S, the discussion section outlines two possible approaches for future research to solve this problem, namely deployment of higher-level decision-making mechanisms with longer-timeframe observations and integration of GRU-S with HER-S.