1. Introduction
In traditional society, families have traditionally been the sole caregiving resource. In Taiwan, with the initiation of the “Senior Citizen Welfare Act” on 26 January 1980, the formalization of elderly welfare legislation began, bringing attention to the issue of caring for the elderly. In the past, elderly care was primarily carried out through manual labor. However, with the economic rise of Taiwan in the 1990s, there was a substantial demand for domestic labor, leading to a shortage of manpower for elderly care. Consequently, foreign labor was introduced to compensate for this shortage [
1].
In recent years, surveillance devices have become ubiquitous and visible on streets, campuses, public facilities, and more. The application of surveillance devices in home care for the elderly is also a noteworthy topic [
2,
3]. However, the use of surveillance devices comes with the drawback of blind spots in the field of vision [
4,
5]. Therefore, this study explores how unmanned aerial vehicles (drones) can be employed to replace surveillance devices in caring for the elderly. The aim is to realize a vision of combining artificial intelligence with machinery to compensate for the shortage of manpower.
As countries around the world enter an “aging society”, falls have become one of the most common issues among the elderly. According to the World Health Organization (WHO) fact sheet, falls are the second leading cause of unintentional injury deaths globally, with the highest number of fatalities occurring among adults aged 60 and above [
6]. The injuries caused by falls can range from minor soft tissue damage to life-threatening injuries [
7], and different fall directions may result in varying levels of injury severity [
8,
9]. This increases the likelihood of severe injuries, placing a significant burden on society and families [
10].
Elderly individuals may lose consciousness after falling and roll unconsciously, making it difficult for medical personnel to accurately identify the injured areas upon arrival. This can lead to a time-consuming and challenging process, ultimately increasing the risk of delayed treatment for the elderly. Additionally, older adults may struggle to recall the details of their fall, and sometimes the injured areas may not show obvious external injuries, making it difficult for doctors to determine which areas need to be examined by X-rays. Without an accurate fall record, doctors may be unable to pinpoint the optimal areas for examination, leading to delays in diagnosis and treatment, as well as unnecessary guesswork regarding the fall, which further increases the patient’s health risks.
Moreover, the demand for long-term care is rapidly increasing due to the aging society, while the declining birth rate and rising female labor force participation have shifted long-term care services from being primarily provided by families to being outsourced. This includes placing individuals in care institutions or nursing homes, or hiring caregivers to provide in-home care. However, whether through institutions or caregivers, this shift imposes a considerable financial burden on families.
Past related research on fall detection in the elderly primarily employed image recognition coupled with traditional surveillance cameras in experiments [
2]. The advantages of this approach include the ability to recognize specific body postures and objects, as well as obtaining images in a timely manner. This significantly addresses the shortcomings of manual monitoring. However, the drawback of using traditional surveillance cameras is the tendency to have blind spots, making it challenging to recognize the posture of the care recipient at all times.
In recent years, some studies have recognized the limitations of traditional surveillance cameras in experimental setups. Consequently, they have opted for wearable sensors. Unlike traditional surveillance cameras, which may have blind spots and require confining the experimental subject to monitored spaces for detection, wearable sensors can be worn by the care recipient [
11]. This allows continuous detection of the care recipient’s posture, reducing the limitations associated with blind spots [
12]. The success rate of recognition with wearable sensors is also relatively higher than that of traditional monitoring devices [
13,
14].
However, wearable sensors have their drawbacks, as they require cooperation from the care recipient for their use. Therefore, they cannot be used for individuals with cognitive impairments or those who cannot cooperate. Additionally, wearable sensors are susceptible to the influence of parameters. Since wearable sensors do not use image recognition technology, they can be influenced into incorrect posture judgments, leading to a decrease in recognition accuracy.
In order to address the three major issues of the elderly not receiving timely treatment if they fall, the inability of traditional surveillance cameras to follow the care recipient, and the requirement for user cooperation with wearable sensors, this study combines the advantages of smart unmanned aerial vehicles (UAVs) that can autonomously track the care recipient and monitor without blind spots. It integrates facial recognition technology from artificial intelligence image recognition and analysis techniques. Additionally, we have enhanced the posture recognition technology using the OpenPose algorithm for identifying human body postures.
We utilize a human body posture coordinate algorithm and posture classification, resulting in a more refined human body posture recognition algorithm. This not only enables the identification of subtle body postures but also significantly improves the success rate of detailed human body posture recognition. With this approach, the care recipient can be tracked and monitored, allowing for the detection of any abnormal safety conditions. Through real-time information recognition technology and instant fall detection records, the latest monitoring images, fall timestamps, and the severity of falls are communicated to the family and medical personnel of the care recipient. This ensures that the care recipient does not miss the opportunity for timely medical intervention.
The smart UAV, coupled with artificial intelligence for real-time tracking and monitoring of the care recipient’s fall posture, not only addresses the shortcomings of traditional surveillance cameras but also eliminates the need for user cooperation with wearables. It can transmit the care recipient’s information back to the monitoring host without requiring the care recipient to wear any devices, providing optimal real-time information.
In summary, this study proposes a system for intelligent dynamic tracking and fall detection using deep learning methods. The system begins with a drone remote control module utilizing OpenCV to capture images taken by the drone. Subsequently, a face detection module employing Dlib HOG algorithm detects the face of the care recipient, enabling the drone to follow the recipient in real time. Further, a pose recognition module utilizes OpenPose to identify and mark joints of the human body, along with the defined Part Affinity Field (PAF), to find corresponding pose vectors. This allows the images captured by the drone to provide relevant information about the care recipient’s body.
However, although OpenPose’s PAF can effectively identify various joints of the human body, when certain body postures result in joint occlusion, there may be misinterpretation of the body posture. To address this issue, this study proposes the “Intelligent Fall Posture Analysis Algorithm”, which further analyzes different types of postures by using the coordinates of the joints and their relative heights and positions. This aims to effectively establish a posture classification model, thereby achieving real-time and highly accurate posture recognition.
Therefore, the intelligent fall posture analysis module developed in this study determines whether the care recipient is in a falling posture and provides a more detailed analysis of the type and direction of the fall, enabling the drone to detect falls effectively. The fall care module, using predefined conditions in Python libraries, assesses the severity of the fall and records the time of occurrence. The monitoring system transmits the recorded fall data—including the direction, severity, timestamp, and real-time footage of the fall—to the care recipient’s family and medical personnel.
This fall detection information is valuable for doctors when performing X-ray examinations. By providing real-time detection and recording of the fall posture, doctors can more accurately identify which areas of the care recipient require X-ray or even Computed Tomography examination. Additionally, based on the system’s real-time image analysis, doctors can assess potential causes of the fall, such as leg weakness, balance disorders, or accidental collisions. This information enables doctors to make more precise and timely diagnoses and treatments, helping the care recipient receive immediate and effective medical attention.
Furthermore, having this system accompany the care recipient alleviates concerns and stress for their family, knowing that the system can monitor and follow the recipient in real time. By utilizing drones and artificial intelligence to monitor body posture, the system developed in this research ensures both real-time tracking and fall detection of the care recipient. This framework forms the basis of the study presented in this paper.
The structure of this thesis is as follows:
Section 1 is the introduction.
Section 2 introduces the relevant literature and technologies, smart real-time home care, deep learning, fall detection applications, OpenPose, and Teachable Machine.
Section 3 describes the system architecture and modules, explaining the five modules of the smart drone following and fall detection systems and their operational logic.
Section 4 presents the experimental results, discussing experimental design and data analysis, as well as the accuracy and error sources in facial recognition and detection of falls in four directions.
Section 5 discusses the proposed intelligent following and fall detection system, focusing on the analysis of experimental results, improvements to OpenPose, and a comparison between this system and Teachable Machine. Finally,
Section 6 provides the conclusion and intentions for future work, summarizing the advantages and limitations of this system in recognizing human postures in four different fall directions, practical constraints in its application, and potential future development directions.
2. Materials and Methods
2.1. Smart Real-Time Home Care
In a global context, falling is a significant public health issue. According to the World Health Organization’s (WHO) manual on falls, an estimated 684,000 fatal falls occur each year worldwide, with over 80% taking place in low- and middle-income countries. Among these, individuals aged 60 and above experience the highest mortality due to falls. While not all falls result in fatalities, approximately 37.3 million people require medical care each year due to severe falls. Furthermore, falls contribute to over 38 million Disability-Adjusted Life Years (DALYs) lost annually [
15], surpassing the combined impact of traffic injuries, drowning, burns, and poisoning in terms of years lived with disability [
6].
When an elderly member of a household experiences a fall and sustains injuries, there arises a long-term need for care. Families address this need in various ways, with some opting for the presence of family members or caregivers to provide support directly, while others choose to entrust the elderly to care facilities designed for seniors. These care facilities concentrate on providing care for each elderly individual entrusted to them, utilizing a combination of human resources and surveillance devices [
16]. However, the drawback of traditional surveillance devices lies in their blind spots, limiting the comprehensive care available to the elderly.
In recent years, technological advancements have led many care centers to complement traditional surveillance systems with wearable devices [
17], creating a synergistic approach that results in intelligent real-time care [
18]. This significantly improves upon the limitations of conventional surveillance devices. However, some families may find the cost of care centers prohibitive, or the increasing aging population might strain the capacity of such facilities. Consequently, home-based care has emerged as an alternative [
19]. Home-based care is viewed as an extension or substitution for traditional hospitalization, shortening the duration of patient stays while ensuring continuity in medical care, thereby providing comprehensive support.
Previous research has explored the integration of various devices (such as surveillance systems [
20], wearable devices [
21], intelligent monitoring robots [
22], etc.) with artificial intelligence to achieve the goal of smart home-based care [
23]. Therefore, this study developed a drone with real-time recognition and tracking capabilities, integrating an intelligent real-time fall detection and analysis algorithm to implement a smart home care service for monitoring and notifications.
2.2. Deep Learning
Deep learning is a branch of machine learning that differs in its approach from traditional machine learning. While machine learning involves computers using algorithms to infer patterns and features from a vast amount of historical data provided by humans, deep learning draws inspiration from the functioning of the human brain’s neural networks. It employs multi-layered artificial neural networks to learn from extensive datasets.
Within various deep learning techniques, Convolutional Neural Networks (CNN) demonstrate outstanding performance in image recognition. The primary structure of CNN includes convolutional layers, pooling layers, and a final dense layer which is a fully connected layer. When progressing to the dense layer, CNN goes through multiple sets primarily responsible for feature extraction through convolutional and pooling layers [
24].
The convolutional layer traverses the image using convolutional kernels to determine which features to enhance or weaken. The pooling layer selects key features from the output of the convolutional layer, dividing the feature map data and extracting the maximum values. After several sets of convolutional and pooling layers, the extracted features from the pooling layer are fed into the dense layer to calculate the predicted classification result.
2.3. The Application of Deep Learning in Fall Detection
In the previous literature, methods employed for fall detection primarily fall into the categories of wearable devices, environmental sensing devices, and computer vision-based fall detection [
25]. The following will provide a literature review and discussion on each of these fall detection methods.
The part related to fall detection in wearable devices primarily involves using heart rate sensors [
26], gyroscope sensors [
27], or accelerometers to determine values and features indicative of a fall [
28]. By connecting the detector to a system and employing machine learning, wearable devices for fall detection achieve higher accuracy and real-time transmission of sensor results. However, challenges arise when the wearable device is not worn by the user, leading to the inability to detect falls. Additionally, malfunctioning sensors can pose obstacles to fall detection. The use of wearable devices for fall detection also introduces discomfort for users due to the necessity of wearing them. Lastly, since wearable devices do not utilize visual methods for fall detection, there is a risk of being deceived by the parameters being sensed.
When used for fall detection, environmental sensing devices primarily employ methods such as detecting floor vibrations and sounds or utilizing ultrasonic and radar devices to sense features and numerical values indicative of a fall [
29,
30,
31,
32]. Like wearable devices, these sensing devices are connected to a system and often coupled with machine learning or neural networks for detection. Although environmental sensing devices generally exhibit slightly lower accuracy in fall detection compared to wearable devices, they offer real-time transmission of sensing results.
However, environmental sensing devices are susceptible to interference from environmental factors, and their setup costs are higher. Additionally, due to their limited sensing range, deploying a sufficient quantity to cover the entire daily living environment for elderly care may be necessary. Even after deployment, there is a possibility of blind spots in the sensor coverage.
In summary, this study employs unmanned aerial vehicles (UAVs) combined with facial detection and OpenPose body pose estimation for computer vision-based fall detection. In comparison to past research utilizing surveillance devices for computer vision-based fall detection, which can be susceptible to environmental factors such as obstructions or overlapping of individuals, resulting in the inability to detect falls, traditional surveillance cameras are fixed in position, making it challenging to cover the entire daily living environment. This limitation can lead to blind spots in practical applications.
Therefore, this study utilizes UAVs for real-time monitoring, low construction costs, and the ability to navigate obstacles during flight. Facial detection is employed to serve as a reference point for UAV system operation, addressing the drawbacks of traditional surveillance devices with potential blind spots. Furthermore, to mitigate the impact of environmental factors, this research utilizes OpenPose, allowing accurate prediction of obscured body joint points even in cases of overlapping individuals or obstruction by objects. This approach aims to enhance fall detection accuracy in challenging environmental conditions.
2.4. OpenPose
OpenPose, developed by researchers at Carnegie Mellon University (CMU), is an open-source model that is considered one of the best real-time human pose estimation methods currently available. In a study conducted by S. Xiong et al., the researchers utilized the RULA/REBA human posture assessment methods for measurement. The article mentioned that OpenPose outperformed Kinect in measuring joint angles and conducting semi-automatic human posture assessments. OpenPose demonstrated its capabilities even in scenarios with occlusions and non-frontal camera views [
33].
OpenPose operates as a supervised convolutional neural network developed using the Caffe framework. Its strengths lie in the accuracy of facial, hand, and body pose detection, with resilience against interference from background environmental factors. A study by A. Viswakumar et al. highlighted OpenPose’s effectiveness in extreme lighting conditions, such as extremely bright and dimly lit environments, showcasing its robust key point estimation capabilities [
34]. Moreover, OpenPose is versatile and applicable to single-person and multi-person pose recognition, exhibiting excellent recognition performance and high-speed processing.
When the system outputs an image, OpenPose undergoes initial image analysis through the first ten layers of the VGG-19 model to obtain a feature map. The image features are then input into two CNNs. The first CNN is used to predict the confidence map of human joints in the image, generating a separate confidence map for each joint. The second CNN combines the confidence map of detected human joints with the Part Affinity Field (PAF) to predict vectors representing the limbs of the human body [
35].
In a study conducted by Q. Xu et al., OpenPose was utilized to extract skeletal diagrams of the human body during falls from internet videos and a dataset released by the University of Montreal in the fall. The trained model exhibited excellent recognition rates in conditions without obstructions and ample lighting. Conversely, recognition rates were lower in scenarios with excessive obstructions or low lighting [
36].
Therefore, in this research, the intelligent fall posture analysis module relies on OpenPose’s posture detection, utilizing PAF technology for accurate and rapid detection of specific body postures in care recipients. This forms the basis for developing a UAV system that, upon detecting a care recipient through facial detection, can follow the care recipient, identify fall postures, categorize falls into different directions (front, back, left, right), record the temporal aspects of falls, assess fall severity, and notify medical personnel and family members of relevant fall information.
2.5. Teachable Machine
Teachable Machine is a no-code machine learning platform launched by Google. Even users with limited knowledge of programming concepts can easily train their machine-learning models on the platform’s website without writing any code. The process involves three training steps. First, users need to add classification content. Second, the Teachable Machine allows users to adjust the training parameters of the model to meet their specific needs. Even with limited expertise, users can train the model using its default training parameters. Finally, after training the model on the platform, users can test the model to confirm the training results. Currently, Teachable Machine offers training models for images, sounds, and human poses [
37].
In this study, the Teachable Machine’s human pose training model was used to train models for four fall directions. These models were then compared with the intelligent tracking and fall detection algorithms developed in this study, serving as a benchmark for the recognition accuracy of falls in four different directions.
5. Discussion
This experiment mainly consists of three parts: the system’s recognition of the four fall directions of the care recipient in terms of detection accuracy at different distances, different lighting conditions, and different backgrounds; the system’s improvement of forward fall detection in dim lighting environments; and the system’s method of improving forward fall detection by modifying judgment criteria. The experiment demonstrates that the accuracy of the intelligent UAV dynamic tracking and fall detection system in both facial recognition and improving the detection of the four fall directions approaches 95%.
5.1. In-Depth Discussion on the Algorithm and Experimental Results for Forward Fall Detection
In the intelligent fall posture analysis module developed in this study, fall detection across four directions—forward, leftward, rightward, and backward—was tested under the same conditions, with the drone positioned 150 cm away. It was observed that forward fall detection had a slightly lower accuracy of 91%, compared to the other directions. The forward fall detection using I_OPose focuses on the height difference between the neck and nose key points. During experiments, we found that one of the main reasons for the lower accuracy was that the care recipient’s face, when in a forward fall posture, was often obscured by light and shadows, causing the facial key points to be lost and resulting in OpenPose failing to recognize the posture.
To address this issue, we changed the detection method to focus on the back of the body. The E_HN_OPose method detects the height difference between the waist and neck key points, effectively reducing the problem of facial key point loss caused by head shadow occlusion in I_OPose. This adjustment increased the drone’s detection accuracy from 91% to 93%. However, E_HN_OPose still encountered issues when detecting the process of the care recipient falling or during forward falls, as the height difference sometimes did not meet the detection threshold. Additionally, joint occlusion during the fall caused OpenPose to misidentify or lose key points, resulting in detection failures.
To resolve these challenges, we proposed the E_OPose method, which detects either the waist-to-neck or neck-to-nose height difference. By combining the advantages of both I_OPose and E_HN_OPose, we achieved a complementary effect. This approach mitigated the problem of I_OPose failing due to head shadow occlusion, and also resolved the issues of E_HN_OPose struggling to detect the falling process and joint occlusion causing key point loss or incorrect predictions. The experimental results of this study confirmed that the detection accuracy improved from 93% to 96% when using this combined approach with the drone.
5.2. Discussion on Improving Openpose in Intelligent Dynamic Tracking and Fall Detection System
Furthermore, in dimly lit environments, the intelligent dynamic tracking and fall detection system addresses the issue of certain areas being too dark during drone detection of care recipients by implementing automatic adjustment of image brightness. This enhancement aims to mitigate the potential loss of key points by OpenPose due to poorly illuminated regions. As indicated by the experimental results in
Table 2, we observed a significant improvement in the detection accuracy of forward falls from 60% to 93% after implementing automatic adjustment of image brightness. Additionally, our research system provides a timeline for recording fall events and their severity. Upon detecting a fall posture in the care recipient, the system begins recording the direction of the fall and the duration of maintaining the fallen posture. Leveraging the recorded fall timeline, we propose a method to estimate the severity of falls by calculating the duration of maintaining the same fall posture. These functionalities effectively address the limitations of OpenPose in recording fall posture duration and assessing fall severity, enabling care recipients and their families to be informed about the severity of detected falls for prompt medical attention.
5.3. Intelligent Tracking and Fall Detection Algorithm vs. Google Teachable Machine
The intelligent tracking and fall detection algorithm of this study was compared with Google Teachable Machine’s Pose Project training model, as shown in
Figure 12 and
Figure 13. In both similar and dissimilar background conditions, the accuracy of the intelligent tracking and fall detection algorithm for detecting and recognizing the four directions of fall posture was higher than that of the Google Teachable Machine. Particularly, the accuracy of identifying the backward fall posture saw the greatest improvement, with recognition accuracy increasing from 70.35% to 95% in similar background conditions and from 65.9% to 95% in dissimilar background conditions.
The reason behind this lies in the fact that our system algorithm utilizes methods based on OpenPose and PAF, combined with the intelligent tracking and fall detection algorithm proposed in this study. As a result, both algorithms can extract joint points from various body parts during posture recognition. However, in experiments, it was found that the Pose Project training model provided by the Google Teachable Machine platform tends to lose joint points, leading to lower accuracy rates in both experiments. Except for the accuracy rate of identifying the rightward fall posture, which reaches 92%, the accuracy rates for the other three directions of fall posture range between 65.9% and 83.25%. In contrast, the fall detection algorithm proposed in this study can achieve accuracy rates of at least 90% to 95% for all four directions of falls.
Furthermore, compared to the Google Teachable Machine system, our algorithm calculates the height difference between two joint points or the angle formed by a line segment connecting two key points and the vertical axis, bringing about its high variability. This characteristic allows for higher accuracy and variability in detecting specific posture values, making it possible to achieve higher accuracy rates with greater variability.
6. Conclusions
The unmanned aerial vehicle (UAV) intelligent tracking and fall detection system proposed in this paper combines Dlib face detection with OpenPose deep learning algorithms. Developed using Python libraries, the system continuously records the timing of falls and estimates the severity of falls based on their duration. This enables the system to autonomously track and provide immediate fall care, addressing the limitations of fixed-point monitors unable to track care recipients. In aging societies, where elderly individuals may experience falls without timely assistance, the inability to accurately detect the direction and severity of falls complicates the transmission of fall information to medical professionals, potentially delaying necessary treatment.
The system promptly detects and analyzes faces and postures, determining whether the UAV has a direct view of the care recipient. Then, using the fall care module, it logs the timestamp based on whether the recipient exhibits a falling posture and the fall’s direction. When a fall is confirmed, the system estimates the fall’s severity based on its duration. The experimental results confirm that the system developed in this study can accurately and promptly identify falls, providing effective continuous detection and tracking with mobility capabilities. This system significantly improves the monitoring range and efficiency compared to traditional stationary cameras. Moreover, during drone operation, the system can determine the fall direction based on the care recipient’s posture and simultaneously record details such as the fall direction, timestamp, severity, and real-time images.
This not only overcomes the limitations of blind spots in traditional surveillance systems but also provides detailed fall detection information that helps medical professionals accurately identify appropriate X-ray examination areas. This detailed fall data supports subsequent diagnosis and treatment, ensuring timely and targeted medical interventions. The system effectively addresses the growing caregiver shortage in an aging society and enhances the utilization of limited healthcare resources, allowing them to be used more efficiently.
In the research, it was observed that OpenPose encounters issues with joint occlusion when detecting fall postures in all four directions. Specifically, during the detection of forward falls, the lower body below the waist often suffers from missing joint points. The primary reason for this is that when the care recipient remains in a forward fall posture, their legs tend to compress, causing joints to overlap, making it difficult for OpenPose to correctly identify joints. Consequently, it may result in misinterpretations or failure to detect the posture.
To address these issues, which negatively impact the system’s fall detection accuracy, this study proposes an “intelligent fall posture analysis algorithm”. By utilizing joint coordinates and calculating both the height differences between joints and the angles between body segments and the vertical axis, the system accurately analyzes the parameters for fall postures in all four directions. These parameters are crucial features for fall classification and help establish a robust fall classification model. As a result, the system achieves real-time detection with a high accuracy rate exceeding 95% across all four directions in fall posture detection tests. Specifically, the detection accuracy for forward, backward, leftward, and rightward falls are improved to reached 96%, 95%, 96%, and 96% respectively.
Moreover, during the automated brightness adjustment experiments, we found that, under certain conditions, image noise could blur the image. This, in turn, destabilizes OpenPose’s ability to reliably detect body parts, further highlighting the need for enhanced detection algorithms and improvements with the pre-processing effect of insufficient image brightness in this study. Therefore, this study provides the investigating methods to utilize image sharpness and noise reduction processing which may help reduce image noise, thereby enhancing the stability of OpenPose in detecting human body postures under automatic brightness adjustment conditions.
In future research, we will consider further improvements that can be made by modifying the OpenPose training model to address issues where specific poses cause significant body part overlap, leading to compressed images that result in OpenPose losing key points in the overlapped regions. In addition, detecting a sequence of movements associated with falls could help identify hazardous poses and assess the severity of falls among the elderly based on the duration of these postures. This would allow the system to provide healthcare professionals with timely and accurate fall-related information for further medical evaluation. Strengthening this system to deliver precise fall data to medical personnel could support subsequent treatment and necessary care for patients, offering valuable assistance.