Human Activity Recognition Using Machine Learning
Human Activity Recognition Using Machine Learning
Abstract:- Working information and classification is one For instance, the MS-G3D network addresses some of
of the most important problems in computer science. these challenges by automatically learning features from
Recognizing and identifying actions or tasks performed skeleton sequences and incorporating 3D convolution and
by humans is the main goal of intelligent video systems. attention mechanisms [1]. To further enhance the
Human activity is used in many applications, from performance of action recognition systems, new methods like
human-machine interaction to surveillance, security and the 3D graph convolutional feature selection and dense pre-
healthcare. Despite continuous efforts, working estimation (3D-GSD) method have been proposed [1]. This
knowledge in a limitless field is still a difficult task and method leverages spatial and temporal attention mechanisms
faces many challenges. In this article, we focus on some of along with human prediction models to better capture local
the current research articles on various cognitive and global information of actions and analyse human poses
functions. This project includes three popular methods to comprehensively [1]. Meanwhile, in a separate line of
define projects: vision-based (using estimates), practical research, the use of Convolutional Neural Networks (CNNs)
devices, and smartphones. We will also discuss some for human behaviour recognition from different viewpoints
advantages and disadvantages of the above methods and has gained attention [2].
give a brief comparison of their accuracy. The results will
also show how the visualization method has become a CNNs are utilized for tasks such as object detection,
popular method for HAR research today. segmentation, and recognition in videos, aiming to identify
various classes of body movement [2]. In parallel, the
Keywords:- Artificial Intelligence (AI), Human Activity development of Human Activity Recognition (HAR) systems
Recognition (HAR), Computer Vision, Machine Learning. has been propelled by the demand for automation in various
fields including healthcare, security, entertainment, and
I. INTRODUCTION abnormal activity monitoring [3]. HAR systems rely on
computer vision-based technologies, particularly deep
Skeleton features are widely used in human action learning and machine learning, to recognize human actions or
recognition and human-computer interaction [1]. This abnormal behaviours [3]. These systems monitor human
technology involves detecting and tracking key points of a activities through visual monitoring or sensing technologies,
human skeleton from images or videos, typically utilizing categorizing actions into gestures, actions, interactions, and
depth cameras, sensors, and other equipment to capture group activities based on involved body parts and movements
movement trajectories and analyse them through computer [3]. The applications of HAR encompass diverse domains
vision and machine learning techniques. The applications of such as video processing, surveillance, education, and
skeleton behaviour recognition span across various domains healthcare [3].
such as games, virtual reality, healthcare, and security [1].
Similarly, action identification in videos remains a
However, recognizing actions from skeleton data poses crucial task in computer vision and artificial intelligence,
several challenges due to factors such as multiple characters with applications in intelligent environments, security
in videos, varying perspectives, and interactions between systems, and human-machine interfaces [4]. Deep learning
characters [1]. Early methods relied on hand-designed feature methods, including convolutional and recurrent neural
extraction and spatiotemporal modelling techniques, while networks, have been employed to address challenges such as
recent advancements have been made in deep learning-based focal point recognition, lighting conditions, motion
approaches, particularly focusing on skeleton key points or variations, and imbalanced datasets [4]. Proposed models
spatio-temporal feature analysis [1]. integrate object identification, skeleton articulation, and 3D
convolutional network approaches to enhance action
Deep learning methods have shown promise in recognition accuracy [4]. These models leverage deep
improving action recognition accuracy, but traditional learning architectures, including LSTM recurrent networks
approaches still face challenges in adapting to different and 3D CNNs, along with innovative techniques such as
scenarios and effectively handling pose changes and high feature extraction and object detection to classify activities in
motion complexity [1]. video sequences effectively [4].
obtained, machine learning or deep learning models can be The relationship between the growth of HAR and AI is
trained to classify different activities based on the detected symbiotic. As AI techniques evolve, they enable more
poses. Features derived from pose data, such as joint angles sophisticated and accurate methods for extracting meaningful
or body part velocities, can be used as input features for these information from raw sensor data, giving rise to improved
models. Pose-based approaches are particularly useful when HAR systems. In the past, HAR models relied on single
sensor data is not available or when activities can be reliably images or short image sequences. These advanced models,
inferred from visual cues, such as in surveillance or human- such as CNNs combined with LSTMs, have made HAR
computer interaction applications. designs more robust and adaptable.
Sensor Based: Human activities are diverse and can encompass a wide
Sensor-based approaches involve using various types of range of motions and positions. HAR has diverse
sensors, such as accelerometers, gyroscopes, magnetometers, applications, including healthcare, surveillance, monitoring,
or inertial measurement units (IMUs), to capture motion or and assistance for the elderly and people with disabilities. It
physiological signals associated with human activities. These is imperative to note that while the field is making great
sensors can be embedded in wearable devices like strides, there are challenges that need to be addressed. These
smartwatches, fitness trackers, or smartphones, or they can challenges include the selection of appropriate sensors, data
be placed in the environment (e.g., on walls or floors) to collection, recognition performance, energy efficiency,
capture activity data. Data collected from sensors typically processing capacity, and system flexibility.
include time-series measurements of acceleration, rotation, or
other physical quantities. Machine learning algorithms, such Our goal in this project is to create an efficient and
as decision trees, support vector machines (SVMs), or deep effective human recognition system that can process video
neural networks, can then be trained on this data to recognize and image data to detect tasks performed. The system is
different activities. Sensor-based approaches are widely used versatile and can find applications in many areas, such as
in applications like activity tracking, fall detection, sports caring for and helping people with special needs.
performance analysis, and healthcare monitoring due to their
versatility and non-intrusiveness. III. METHODOLOGY
Working
The project boasts a Graphical User Interface (GUI)
crafted with Tkinter, facilitating human tracking while
leveraging Mediapipe's pose estimation model for exercise
supervision. Through Tkinter, the GUI provides users with a
variety of exercise tracking options, including fighting,
smoking, walking, reading, and playing. It offers the
flexibility of choosing between real-time camera recognition
or video input, with distinct buttons to toggle between the
two modes.
IV. RESULT
The future holds a wealth of opportunities for further In an era dominated by the advancements in computer
enhancing and extending Human Activity Recognition vision, the development and implementation of Human
systems. Here are some promising areas of development and Activity Recognition (HAR) systems have emerged as a
exploration: compelling and indispensable technology. These systems has
proven a remarkably effective in addressing multitude of
Multimodal Integration: real-world applications, from surveillance and monitoring to
Incorporating data from diverse sensor types, such as providing invaluable assistance to elderly and visually
audio, accelerometers, and more, can enable richer context impaired individuals. The project's CNN-LSTM-based HAR
and improved recognition accuracy. system represents a significant stride in this direction,
offering a versatile and accurate approach to recognize and
Privacy and Ethics: categorize a wide array of human activities. This system's
Researching and implementing robust privacy measures adaptability to video streams and image data showcases its
and ethical guidelines is essential for deploying HAR potential in addressing the diverse needs of various industries
systems in sensitive or public environments. and domains.