Human Activity Recognition Using CNN
Human Activity Recognition Using CNN
Human Activity Recognition Using CNN
Submitted By:
Kaniz Fatema
A final year project report submitted to the City University in partial fulfillment of
the requirements of the Degree of Bachelor of Computer Science and Engineering
Supervised By:
Sadia Islam
City University
Dec 2020
This is to certify that the work presented in this project entitled “Human Activity
Recognition” is the outcome of the work done by Kaniz Fatema under the
supervision of Sadia Islam, Lecturer, Department of Computer Science and
Engineering, City University, Dhaka, Bangladesh during 15 July to 24 Dec 2020. It
is also declared that neither this project/report nor any part it has been submitted
or is being currently submitted anywhere else for the award of any degree or
Approved By:
--------------------------------- --------
Supervisor :
Sadia Islam
City University
This is to certify that the work presented in this project entitled “Human Activity
Recognition using CNN” is the outcome of the work done by Kaniz Fatema under
the supervision of Sadia Islam, Lecturer, Department of Computer Science and
Engineering, City University, Dhaka, Bangladesh during 15 July to 24 Dec 2020. It
is also declared that neither this project/report nor any part it has been submitted
or is being currently submitted anywhere else for the award of any degree or
Kaniz Fatema
Batch: 40th
Dhaka, Bangladesh
Project development is not an easy task. It requires co-operation and help of
various people. It always happen that word run out when We are really thankful
and sincerely want to inspire our feeling of gratitude towards the one when
helped in the completion of the project. We are deeply indebted to my supervisor
Sadia Islam, Lecturer of Department of Computer Science & Engineering, City
University, Dhaka, Bangladesh Without his help, guidance, sympathetic co-
operation, stimulating suggestions and encouragement, the planning and
development of this project would be very difficult for us. Our special thanks go to
the Head of the Department of CSE, Md. Safaet Hossain, who had given us the
permission and encouraged us to go ahead. We are bound to the Honorable Dean
of Department of Science Faculty, Prof. Dr. Md. Shawkut Ali Khan,for his endless
support. I am very grateful to all my faculty members who gave me their valuable
guides to complete my graduation. I am also very grateful to all those people who
have helped me to complete my project.
Kaniz Fatema
Batch: 40th
Dhaka, Bangladesh
We construct a Convolutional Neural Network (CNN) to identify human activities
using the data collected from the google images . The daily human activities that
are chosen to be recognized include walking, jogging, sitting, standing, upstairs
and downstairs. The Convolutional Neural Network is directly approached as the
input for training the CNN without any complex pretreatment. The data collected
from google images site like gettyImages, istock photos and dreamstime site .
There I train and test data, preprocessing and labeling the data. Then apply
algorithm like Random Forest, SVM, Decision Tree, Naïve Bayes, Logistic
Regression and KNN. Then the process going to apply in CNN. Thus produce the
output of Human Activity Recognition.
Table of Contents
Chapter No. Content Name Page No.
Certification i
Declaration ii
Acknowledgement iii
Abstract iv
Table of Contents v
3.3 Logistic Regression Segmentation
3.3.3 Binarization
5.1 Introduction
Chapter 5
Proposed Scheme 5.2 Detection of HAR
5.3 Benefits
5.4 Limitations
6.1 Implementation
Chapter 6
Implementation 6.2 Implementation and Result
7.1 Summary
Chapter 7
Conclusion 7.2 Suggestions for future work
……………………… References
Chapter 1
1.1Human Activity Recognition
Human activity recognition plays a significant role in human-to-human interaction
and interpersonal relations. Because it provides information about the identity of a
person, their personality, and psychological state, it is difficult to extract. The
human ability to recognize another person’s activities is one of the main subjects of
study of the scientific areas of computer vision and machine learning. As a result
of this research, many applications, including video surveillance systems, human-
computer interaction, and robotics for human behavior characterization, require a
multiple activity recognition system.
When attempting to recognize human activities, one must determine the
kinetic states of a person, so that the computer can efficiently recognize this
activity. Human activities, such as “walking” and “running,” arise very
naturally in daily life and are relatively easy to recognize. On the other hand,
more complex activities, such as “peeling an apple,” are more difficult to
identify. Complex activities may be decomposed into other simpler activities,
which are generally easier to recognize. Usually, the detection of objects in a
scene may help to better understand human activities as it may provide useful
information about the ongoing event (Gupta and Davis, 2007).
© Sitting (d) Standing
That is, actions within the same class may be expressed by different people with
different body movements, and actions between different classes may be difficult
to distinguish as they may be represented by similar information. The way that
humans perform an activity depends on their habits, and this makes the problem of
identifying the underlying activity quite difficult to determine. Also, the
construction of a visual model for learning and analyzing human movements in
real time with inadequate benchmark datasets for evaluation is challenging tasks.
1.2Applications of HAR
During the last decade, there was a significant growth of the number of
publications in the field of HAR; in particular, many researchers have proposed
application domains to identify specific activity types or behaviors to reach
specific goals in these domains. This section focuses on state-of-the-art
applications that use HAR methodologies to assist humans. This review, in
particular, discusses the application of the current AR approaches to AAL for
smart homes, healthcare monitoring solutions, security and surveillance
applications, and TI applications; these approaches are further classified along the
observation methodology used for recognizing human behavior, namely, into
approaches based on visual, non-visual, and multimodal sensor technology.
(a) (b)
Moreover, smart home system proposed by the Center for Advanced Studies in
Adaptive Systems (CASAS)23 used machine learning and data mining
techniques to analyze the daily activity patterns of a resident to generate
automation policies based on the identified patterns to support the residents.
Automation policies were used to assist elder individuals in their urgent needs.
1.3Idea and Concept
Human activity recognition will be able to interconnect human-computer
It also includes video surveillance system.
Investigators also can detect criminal offenses.
For monitoring health caring system
1.5Scope & Objectives
Model based on user activities which help detecting criminal offenses
based on their activities
It can also apply business for targeted investigators.
Emphasizes in healthcare system for the targeted patient who reduce risk
of many disease.
Determine the kinetic states of person
Human- Human interaction and interpersonal relations.
Identify a person , their personality and psychological state..
Video surveillance system.
Human-Computer interaction to identify behaviors of a person which
encounter of the system.
Robotics for human behavior characterization.
Stop criminal offenses based on their physical activity.
Must be enhanced a healthcare system of the targeted patients.
Lots of data for training and testing manually
Chapter 2
Literature review
2.1 Detection
Human activity recognition, or HAR for short, is a broad field of study concerned
with identifying the specific movement or action of a person based on sensor data.
Movements are often typical activities performed indoors, such as walking, talking,
standing, and sitting. They may also be more focused activities such as those types
of activities performed in a kitchen or on a factory floor.
The sensor data may be remotely recorded, such as video, radar, or other wireless
methods. Alternately, data may be recorded directly on the subject such as by
carrying custom hardware or smart phones that have accelerometers and
FNN based Human Activity Recognition. The algorithms author includes are sobel
edge detection, highlight the licence plate region using image morphology,
horizontal and vertical extraction ,otsu method. Detection accuracy: 89.7% [4].
2.2 Classification
Given a training set of labeled observation sequences (features extracted from the
acceleration readings in the x, y, and z axes from three accelerometers placed on
different parts of the body), corresponding to each of the activities that we want to
classify, we first want to estimate the model parameters λ = (A, B, π), where A =
{aij} is the the state transition probability distribution, B = {bj(k)} is the
observation symbol probability distribution in state j, and π = {πi} is the initial
state distribution for each activity. Given a new set of observations we would like
to classify each sequence according to the model that gives the maximum
likelihood for that particular sequence.
We modeled each observation sequence as a 5-state leftto-right HMM with
continuous Gaussian observation vectors and two
hidden states. Each observation vector was formed by combining the mean and
variance in the x, y, and z axes from each accelerometer. These features were
previously used in [2]. An HMM was trained for each class (λ1, λ2, ..., λC), where
λc indicates the learned HMM model for class c, and C = 8 is the total number of
classes, using the labeled data from eight datasets as training dataset Tk. The ninth
data set was used as the validation dataset Vk. The HMM toolbox for Matlab
developed by [7] was used to train and test the different models. The log likelihood
of each model was calculated for each observation sequence in the ninth dataset.
Each observation sequence Ol = {Ol 1Ol 2...Ol T} (with T = 5 time slices) in the
validation dataset Vk = {Ol }L l=1 was classified according to the model that gave
the maximum likelihood.
Chapter 3
Algorithm & Techniques
3.1 Introduction
We try a number of existing techniques and algorithms in our project, such as
for activity recognition detection I try algorithm that is, KNN, Naïve Bayes,
Decision Tree, SVM, Random Forest and Logistic Regression. Here we also
use some techniques for classification process, we use many stage of
segmentation process, for classification we try NN architecture also traditional
machine learning algorithm KNN.
Now, we will find some line that splits the data between the two differently
classified groups of data. This will be the line such that the distances from the
closest point in each of the two groups will be farthest way.
Example: Let‘s work through an example to understand this better. So, here I have
a training data set of weather namely, sunny, overcast and rainy, and corresponding
binary variable ‗Play‘. Now, we need to classify whether players will play or not
based on weather condition. Let‘s follow the below steps to perform it.
Step 3: Now, use the Naive Bayesian equation to calculate the posterior probability
for each class. The class with the highest posterior probability is the outcome of
Table 3.1: Frequency table of weather data
22 Random Forest
Random forests or random decision forests are an ensemble learning method for
classification, regression and other tasks that operate by constructing a multitude of
decision trees at training time and outputting the class that is the mode of the
classes (classification) or mean prediction (regression) of the individual trees.
Random decision forests correct for decision trees' habit of over fitting to their
training set. The first algorithm for random decision forests was created by Tin
Kam Ho. using the random subspace method, which, in Ho's formulation, is a way
to implement the "stochastic discrimination" approach to classification proposed
by Eugene Kleinberg. An extension of the algorithm was developed by Leo
Breimanand Adele Cutler, who registered "Random Forests" as a trademark (as of
2019, owned by Minitab, Inc.). The extension combines Breiman's "bagging" idea
and random selection of features, introduced first by Ho and later independently by
Amit and Geman in order to construct a collection of decision trees with controlled
23 Decision Tree
Decision tree are a non-parametric supervised learning method used for
classification and regression. The goal is to create a model that predicts the value
of a target variable by learning simple decision rules inferred from the data
features. A tree can be seen as a piecewise constant approximation.
For instance, in the example below, decision trees learn from data to approximate a
sine curve with a set of if-then-else decision rules. The deeper the tree, the more
complex the decision rules and the fitter the model.
3.3.2 Proposed CNN model
Advantage of using CNN for image classification problems:
The usage of CNNs are motivated by the fact that they can capture / are able
to learn relevant features from an image /video at different levels similar to a
human brain. This is feature learning! Conventional neural networks
cannot do this.
Has high statistical efficiency (needs few labels to reliably learn from)
Convolutional Neural Networks are the basis of all modern computer vision
models. Fully connected networks do not scale up past toy problems,
because they use far too many parameters. CNNs are a much less flexible
model compared to a fully connected network, and are biased toward
performing well on image, because in images we would like to
extract location invariant feature.
CNNs are also useful for 1D problems like time series, and 3D image
classification, because they have the same structure where we would like
location invariant features. Convolutions are technically location equivariant
because they preserve the location of extracted features, but the important
thing is that the same features are extracted over the entire image.
Model Summary:
27 CNN Process
In deep learning, a convolutional neural network(ConvNet) is a class of deep
neural networks. It regularized version of multilayer perceptrons which means
fully connected layer. It connected to one layer to another layer. Then max pooling
layer which reduces the dimension of data by combining the output of layer to next
layer. Then dropout layer which refers to the dropping units of a neural network.
Then feature mapping and thus give output of this project.
3.4 Segmentation
Segmentation plays a very vital role in classification. The characters and digits
inside the activity recognition are segmented. Both binary and grey scale image
processing techniques are used to segment the characters. We do process our
segmentation in two steps
Trim and segmentation
3.4.1 Binarization
In binarization stage we convert the plate into gray scale image then we apply
binarization operation. We apply a certain threshold value and if any pixel from
that gray scale image contain higher number than that threshold value then we
convert that pixel into white otherwise we convert it into black.
After getting the final plate we segment the activity into two row, first row contain
all characters and second row contain all types of activity
Chapter 4
Train & Test Reviews
4.1 Introduction
In this chapter we are going to show training and testing result of all algorithms.
Here our training platform is Google Colab, we know that, to train a deep learning
technology we need very high performance computer, but in that case our local PC
is not able to maintain this types of huge computational operation, that‘s why we
choose Google Colab for training. To train and test our data we divide our data into
85% and 15% ratio. To train we choose 85% data and to test we choose 15% data.
Logistic Regression
Support Vector Machine
Naïve Bayes
KNN(K-Nearest Neighbor)
Random Forest
Decision Tree
Training_Data: 26117
Accuracy: 91.13%
4.4 KNN(K-Nearest Neighbor)
In KNN algorithm, the training_data is 23389 and accuracy is 90.64%
Accuracy: 90.64%
Training_Data: 23389
4.4.2 Decision Tree
In Decision Tree algorithm, the training_data is 33134 and accuracy is 90.80%
Accuracy: 90.80%
Training_Data: 33134
Chapter 5
Proposed Scheme
5.1 Introduction
In this chapter we are going to compare each and every algorithm and give a final
scheme and we will implement this final scheme in our project. In finalization
section of this chapter we will show how we synchronize both CNN based our own
classification CNN algorithm to get the final output.
5.2.1 Comparison
As we discuss different types of classical algorithm and now I also compare
different types of algorithm.
Fig 3.21: Comparison logistic regression, random forest and CNN
5.3.1 Benefits
Can make a good business profit.
For detecting criminal offenses
Great impact on investigators and cop
Great impact on healthcare system.
Enhancing data security of the system
5.4 Limitations
Rebooting the software when disconnected and process the full system again
and again
It constrained the number, locations and nature of used sensors.
Deployment, maintenance and costs of daily activities unimpeded.
Not available the image dataset of human activity recognition.
Manually the data preprocessing of the image observed
Can not detect Yolo object detection because the testing and training data
already trained and tested.
Can not detect object detect algorithm like Faster RCNN and SSD algorithm
Software configuration GPU and TPU must be included
Support Ram and Disk. That’s why not connected
Existing activity recognition systems are constrained by practical
limitations such as the number, location, and nature of used sensors. Other
issues include ease of deployment, maintenance, costs, and the ability to
perform daily activities unimpeded.
Sensors might vary for the same activity across different subjects and even
for the same individual .
Errors can also cause to variability in sensor signals caused by differences in
sensor orientation and placement
Reviewed different reported method addressing human activity as a
classification problem.
Real time tested and trained image dataset not found
Sometimes lot of problems arose in sensor of can not predict activity.
Chapter 6
6.1 Implementation
Fig Prediction Jogging
Fig Prediction Sitting
Fig Prediction Standing
Fig Prediction Upstairs
Fig Prediction Walking
6.2 Result
In this project , I try to implement ML model in real time. I can apply several
algorithms which is classical classification algorithm like logistic regression,
support vector machine, random forest, decision tree, naïve bayes, snn(k-nearest
neighbor). I applied this project with algorithm got up to 90% accuracy of four
algorithms which is svm, knn, decision tree and random forest. So my project is fit
for this algorithms. But logistic regression and naïve bayes can not give fit
accuracy score. Here this project is also able to predict a person kinetic state which
is downstairs, jogging, sitting , standing, upstairs and walking. Given accuracy of
support vector machine, random forest, decision tree and k-nearest neighbor is like
91.13%, 90.40%, 90.80% and 90.64%.
Chapter 7
7.1 Summary
I presented a CNN-based human activity recognition method. Our method
outperformed the baseline random forest, decision tree, svm, logistic regression,
knn and naïve bayes algorithm in ternary human activity classification and
exhibited the best classification accuracy when longer length accelerometer data
was used for learning the neural network. We found that the dimension of the input
vector affects the activity recognition performance and that figuring out a way to
disambiguate the ‘walk’ signal in particular will likely lead to the improvement in
the activity recognition performance.
Also enriching data label as high frequency given result up to 97% which is almost
100% accuracy to fit the train and testing dataset prior to the classified almost
100% accuracy result .
6.2 Suggestions for future work
In our project activity recognition happen with almost 100% accuracy. But the
main problem we face in our project is in classification stage, we were able to
classify all very high accuracy but when we try to classify characters then it gives
us many misclassification result, so we assume that it happen due to lack of data
variation and lack of data amount. So in future we will collect more data and
evaluate our work further.
Also implement object detection algorithm like Faster RCNN, SSD and YoloV3
algorithm to predict human activity recognition
8.1 References
[1] Human Activity Recognition From Accelerometer Data Using
Convolutional Neural Network,2017Author (Song-Mi Lee, Sang Min Yoon,
Heeryon Cho).