0% found this document useful (0 votes)

51 views

Transfer Learning Enhanced Vision-Based Human Activity Recognition

This article reviews over 350 research papers from 2011-2021 on vision-based human activity recognition using transfer learning. It summarizes key transfer learning techniques, popular datasets, classification methods, and performance metrics. Challenges and future directions are also discussed, with the goal of guiding new researchers.

Uploaded by

dhaferfarouk

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

51 views

Transfer Learning Enhanced Vision-Based Human Activity Recognition

Uploaded by

dhaferfarouk

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 15

International Journal of Information Management Data Insights 3 (2023) 100142

Contents lists available at ScienceDirect

International Journal of Information Management Data

Insights
journal homepage: www.elsevier.com/locate/jjimei

Review

Transfer Learning Enhanced Vision-based Human Activity Recognition: A

Decade-long Analysis
Abhisek Ray a, Maheshkumar H. Kolekar a, R. Balasubramanian b, Adel Haﬁane c,∗
a
Indian Institute of Technology Patna, Bihta, 801103, India
b
Indian Institute of Technology Roorkee, Uttarakhand, 247667, India
c
INSA CVL, University Of Orléans, PRISME, EA4229, Bourges 18022, France

a r t i c l e i n f o a b s t r a c t

Keywords: The discovery of several machine learning and deep learning techniques has paved the way to extend the reach
Deep learning of humans in various real-world applications. Classical machine learning algorithms assume that training, val-
Machine learning idation, and testing data come from the same domain, with similar input feature spaces and data distribution
Transfer learning
characteristics. In some real-world exercises, where data collection has become difficult, the above assumption
Human Activity Recognition
does not hold true. Even, if possible, the scarcity of rightful data prevents the model from being successfully
trained. Compensating for outdated data, reducing the need and hardship of recollecting the training data, avoid-
ing many expensive data labeling efforts, and improving the foreseen accuracy of testing data are some significant
contributions of transfer learning in the real-world application. The most cited transfer learning application in-
cludes classification, regression, and clustering problems in activity recognition, image and video classification,
wi-fi localization, detection and tracking, sentiment analysis and classification, and web-document classification.
Human activity recognition plays a cardinal role in human- to-human and human-to-object interaction and inter-
personal relations. Pairing with robust deep learning algorithms and improved hardware technologies, automatic
recognition of human activity has opened the door in the direction of constructing a smart society. To the best of
our knowledge, our survey is the first to link machine learning, transfer learning, and vision sensor-based activity
recognition under one roof.. However, this survey exploits the above connection by reviewing around 350 related
research articles from 2011 to 2021. Findings indicate an approximate 15% increment in research publications
connected to our topic every year. Among these reviewed articles, we have selected around 150 significant ones
that give insights into various activity levels, classification techniques, performance measures, challenges, and
future directions related to transfer learning enhanced vision sensor-based HAR.

1. Introduction days essential for smooth and error-free industrial and institutional op-
eration.
Humans have evolved into an essential resource capable of han- Human Activity Recognition (HAR) datasets are manufactured by
dling cognitive tasks, even in many malicious applications. Human in- taking the knowledge of three fundamental domain-specific aspects:
tervention is still inevitable in many industrial practices, even in this (i) Data related to the sensor device, 2(i) (ii) Data related to the sub-
machinery-driven world of the twenty- first century. Recognition of hu- ject/actor, and (iii) Data related to the sensing background. However,
man action Gupta (2021); Imran and Raman (2020) has become essen- the mutable nature of the above three defies the conventional machine
tial for individual performance appraisal. Manual bookkeeping of such learning assumption that source and target data must belong to the same
activities can be an untidy and error-prone task. As a result, automatic domain. Knowledge transfer came to the rescue by eliminating this con-
recognition tools have become popular and an area of interest among ventional machine learning hypothesis. Apart from this, the older train-
the research fraternity. Automatic detection of any suspicious or unex- ing data are sometimes unsuitable for real-time recognition due to the
pected human behavior will trigger the alarm for either self-correction mutable nature of the sensor and environment. Through the help of
or manual intervention. Auto- recognition of human activities is nowa- transfer learning, we can easily exploit the older samples and utilize
the valuable information to enhance the classification, regression, and
recognition tasks. It is even more difficult and expensive to collect an

∗
Corresponding author: Dr. Adel Hafiane, INSA center Val de Loire: Institut National des Sciences Appliquees center Val de Loire, 88, Boulevard Lahitolle, 18022
Bourges, France.
E-mail address: adel.hafiane@insa-cvl.fr (A. Hafiane).

https://doi.org/10.1016/j.jjimei.2022.100142
Received 9 May 2022; Received in revised form 19 November 2022; Accepted 26 November 2022
2667-0968/© 2022 The Author(s). Published by Elsevier Ltd. This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/)
A. Ray, M.H. Kolekar, R. Balasubramanian et al. International Journal of Information Management Data Insights 3 (2023) 100142

adequate number of training data samples and label it. Transfer learning 4. We have tried to identify potential research gaps and future direc-
significantly contributes to many real-world applications by compensat- tions concerning vision-based HAR. We believe it will pilot new re-
ing for old data, lowering the requirement and difficulty of recollecting searchers in the right direction after saving their investigation time.
training data, avoiding many expensive data labeling efforts, and boost-
ing the accuracy of testing data. The rest of the paper is organized as follows. The research mehodol-
“Until now, most review articles from activity recognition back- ogy is discussed in section II. The overview of transfer learning, includ-
grounds have summarized the context related to either transfer learning its definition and significance, and architecture related to HAR, is
ing or classifier-based machine/deep learning. The activities addressed demonstrated in Section III. Section IV introduces various HAR datasets,
in those surveys are vision-based or sensor-based. However, our survey their classification, and hierarchical tabular representation with the
enlists only those activity recognition articles where the machine/deep specification. Section V outlines the classification techniques used in
learning classifiers take advantage of the transfer learning techniques vision-based HAR with a three-modular representation format. Perfor-
to enhance the recognition performance. In this work, we perform a maces of various significant articles are summarized in Section VI . The
data-centric and classifier-specific extensive survey on vision-based ac- challenges, and various aspects of future directions are briefly expressed
tivity recognition. To the best of our knowledge, our survey is the first in section VII . In section VIII, the contributions and practical implica-
to review vision sensor-based human activity recognition using transfer tions are briefly discussed. Finally, section IX concludes the paper along
learning enhanced machine/deep learning algorithm. Our paper gives with the improvement that can be considered further.
insights into various activity levels, classification techniques, perfor-
mance measures, challenges, and future directions related to transfer
2. Research methodology
learning enhanced vision sensor-based HAR. Our paper gives insights
into various action recognition datasets with specifications and levels of
We followed Preferred Reporting Items for Systematic review and
activity associated with them after inferring the context of the data. We
Meta-Analysis Protocols (PRISMA-P) Tricco et al. (2018) to single out
also address different classification techniques, performance measures,
relevant and significant articles related to our research domain. We ac-
challenges, and future directions related to transfer learning enhanced
complished this review by adopting three protocols: searching protocol,
vision sensor-based HAR. Our survey guides fresh researchers to become
inclusion, and exclusion protocol, and scoping review protocol.
familiar with the information and management of existing datasets and
learning methods that help to analyze the gaps and opportunities for
future research work. 2.1. Searching protocol
Many studies have reviewed transfer learning and activity recogni-
tion separately. However, a few have reviewed activity recognition in First, we set the search platforms, i.e., search sites, libraries, or dig-
transfer learning platforms, and the number has become scarce while ital databases. Most articles included in this review were taken from
talking about sensor-based HAR in transfer learning enhanced plat- Web of Science, IEEE Xplore, and Google Scholar digital libraries. We
forms. To the best of our knowledge, Cook, Feuz and Krishnan (2013) is reach out to the relevant articles by putting the exact or relevant key-
the last published review article that addresses HAR in the transfer words or a combination of them. Some of the keywords are “human
learning domain. This paper enlists similar research work from 2011- activity recognition,” “video action classification,” “transfer learning,”
2021 but more oriented to HAR dataset and classification techiniques.. “deep learning,” “machine learning,” “CNN,” or the name of different
Deng, Zheng and Wang (2014) investigates sensor-based and vision- activity recognition databases. Some of the searched sentences are the
based HAR by extensively classifying different HAR methodologies combination of more than one keyword with effective meanings. We
based on their pros and cons. Our paper carries similar content but in the downloaded around 350 articles during initial consideration for further
transfer learning platform. Unlike previous surveys, this survey paper processing.
does not abide by constraints like sensor- based modeling, architecture-
based modeling, classifier-based modeling, or dataset-based modeling,
as seen in other surveys. This article combines all these models to forge 2.2. Inclusion and exclusion protocol
a complete superficial package that will boost creativity for beginner
and intermediate-level researchers. The scope of our paper can be fur- We only included those vision-based activity recognition articles that
ther extended to wearable sensor-based and ambient sensor- based HAR adopt machine learning and transfer learning techniques for model de-
Gupta (2021) in the transfer learning domain. This paper precisely ex- signing. Non-English papers were excluded. We considered the date
plains the transfer learning technique, various steps, and datasets used and type of publication (journal or conference), publishing house, and
in vision sensor-based HAR. This survey also introduces a novel classi- cite score during preliminary screening. Furthermore, we extended this
fication hierarchical model related to this research domain. screening procedure to the abstract composition level, where we vali-
We can find many studies depicting transfer learning and activity dated searched articles’ themes to our survey theme. Publications with
recognition separately. To the best of our knowledge, a few analyzed appropriate matches were included. Finally, we filtered out the 150 most
the HAR based on the transfer learning technique, and the number be- significant articles for further review.
comes scarce in our research domain. The contributions of our paper are
summarized below. 2.3. Scoping review protocol

1. To the best of our knowledge, we are the first to divide classification In this last step of methodology, we systematically reviewed the se-
techniques for vision-based HAR in the form of three modular repre- lected papers after thoroughly apprehending many contextual factors
sentations. We categorically discuss these classes in detail for future in detail. First, we structured the summary observing the background,
reference. objective, source of evidence, eligibility criterion, databases, model al-
2. The frequently used visual datasets (source and target datasets) used gorithms, results, and conclusion from the abstract section. Aftermath,
in HAR are organized based on their year of evolution, mode of rep- we stepped into the detailed sketch of the paper considering aforemen-
resentation, frames per second, resolution, classes, subjects, and the tioned factors along with some finer details. For example, computational
number of videos compared. complexity, real-time deployment possibility, limitations, research gaps
3. We chronologically summarize the related research articles by com- and opportunities.
paring their underneath architecture, source/target datasets, the
number of detected classes, and their respective accuracy.

2
A. Ray, M.H. Kolekar, R. Balasubramanian et al. International Journal of Information Management Data Insights 3 (2023) 100142

3. Overview

3.1. What is transfer learning?

The transfer learning deﬁnition can be well illustrated after debating

the following four terms.
Source and Target. As we know, the knowledge from abundantly
available data is utilized to exploit infrequent sparse data in transfer
learning. The abundantly available and easily obtainable data is the
source data, and the other infrequent alien data is the target data. The
principal goal is to reduce the target data collection and labeling effort.
Domain. Transfer learning paves the way to anticipate future data
intelligently after acquiring prior knowledge about source and target
data distribution and feature space. Domain (D) is the complete set of
knowledge that describes feature characteristics and their distribution
probability. Feature space describes the number and type of characters
present in the source or target data domain. The marginal probability of
a particular feature among all features is known as the marginal prob-
ability distribution. The feature space and their marginal probability
distribution of source data constitute the source domain, and the same
in target data constitutes the target domain. So a domain D is character-
ized by a set, {𝜒, P(X)}, where 𝜒 is the feature space and P(X) Marginal Fig. 1. Steps involved in human activity recognition.
probability distribution; X = {x1 , x2 , x3 ,…, xn } ∈ 𝜒. Two different do-
mains can vary either in terms of feature space (𝜒 S, 𝜒 T ) or in terms of
marginal probability distribution (P(XS ), P(XT )). 𝜒 S is the source domain Nowadays, transfer learning takes the reign over similar machine
feature space, 𝜒 T is the target domain feature space, P(XS ) is the source learning knowledge transfer techniques i.e., multi- task learning, do-
domain marginal probability, and P(XT ) is the target domain marginal main adaptation, and covariate shift in classification, regression, clus-
probability. tering, and reinforcement learning tasks. Sports video classification,
Task. The label information of the source and target samples and web document classification, image classification of various fields, text
the conditional probability of that label for a given specific domain fea- classification, sentiment classification Luo and Mu (2022), anomaly
ture constitute the task. Label space (𝛾) is nothing, but the collection of detection Al-Sulaiman (2022), emotion recognition Gonegandla and
class labels associated with a set of source or target class objects, and Kolekar (2022) and location estimation based on Wi-Fi signal strength
the probability distribution of a class object at a given specific domain are among the most significant area where practices of different transfer
feature space is called a conditional probability distribution. So, a task learning techniques are highly beneficial.
T is characterized by a set, {𝛾, P(Y/X)}, where (𝛾) is the label space and
(P(Y/X)) is the conditional probability distribution for a give domain
feature; Y = {y1 , y2 , y3 , …, yn } ∈ 𝛾, X = {x1 , x2 , x3 , …, xn } ∈ 𝜒. Two dif- 3.3. Architecture for human activity recognition
ferent tasks can vary either in terms of label space,i.e., 𝛾 s , 𝛾 T , or in terms
of the conditional probability distribution,i.e., P(YS /XS ), P(YT /XT ). 𝛾 S is HAR is a prominent area of research while talking about in-
the source label space, 𝛾 T is the target label space, P(YS /XS ) is the source door Activity of Daily Living (ADL), outdoor ADL, gesture recog-
conditional probability, and P(YT /XT ) is the target conditional probabil- nition Anand, Urolagin and Mishra (2021); Chatterjee, Bhan-
ity. dari and Kolekar (2016), sports activity recognition Kolekar and Sen-
The above discussion can be concluded by stating that transfer learn- gupta (2015), human-object interaction, and human-human interaction.
ing has freed the traditional machine learning hypothesis barrier to solv- To understand the daily routine activity, we need a systematic tool that
ing the rising concern on performance and dataset-related cost. can differentiate and understand different activities. Under the diverse
situation, the activity recognition algorithm should perform more par
3.2. Why transfer learning? than average to establish a well-performing recognition setup.
Five different modules are fused together to form the framework of
Researchers have to analyze the nature of both training and test- HAR, as shown in Fig. 1. These five modules are performed activity,
ing data before passing through a machine or deep learning architec- sensing type, transduction module, HAR approach, and performance
ture. Traditional machine learning methods work on the assumption measure.
that the training, validation, and testing data all come from the same
domain, with similar data distribution and feature space. However, the
above proposition fails to prove its significance due to the existence of 3.3.1. Activity
real-world data heterogeneity. Therefore, we need to build and train People engage in activities throughout the day, from off- bed to on-
a separate model for different but related tasks. However, an isolated bed, and even when sleeping. Some human actions are carried out alone,
training approach will make the whole process burdensome, expensive, while others are carried out with the assistance of others. These activities
and time-consuming. Even if we go for separate training, the avail- can be divided into three categories based on their manner of execution:
ability of rightful training data makes its path difficult. Transfer learn- (i) single- person activity Zhang et al. (2020a) (ii) multi-person activity,
ing came to the rescue. This technique helps boost the performance of Ji, Liu, Pang and Li (2020) and (iii) group activity Tran, Bourdev, Fergus,
test/validation data trained on a dataset belonging to either a different Torresani and Paluri (2015). The basic issues we normally confront dur-
feature space or distinct distribution. Compensating for outdated data, ing a group activity are quantifying human roles and integrating these
reducing the need and hardship of recollecting the training data, avoid- role descriptions into inference techniques. Indoor ADL, outdoor ADL,
ing many expensive data labeling efforts, and improving the foreseen gesture, and cooking activities can all be classified as single-person or
accuracy of testing data are significant contributions of transfer learn- multi-person activities. Sports can be classified as either a single person
ing in a real-world application. or a group activity.

3
A. Ray, M.H. Kolekar, R. Balasubramanian et al. International Journal of Information Management Data Insights 3 (2023) 100142

3.3.2. Sensing face by incorporating advanced technical characteristics that result in a

Advancement in material science and engineering opens the gate more efficient and resilient product than its predecessor. For more effi-
for developing large-scale semiconductor units. These units help the re- cient and accurate activity recognition, a variety of techniques and algo-
searcher to sense physical signals and transduce these into easily pro- rithms are being developed.Some are no longer in use, while others have
cessable electrical outputs. The device that performs both sensing and advanced to become state-of-the- art techniques. In HAR, the probabilis-
transduction operation is called a transducer. While developing trans- tic and statistical modeling technique expresses uncertainty in reasoning
duction devices, the lowest cost, greater fidelity, and enhanced reliabil- about the performer’s activities, plans, and goals. Naïve Bayes Classifier
ity are major concerns. The occurrence of any physical event within a dy- (NBC), Decision Trees (DT), and HMM are some examples of probabilis-
namic environment can be recognized through a sensor before convey- tic approaches. Machine learning approaches take over the probabilistic
ing it to the control unit. Over the years, many advanced sensors have approach as it depends on other independent inferences that complicate
evolved, sensing a wide range of various physical, chemical, and biolog- the recognition process in a real-world environment. K-Nearest Neigh-
ical activities. Most of them are involved in measuring physical proper- bors (KNN), Bayes classifier Minimal Learning Machine (MLM), Support
ties such as vision, hearing, touch, sense, light, sound waves, pressure, Vector Machine (SVM), Extreme Learning Machine (ELM), and Multi-
and temperature. Chemical sensors deal with the compositional proper- layer Perceptron (MLP) are some popular ML techniques practiced in
ties, concentration level, and chemical properties such as the taste and inactivity recognition platforms.
order of the substances. As a subset of chemical sensors, biological sen- Nowadays, deep learning approaches have become popular in the
sors or, in short, biosensors, are analytical devices that help in revealing HAR domain because they can achieve the highest success rate in real-
biological states and properties of bio substances. These smart sensors world scenarios. The deep learning technique sets a platform for au-
are often used alongside IoT devices to perform intelligence tasks. tomatic feature selection and learning compared to machine learning
Here, we differentiate these sensors from the activity recognition handcrafted feature extraction. Deep Neural Network (DNN), Convolu-
point of view into two classes. First, on-body or wearable sensors com- tional Neural Network (CNN), autoencoder Aslam and Kolekar (2022);
prising accelerometer computes acceleration of action, magnetometer Aslam, Rai and Kolekar (2022), Restricted Boltzmann Machine (RBM),
reckons direction and magnetic field of motion, heart rate monitor- Recurrent Neural Network (RNN), and generative Adversarial Network
ing device, electrocardiogram displays heart function, electroencephalo- (GAN) are some of frequently used DL techniques. These approaches
gram measures brain activity, electromyogram computes muscle tremor, can be used for human activity detection or recognition, classification,
temperature sensors calibrate on- body temperature and pressure sensor regression, and clustering. Different classification and detection tech-
which indicates on-body pressure in the course of action. Second, off- niques are discussed in Section 5 thoroughly.
body or ambient sensors include the image sensor, video sensor, radio
frequency sensor for motion and displacement, the IR sensor for infrared 3.3.5. Performances
images and videos, the Wi-Fi signal sensor for detection, the GPS track Every user should be aware of the acceptable degree of predicting
sensor for localization, temperature sensor, and pressure sensor, mea- results in a predictive activity. To do so, we must first determine the
sure various environmental parameters. Together, these two can update correct prediction probability. It can be thought of as the ratio of all
the status of both agent and environment for the contiguous processing the correct guesses to the entire number of forecasts. Intuitively, users
unit. of a prediction model want to know how much they can trust the fore-
Visual sensing technologies, such as CCTV and camera sensors, are cast results. On the other hand, the researcher devised a set of statis-
greatly beneficial in keeping an eye on human activities. The quality tical performance metrics for the quantitative description of predicted
of the recorded data is decided by assessing image-related attributes performance in numerous aspects under a variety of conditions. When
and these attributes include viewpoint, lighting environments, illumi- discussing class, we come across two types of labels: truth labels and
nation changes, occlusion, and image resolution. The HAR data is ei- prediction labels. Prediction labels are model-predicted tags following
ther in an image sequence or an audio data format, subsequently pro- evaluation, either during validation or testing, while ground truth la-
cessed through computer vision and audio signal processing technology bels are the actual class label to which that sample belongs. According
Ghosal and Kolekar (2018) to find meaningful information. Different to these definitions, a true positive (TP) sample is one in which both the
steps may include pre-processing, feature engineering, data modeling, ground truth and predictive labels are positive, while a true negative
and activity reckoning. Recognizing activity may encompass segmen- (TN) sample is one in which both the ground truth and predictive labels
tation, detection, classification, or tracking of the object or subject in are negative. If the ground truth label is negative, but the prediction
interest. These activities are continuously tracked through the sensor label is positive, the prediction label is false positive (FP); if the ground
attached to the actor/actors or object/objects or environment. Here, we truth label is positive but the prediction label is negative, the prediction
cluster HAR data and environment-related parameters by considering label is a false negative (FN). The most often used performance measures
the visual sensor as the primary. for activity recognition are listed below.
𝑇 𝑃 +𝑇 𝑁
𝐴𝑐 𝑐 𝑢𝑟𝑎𝑐 𝑦 = 𝑇 𝑃 +𝐹 𝑃 +𝑇 𝑁+𝐹 𝑁
× 100%
3.3.3. Transduction
𝑇𝑃
Sensor acts as the nervous system, whereas the algorithms and pro- 𝑃 𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 = 𝑇 𝑃 +𝐹 𝑃
× 100%
cessing units serve as the brain of activity recognition. After signal con-
𝑇𝑃
version, the transduction unit connects the sensor output to the process- 𝑅𝑒𝑐𝑎𝑙𝑙 = 𝑇 𝑃 +𝐹 𝑁
× 100%
ing unit input for further processing. Translating a perceived signal into 𝑃 𝑟𝑒𝑐 𝑖𝑠𝑖𝑜𝑛×𝑅𝑒𝑐 𝑎𝑙𝑙
an appropriate and easily processable form is known as transduction. 𝐹1 𝑠𝑐𝑜𝑟𝑒 = 𝑃 𝑟𝑒𝑐 𝑖𝑠𝑖𝑜𝑛+𝑅𝑒𝑐 𝑎𝑙𝑙
The processing unit next to it greatly influences the condition of out- Apart from these, two more measurements, the Mean Levenshtein
going signals. The sensing and transduction units fuse together to form Distance (MLD) score and mean average precision, are referenced in
a transducer most of the time. However, the manufacturer sometimes several articles. We sort transfer learning enhanced vision-based AR ar-
finds it much easier to position the transduction device outside the sen- ticles from 2011 to 2021 in Table 2 where we only take accuracy as the
sor, known as out-built transduction. performance parameter.

3.3.4. Approach 4. Visual HAR datasets

The desire to improve a variety of decision variables, such as compat-
ibility, mobility, cost, eﬃciency, and accuracy, has pushed researchers The evolution in HAR dataset with time occur due to the content
to pursue new ideas. Every day, the high-tech world takes on a new variation and technological mutation. Some of these datasets are old,

4
A. Ray, M.H. Kolekar, R. Balasubramanian et al. International Journal of Information Management Data Insights 3 (2023) 100142

Table 1
Popular HAR dataset with speciﬁcation.

Dataset FPS/Resolution Classes/Subjects/Videos Activity Level

MP-II Cooking Rohrbach, Amin, Andriluka and Schiele (2012) 29.4/1624 × 1224 65/12/44 H-O Level
UCF-101 Soomro, Zamir and Shah (2012) 25/320 × 240 101/-/13,320 H-O/Group Level
DML Smart Action Mohsen Amiri et al. (2013) 30/2HD+1VGA 12/16/932 Atomic/H–O Level
Hollywood 3D Hadﬁeld and Bowden (2013) 24/1920 × 1080 14/-/650 H-O/H–H Level
YouTube Sports 1 M Karpathy et al. (2014a) -/227 × 227 487/-/11,33,158 H-O/Group Level
Thumos’ 14 Thumos14 -/- 101/-/18,000 Atomic/H-O/Group Level
Northwestern-UCLA Wang, Nie, Xia, Wu and Zhu (2014) 30/640 × 480 10/10/1475 Atomic/H-O Level
UTD_MHAD Chen, Jafari and Kehtarnavaz (2015) 30/640 × 480, 320 × 240 27/8/861 Atomic/H-O Level
ActivityNet Caba Heilbron, Escorcia, Ghanem and Niebles (2015) 30/1280 × 720 203/-/27,801 H-O Level
THUMOS’15 Gorban et al. (2015) -/- 102/-/23,500 Atomic/H-O/Group Level
NTU RGB+D 60 Shahroudy, Liu, Ng and Wang (2016) 30/1920 × 1080, 512 × 424 60/40/56,880 Atomic/H-O/H-H Level
YouTube 8 M Abu-El-Haija et al. (2016) 1/- 480/-/82,64,650 H-O/Group Level
Kinetics400 Kay et al. (2017a) -/658 × 1022 400/-/3,06,245 H-H/H-O Level
PKU-MMD Liu, Hu, Li, Song and Liu (2017) 30/1920 × 1080, 512 × 424 51/66/20,000 H-O/H-H Level
Something-SomethingV2 Goyal et al. (2017) 12/96 × 96 174/1133/2,20,847 H-O Level
AVA Gu et al. (2018a) 1/451 × 808 80/-/230K Atomic/H-O Level
MLB-YouTube Piergiovanni and Ryoo (2018) 60/- 20/-/4290 H-O/Group Level
Kinetics600 Carreira, Noland, Banki-Horvath, Hillier and Zisserman (2018) -/658 × 1022 600/-/4,95,547 H-H/H-O Level
SoccerNet Zhou, Xu and Corso (2018) 25/1280 × 720 3/-/6637 H-O/Group Level
YouCook2 YouCook2 -/- 89/-/2000 H-O Level
NTU RGB+D 120 Liu et al. (2019) 30/1920 × 1080, 512 × 424 120/106/1,14,480 Atomic/H-O/H-H Level
Kinetics-700 Carreira, Noland, Hillier and Zisserman (2019) -/658 × 1022 700/-/650K H-H/H-O Level
MOD20 Perera, Law, Ogunwa and Chahl (2020) 29.97/720 × 720 20/-/2324 H-O/Group Level
HAA-500 Chung, Wuu, Yang, Tai and Tang (2021) -/1080 × 720 500/-/10,000 Atomic/H-O/Group Level
EduNet Sharma, Gupta, Kumar and Mishra (2021) 30/1280 × 720 20/-/7851 H-O Level
TAD-08 Gang et al. (2021) -/720 × 576 8/-/2048 H-O Level
Win-Fail Parmar and Morris (2022) -/1080 × 720 4/-/1634 Atomic/H-O Level

categorized into ﬁve levels of activity: gesture level activity, atomic level
activity, Human-Object (H–O) interaction level activity, Human-Human
(H–H) interaction level activity, and group level activity.

(a) Gesture level activity: The gesture is deﬁned as a purposive human

body movement to convey some meaning or idea. A gesture is usu-
ally performed within a short period of time and can be considered
the elemental activity among the ﬁve groups. Hand-waving, facial
expressions, eye movements, and head shakes are examples of ges-
tures.
(b) Atomic level activity: An activity can be a single atomic action com-
prising a sequence of interrelated gestures. This level of action is
performed only by an actor without the involvement of any sub-
ject or object. Hence, it is sometimes referred to as a solo activity.
Knocking, swimming, walking, jogging, and running are examples
Fig. 2. Level of Activity in Various HAR Dataset. of atomic-level activity.
(c) H–O interaction level activity: Some activities are the results of inter-
action established between two agents. As we are talking about hu-
some are modern, and a few of them have become the benchmark based man activity, one of the agents must be a human, and the other may
on their speed, accuracy, and adaptability. Modern datasets have a more be a human or an object. If we take the object as a second agent,
detailed description compared to older ones. These descriptions or spec- the action may be termed a Human-Object interaction. The treat-
iﬁcations can be physically perceptible or imperceptible. Variation in ment done to the object by the human will decide the type of H–O
frame per second (fps), action count, actor count, videos count per ac- interaction. All the cooking activities, kicking, hammering, eating,
tion, modality, resolution, annotation mode, and viewpoint are physi- drinking, throwing, and sports activities like weightlifting, batting,
cally perceptible parameters. Light illumination and occlusion are exam- and bowling can be laced under the H–O interaction category.
ples of the physically imperceptible parameter. These factors mentioned (d) H–H interaction level activity: Many activities are taken place with two
above decide the quality of a dataset. Better quality dataset helps to open humans as the agent. This kind of activity results from the interac-
the gate for exploring more composite models, whereas a challenging tion between two people; it is called two-person activity or Human-
dataset helps in scrutinizing the generalizability and the robustness of a Human interaction. Hugging, wrestling, and shaking hands are some
model. examples of human-human interaction.
Several factors help analyze a dataset and decide the group from (e) Group level activity: A group activity is a composite kind of activity
where that dataset should belong. We can cluster vision- based HAR that may require interaction between more than one person or one
datasets into several groups after reckoning the type of action, sensing or more than one objects. It involves many sequences of gestures,
modality, viewpoints, and the nature of the data. As shown in Table 1, actions, and interactions. A group study, a cricket match, a group
we cite the most widely used datasets related to this domain and classify discussion, and a presentation are some examples of group-level ac-
them by activity type. As shown in Fig. 2, the collected datasets can be tivity.

5
A. Ray, M.H. Kolekar, R. Balasubramanian et al. International Journal of Information Management Data Insights 3 (2023) 100142

Fig. 3. Samples of extracted frames and recognized activity of state-of-the-art datasets.

The datasets related to these activities level is summarized in

Table 1 with many related speciﬁcations, and the frames of various state-
of-the-art datasets are shown in Fig. 3.

5. Classiﬁcation technique used in vision-based har

The advancement in sensor and hardware technology has added new

feathers in the background of the machine and deep learning algorithms.
A new algorithm always replaces its previous version while accomplish-
ing a robust model with superior performances. Many deciding factors
are responsible for performance enhancement. The intent to improvise
these factors, such as compatibility, portability, cost, eﬃciency, and
accuracy, has compelled researchers to adopt numerous eﬃcient and
accurate algorithms. Some are obsolete, some have restricted use, and
some have become state-of-the-art methods over time. In this paper, we
group them into three learning-based approaches; (i) generative-based
approach, (ii) discriminative-based approach, and (iii) graph-based ap-
proach, as shown in Fig. 4.

5.1. Generative-based approach

Generative models are semi-supervised probabilistic approaches that

determine conditional probability distribution P(X/Y) of training data
samples (X) considering their corresponding labels (Y). These posterior
distributions are obtained by applying the Bayes rule to predict the class
probability of a test input. These robust models are less inclined to over-
fitting issues that give good performance in a limited data environment.
Fig. 4. Classification techniques used in Vision-based HAR.
5.1.1. Hidden markov model
Hidden Markov Model (HMM) is a probabilistic and discrete-time
framework that advances through a series of hidden states to give a
final output observation sequence. Each level of the hidden states is each training sample Rodriguez, Orrite, Medrano and Makris (2017a).
associated with three types of probabilities: starting probability, transi- The computational cost is reduced by minimizing the number of Gaus-
tion probability, and emission probability. The occurrence probability sians in UBM and estimating the Expectation-Maximization optimum
of an unobserved state is termed as starting probability of the hidden quickly. Like Cabrera et al. (2017), Rodriguez et al. (2017a) employ
states. The probability at which one hidden state makes the transition the HMM-based one-shot learning (OSL) approach but on the Weiz-
to another hidden state of the same level or to the state itself is called mann dataset taking HMDB51, Olympic Sports, and Virat Release 2.0
transition probability. The state transition probability between two dif- as source domain datasets. Apart from the above two, Rodriguez, Or-
ferent states is called emission probability. These observations are first rite, Medrano and Makris (2017b); Wen and Zhong (2015) are also
diverged out and subsequently converged to give the final output prob- HMM-based articles that employ instance transfer learning for knowl-
ability. This stochastic model is run through the Markov process, hence, edge transfer. Arif Ul Alam et al. (2021) proposed Adaptive Order HMM
called so. Baum-Welch algorithm trains an HMM framework with five (AO–HMM) and Crossover Path Disambiguation Algorithm (CPDA) to
states in a left-to-right composition to yield promising results on the address the issues like fast and multiuser target tracking and re-cluster
MSRC-12 Kinect gesture dataset Cabrera, Sanchez-Tamayo, Voyles and voxelized Point Cloud Data (PCD) for target activity recognition. The
Wachs (2017). A fast simplex HMM (Fast- SHMM) is introduced by com- changing order of HMM is decided by the number of active states and
puting MAP adaptation of the Universal Background Model (UBM) for their neighbors. Arif Ul Alam et al. (2021) use transductive transfer

6
A. Ray, M.H. Kolekar, R. Balasubramanian et al. International Journal of Information Management Data Insights 3 (2023) 100142

learning to exercise on the PALMAR and Benedek dataset for recogniz- to the encoder layer). The encoder layer only considers the informa-
ing human- object interaction activity. tive data representative of the input to generate low dimensional code
and stores it in the code layer, which is the latent space representa-
5.1.2. Gaussian mixture model tion of the input data. The decoder layer later collects these codes and
K-means clustering is considered a hard clustering method or reconstructs them back to generate output containing only valuable fea-
distance-based clustering method. Hence, it cannot express its signif- tures. These generated outputs are identical and equidimensional to the
icance in undistinguishable or multi-label data environments. So, we input. Regularized (sparse, denoising, and contractive), concrete and
shifted to a soft clustering model called the Gaussian Mixture Model variational AE are the most common types used in many machine learn-
(GMM), where a distribution- based clustering technique is adopted in- ing tasks like facial recognition, activity recognition, dimensionality re-
stead of a distance- based. In GMM, a dataset of D features can have duction, anomaly detection, machine translation, drug discovery, and
a mixture of k Gaussian distributions. Each distribution represents a popularity prediction. Khan and Roy (2018) use a pre-trained trans-
cluster head defined by the D length mean and D × D co- variance fer learning framework called UnTran that transfers the first two lay-
matrix. The expectation-maximization technique determines these vari- ers of the source trained Deep Sparse Autoencoder (DSAE) to incorpo-
ables (means and co-variances) and sets model parameters accordingly. rate with SVM classifier for recognizing human activity on Opportunity,
Xing et al. (2019) use the GMM algorithm to segment the raw RGB WISDM, and Daily and Sports datasets. This multi-layered classifica-
images of small and unseen target datasets and send the segmented data tion model helps generalize the model to overcome user-related, sensor-
to a CNN (AlexNet, GoogleNet, and ResNet-50) model for activity recog- related, and environmental- related diversities. A combined model per-
nition. Xing et al. (2018) use GMM-based segmentation and only pre- forms domain adaptation for re-annotation in the cross-dataset platform
trained AlexNet model to implement the same inductive transfer learn- Sanabria and Ye (2020). The combined model fuse two learning tech-
ing through fine-tuning. Ntalampiras and Potamitis (2018) use tempo- niques for human activity recognition; (i) knowledge and data-driven
ral, spectral, and wavelet features to identify statistically-closely located learning technique, and (ii)Unsupervised Domain Adaptation technique.
classes using GMM and KL divergence algorithm. Class-specific HMM The Variational Auto-encoder (VAE) in UDAR has achieved encourag-
and universal HMM use these distance- based class features for class pre- ing outcomes while learning latent space representation in minimiz-
diction. ESN-based transfer learning technique is adopted to categorize ing the distance across Aruba and Twor datasets. The discussed frame-
seven human-object interaction level activities. Variational Bayesian work is effective and robust for adapting the divergence in training data
Inference (VI) is the generalization of the expectation-maximization count and sensor noise settings. The semi-supervised Inverse Autore-
approach, which maximizes the likelihood iteratively Jänicke, Tom- gressive Flow (IAF) based VAE is associated with Bi-Directional GAN
forde and Sick (2016a). VI is used to determine the latent features of (Bi-GAN) classifier to implement Zero- Shot Learning (ZSL) for HAR us-
GMM, responsible for reducing the model complexity, and nullify the ing synthesized features on UCF101, HMDB51, and Olympic datasets
need for a specific number of components a priori. Transductive trans- Mishra, Pandey and Murthy (2020). The above model adopts a decoder
fer learning is used for self-improvisation, i.e., new node insertion. with skip connections to stabilize the training and prevent overfitting.
Khan and Roy (2018) employ the inductive transfer learning method,
whereas Mishra et al. (2020); Sanabria and Ye (2020) uses the transduc-
5.1.3. Restricted boltzmann machine tive setting for transferring knowledge across datasets. Autoencoders are
Restricted Boltzmann Machine (RBM) is an unsupervised generative very suitable in unsupervised applications like anomaly activity recog-
network with fully connected nodes across layers (bi-partite node con- nition, where we define the data under either normal or abnormal cat-
figuration and hence the term ’restricted’) that is capable of learning egories.
probability distribution from the seen data to make inferences about un-
seen data. It has a visible or input layer(v) associated with the seen data 5.1.5. Generative adversarial network
and one or multiple hidden layers (h) pointing out the unseen inference GAN Aggarwal, Mittal and Battineni (2021) is a synchronous gen-
data having no output layer. RBM is an energy- based model used in clas- erative model that comprises two sub-models (generator and discrim-
sification, regression, dimensionality reduction, feature learning, collab- inator). A generator generates a random sample of target dimensions
orative filtering, and topic modeling. Boltzmann distribution (Gibbs Dis- by taking a fixed-length vector as input and sending it over to a dis-
tribution), derived from statistical mechanics in thermodynamics, is im- criminator for binary classification (real or fake) along with an actual
plemented to explain the impact of entropy on different states in RBM. target domain sample. The generator tries to mislead the discrimina-
It is associated with two biases; (i) hidden bias that helps to produce tor by generating random output close to the real input. Moreover, the
activation on the forward pass (ii) input bias that helps to produce ac- discriminator tries to protect from being fooled by updating its weight.
tivation on the backward pass. The gradient- based contrastive diver- This process of "making fools" and "being fooled" is performed itera-
gence algorithm is implemented to carry out learning during training. tively to accomplish recognition and generation tasks that come un-
Multiple RBMs are stacked together to form Deep Belief Network (DBN) der unsupervised, semi-supervised, fully supervised, and reinforcement
Kolekar (2011) to perform layer-wise training. Roder et al. (2021) first, settings. Vondrick, Pirsiavash and Torralba (2016) use Spatio-temporal
introduce the spectral DBN on HMBD-51 and UCF-101 HAR datasets us- convolutional GAN for unsupervised HAR in videos from Flicker. Spatio-
ing the domain adaptation technique. Gradient- DBN and Aggregative- temporal convolutional architecture helps untangle a scene’s foreground
DBN are proposed to employ image gradient and frame fusion in video- from its background. GAN is employed to generate and classify video
based HAR. Gradient- DBN and Aggregative-DBN are proposed to em- samples by utilizing scene dynamics. The proposed conditional GAN
ploy image gradient and frame fusion in video-based HAR. A Binary- bi- framework is fed with a class prototype vector to implement General-
nary RBM and a gaussian-binary RBM are stacked together to optimize ized FSL (GFSL) on UCF-101, HMDB- 51, and Olympic-Sports datasets.
the weights and learn the informative features of triaxial accelerometer The GFSL sub-module addresses the inadequate data and seen-data bias-
HAR data Alsheikh et al. (2016). To train and fit the model parameters, ing problems. Class prototype Transfer Network (CPTN) generated class
the underlying model should go through the pre-training stage (unsu- prototype vectors with random noise are fed to the generator module to
pervised and generative) and fine-tuning stage (supervised and discrim- produce synthetic features. The generator has gone through an iterative
inative). update based on the discriminatory loss to make random synthetic fea-
tures close to the real features. A classifier is trained with both real and
5.1.4. Autoencoder GAN-generated synthetic features to efficiently address novel data clas-
Autoencoder (AE) is an unsupervised generative ANN model that is sification problems. Common latent semantic representation can be an
embodied with an encoder layer, code layer, and decoder layer (mirror excellent asset to generalizing a model in the zero-shot learning setting.

7
A. Ray, M.H. Kolekar, R. Balasubramanian et al. International Journal of Information Management Data Insights 3 (2023) 100142

Zhang, Li and Ogunbona (2017a) take connotative and extensional rela- predictive pre-training methods that come under transfer learning to
tions for solving poor generalization problems on UCF-101 and HMDB- learn efficient and distinctive representation. Like Cheng et al. (2021);
51 datasets. GAN-based model synthesizes action features and word vec- Haresamudram et al. (2020), Zaher Md Faridee et al. (2022) also uses
tors of unseen classes by exploiting this representation from seen exam- the transfer learning technique on the proposed STranGAN (spatial
ples. A knowledge-based graph is prepared by relating the word vectors transformer-based GAN) model for inertial sensor-based HAR applica-
to their corresponding object. Finally, an attention-based Graph Convo- tions. This paper uses domain adaptation via feature alignment to trans-
lutional Network (GCN) is employed to classify the novel samples with fer the knowledge between source and target without any labeled train-
better accuracy and enhanced generalizability. Standard and general- ing data requirement. The transformer is a highly effective and popular
ized settings of transductive ZSL are realized in Ji et al. (2020) through video data interpretation technique despite its highly complex and data-
Bi-directional adversarial GAN and Inverse Auto-regressive flow-based hungry nature.
VAE on UCF-101, HMDB-51, and Olympic-Sports datasets. Skip connec-
tion of decoder in VAE not only results in more stable training but also 5.2. Discriminative-based approach
additionally prevents overfitting.
Discriminative models, also called conditional models, are a class
5.1.6. Long-Short term memory of logistical models used for classification or regression. They distin-
Long-Short Term Memory (LSTM) is an RNN variant where multi- guish decision boundaries through observed data, such as pass/fail,
ple layers stacks together to perform time-series signal processing to win/lose, alive/dead, or healthy/sick. Logistic regression, conditional
preserve long-term dependencies between information sequences. The random fields, and DT can be categorized under discriminative classi-
presence of more complex interactive layers in LSTM helps to realize the fiers. The naive Bayes model, Gaussian mixture model, variational au-
significance of previous sequential knowledge in manipulating future toencoder, and GAN can be categorized under generative classifiers.
ones. The whole internal processing is carried out by passing the earlier
information through three gates; (i) Forget gate that decides whether
completely forget or complete keep the past information, (ii) Input gate 5.2.1. Decision tree
that allows only the relevant input information by discarding the others, A DT is a flowchart-like tree structure comprising nodes and
and (iii) output gate that replace the old cell state with the new one after branches for classification and regression tasks. We can visualize these
concatenating the concerning forget gate signal and input gate signals. nodes and branches in three segments: internal connecting nodes, inter-
LSTM is used as a controller in Ma, Zhang, Wang, Qi and connecting branches, and leaf nodes. Each connecting node evaluates
Chen (2020) that controls the gateway (read and write heads) between an attribute of a given classification or regression task. The branch cor-
the received input signal and the external memory module. Memory responding to that particular node epitomizes the evaluation result of
encoding and retrieval are the primary goal of the read and write that attribute, and the terminal node (leaf node) holds a class label for
heads. Kay et al. (2017b) incorporate the LSTM layer and batch nor- that task. The input feed to the DT may be a discrete set of values or
malization layer that receives spatial features from the CNN module a continuous variable. Based on this, we can specify DT as a classifica-
to perform state encoding, temporal order capturing, and long depen- tion tree or regression tree, respectively. The superior clustering tech-
dency exploring. Read attention- based bi-directional LSTM is used in nique in DT promotes it as a good regressor or a well-performed classi-
Shi, Zhang, Xu and Cheng (2020). The discriminative features from CNN fier in a restricted data environment. Integration of new unseen sensors
are fed to bi-LSTM that contains a forward LSTM module and a back- leads to an extent in input space. DT needs to be reformed to adopt this
ward LSTM module. Similarly, two stacked bi-directional LSTMs look change by replacing specific leaf nodes of the original tree with a sub-
forward and backward in time to garner fine-grained sequence informa- tree Jänicke, Tomforde and Sick (2016b). An iterative semi-supervised
tion in Fu, Damer, Kirchbuchner and Kuijper (2021). Time-dependent training approach is endorsed in Bhattacharya, Nurmi, Hammerla and
video-level representation is generated by feeding aggregated fixed- Plötz (2014) called En-Co-Training. A pool of randomly sampled data
length spatial features from a combined model of ResNet, and AlexNet generated from an unlabeled opportunity and challenge dataset is gen-
Careaga, Hutchinson, Hodas and Phillips (2019). A transformer architec- erated using this algorithm which is later trained with DT. A DT classi-
ture comprising LSTM and class-wise attention module helps re-weight fier is deployed on the features extracted from the last layer excluded
the cross-domain data by assigning the higher weight to more infor- ResNet-50 network Loey, Manogaran, Taha and Khalifa (2021). The
mative data. All of these references adopt inductive learning platforms DT classification model computes the output label based on informa-
for transferring knowledge. LSTM is an effective classification tool com- tion gain and entropy function. Jänicke et al. (2016b) adopt trans-
monly used alongside CNN for Spatio-temporal exploration while inter- ductive transfer learning, whereas the Bhattacharya et al. (2014) and
preting the video data in HAR. Loey et al. (2021) follow an inductive learning platform.

5.1.7. Transformer 5.2.2. Conditional random ﬁelds

A transformer is an encoder-decoder structure-like model that adopts As opposed to the problem that arises in image and video recog-
a self-attention mechanism without counting recurrence and convo- nition using Bag-of-Words (BoW) techniques which stress the presence
lution. The transformer processes the input data sequence in a ran- and absence of word(s) over the sequence of words, Conditional Ran-
dom order that helps allow more parallelization compared to differ- dom Fields (CRF) instead emphasizes word sequences over just word(s).
ent RNN models. In order to manipulate sequential input data in many This probabilistic classifier exploits the contextual information of all
NLP and computer vision tasks, the transformer runs through a dif- neighbor classes in a class predicting task and utilizes that knowledge
ferential weighting mechanism that measures the contribution of each to model the decision boundary between them. Weights are estimated
part by assigning some weights in constructing the whole body. In using a maximum likelihood estimator for each user-defined feature. It
Haresamudram et al. (2020), the transformer processes the 1D wear- is then followed by a constant normalization term that represents the
able sensor data from the accelerometer and gyroscope through knowl- sum of all possible state sequences to calculate the required conditional
edge transfer in two steps for self-supervised HAR learning. The un- probability distribution. After completing the training with the CRF al-
labeled continuous data is used to learn the weights of the encoder gorithm on IMD, CDG13, and MSRC-12 gesture recognition datasets, a
by self-supervised learning and subsequently use these weights to ex- likelihood metric is activated for prediction Cabrera and Wachs (2017).
tract the features further. Cheng et al. (2021) uses a hierarchical Wang, Chen, Hu, Peng and Philip (2018a) employ Stratified Transfer
transformer to learn skeleton-based action features in an unsuper- Learning (STL) as a knowledge transfer technique and CRF with 30 trees
vised manner on NTU RGB+D and NW-UCLA datasets. It uses motion as the classifier for six different feature engineering methods. Here, STL

8
A. Ray, M.H. Kolekar, R. Balasubramanian et al. International Journal of Information Management Data Insights 3 (2023) 100142

is meant for transferring knowledge between domains. The same ar- then activated to describe the concerning features of these key interest
chitecture is used in Chen, Wang, Huang and Yu (2019) to evaluate points. These visual features are clustered and subsequently classified by
the transfer learning performance between various positions. The one- the k-means algorithm and KNN classifier, respectively, to implement
shot inductive transfer learning is used in Cabrera and Wachs (2017), OSL. Apart from Karn and Jiang (2016), Bhattacharya et al. (2014);
whereas Wang et al. (2018a) and Chen et al. (2019) follows the trans- Zhang et al. (2017b) adopts the same KNN-based architecture for in-
ductive platform on OPPORTUNITY, PAMAP2, and UCI DSADS activity ductive transfer learning implementation. Lang et al. (2018) propose
datasets. a KNN-based domain adaptation model for micro-doppler data classifi-
cation after fusing three domain-invariant features, i.e., low-level deep
5.2.3. Support vector machine features from CNN, empirical features, and statistical features. After-
Before diving into the deep learning era, the use and popularity of math, a KNN classifier is adapted to classify seven human activities.
the supervised SVM have become sky-high among classification and re- An Adaptive Spatial- Temporal Transfer Learning (ASTTL) approach
gression models. This learning paradigm projects each data sample into is introduced Qin, Chen, Wang and Yu (2019) to deal with negative
a point in n- dimensional space. Then, it sets a best-suited hyperplane transfer and domain intensive transfer in cross-domain HAR. The spa-
or decision boundary by maximizing the distance from each category to tial features are exploited by weighting relative importance between
that boundary. The position of the new sample from that hyperplane in the marginal and conditional probability distribution and temporal fea-
n-dimensional space will decide its class. A binary classification problem tures by incremental manifold learning. KNN is used as a baseline clas-
adopts a linear SVM, whereas a multi-classification problem uses kernel sifier over UCI DSADS, UCI-HAR, USC–HAD, and PAMAP2 datasets.
based non-linear SVM architecture. The extracted Spatio-temporal fea- Along with Lang et al. (2018); Qin et al. (2019), Xu, Hospedales and
tures of gesture data from a 3D Inception-ResNet model with separable Gong (2016) also follows the same KNN-based transductive transfer
convolution are forwarded to an SVM classifier Li et al. (2021). The learning mechanism. The use of KNN has been restricted in present days
above model outperforms many state-of- the-art architectures in per- due to its low performance but is still prevalent in restricted data envi-
formance, computational cost, and efficiency. SVM classifier with ra- ronments and unsupervised learning conditions.
dial basis function kernel is used to recognize gesture level activity in
Cabrera and Wachs (2017). Inductive transfer learning, or more specif-
ically, OSL, is used in Li et al. (2021) and Cabrera and Wachs (2017). 5.2.5. Convolutional neural network
Tran et al. (2015) uses SVM as a classifier in the inductive transfer learn- CNN is a deep learning architecture where spatial information of
ing domain. A multi-class Hierarchical SVM (HSVM) is adopted to train image and video (in vision-based) data is explored through repeated
on ambient sensor features of synthetic, TU Darmstadt, and RCC datasets convolution operations for different vision-based applications. Convo-
which help detect instant semantic attributes of test samples Alam and lutional layers have passed through various levels of transformation
Roy (2017). The confidence score of the HSVM classifier is measured to fetch more significant features effectively. The activation, pooling,
by Contextual Informativeness (CI). Local dense trajectory video fea- batch normalization, and dropout layers are other supporting layers to
tures from UCF101, FCVID, Sports1M, and ActivityNet datasets are ag- improve the feature’s quality and computational efficiency by suppress-
gregated into video-level feature vectors to train a linear SVM classifier ing the noise and parameter count. Some transfer learning-based CNN
Gan, Lin, Yang, De Melo and Hauptmann (2016). The above two mod- architectures are either pre-trained with a definite large dataset or cus-
els endorse transductive ZSL. Chen et al. (2019) also employs transduc- tomized by the user according to their dataset and application. AlexNet,
tive transfer learning techniques on SVM- based classifiers. Rahmani and GoogleNet, Inception, VGG, ResNet, DenseNet, and EfficientNet are ex-
Mian (2015) trains the SVM classifier on IXMAS and N-UCLA datasets amples of pre- trained transfer learning architecture primarily trained
to perform transfer learning in cross-view and cross-dataset scenarios, on the ImageNet dataset.
respectively. In the era of machine learning, SVM has become a very Karpathy et al. (2014b) use the multiresolution CNN model to im-
popular and effective classification tool used in the HAR domain. How- plement transfer learning on a large (487 classes) Sports- 1 M dataset.
ever, its use has been restricted in this large-scale data domain. A two-steam CNN model is trained on the resolution images of this
dataset by various fusion techniques. Five combinations of convolution
5.2.4. K-Nearest neighbor and max-pooling layers followed by two fully connected (FC) layers are
KNN is another class of supervised learning model proposed for clas- trained on ILSVRC-12 datasets to get a final output of 1000 class distri-
sification and regression practices using the distance matrix. K denotes butions Liu, Mei, Zhang, Che and Luo (2015). Eight convolution layers,
the number of nearest labeled data points or trained samples considered five pooling layers, and two FC layers are forged together to form a 3D
for evaluating the distance matrix. The respective Euclidean distance of ConvNets model called C3D and trained on a large Sports-1 M dataset for
KNN from the test data point is aggregated to compose this matrix. So, weight initialization Zhu and Newsam (2017). This C3D architecture is
a distance matrix illustrates the feature similarity index between the later applied to the ActivityNet dataset for classification performances.
new unlabeled data and its K nearest available labeled data. More con- The last pooling layer of ResNet-50 is fed to an LSTM network with batch
gruent the features, the lesser the Euclidean distance, and the test data normalization to get globally pooled Spatio-temporal features of the Ki-
become more biased toward that class label. KNN learning algorithm is netics dataset Kay et al. (2017a). The trained weight of this model is
sometimes termed non-parametric learning as no mapping function in- later on various small datasets to validate its performances. The output
dulgent, lazy learning as the whole dataset is stored for inference, and of a modified pre- trained ResNet-18 comprising 17 convolutions and
instance-based learning as weights are not learned. Depending on the a pooling layer is fed to another three-layered head model to compute
context of use, the output may be a class membership value or an object the classification score Du, He and Jin (2018). An avg pooling, an FC,
property value, i.e., KNN classification or KNN regression. and a softmax layer are stacked together to form the head model. The
The motion and texture features of the Chalearn gesture chal- base model is fine-tuned with the micro-doppler dataset and validated
lenge and NTU RGB+D dataset are extracted using co- variance de- on simulated micro-doppler data. Perrett, Masullo, Burghardt, Mirme-
scriptor after building a Bag of manifold words (BoMW) representa- hdi and Damen (2021) follow CNN-based architecture in the inductive
tion Zhang et al. (2017b). These local features of the distinct cate- transfer domain for different vision-based HAR datasets. Akbari and
gory are passed through the KNN classifier to perform the one-shot Jafari (2019) follows a similar kind of CNN-based architecture in the
learning gesture recognition. Key points around motion patterns of transductive transfer learning platform for different vision-based HAR
the ChaLearn gesture database are detected and tracked by the Shi- datasets. However, CNN has become a very effective and widespread
Tomasi corner detector and sparse optical flow Karn and Jiang (2016). model nowadays due to its high performance and low complexity in the
The Gradient Location and Orientation Histogram feature descriptor is image and video domain.

9
A. Ray, M.H. Kolekar, R. Balasubramanian et al. International Journal of Information Management Data Insights 3 (2023) 100142

5.3. Graph-based approach 7. Challenges and future research directions

Multiple modalities are involved in recognizing a human activ- 7.1. Transfer learning
ity, such as pose, appearance, optical flow, depth, and skeleton. Al-
though the dynamic human skeleton modality is a powerful descrip- Sufficient data availability and outdated data management are two
tor in identifying the human action irrespective of illumination change significant issues when dealing with deep learning architecture. Transfer
and background dynamics, it has rewarded relatively less attention. learning becomes very effective for the problems like outdated data com-
Yan et al. (2022) proposed a sensor-based graph model called HAR- pensation, training data recollection, expensive data labeling, and accu-
ResGCNN that combines graph neural residual structure with trans- racy enhancement. However, the most compelling feature selection cri-
fer learning in a cross-dataset setting to validate the performance in teria and techniques for a successful knowledge transfer are still yet to be
PAMAP-2, mHealth, and TNDA action datasets. The use of deep trans- explored even in these modern days. Again, powerful transfer learning
fer learning on the data derived from the accelerometer, gyroscope, techniques like ZSL and unsupervised transfer learning need more atten-
and magnetometer makes the convergence speed faster and the learning tion to make transfer learning more effective in the HAR and classifica-
curve better. A Spatio-Temporal GCN trained on NTU-RGB+D 60 dataset tion domain. The hidden negotiable relationships between HAR datasets
combined with a zero-shot learning module can able to predict unseen can potentially enhance the performance of the transfer learning-based
activity on which it never trained Jasani and Mazagonwalla (2019). Al- HAR model. How can a platform be built to promote devalued learning
though the graph- based HAR algorithm is very effective in view-point concepts such as relational knowledge transfer?
variation and background changes, it has received comparatively less
attention compared to other classification techniques. 7.2. Explorable video model

In recent years, there has been much research on interpreting visual

models. However, there is a scarcity of research on interpretable video
6. Performance Comparison of Transfer Learning enhanced models. Only a few video frames are required for recognizing an action.
vision-based har Furthermore, activities differ in terms of their temporal aspects. The
interpretability of complex activities based on keyframes raises some
We have already discussed different types of datasets, classification decisive concerns:
techniques, and various performance parameters used in vision-based 1.How are these frames organized in the temporal domain? 2.How
HAR in Sections 3, 4, and 5, respectively. This section summarizes trans- do they contribute to the classification task?
fer learning enhanced vision- based AR articles from 2011 to 2021 in the 3.Whether these frames can be chosen to train the model faster with-
order of year/link, classification technique, target/source dataset, and out affecting HAR performance?
accuracy in Table 2. It is clearly observed from Table 2 that the domi-
nance of automatic deep learning approaches was finally turned up over 7.3. Sensor data modality
hand-crafted machine learning approaches around the year 2013 after
the evolution of the pre-trained CNN framework during the ImageNet The sensor data modality in the HAR dataset can be audio, text, and
competition (ImageNet Large Scale Visual Recognition Challenge). Most image. A relation can be established between multimodal data more con-
spatial feature extraction frameworks in Table 2 are based on the vincingly than mono- modal data. Recognition of more complex activi-
CNN technique. CNN is implemented with sequence-based approaches ties has become rich semantic knowledge residing in multimodal data,
(RNN, LSTM, attention, and transformer) to exploit features in the tem- making the complex activity recognition smoother. When a human re-
poral dimension along with the spatial one Yue-Hei Alwassel, Heil- members something, the former sequence evokes the latter, and so on.
bron and Ghanem (2018); Mutegeki and Han (2019); Ng et al. (2015); Furthermore, understanding the long-term association necessitates an
Perrett et al. (2021); Wang et al. (2016b). It can also be observed that understanding of inter- entity communication. Pre-defined item inter-
some of the articles used deep learning approaches (mostly CNN tech- actions, for example, occur in a certain activity under specific scene
niques) for feature extraction and machine learning approaches for clas- circumstances. As a result, HAR should examine both activities and mul-
sification (mostly SVM and KNN). To take the advantage of transfer timodal data, such as object interpretation, scene interpretation, and ac-
learning, most of the HAR framework has taken the advantage of the tivity temporal correlations. The study of multimodal data also supports
pre-trained models (ResNet, AlexNet, GoogleNet) whose parameters are the prediction of long-duration activities.
optimized on large-scale datasets like ImageNet. In Section 5, we dis-
cuss different machine learning and deep learning-related classification 7.4. Architecture generalisation
framework in detail. The third column of Table 2 bring to notice the
target and source dataset of relevant articles. It can be clearly stated Most of the papers are based on single-mode input data. More than
that most of the transfer learning- based HAR articles used ImageNet as one mode could enhance the performance of the model architecture.
the source dataset due to large-scale data and class variations. This col- How to model the transfer learning architecture so as to perform well
umn also acknowledges the type of operation linked to model training. on multi-modal source data? How to make the entire architecture robust
In some cases, the model is trained on images through a series of 2D for online and real-time HAR? It is incredibly challenging to realize the
operations whereas others pitch about 3D operations on a bunch of se- hierarchical framework of complex high-level activities because they en-
quential frames. The second last column of Table 2 gives the information compass deeper semantic and context information. How do we develop
about the number of classes recognized by different classification mod- an architecture that exploits these high-level activities’ co-relation to
els. The last column shows the performances in terms of the accuracy yield enhanced performance?
of related HAR articles using transfer learning techniques. By observing
the table, we can conclude that the performance is directly proportional 7.5. Physical attributes in activities
to the model’s computational efficiency and the relatedness between
source and target datasets. But it is inversely proposal to the number of In deep learning-based HAR, much research has been conducted
classes or class diversity. emphasizing Spatio-temporal features. Only a few of them discuss the
depth features present in the image sequences of HAR video data. The

10
A. Ray, M.H. Kolekar, R. Balasubramanian et al. International Journal of Information Management Data Insights 3 (2023) 100142

Table 2
Chronological Performance Comparison for human activity recognition.

YEAR/LINK ARCHITECTURE TARGET/SOURCE DATASET ACCURACY(%)

2011/Duan, Xu, Tsang and Luo (2011) SVM web video/consumer video 57.9
2011/Liu, Shah, Kuipers and Savarese (2011) SVM IXMAS (cross-view) 75.3
2011/Wei and Pal (2011) RBM UIUC(cross-action) 82.9
2012/Li, Camps and Sznaier (2012) SVM IXMAS (cross-view) 90.57(High)
2013/Rohrbach, Ebert and Schiele (2013) kNN Script data/MP-II composite 36.2
2014/Zhu and Shao (2014) SVM HMDB51+YouTube/UCF YouTube 91.11
HMDB51+YouTube/Caltech101 79.02
HMDB51+YouTube/Caltech256 42.8
HMDB51+YouTube/Kodak consum 62.6
2014/Yamada, Sigal and Raptis (2013) kNN Poser/HUMANEVA-1 –
2014/Bhattacharya et al. (2014) CNN Sports1M/UCF-101 65.4
2015/Yue-Hei Ng et al. (2015) CNN - LSTM + optical ﬂow Sports 1 M/ImageNet 90.4
(AlexNet and GoogleNet) UCF101/ImageNet 88.6
2016/Zhang, Chao, Sha and Grauman (2016) CNN + key frame + Subshot Kodak consumer/seed image data 82.3
2016/Wang, Farhadi and Gupta (2016a) CNN(two stream) UCF 101/ImageNet 92.4
HMDB 51/ImageNet 63.4
ACT/ImageNet 80.6
2016/Wang et al. (2016b) ConvNet + TSN HMDB 51/ImageNet 69.4
UCF101/ImageNet 94.2
2017/Wang, Chen, Hu, Peng and Philip (2018b) Deep CNN + Stratiﬁed TL OPPORTUNITY/Intra-class 83.96
PAMAP2/Intra-class 43.47
UCI DSADS/Intra-class 81.6
2017/Bux Sargano et al. (2017) AlexNet + SVM/KNN KTH/ImageNet 98.15
UCF Sports/ImageNet 91.47
2017/Qiu, Yao and Mei (2017) Pseudo 3D ResNet Sports 1 M/ImageNet 87.4(Top5)
UCF101/ImageNet 93.7(Top3)
ActivityNet/ImageNet 87.71(Top3)
ASLAN/ImageNet 80.8
2018/Alwassel et al. (2018) PCA + LSTM AVA/ImageNet 91
THUMOS14/ImageNet 91
2018/Carreira and Zisserman (2017) I3D(two-stream) HMDB 51/Kinetics 80.9
UCF101/Kinetics 98
2018/Wang, Zheng, Chen and Huang (2018c) USSAR and TNNAR OPPORTUNITY/Intra-class 87.43
UCI DSADS/Intra-class 86.76
2018/Ntalampiras and Potamitis (2018) TL-CHMM Imbalance audio data/Intra class 94.6
2018/Tran et al. (2018) kinetics /Seed videos 95.0
UCF 101/Seed videos 97.3
2019/Ghadiyaram, Tran and Mahajan (2019) R(2 + 1)D-d kinetics /(ImageNet1K+IG-Kinetics) 95.3
something-something 79
EPIC-Kitchen/(ImageNet1K+IG-Kinetics) 42.7
2019/Korbar, Tran and Torresani (2019) ir-CSN-152 Sports1M/(ImageNet + AudioNet) 84
2020/An, Bhat, Gumussoy and Ogras (2020) CNN +TL UCI HAR/HAPT, UniMiB,WISDM upto 43%
2021/Coskun et al. (2021) AMAML EPIC/EGTEA 60.7(10shots)
2021/Zhu et al. (2021) PAL(CNN) ImageNet/Kinetics-100 74.1
SSV2–100/ImageNet 62.6
HMDB-51/ImageNet 75.8
UCF-101/ImageNet 85.3
2021/Sabater et al. (2021) TCN NTU RGB+D-120/therapy dataset 46.5(1shot)
2021/Ben-Ari, Shpigel Nacson, Azulai, Barzelay and Rotman (2021) C3D+I3D Sports1m, ActivityNet V1.2/Kinetics-400 83.12
2021/Perrett et al. (2021) ResNet-50+Transformer ImageNet/Kinetics-100 85.9
SSV2/ImageNet 64.6
HMDB-51/ImageNet 75.6
UCF-101/ImageNet 96.1

depth feature carries significant information about activity-related phys- learning schemes. These manually labeling strategies are expensive and
ical factors like distance, movement type, and gait pattern. So, it needs sometimes faulty due to human error. Effective crowdsourcing platform
more attention in exploring H–H and H–O interaction level activity. Both comes to the rescue. Amazon Mechanical Turk (AMT) is a widespread
dataset and architectural level development are required to successfully annotating strategy for automatic and quick labeling during dataset cre-
comprehend the physical aspect of activities. Apart from these, many ation. However, due to lack of generalization, it has its own limitation.
other physical aspects, such as acceleration, direction, and movement So, extra attention should be given to creating more AMT-like anno-
style, get unnoticed while analyzing complex and interaction-level ac- tation strategies. Zero-shot learning is a knowledge transfer approach
tivity. where activities are classified without any prior training. Besides these,
pseudo labeling strategy is sometimes played a pivotal role to anno-
tate large-scale data in one-shot learning, few-shot learning, and semi-
7.6. Labeling strategy in har supervised learning. Here, we approximate the labels in unannotated
data based on the previously annotated data. Pseudo-labeling reduces
Some benchmark datasets comprise large numbers of classes with the overfitting and improves the speed of the model. But this strategy
millions of video sequences, such as Kinetics, HAA, and YouTube 8 M. fails to impact when there is not enough labeled data present or the
Class labeling is not required for recorded, generated, and crowdsourced absence of labeled data for a particular class or increment of data does
dataset as it gets labeled at the time of origination. Accurate and precise not help the model performance. Robustness and universality are still
labeling followed by thorough verification is imperative for supervised

11
A. Ray, M.H. Kolekar, R. Balasubramanian et al. International Journal of Information Management Data Insights 3 (2023) 100142

debatable research topics for this approach to improve classification ac- and target data after a certain bound undesirably hamper the perfor-
curacy. mance of the underlying model instead of boosting it. A similar defi-
nition is also illustrated in Pan and Yang (2009), which compares the
7.7. Feature engineering in har relatedness between the source domain data to the target domain data
to introduce the negative transfer while applying transfer learning. The
To improve the model performance on unseen data, we need to ex- governing factors like domain divergence, transfer learning algorithms,
tract the valuable features from the raw data that better represent the source, and target data quality are needed to be reviewed to rule out
underlying problem to the predictive model. The flexibility, complex- the existence of negative transfer in the learning mechanism. Even af-
ity, and performance of a model are profoundly dependent on these ter the negative transfer is get infused into the transfer model, it can be
extracted and transformed features. Data decomposition and aggrega- overcome through many preventive approaches like the secure transfer
tions are two important operations encountered while transforming the mechanism, domain similarity estimation algorithm, and distant trans-
raw data. "How do we decompose or aggregate the raw data for a better fer assessment Zhang, Deng, Zhang and Wu (2020b). So, the negative
description of the underlying problem?" is a challenging problem dur- transfer is a longstanding and formidable concern that needs to be thor-
ing this extraction process. Apart from this, researchers are still trying oughly reviewed for the vision- based transfer learning model for HAR.
to find out the effective solution to some smaller queries like, "How to
automate this transformation process?", "How to identify and select the 8. Contribution to literature and implication for practice
problem-dependent useful feature?”, and “what are the manual feature
selection criteria?". DT, random forest, regularization variant, principal We have shown an extensive sketch of vision sensor-based HAR us-
component analysis, and structural risk minimization are some widely ing transfer learning. To understand transfer learning, we briefly explain
used feature engineering approaches applied in HAR tasks. the difference between source and target, domain and task. We also rep-
resent five steps followed in HAR: activity types, sensors, transduction,
7.8. Limited hardware computation different approaches, and performance measures. We label the vision-
based HAR datasets from 2011 to 2021 with detailed specifications. We
Well-performed HAR models are very hard to implement in real-time classify and discuss the learning algorithm in three different categories
due to the constrained computing power (Hardware constrain). As a used for this task. These categories are the generative model, discrimi-
result, we are forced to compromise either on input data or computa- native model, and graph-based model. To the best of our knowledge, we
tionally expensive techniques. And for this, analysts adopt many data are the first to divide classification techniques for vision-based HAR into
reduction techniques like cropping, compression, key-frame extraction, three modular representations. We conclude our review by exchanging
sub-shot, and thresholding. Another method is to embrace those sens- views on various challenges and future direction. To the best of our
ing devices that can provide relatively more uncomplicated data forms. knowledge, we are the first to conduct a decade-long review on transfer
Most wearable sensors can be an example of that kind, where we collect learning enhanced vision-based HAR, where we discuss related datasets
mainly the 1-D form of data. Both the compromising techniques lead with specifications and three classification formats relevant to our topic.
the model to decline in performance. We conclude with a similar re- This paper transfer in-depth information about different datasets from
sult while adjusting the classification technique. We need a model that 2011 to 2021 and is intended to be managed under various applica-
comes up with an acceptable trade-off between the computational bur- tion scenarios. The detailed depiction of classification algorithms under
den and performance in a constrained computational environment. And transfer learning scenarios enhances the ideation of researcher for future
to make this viable, our research focus should rest upon a track of infor- implementation in this domain.
mative sensing techniques, efficient descriptor extracting methods, and
high-performing model architecture. 9. Conclusion

7.9. Contextual information gathering In this extensive survey, we emphasize the idea of using state- of-
the-art transfer learning methods that reduce the difficulty and effort
Our model may not be able to recognize high-level behavior or ac- behind data collection, data extinction, data labeling, and accuracy en-
tivity properly. For example, our model may fail to recognize "group hancement in the action recognition domain. This paper focuses on the
discussion" activity. Instead, it may be wrongly interpreted as "sitting vision-based HAR in context- aware applications and empathizes its di-
and talking." Similarly, "running on the road" or "running on the track" versity with transfer learning functionality. This paper’s whole-length
can get mixed and misclassified under a more superficial activity, i.e., depiction, investigation, and high points help the researcher achieve
"running." This shallow activity classification is apparently due to the in-depth knowledge in vision-based activity recognition using transfer
lack of background knowledge. Relating the semantic features with the learning techniques.
logical description between action and behavior through the learning Apart from transfer learning and all-pervasive applications in vision-
of Natural Language Processing (NLP) may be a possible solution for based activity recognition, other various orientations still lie down on
recognizing these complex activities. This contextual information may the floor to investigate and discover for subsequent research such as
provide additional knowledge that helps classify complex activities cor- detection, tracking, design, and classification. This all-inclusive survey
rectly. is supposed to strengthen further research in activity recognition grass-
land.
7.10. Negative transfer
Declaration of Competing Interest
In transfer learning, the source domain data representation lever-
ages target domain data for enhancing target domain performance ac- No conflict of interest
curacy. However, sometimes, leveraging source domain-specific knowl-
References
edge reduces the transfer learning performance of the target data. So, we
need to keep the knowledge about the origin of negative transfer, fac- Abu-El-Haija, Sami, Kothari, Nisarg, Lee, Joonseok, Natsev, Paul, Toderici, George,
tors influencing negative transfer, and tranquilizing algorithms to pre- Varadarajan, Balakrishnan et al. (2016).Youtube-8m: A large-scale video classifica-
vent negative transfer before applying transfer learning to any tasks. tion benchmark. arXiv preprint arXiv:1609.08675.
Aggarwal, Alankrita, Mittal, Mamta, & Battineni, Gopi (2021). Generative adversarial net-
Rosenstein, Marx, Kaelbling and Dietterich (2005) introduces negative work: An overview of theory and applications. International Journal of Information
transfer after finding that the incongruent nature between the source Management Data Insights, 1(1), Article 100004 pages 9.

12
A. Ray, M.H. Kolekar, R. Balasubramanian et al. International Journal of Information Management Data Insights 3 (2023) 100142

Akbari, Ali, & Jafari, Roozbeh (2019).Transferring activity recognition models for new Duan, Lixin, Xu, Dong, Tsang, Ivor Wai-Hung, & Luo, Jiebo (2011). Visual event recog-
wearable sensors with deep generative domain adaptation. In Proceedings of the 18th nition in videos by learning from web data. IEEE Transactions on Pattern Analysis and
International Conference on Information Processing in Sensor Networks, pages 85–96/ Machine Intelligence, 34(9), 1667–1680.
Alam, Mohammad Arif Ul, & Roy, Nirmalya (2017). Unseen activity recognitions: A hier- Fu, Biying, Damer, Naser, Kirchbuchner, Florian, & Kuijper, Arjan (2021). Generalization
archical active transfer learning approach. In 2017 IEEE 37th International Conference of fitness exercise recognition from doppler measurements by domain-adaption and
on Distributed Computing Systems (ICDCS) (pp. 436–446). IEEE. pages. few-shot learning. In International Conference on Pattern Recognition (pp. 203–218).
Alsheikh, Mohammad Abu, Selim, Ahmed, Niyato, Dusit, Doyle, Linda, Lin, Shaowei, & Springer. pages.
Tan, Hwee-Pink (2016). Deep activity recognition models with triaxial accelerome- Gan, Chuang, Lin, Ming, Yang, Yi, De Melo, Gerard, & Hauptmann, Alexander G.
ters. Workshops at the Thirtieth AAAI Conference on Artificial Intelligence. (2016).Concepts not alone: Exploring pairwise relationships for zero- shot video ac-
Al-Sulaiman, Talal (2022). Predicting reactions to anomalies in stock movements using a tivity recognition. In Thirtieth AAAI conference on artificial intelligence.
feed-forward deep learning network. International Journal of Information Management Gang, Zhao, Wenjuan, Zhu, Biling, Hu, Jie, Chu, Hui, He, & Qing, Xia (2021). A simple
Data Insights, 2(1), Article 100071 pages 11. teacher behavior recognition method for massive teaching videos based on teacher
Alwassel, Humam, Heilbron, Fabian Caba, & Ghanem, Bernard (2018).Action search: Spot- set. Applied Intelligence, 51(12), 8828–8849.
ting actions in videos and its application to temporal action localization. In Proceed- Ghadiyaram, Deepti, Tran, Du, & Mahajan, Dhruv (2019).Large-scale weakly- supervised
ings of the European Conference on Computer Vision (ECCV), pages 251–266. pre-training for video action recognition. In Proceedings of the IEEE/CVF Conference
An, Sizhe, Bhat, Ganapati, Gumussoy, Suat, & Ogras, Umit (2020).Transfer learning for on Computer Vision and Pattern Recognition, pages 12046–12055.
human activity recognition using representational analysis of neural networks. arXiv Ghosal, Deepanway, & Kolekar, Maheshkumar H. (2018).Music genre recognition using
preprint arXiv:2012.04479. deep neural networks and transfer learning. In Interspeech, pages 2087–2091.
Anand, Kartik, Urolagin, Siddhaling, & Mishra, Ram Krishn (2021). How does hand ges- Gonegandla, Pranesh, & Kolekar, Maheshkumar H. (2022). Automatic song indexing by
tures in videos impact social media engagement-insights based on deep learning? In- predicting listener’s emotion using EEG correlates and multi-neural networks. Multi-
ternational Journal of Information Management Data Insights, 1(2), Article 100036. media Tools and Applications, 81, 1–11 pages.
Arif Ul Alam, Mohammad, Mahmudur Rahman, Md, & Widberg, Jared Q. (2021). Palmar: Gorban, A., Idrees, H., Jiang, Y.-.G., Roshan Zamir, A., Laptev, I., Shah, M. et al.
Towards adaptive multi-inhabitant activity recognition in point- cloud technology. In (2015).THUMOS challenge: Action recognition with a large number of classes.
IEEE INFOCOM 2021-IEEE Conference on Computer Communications (pp. 1–10). IEEE. http://www.thumos.info/.
pages. Goyal, Raghav, Kahou, Samira Ebrahimi, Michalski, Vincent, Materzynska, Joanna, West-
Aslam, Nazia, & Kolekar, Maheshkumar H. (2022). Unsupervised anomalous event detec- phal, Susanne, Kim, Heuna et al. et al. (2017).The" something something" video
tion in videos using spatio-temporal inter-fused autoencoder. Multimedia Tools and database for learning and evaluating visual common sense. In Proceedings of the IEEE
Applications, 1–26 pages. international conference on computer vision, pages 5842–5850.
Aslam, Nazia, Rai, Prateek Kumar, & Kolekar, Maheshkumar H. (2022). A3N: Atten- Gu, Chunhui, Sun, Chen, Ross, David A., Vondrick, Carl, Pantofaru, Caroline, Li, Yeqing
tion-based adversarial autoencoder network for detecting anomalies in video se- et al.,(2018a). et al. AVA: A video dataset of spatio-temporally localized atomic visual
quence. Journal of Visual Communication and Image Representation, 87, Article 103598 actions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recogni-
pages 15. tion, pages 6047–6056.
Ben-Ari, Rami, Shpigel Nacson, Mor, Azulai, Ophir, Barzelay, Udi, & Rotman, Daniel Gupta, Saurabh (2021). Deep learning based human activity recognition (HAR) using
(2021).Taen: Temporal aware embedding network for few-shot action recognition. wearable sensor data. International Journal of Information Management Data Insights,
In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recog- 1(2), Article 100046 pages 18.
nition, pages 2786–2794. Hadfield, Simon, & Bowden, Richard (2013).Hollywood 3D: Recognizing actions in 3D
Bhattacharya, Sourav, Nurmi, Petteri, Hammerla, Nils, & Plötz, Thomas (2014). Using natural scenes. In Proceedings of the IEEE Conference on Computer Vision and Pattern
unlabeled data in a sparse-coding framework for human activity recognition. Pervasive Recognition, pages 3398–3405.
and Mobile Computing, 15, 242–262. Haresamudram, Harish, Beedu, Apoorva, Agrawal, Varun, Grady, Patrick L., Essa, Irfan,
Bux Sargano, Allah, Wang, Xiaofeng, Angelov, Plamen, & Habib, Zulfiqar (2017). Human Hoffman, Judy et al. (2020).Masked reconstruction based self- supervision for human
action recognition using transfer learning with deep representations. In 2017 Interna- activity recognition. In Proceedings of the 2020 International Symposium on Wearable
tional joint conference on neural networks (IJCNN) (pp. 463–469). IEEE. pages. Computers, pages 45–49.
Cabrera, Maria E., Sanchez-Tamayo, Natalia, Voyles, Richard, & Wachs, Juan P. (2017). Heilbron, Fabian Caba, Escorcia, Victor, Ghanem, Bernard, & Niebles, Juan Carlos
One-shot gesture recognition: One step towards adaptive learning. In 2017 12th (2015).Activitynet: A large-scale video benchmark for human activity understanding.
IEEE International Conference on Automatic Face & Gesture Recognition (FG 2017) In Proceedings of the ieee conference on computer vision and pattern recognition, pages
(pp. 784–789). IEEE. pages. 961–970.
Cabrera, Maria Eugenia, & Wachs, Juan Pablo (2017). A human-centered approach to Imran, Javed, & Raman, Balasubramanian (2020). Evaluating fusion of RGB-D and inertial
one-shot gesture learning. Frontiers in Robotics and AI, 4(8) pages 18. sensors for multimodal human action recognition. Journal of Ambient Intelligence and
Careaga, Chris, Hutchinson, Brian, Hodas, Nathan, & Phillips, Lawrence (2019).Metric- Humanized Computing, 11(1), 189–208.
based few-shot learning for video action recognition. arXiv preprint arXiv:1909.09602. Jänicke, Martin, Tomforde, Sven, & Sick, Bernhard (2016a). Towards self-improving ac-
Carreira, Joao, Noland, Eric, Banki-Horvath, Andras, Hillier, Chloe, & Zisserman, Andrew tivity recognition systems based on probabilistic, generative models. In 2016 IEEE
(2018).A short note about kinetics-600. arXiv preprint arXiv:1808.01340. International Conference on Autonomic Computing (ICAC) (pp. 285–291). IEEE. pages.
Carreira, Joao, Noland, Eric, Hillier, Chloe, & Zisserman, Andrew (2019).A short note on Jänicke, Martin, Tomforde, Sven, & Sick, Bernhard (2016b).Towards self-improving ac-
the kinetics-700 human action dataset. arXiv preprint arXiv:1907.06987. tivity recognition systems based on probabilistic, generative models. In 2016 IEEE
Carreira, Joao, & Zisserman, Andrew (2017).Quo vadis, action recognition? a new model International Conference on Autonomic Computing, pages 285–291. IEEE.
and the kinetics dataset. In proceedings of the IEEE Conference on Computer Vision Jasani, Bhavan, & Mazagonwalla, Afshaan (2019).Skeleton based zero shot action recog-
and Pattern Recognition, pages 6299–6308. nition in joint pose-language semantic space. arXiv preprint arXiv:1911.11344.
Chatterjee, Subhamoy, Bhandari, Piyush, & Kolekar, MaheshKumar H. (2016). A novel Ji, Zhong, Liu, Xiyao, Pang, Yanwei, & Li, Xuelong (2020). SGAP-Net: Semantic- guided
krawtchouk moment zonal feature descriptor for user-independent static hand gesture attentive prototypes network for few-shot human-object interaction recognition. Pro-
recognition. In 2016 IEEE Region 10 Conference (TENCON) (pp. 387–392). IEEE. pages. ceedings of the AAAI Conference on Artificial Intelligence, 34, 11085–11092 pages.
Chen, Chen, Jafari, Roozbeh, & Kehtarnavaz, Nasser (2015). UTD-MHAD: A multimodal Karn, Nabin Kumar, & Jiang, Feng (2016). Improved gloh approach for one-shot
dataset for human action recognition utilizing a depth camera and a wearable inertial learning human gesture recognition. In Chinese Conference on Biometric Recognition
sensor. In 2015 IEEE International conference on image processing (ICIP) (pp. 168–172). (pp. 441–452). Springer. pages.
IEEE. pages. Karpathy, Andrej, Toderici, George, Shetty, Sanketh, Leung, Thomas, Sukthankar, Rahul,
Chen, Yiqiang, Wang, Jindong, Huang, Meiyu, & Yu, Han (2019). Cross-position activity & Fei-Fei, Li (2014a).Large-scale video classification with convolutional neural net-
recognition with stratified transfer learning. Pervasive and Mobile Computing, 57, 1–13. works. In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition,
Cheng, Yi-Bin, Chen, Xipeng, Chen, Junhong, Wei, Pengxu, Zhang, Dongyu, & Lin, Liang pages 1725–1732.
(2021). Hierarchical transformer: Unsupervised representation learning for skele- Karpathy, Andrej, Toderici, George, Shetty, Sanketh, Leung, Thomas, Sukthankar, Rahul,
ton-based human action recognition. In 2021 IEEE International Conference on Mul- & Fei-Fei, Li (2014b).Large-scale video classification with convolutional neural net-
timedia and Expo (ICME) (pp. 1–6). IEEE. pages. works. In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition,
Chung, Jihoon, Wuu, Cheng-hsin, Yang, Hsuan-ru, Tai, Yu-Wing, & Tang, Chi-Ke- pages 1725–1732.
ung (2021). HAA500: Human-centric atomic action dataset with curated videos. Kay, Will, Carreira, Joao, Simonyan, Karen, Zhang, Brian, Hillier, Chloe, Vijaya-
In Proceedings of the IEEE/CVF International Conference on Computer Vision narasimhan, Sudheendra et al. et al. (2017a).The kinetics human action video dataset.
(pp. 13465–13474). pages. arXiv preprint arXiv:1705.06950.
Cook, Diane, Feuz, Kyle D., & Krishnan, Narayanan C. (2013). Transfer learning for activity Kay, Will, Carreira, Joao, Simonyan, Karen, Zhang, Brian, Hillier, Chloe, Vijaya-
recognition: A survey. Knowledge and information systems, 36(3), 537–556. narasimhan, Sudheendra et al. et al. (2017b).The kinetics human action video dataset.
Coskun, Huseyin, Zia, M. Z. eeshan, Tekin, Bugra, Bogo, Federica, Navab, Nassir, arXiv preprint arXiv:1705.06950.
Tombari, Federico, et al., (2021). Domain-specific priors and meta learning for Khan, Md Abdullah Al Hafiz, & Roy, Nirmalya (2018). Untran: Recognizing unseen ac-
few-shot first-person action recognition. IEEE Transactions on Pattern Analysis and Ma- tivities with unlabeled data using transfer learning. In 2018 IEEE/ACM Third In-
chine Intelligence pages 14. ternational Conference on Internet-of-Things Design and Implementation (IoTDI)
Deng, Wan-Yu, Zheng, Qing-Hua, & Wang, Zhong-Min (2014). Cross-person activity recog- (pp. 37–47). IEEE. pages.
nition using reduced kernel extreme learning machine. Neural Networks, 53, 1–7. Kolekar, Maheshkumar H. (2011). Bayesian belief network based broadcast sports video
Du, Hao, He, Yuan, & Jin, Tian (2018). Transfer learning for human activities classifi- indexing. Multimedia Tools and Applications, 54(1), 27–54.
cation using micro-doppler spectrograms. In 2018 IEEE International Conference on Kolekar, Maheshkumar H., & Sengupta, Somnath (2015). Bayesian network-based cus-
Computational Electromagnetics (pp. 1–3). IEEE. pages. tomized highlight generation for broadcast soccer videos. IEEE Transactions on Broad-
casting, 61(2), 195–209.

13
A. Ray, M.H. Kolekar, R. Balasubramanian et al. International Journal of Information Management Data Insights 3 (2023) 100142

Korbar, Bruno, Tran, Du, & Torresani, Lorenzo (2019).Scsampler: Sampling salient clips Rohrbach, Marcus, Amin, Sikandar, Andriluka, Mykhaylo, & Schiele, Bernt (2012). A
from video for efficient action recognition. In Proceedings of the IEEE/CVF Interna- database for fine grained activity detection of cooking activities. In 2012 IEEE Confer-
tional Conference on Computer Vision, pages 6232–6242. ence on Computer Vision and Pattern Recognition (pp. 1194–1201). IEEE. pages.
Lang, Yue, Wang, Qing, Yang, Yang, Hou, Chunping, Huang, Danyang, & Xiang, Wei Rohrbach, Marcus, Ebert, Sandra, & Schiele, Bernt (2013). Transfer learning in a trans-
(2018). Unsupervised domain adaptation for micro-doppler human motion classifi- ductive setting. Advances in neural information processing systems, 26.
cation via feature fusion. IEEE Geoscience and Remote Sensing Letters, 16(3), 392–396. Rosenstein, Michael T., Marx, Zvika, Kaelbling, Leslie Pack, & Dietterich, Thomas G.
Li, Binlong, Camps, Octavia I., & Sznaier, Mario (2012). Cross-view activity recognition (2005).To transfer or not to transfer. In In NIPS’05 Workshop, Inductive Transfer:
using hankelets. In 2012 IEEE Conference on Computer Vision and Pattern Recognition 10 Years Later.
(pp. 1362–1369). IEEE. pages. Sabater, Alberto, Santos, Laura, Santos-Victor, Jose, Bernardino, Alexandre, Montesano,
Li, Lianwei, Qin, Shiyin, Lu, Zhi, Zhang, Dinghao, Xu, Kuanhong, & Hu, Zhongying Luis, & Murillo, Ana C. (2021).One-shot action recognition in challenging therapy
(2021). Real-time one-shot learning gesture recognition based on lightweight 3D scenarios. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern
inception-ResNet with separable convolutions. Pattern Analysis and Applications, 24, Recognition, pages 2777–2785.
1–20 pages. Sanabria, Andrea Rosales, & Ye, Juan (2020). Unsupervised domain adaptation for activity
Liu, Chunhui, Hu, Yueyu, Li, Yanghao, Song, Sijie, & Liu, Jiaying (2017).PKU MMD: A recognition across heterogeneous datasets. Pervasive and Mobile Computing, 64, Article
large scale benchmark for continuous multi-modal human action understanding. arXiv 101147.
preprint arXiv:1703.07475. Shahroudy, Amir, Liu, Jun, Ng, Tian-Tsong, & Wang, Gang (2016).NTU RGB+D: A large
Liu, Jingen, Shah, Mubarak, Kuipers, Benjamin, & Savarese, Silvio (2011). Cross-view scale dataset for 3d human activity analysis. In Proceedings of the IEEE conference on
action recognition via view knowledge transfer. In Proceedings of IEEE conference on computer vision and pattern recognition, pages 1010–1019.
Computer Vision and Pattern Recognition (CVPR) (pp. 3209–3216). IEEE. pages. Sharma, Vijeta, Gupta, Manjari, Kumar, Ajai, & Mishra, Deepti (2021). EduNet: A new
Liu, Jun, Shahroudy, Amir, Perez, Mauricio, Wang, Gang, Duan, Ling-Yu, & Kot, Alex video dataset for understanding human activity in the classroom environment. Sen-
C. (2019). NTU RGB+D 120: A large-scale benchmark for 3d human activity un- sors, 21(17), 5699.
derstanding. IEEE transactions on pattern analysis and machine intelligence, 42(10), Shi, Zhenguo, Zhang, Jian Andrew, Xu, Yi Da Richard, & Cheng, Qingqing (2020). Environ-
2684–2701. ment-robust device-free human activity recognition with channel- state-information
Liu, Wu, Mei, Tao, Zhang, Yongdong, Che, Cherry, & Luo, Jiebo (2015).Multi- task deep enhancement and one-shot learning. IEEE Transactions on Mobile Computing.
visual-semantic embedding for video thumbnail selection. In Proceedings of the IEEE Soomro, Khurram, Zamir, Amir Roshan, & Shah, Mubarak (2012).Ucf101: A dataset of
conference on computer vision and pattern recognition, pages 3707–3715. 101 human actions classes from videos in the wild. arXiv preprint arXiv:1212.0402.
Loey, Mohamed, Manogaran, Gunasekaran, Taha, Mohamed Hamed N., & Khalifa, Nour Tran, Du, Bourdev, Lubomir, Fergus, Rob, Torresani, Lorenzo, & Paluri, Manohar
Eldeen M. (2021). A hybrid deep transfer learning model with machine learning meth- (2015).Learning spatiotemporal features with 3d convolutional networks. In Proceed-
ods for face mask detection in the era of the covid-19 pandemic. Measurement, 167, ings of the IEEE international conference on computer vision, pages 4489–4497.
Article 108288 pages 11. Tran, Du, Wang, Heng, Torresani, Lorenzo, Ray, Jamie, LeCun, Yann, & Paluri, Manohar
Luo, Manman, & Mu, Xiangming (2022). Entity sentiment analysis in the news: A case (2018).A closer look at spatiotemporal convolutions for action recognition. In Pro-
study based on negative sentiment smoothing model (NSSM). International Journal of ceedings of the IEEE conference on Computer Vision and Pattern Recognition, pages
Information Management Data Insights, 2(1), Article 100060. 6450–6459.
Ma, Chunyong, Zhang, Shengsheng, Wang, Anni, Qi, Yongyang, & Chen, Ge (2020). Tricco, Andrea C., Lillie, Erin, Zarin, Wasifa, O’Brien, Kelly K., Colquhoun, Heather,
Skeleton-based dynamic hand gesture recognition using an enhanced network with Levac, Danielle, et al., (2018). PRISMA extension for scoping reviews (PRISMA-ScR):
one-shot learning. Applied Sciences, 10(11), 3680. Checklist and explanation. Annals of internal medicine, 169(7), 467–473.
Mishra, Ashish, Pandey, Anubha, & Murthy, Hema A. (2020). Zero-shot learning for action Vondrick, Carl, Pirsiavash, Hamed, & Torralba, Antonio (2016). Generating videos with
recognition using synthesized features. Neurocomputing, 390, 117–130. scene dynamics. Advances in neural information processing systems, 29, 613–621.
Mohsen Amiri, S., Pourazad, Mahsa T., Nasiopoulos, Panos, & Leung, Victor C. M. (2013). Wang, Jiang, Nie, Xiaohan, Xia, Yin, Wu, Ying, & Zhu, Song-Chun (2014).Cross- view
Non-intrusive human activity monitoring in a smart home environment. In 2013 IEEE action modeling, learning and recognition. In Proceedings of the IEEE conference on
15th International Conference on e-Health Networking, Applications and Services (Health- computer vision and pattern recognition, pages 2649–2656.
com 2013) (pp. 606–610). IEEE. pages. Wang, Jindong, Chen, Yiqiang, Hu, Lisha, Peng, Xiaohui, & Philip, S. Y. u (2018a). Strat-
Mutegeki, Ronald, & Han, Dong Seog (2019). Feature-representation transfer learning for ified transfer learning for cross-domain activity recognition. In 2018 IEEE interna-
human activity recognition. In 2019 International Conference on Information and Com- tional conference on pervasive computing and communications (PerCom) (pp. 1–10). IEEE.
munication Technology Convergence (ICTC) (pp. 18–20). IEEE. pages. pages.
Ng, Joe Yue-Hei, Hausknecht, Matthew, Vijayanarasimhan, Sudheendra, Vinyals, Oriol, Wang, Jindong, Chen, Yiqiang, Hu, Lisha, Peng, Xiaohui, & Philip, S. Y. u (2018b). Strati-
Monga, Rajat, & Toderici, George (2015).Beyond short snippets: Deep networks for fied transfer learning for cross-domain activity recognition. In 2018 IEEE International
video classification. In Proceedings of the IEEE conference on computer vision and pattern Conference on Pervasive Computing and Communications (PerCom) (pp. 1–10). IEEE.
recognition, pages 4694–4702. pages.
Ntalampiras, Stavros, & Potamitis, Ilyas (2018). Transfer learning for improved audio- Wang, Jindong, Zheng, Vincent W., Chen, Yiqiang, & Huang, Meiyu (2018c).Deep transfer
based human activity recognition. Biosensors, 8(3), 60 pages 12. learning for cross-domain activity recognition. In proceedings of the 3rd International
Pan, Sinno Jialin, & Yang, Qiang (2009). A survey on transfer learning. IEEE Transactions Conference on Crowd Science and Engineering, pages 1–8.
on knowledge and data engineering, 22(10), 1345–1359. Wang, Limin, Xiong, Yuanjun, Wang, Zhe, Qiao, Yu, Lin, Dahua, Tang, Xiaoou, et al.,
Parmar, Paritosh, & Morris, Brendan (2022).Win-Fail action recognition. In Proceedings of (2016b). Temporal segment networks: Towards good practices for deep action recog-
the IEEE/CVF Winter Conference on Applications of Computer Vision, pages 161–171. nition. In European conference on computer vision (pp. 20–36). Springer. pages.
Perera, Asanka G., Law, Yee Wei, Ogunwa, Titilayo T., & Chahl, Javaan (2020). A mul- Wang, Xiaolong, Farhadi, Ali, & Gupta, Abhinav (2016a).Actions˜ transformations. In Pro-
tiviewpoint outdoor dataset for human action recognition. IEEE Transactions on Hu- ceedings of the IEEE conference on Computer Vision and Pattern Recognition, pages 2658–
man-Machine Systems, 50(5), 405–413. 2667.
Perrett, Toby, Masullo, Alessandro, Burghardt, Tilo, Mirmehdi, Majid, & Damen, Dima Wei, Bin, & Pal, Christopher (2011). Heterogeneous transfer learning with rbms. In Pro-
(2021).Temporal-relational crosstransformers for few-shot action recognition. In Pro- ceedings of the AAAI Conference on Artificial Intelligence: 25 (pp. 531–536). pages.
ceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages Wen, Jiahui, & Zhong, Mingyang (2015). Activity discovering and modelling with labelled
475–484. and unlabelled data in smart environments. Expert Systems with Applications, 42(14),
Piergiovanni, A.J., .& Ryoo, Michael S. (2018).Fine-grained activity recognition in base- 5800–5810.
ball videos. In Proceedings of the ieee conference on computer vision and pattern recogni- Xing, Yang, Lv, Chen, Wang, Huaji, Cao, Dongpu, Velenis, Efstathios, & Wang, Fei- Yue
tion workshops, pages 1740–1748. (2019). Driver activity recognition for intelligent vehicles: A deep learning approach.
Qin, Xin, Chen, Yiqiang, Wang, Jindong, & Yu, Chaohui (2019). Cross-dataset activity IEEE transactions on Vehicular Technology, 68(6), 5379–5390.
recognition via adaptive spatial-temporal transfer learning. Proceedings of the ACM on Xing, Yang, Tang, Jianlin, Liu, Hong, Lv, Chen, Cao, Dongpu, Velenis, Efstathios, et al.,
Interactive, Mobile, Wearable and Ubiquitous Technologies, 3(4), 1–25. (2018). End-to-end driving activities and secondary tasks recognition using deep con-
Qiu, Zhaofan, Yao, Ting, & Mei, Tao (2017).Learning spatio-temporal representation with volutional neural network and transfer learning. In 2018 IEEE Intelligent Vehicles Sym-
pseudo-3d residual networks. In proceedings of the IEEE International Conference on posium (IV) (pp. 1626–1631). IEEE. pages.
Computer Vision, pages 5533–5541. Xu, Xun, Hospedales, Timothy M., & Gong, Shaogang (2016). Multi-task zero- shot action
Rahmani, Hossein, & Mian, Ajmal (2015).Learning a non-linear knowledge transfer model recognition with prioritised data augmentation. In European Conference on Computer
for cross-view action recognition. In Proceedings of the IEEE conference on computer Vision (pp. 343–359). Springer. pages.
vision and pattern recognition, pages 2458–2466. Yamada, Makoto, Sigal, Leonid, & Raptis, Michalis (2013). Covariate shift adaptation for
Roder, Mateus, Almeida, Jurandy, Rosa, Gustavo H. D. e, Passos, Leandro A., Rossi, André discriminative 3d pose estimation. IEEE transactions on pattern analysis and machine
L. D., & Papa, João P. (2021). From actions to events: A transfer learning approach intelligence, 36(2), 235–247.
using improved deep belief networks. In 2021 IEEE Symposium Series on Computational Yan, Yan, Liao, Tianzheng, Zhao, Jinjin, Wang, Jiahong, Ma, Liang, Lv, Wei et al.
Intelligence (SSCI) (pp. 01–08). IEEE. pages. (2022).Deep transfer learning with graph neural network for sensor-based human
Rodriguez, Mario, Orrite, Carlos, Medrano, Carlos, & Makris, Dimitrios (2017a).Fast activity recognition. arXiv preprint arXiv:2203.07910.
simplex-hmm for one-shot learning activity recognition. In Proceedings of the IEEE Zaher Md Faridee, Abu, Chakma, Avijoy, Misra, Archan, & Roy, Nirmalya (2022). STran-
Conference on Computer Vision and Pattern Recognition Workshops, pages 41–48. GAN: Adversarially-learnt spatial transformer for scalable human activity recognition: 23.
Rodriguez, Mario, Orrite, Carlos, Medrano, Carlos, & Makris, Dimitrios (2017b).Fast Smart Health.
simplex-hmm for one-shot learning activity recognition. In Proceedings of the IEEE Zhang, Hongguang, Zhang, Li, Qi, Xiaojuan, Li, Hongdong, Torr, Philip H. S., & Koniusz, Pi-
Conference on Computer Vision and Pattern Recognition Workshops, pages 41–48. otr (2020a). Few-shot action recognition with permutation-invariant attention. In
Computer Vision–ECCV 2020: 16th European Conference (pp. 525–542). Glasgow, UK:
Springer. August 23–28, 2020Proceedings, Part V 16pages.

14
A. Ray, M.H. Kolekar, R. Balasubramanian et al. International Journal of Information Management Data Insights 3 (2023) 100142

Zhang, Jing, Li, Wanqing, & Ogunbona, Philip (2017a).Joint geometrical and statistical Zhou, Luowei, Xu, Chenliang, & Corso, Jason J. (2018).Towards automatic learning of pro-
alignment for visual domain adaptation. In Proceedings of the IEEE conference on com- cedures from web instructional videos. In Thirty-Second AAAI Conference on Artiﬁcial
puter vision and pattern recognition, pages 1859–1867. Intelligence.
Zhang, Ke, Chao, Wei-Lun, Sha, Fei, & Grauman, Kristen (2016).Summary transfer: Zhu, Fan, & Shao, Ling (2014). Weakly-supervised cross-domain dictionary learning for
Exemplar-based subset selection for video summarization. In Proceedings of the IEEE visual recognition. International Journal of Computer Vision, 109(1–2), 42–59.
conference on computer vision and pattern recognition, pages 1059–1067. Zhu, Xiatian, Toisoul, Antoine, Perez-Rua, Juan-Manuel, Zhang, Li, Martinez, Brais, & Xi-
Zhang, Lei, Zhang, Shengping, Jiang, Feng, Qi, Yuankai, Zhang, Jun, Guo, Yuliang, et al., ang, Tao (2021).Few-shot action recognition with prototype- centered attentive learn-
(2017b). Bomw: Bag of manifold words for one-shot learning gesture recognition ing. arXiv preprint arXiv:2101.08085.
from kinect. IEEE Transactions on Circuits and Systems for Video Technology, 28(10), Zhu, Yi, & Newsam, Shawn (2017). Eﬃcient action detection in untrimmed videos via
2562–2573. multi-task learning. In 2017 IEEE Winter Conference on Applications of Computer Vision
Zhang, Wen, Deng, Lingfei, Zhang, Lei, & Wu, Dongrui (2020b).A survey on negative (WACV) (pp. 197–206). IEEE. pages.
transfer. arXiv preprint arXiv:2009.00909.

LAFVIN 2WD Smart Robot Car Kit V2.2
No ratings yet
LAFVIN 2WD Smart Robot Car Kit V2.2
252 pages
KY8000 Brochure
No ratings yet
KY8000 Brochure
2 pages
Active Machine Learning For Heterogeneity Activity
No ratings yet
Active Machine Learning For Heterogeneity Activity
13 pages
SLR Zainab Saba
No ratings yet
SLR Zainab Saba
21 pages
Sensors 21 01121 With Cover
No ratings yet
Sensors 21 01121 With Cover
35 pages
Artificial Intelligence Student Management Based On Embedded System
No ratings yet
Artificial Intelligence Student Management Based On Embedded System
7 pages
1 s2.0 S1110016824000425 Main
No ratings yet
1 s2.0 S1110016824000425 Main
14 pages
AutomatedFacialAuthenticationAttendanceSystem
No ratings yet
AutomatedFacialAuthenticationAttendanceSystem
12 pages
Expert Systems With Applications: Review
No ratings yet
Expert Systems With Applications: Review
29 pages
Data-science-as-knowledge-creation-a-framework-for_2021_Technological-Foreca
No ratings yet
Data-science-as-knowledge-creation-a-framework-for_2021_Technological-Foreca
10 pages
Video Based Action Recognition RM
No ratings yet
Video Based Action Recognition RM
3 pages
OBJECT DETECTION AND GESTURE RECOGNITION
No ratings yet
OBJECT DETECTION AND GESTURE RECOGNITION
11 pages
Applying Human-in-the-Loop to construct a dataset for determining content reliability to combat fake news
No ratings yet
Applying Human-in-the-Loop to construct a dataset for determining content reliability to combat fake news
16 pages
Big Data Reference Architecture For E-Learning Analytical Systems
No ratings yet
Big Data Reference Architecture For E-Learning Analytical Systems
13 pages
IARJSET.2024.11739
No ratings yet
IARJSET.2024.11739
4 pages
FYP_PAPER_EDIT
No ratings yet
FYP_PAPER_EDIT
8 pages
Barua Et Al. - 2024 - Second-Order Learning With Grounding Alignment A
No ratings yet
Barua Et Al. - 2024 - Second-Order Learning With Grounding Alignment A
12 pages
1 s2.0 S2666920X23000152 Main
No ratings yet
1 s2.0 S2666920X23000152 Main
12 pages
RM 5
No ratings yet
RM 5
6 pages
A survey on machine learning for data fusion
No ratings yet
A survey on machine learning for data fusion
15 pages
A Survey of Deep Learning Platforms Applications and Emerging Research Trends
No ratings yet
A Survey of Deep Learning Platforms Applications and Emerging Research Trends
21 pages
Police Surveillance System for Missing Persons
No ratings yet
Police Surveillance System for Missing Persons
5 pages
Deep Learning
No ratings yet
Deep Learning
7 pages
Data Collection For ML Papaer
No ratings yet
Data Collection For ML Papaer
20 pages
Deep L Earning
No ratings yet
Deep L Earning
7 pages
Smart
No ratings yet
Smart
9 pages
An Adaptive Batch Size-Based-CNN-LSTM Framework For Human Activity Recognition in Uncontrolled Environment
No ratings yet
An Adaptive Batch Size-Based-CNN-LSTM Framework For Human Activity Recognition in Uncontrolled Environment
9 pages
FYP_PAPER
No ratings yet
FYP_PAPER
7 pages
Panoptic, Synoptic, and Omnoptic Surveillance: Rachel H. Ellaway
No ratings yet
Panoptic, Synoptic, and Omnoptic Surveillance: Rachel H. Ellaway
4 pages
MOVIE RECOMMENDATION SYSTEM BASED ON MACHINE LEARNING USING PROFILING
No ratings yet
MOVIE RECOMMENDATION SYSTEM BASED ON MACHINE LEARNING USING PROFILING
10 pages
Big Data Research
No ratings yet
Big Data Research
2 pages
IEEE 1 Kuttan
No ratings yet
IEEE 1 Kuttan
40 pages
Lecun 2015
No ratings yet
Lecun 2015
10 pages
Guest Editorial Introduction To The Special Issue On Large-Scale Video Analytics For Enhanced Security: Algorithms and Systems
No ratings yet
Guest Editorial Introduction To The Special Issue On Large-Scale Video Analytics For Enhanced Security: Algorithms and Systems
4 pages
1 s2.0 S0167923624000472 Main
No ratings yet
1 s2.0 S0167923624000472 Main
14 pages
Deep Learning Techniques for Behavior Analysis Methods, Applications, and Insights using CNN model
No ratings yet
Deep Learning Techniques for Behavior Analysis Methods, Applications, and Insights using CNN model
7 pages
A PSL-based Approach to Human Activity Recognition in Smart Home Environments
No ratings yet
A PSL-based Approach to Human Activity Recognition in Smart Home Environments
8 pages
rrl
No ratings yet
rrl
10 pages
TII Deep Learning PA Accepted
No ratings yet
TII Deep Learning PA Accepted
12 pages
Data Science and Deep Learning for Image Classification and Recognition
No ratings yet
Data Science and Deep Learning for Image Classification and Recognition
4 pages
Hagendorff2021 Article LinkingHumanAndMachineBehavior
No ratings yet
Hagendorff2021 Article LinkingHumanAndMachineBehavior
31 pages
Proposition of An Employability Prediction
No ratings yet
Proposition of An Employability Prediction
14 pages
Research Papers Summry
No ratings yet
Research Papers Summry
7 pages
Identification of Human Activities Using Sensors
No ratings yet
Identification of Human Activities Using Sensors
4 pages
Activity Recognition Ai
No ratings yet
Activity Recognition Ai
8 pages
Continual Learning in Inertial Measurement Unit Based Hum 2025 Expert System
No ratings yet
Continual Learning in Inertial Measurement Unit Based Hum 2025 Expert System
18 pages
Visual Data Mining
No ratings yet
Visual Data Mining
2 pages
A Review On Knowledge Sharing in Collaborative Environment
No ratings yet
A Review On Knowledge Sharing in Collaborative Environment
6 pages
Face Recognition With Deep Learning Architectures
No ratings yet
Face Recognition With Deep Learning Architectures
27 pages
Sensors 22 06463 v2
No ratings yet
Sensors 22 06463 v2
33 pages
Deep-learning-applications-and-challenges-in
No ratings yet
Deep-learning-applications-and-challenges-in
22 pages
A Systematic Review of Non-Intrusive Human Activity Recognition in Smart Homes Using Deep Learning
No ratings yet
A Systematic Review of Non-Intrusive Human Activity Recognition in Smart Homes Using Deep Learning
15 pages
Activity Recognition: Fundamentals and Applications
From Everand
Activity Recognition: Fundamentals and Applications
Fouad Sabry
No ratings yet
Metrics_for_Dataset_Demographic_Bias_A_Case_Study_on_Facial_Expression_Recognition
No ratings yet
Metrics_for_Dataset_Demographic_Bias_A_Case_Study_on_Facial_Expression_Recognition
18 pages
Baumer - 2015 - A Data Science Course For Undergraduates Thinking
No ratings yet
Baumer - 2015 - A Data Science Course For Undergraduates Thinking
10 pages
Major Project 30% Work Report
No ratings yet
Major Project 30% Work Report
5 pages
Ugc Questions
No ratings yet
Ugc Questions
17 pages
1 s2.0 S1566253522002081 Main
No ratings yet
1 s2.0 S1566253522002081 Main
19 pages
1 s2.0 S0957417424005839 Main
No ratings yet
1 s2.0 S0957417424005839 Main
24 pages
Big Data Analytics Using Artificial Intelligence
No ratings yet
Big Data Analytics Using Artificial Intelligence
5 pages
Data Mining: Concepts, Fundamentals And Applications
From Everand
Data Mining: Concepts, Fundamentals And Applications
Enrico Guardelli
No ratings yet
Data Science
From Everand
Data Science
John D. Kelleher
3/5 (8)
Katalog DS4 2007
No ratings yet
Katalog DS4 2007
164 pages
IoT Based Water Quality Monitoring System For Smart Cities
No ratings yet
IoT Based Water Quality Monitoring System For Smart Cities
6 pages
Health Monitoring System Using Wireless Sensor Network
No ratings yet
Health Monitoring System Using Wireless Sensor Network
9 pages
Rim Seal Protection of Tank
No ratings yet
Rim Seal Protection of Tank
6 pages
Artículo 1 Sensor Level Measurement PDF
No ratings yet
Artículo 1 Sensor Level Measurement PDF
5 pages
NanoSSOC A60 Technical Specifications 2
No ratings yet
NanoSSOC A60 Technical Specifications 2
15 pages
Product Overview: Carbon Monoxide Voltage Transmitter IP65, With Optional Display
No ratings yet
Product Overview: Carbon Monoxide Voltage Transmitter IP65, With Optional Display
4 pages
Plantweb Health Advisor: Product Data Sheet
No ratings yet
Plantweb Health Advisor: Product Data Sheet
13 pages
Specification TRID 30-7 R F FW
No ratings yet
Specification TRID 30-7 R F FW
2 pages
Order Online Today!: Series RHP
No ratings yet
Order Online Today!: Series RHP
2 pages
TC806B1076, TC806B1084 Photoelectric, TC807B1059 Low Profile, Ionization Intelligent Smoke Sensors
No ratings yet
TC806B1076, TC806B1084 Photoelectric, TC807B1059 Low Profile, Ionization Intelligent Smoke Sensors
2 pages
Cu 300
No ratings yet
Cu 300
59 pages
P.A.T. / L.M.I. DS 160 Service Manual: Americas Training
No ratings yet
P.A.T. / L.M.I. DS 160 Service Manual: Americas Training
48 pages
Warrior COOKBOOK CSSM
No ratings yet
Warrior COOKBOOK CSSM
47 pages
APA 6000™ Process Analyzer: Installation and Maintenance Manual
No ratings yet
APA 6000™ Process Analyzer: Installation and Maintenance Manual
108 pages
ILE 301 ME 319 Robotics
No ratings yet
ILE 301 ME 319 Robotics
2 pages
IMU-10x Motion Sensor Datasheet
No ratings yet
IMU-10x Motion Sensor Datasheet
2 pages
Final Projek 2 2023 (Telah Dibetulkan) .
No ratings yet
Final Projek 2 2023 (Telah Dibetulkan) .
71 pages
VS Fire Alarm System Distribuidores PDF
No ratings yet
VS Fire Alarm System Distribuidores PDF
38 pages
LT-1023 Device Compatiblity Guide - Mircom Addressable Panels Rev 15 Oct 2021
No ratings yet
LT-1023 Device Compatiblity Guide - Mircom Addressable Panels Rev 15 Oct 2021
69 pages
Analysis Using Temperature Sensor DS18B20 Hydroponic
No ratings yet
Analysis Using Temperature Sensor DS18B20 Hydroponic
9 pages
Vibration Monitoring On Submerged Vertical Pump
No ratings yet
Vibration Monitoring On Submerged Vertical Pump
8 pages
LPG Leakage and Flame Detection With Notification and Alarm System
No ratings yet
LPG Leakage and Flame Detection With Notification and Alarm System
15 pages
IR400 Manual
No ratings yet
IR400 Manual
45 pages
Gas Alarm Series PDF
No ratings yet
Gas Alarm Series PDF
5 pages
Va-3D 3-Axes Acceleration Sensor: Properties
No ratings yet
Va-3D 3-Axes Acceleration Sensor: Properties
2 pages
EIOT Unit-3
No ratings yet
EIOT Unit-3
26 pages
TCON.H-L312 Generic Error Codes v2.54
No ratings yet
TCON.H-L312 Generic Error Codes v2.54
74 pages