Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Adventures in Data Analysis: A Systematic Review of Deep Learning Techniques For Pattern Recognition in Cyber Physical Social Systems

Download as pdf or txt
Download as pdf or txt
You are on page 1of 65

Multimedia Tools and Applications

https://doi.org/10.1007/s11042-023-16382-x

Adventures in data analysis: a systematic review


of Deep Learning techniques for pattern recognition
in cyber‑physical‑social systems

Zahra Amiri1 · Arash Heidari1,2 · Nima Jafari Navimipour3,4   · Mehmet Unal5 ·


Ali Mousavi6

Received: 5 October 2022 / Revised: 7 June 2023 / Accepted: 17 July 2023


© The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2023

Abstract
Machine Learning (ML) and Deep Learning (DL) have achieved high success in many
textual, auditory, medical imaging, and visual recognition patterns. Concerning the impor-
tance of ML/DL in recognizing patterns due to its high accuracy, many researchers argued
for many solutions for improving pattern recognition performance using ML/DL methods.
Due to the importance of the required intelligent pattern recognition of machines needed in
image processing and the outstanding role of big data in generating state-of-the-art modern
and classical approaches to pattern recognition, we conducted a thorough Systematic Lit-
erature Review (SLR) about DL approaches for big data pattern recognition. Therefore, we
have discussed different research issues and possible paths in which the abovementioned
techniques might help materialize the pattern recognition notion. Similarly, we have classi-
fied 60 of the most cutting-edge articles put forward pattern recognition issues into ten cat-
egories based on the DL/ML method used: Convolutional Neural Network (CNN), Recur-
rent Neural Network (RNN), Generative Adversarial Network (GAN), Autoencoder (AE),
Ensemble Learning (EL), Reinforcement Learning (RL), Random Forest (RF), Multilayer
Perception (MLP), Long-Short Term Memory (LSTM), and hybrid methods. SLR method
has been used to investigate each one in terms of influential properties such as the main
idea, advantages, disadvantages, strategies, simulation environment, datasets, and security
issues. The results indicate most of the articles were published in 2021. Moreover, some
important parameters such as accuracy, adaptability, fault tolerance, security, scalability,
and flexibility were involved in these investigations.

Keywords  Deep Learning · Machine Learning · Pattern Recognition · Big Data ·


Autonomous System

Extended author information available on the last page of the article

13
Vol.:(0123456789)
Multimedia Tools and Applications

1 Introduction

Presently, researchers are captivated by big data, which poses a formidable challenge due
to the amalgamation of four primary parameters (velocity, diversity, volume, and qual-
ity) that delineate the data flow for pattern detection [58, 113, 128]. Numerous sources of
data, both homogeneous and heterogeneous, strive to embody these criteria [22, 23, 56].
Additionally, big data encompasses a repertoire of techniques and tools employed to scru-
tinize vast amounts of unstructured data, including videos and images [2, 54, 61]. Process-
ing unstructured data presents a formidable task as it lacks the comprehensive structure
characteristic of regular data formats, owing to its frequent alterations [24, 102, 105]. One
prominent tool that addresses potential challenges and effectively handles large data sets
is Hadoop [17, 45, 126]. The progressive advancement in pattern recognition approaches
for both structured and unstructured data processing continually expands [44, 114, 117].
This capacity necessitates greater attention to data analysis methodologies that effectively
manage these immense and diverse volumes of information [5, 57]. Several analytical
techniques have been developed to fulfill the need for high-quality data analysis functions.
These encompass visualization, pattern recognition, statistical analysis, Machine Learning
(ML), and Deep Learning (DL), all of which contribute to extracting meaningful patterns
from extensive data sets [59, 60].
Pattern recognition and other diverse computational methods have proven to be valuable
assets in leveraging the potential of big data [25, 132]. Big data fusion with DL and ML
has further enhanced computational pattern recognition, leading to insightful predictive
findings from acquired data [82, 137]. However, it is important to acknowledge the inher-
ent challenge of dealing with all attributes within vast and similar datasets found in big
data [139, 140]. Therefore, new approaches for data certification and conformity must be
explored. Advancements in computing technology have opened up possibilities for uncov-
ering hidden values in massive datasets by utilizing various pattern recognition algorithms,
which were previously cost-prohibitive [43, 65]. The emergence of pattern recognition has
prompted the development of technologies that facilitate real-time accessibility, storage,
and analysis of enormous data volumes [40, 87]. Notably, big data methods for visual pat-
tern recognition differ in two key aspects [123, 125]. Firstly, big data refers to data sets that
are too large to be stored on a single device [143, 144]. Secondly, the absence of struc-
ture in traditional data necessitates the replication of the big data concept, requiring spe-
cific tools and approaches [16, 121]. Innovations like Hadoop, Bigtable, and MapReduce
have revolutionized visual pattern recognition, addressing significant challenges associated
with efficiently handling vast data volumes [103, 129]. Various applications, such as sim-
ple Database (DB), NoSQL, Data Stream Management System (DSMS), and Memcached,
can be employed for big data, with Hadoop standing out as the most popular and suitable
choice [86, 116].
In our study, this paper contributes by conducting a comprehensive Systematic Litera-
ture Review (SLR) to evaluate the utilization of DL/ML methods in pattern recognition,
addressing previous gaps in the literature. It focuses on practical approaches and catego-
rizes them into ten distinct groups, providing a detailed analysis of each group’s advan-
tages, disadvantages, and applications. The paper consolidates findings, considers various
factors, and offers a wide range of techniques, contributing to advancements in the field of
pattern recognition. We undertook an integrated SLR to comprehensively examine the utili-
zation of DL/ML methods in pattern recognition. Previous SLRs have failed to comprehen-
sively evaluate all aspects of DL/ML approaches in this domain, prompting our research

13
Multimedia Tools and Applications

to fill this gap. Consequently, our paper primarily focuses on practical DL/ML approaches
within the context of pattern recognition. The significance of our research lies in its explo-
ration of diverse and efficient DL/ML methodologies employed to tackle pattern recogni-
tion challenges. We thoroughly analyzed, consolidated, and reported findings from similar
publications through the SLR. Additionally, we categorized DL/ML approaches for pattern
recognition into ten distinct groups, encompassing Convolutional Neural Network (CNN),
Recurrent Neural Network (RNN), Generative Adversarial Network (GAN), Autoencoder
(AE), Ensemble Learning (EL), Reinforcement Learning (RL), Random Forest (RF), Mul-
tilayer Perception (MLP), Long-Short Term Memory (LSTM), and hybrid models. Each
group was meticulously examined, considering various factors such as advantages, disad-
vantages, security implications, simulation environment, dataset, and the DL/ML approach
employed in pattern recognition. The paper emphasizes the techniques and applications of
DL/ML methods in pattern recognition, presenting a wide range of techniques that contrib-
ute to advancements in this field. Furthermore, we delved deeply into future work that must
be implemented in future studies. Overall, this paper’s contributions are:

• Reviewing the present issues pertinent to DL/ML methods for pattern recognition;
• Presenting a systematic overview of previous works on pattern recognition;
• Evaluating each approach that emphasized DL/ML methods with diverse aspects;
• Planning the key aspects that will allow the researchers to develop future works;
• Explaining the definitions of pattern recognition methods used in various studies.

The subsequent compilation constitutes the framework of this article. The subsequent
section elucidates the principal viewpoints and suitable terminology of DL/ML approaches
employed in pattern recognition. Section 3 scrutinizes the relevant review papers. Section 4
encompasses the research methodology and tools employed for paper selection. Section 5
encompasses the chosen papers subjected to study and evaluation. The following section
presents a comprehensive comparison and discussion of the outcomes, as expounded in
Section 6. Section 7 deliberates on future endeavors, while Section 8 elucidates the rami-
fications. Furthermore, Table  1 provides a catalog of the abbreviations employed in the
research.

2 Basic concepts and corresponding terminologies

In this part, we have provided a quick definition of important terms such as DL, ML, big
data, and pattern recognition.

2.1 ML and DL

ML is a subset of Artificial Intelligence (AI) that enables computer programs to learn and
adapt without human intervention [80, 142]. ML algorithms analyze vast amounts of data to
detect patterns and make predictions in various fields such as advertising, finance, fraud detec-
tion, and more [62, 133]. It can process diverse data types like words, images, and clicks,
making it applicable to digital storage. DL, a branch of ML, uses Artificial Neural Networks
(ANN) to simulate the human brain’s functioning [31, 141]. DL extracts feature from data
by employing multiple hidden layers and progressively abstracts information. With increas-
ing data analysis, DL can identify hidden patterns [67, 90]. It learns from processed data,

13

Table 1  Abbreviation table
Abbreviation Definition Abbreviation Definition

13
AE Autoencoder AI Artificial Intelligence
AIFV Armored Infantry Fighting Vehicle AIDAE Anti-Intrusion Detection AutoEncoder
AMD Advanced Micro Devices ANN Artificial Neural Network
CAE Computer-aided engineering BPTT Back Propagation Through Time
CN-GAN Cognitive Network-Generative Adversarial Network CNN Convolutional Neural Network
CTC​ Connectionist Temporal Classification CNV Copy Number Variation
DCH Direct Connect Hub DB Database
DR Data Request DC-VAE Dual Contradistinctive Generative AutoEncoder
DTL Deep Transfer Learning DSMS Data Stream Management System
FA-GAN Face Augmentation- Generative Adversarial Network DM Direct Message
GCE Gated Context Extractor GAN Generative Adversarial Network
GIS Geographic Information System GCN Graph Convolutional Network
GRU​ Gated Recurrent Unit GRE Generic Encapsulation
IT Information Technology IDS Intrusion Detection System
LSTM Long Short-Term Memory GPU Graphics Processing Unit
MCC Matthews Correlation Coefficient IoT Internet of Things
ML Machine Learning LBP Local Binary Pattern
MOS Metal Oxide Semiconductor MAE Metropolitan Area Exchange
NLP Natural Language Processing ML Machine Learning
NR New Radio MLP Multilayer Perception
RF Radio Frequency RL Reinforcement Learning
RNN Recurrent Neural Network RL-RBN Race-balanced Network
ROM Read Only Memory SAR System Activity Report
SER Service Edge Router SLR Systematic Literature Review
STS Synchronous Transport Signal SVM Support Vector Machine
SVR Software Verification Result TTS Text-To-Speech
Multimedia Tools and Applications
Multimedia Tools and Applications

autonomously extracting features without human involvement [68, 119]. DL techniques have
revolutionized language modeling, exemplified by Google Tran slate’s contextual translations
facilitated by DL-based Natural Language Processing (NLP). DL’s ability to handle complex
data and perform advanced tasks positions it at the forefront of AI technologies [3, 75].

2.2 What are big data and its usage?

Big data refers to a vast amount of ever-increasing data sets in a variety of formats, includ-
ing structured, semi-structured, and unstructured information [74, 92]. Because of the compli-
cated nature of big data, which necessitates powerful algorithms and robust technology, it is
defined by the three primary criteria listed below.

I. Volume: A huge amount of digital data is produced continuously from millions of


applications and devices. More than several exabytes of data are increasingly pro-
duced each year.
II. Diversity: Big data is generated in a variety of formats by several distant sources.
Great data series include unstructured and structured data and local, private, com-
pleted, or uncompleted data.
III. Distribution: Big data is being used as a successful solution in many fields, including
smart grid, E-earth, the Internet of Things (IoT), public utilities, transportation and
logistics, political services and government surveillance, and so on. DL/ML, on the
other hand, objectively contributes to acquiring knowledge and making judgments
for a variety of vital purposes, such as pattern recognition, recommendation engines,
informatics, data mining, and autonomous control systems.

2.3 What is pattern recognition?

The detection of the features or data deployment that offer information about a specific system
or data set is referred to as pattern recognition [50, 63]. In the professional context, a pattern
may be a continuously repeating sequence of data over time that can be used to predict trends,
specific configurations of image characteristics that recognize objects, frequent combinations of
words and phrases for NLP, or particular groups of behavior on a network that can demonstrate
an attack through virtually infinite other likelihoods [81, 89]. Pattern recognition, in essence,
crosses several areas of IT, including biometric identity, security and AI, and big data analytics
[76, 93]. Pattern recognition is distinct from ML, in which the pattern recognizer, unsupervised,
and supervised learning methodologies are widely used during training [34, 94]. In supervised
ML, the human contributor gives a representative set of configurable data to characterize the
patterns [85, 98]. Unsupervised ML minimizes the use of a human element and pre-existing
knowledge [35, 97]. In this approach, the algorithms are trained to discover new patterns with-
out using existing labels simply by being familiar with a large data set. On the other hand, DL
can be used to train pattern recognizers alongside machines regarding networks [10].

3 Relevant reviews

In this paper, we have presented a detailed assessment of independent ML/DL algo-


rithms for large data pattern identification in cyber-physical systems and a discussion of
the research contributions of these various approaches. Several related survey studies and

13
Multimedia Tools and Applications

journal articles based on ML/DL approaches in big data were studied in this regard. Even
though we attempt to categorize articles, some of them may not correspond to one cat-
egory. By the same token, Bai, et al. [11], to make pattern recognition robust and efficient,
reviewed several articles accepted on explainable ML/DL. Their broad review of repre-
sentative studies and current improvements in explainable ML/DL for effective and robust
pattern recognition is of high quality, presenting the latest development of interpretability
of DL strategies, well-organized and compacted network architectures in particular pattern
recognition and new adversarial attacks, and stability preparation is investigated. Moreo-
ver, Paolanti and Frontoni [83] put forward new trends and methods of pattern recognition
used in various fields, and different pattern recognition techniques have been reviewed. By
putting special regard to ML, DL, and statistics, the authors investigated possible solutions
for systems development. They mentioned elements like intelligent systems, devices, and
end-to-end analytics. Then they examined multiple various fields of pattern recognition
applications with particular attention to biology and biomedical, surveillance, social media
intelligence, Direct Connect Hub (DCH), and retail.
Also, Zerdoumi, et al. [130] talked about large data, visual pattern recognition, and
categorization. They discussed the potential advantages of ML algorithms for pattern
recognition in huge data. They emphasized unresolved research difficulties related to the
use of pattern recognition in big data. They performed a thorough literature review to
demonstrate the applicability of multi-criteria decision approaches and DL algorithms
to big data concerns. Moreover, Bhamare and Suryawanshi [13] offered an overview
and analysis of several well-known tactics used at various levels of the pattern recogni-
tion system and research subjects’ recognition and applications that are at the forefront
of this intriguing difficult field. They presented pattern recognition frameworks based
on several ML algorithms. On this basis, they examined 33 similar experiments from
2014 to 2017.
Smart city development is only one of several domains where common technology has
significantly impacted it. As Atitallah, et  al. [7] demonstrated this by reviewing several
current studies. The primary goal of their research is to look into the use of IoT big data
and DL analytics in the enhancement and development of smart cities. Following that,
they identified IoT technology and demonstrated the computing foundation and ML/DL
applications used by IoT data analytics, which includes fog, cloud, and edge computing.
As a result, they investigated well-known DL architectures and their applications, disad-
vantages, and benefits. Furthermore, as ML and big data analytics have demanded progres-
sive leaps and bounds in information systems and boundaries, Zhang, et al. [135] promised
bibliometric research to examine the primary writers’ contributions, countries, and organi-
zations/universities in terms of citations, yield, and bibliographic coupling. As a result,
they provided valuable information for potential participants and audiences regarding new
research topics.
Similarly, the epidemic of chronic illnesses such as CoronaVirus Disease (COVID-
19) gave healthcare facilities to populations all over the world [96, 108, 110]. With the
advancement of the IoT, these wearables were able to collect context-specific data perti-
nent to behavioral, physical, and psychological health. By taking this issue into mind, Li,
et al. [53] gave an in-depth evaluation of big data analytics in IoT healthcare by evaluating
chosen relevant surveys to identify a research gap. Also, they provided cutting-edge smart
health. Also, a detailed analysis of related reviews’ weaknesses and strengths is shown in
Table 2.

13
Table 2  Summary of related works
Author Scope Main idea Advantage Disadvantage

Bai, et al. [11] Pattern recognition Using • Providing three groups of the most • High-quality classification • Lack of comparison between
DL recent development of each researched • Broad investigation area the paper’s idea
article
Paolanti and Frontoni [83] ML, DL • Investigating several fields of Pattern • A comprehensive and in-depth • The method of paper selection
recognition applications study of related articles is not mentioned
Zerdoumi, et al. [130] Big data, ML, and pattern • Creating a taxonomy for image pattern • Regarding applications of image • Lack of comparison between
recognition are combined recognition and large data pattern recognition for big data studied papers
Multimedia Tools and Applications

• Using visual pattern recognition for • Broad investigation through the


big data topic
Yang, et al. [127] Big data and DL • Using DL applications and multi-crite- • High-quality schematic com- • Some important parameters in
ria decision-making methodologies to parison comparison such as the dataset
address big data concerns • Giving an organized insight into are overlooked
the topic
Bhamare and Suryawanshi Pattern recognition and ML • Summarizing many approaches used at • High-quality comparison • Poor schematic results
[13] various levels of pattern recognition • Well-organized topics • Poor figurative concludes
Atitallah, et al. [7] DL and pattern recognition • Proposing and evaluating open-source • High-quality analysis • Lack of well-organized clas-
systems to assist DL research sification
Zhang, et al. [135] ML and big data • Conducting bibliographic research and • Comprehensive investigation • Poor comparison between
assessing the publications that have • Well-organized classification methods
been studied
Li, et al. [53] Big Data and ML • Examining machine learning strategies • Comprehensive investigation • Overlooked several critical
for large data analytics in smart health • Accurate details parameters for comparison
Our study Pattern recognition, big • Researching autonomous ML in cyber- • The method of paper selection is • Lack of using lecturer notes
data, and ML physical-social systems for large data mentioned
pattern recognition • The benefits and drawbacks of
the methods under consideration
are listed
• An in-depth examination of the
results

13
Multimedia Tools and Applications

4 Methodology of research

The SLR approach was used in this section to understand better Autonomous ML/DL strat-
egies for big data pattern recognition. The SLR is a critical examination of all research on
a specific scope. This section will provide an in-depth discussion of ML approaches to pat-
tern identification. Following that, we seek verification of the research selection technique.
Subsequent subsections outline the search technique and include Research Questions (RQ)
and selection criteria.

4.1 Question formulation

The primary purpose of this study is to categorize, recognize, survey, and assess certain
specific existing articles in ML/DL techniques for pattern recognition applications. To
achieve the discussed purpose, the aspects and characteristics of the techniques can be
thoroughly researched using an SLR. Understanding the main concerns and challenges
encountered thus far is the next goal of SLR in this phase. We proposed several RQs that
had been pre-specified:

• RQ 1: How can we identify the paper and select the ML/DL techniques in pattern rec-
ognition?
 This is covered in Section 4.
• RQ 2 What are the most important potential solutions and unanswered questions in this
field?
 Section 7 will present the outstanding issues.
• RQ 3: How can the ML/DL methods in pattern recognition be categorized in big data?
What are some of their instances?
 The answer to this question can be found in Section 5.
• RQ 4: What methods do the researchers use to conduct their investigation?
  This question is addressed in Sections 5.1 through 5.7.

4.2 The paper selection procedure

The following four stages design the paper selection and search procedure for this research.
This procedure is depicted in Fig. 1. Table 3 displays the terms and keywords for searching
the articles at the first level. The articles in this set are the outcome of a typical electronic
database query. Electronic databases used include Springer Link, ACM, Scopus, Elsevier,
IEEE Explore, Emerald Insight, Taylor and Francis, Peerj, Dblp, ProQuest, and DOAJ.

Fig. 1  The stages of the paper searching and selection procedure

13
Multimedia Tools and Applications

Table 3  Search terms and S# Search Terms and Keywords


keywords
S1 “DL” or “ML” and “Big Pattern Recognition”
S2 “Pattern Recognition” and “Big Data”
S3 “AI” and “Pattern Recognition”
S4 “DL” or “ML” and “Big Data Cyber-Physi-
cal-Social-Systems “
S5 “Pattern Recognition” and “Neural Network”
S6 “Big Data” and “AI”
S7 “Big Data” and “ML” or “DL”
S8 “Big Pattern Recognition” and “AI”

Books, chapters, journals, technical studies, conference papers, and special issues are also
established. Stage 1 has 612 items allocated to it. Figure 2 displays the distribution of arti-
cles by publication.
Stage 2 consists of two processes for determining the total number of articles to
be researched. Figure  3 depicts the publisher’s distribution of articles at this point.
The papers are initially judged based on the criteria shown in Fig. 4. 305 articles are
still present. In stage 2, the survey papers are extracted.; out of the 305 papers that
remained in the previous stage, 35 (11.47%) were survey papers. There are presently
188 papers available. In step 3, the titles and abstracts of the articles were examined.
Finally, 95 publications that met the stringent conditions were chosen to analyze and
investigate the other papers. The distribution of the selected papers by their publishers
is shown in Fig. 5. There were 60 manuscripts left for the final round, and Fig. 6 dis-
plays the journals that published the studies at that point.

Fig. 2  The distribution of selected papers by publishers in the first stage

13
Multimedia Tools and Applications

Fig. 3  The distribution of selected papers by publishers in the second stage

5 Techniques for autonomous ML for big data pattern recognition

This section investigates autonomous ML/DL algorithms for large data pattern detection
in a variety of applications. We are going to touch on distinct articles in the following
paragraphs. 10 categories of ML/DL techniques, including CNNs, RNNs, GANs, AEs,
ELs, RLs, RFs, MLPs, LSTMs, and hybrid emphasis studied articles, are appropriately
organized into them. Figure  7 depicts the proposed assortment of ML/DL Techniques
used in pattern recognition.

Fig. 4  Standard for selecting


papers proces

13
Multimedia Tools and Applications

Fig. 5  The distribution of selected papers by publishers in the third stage

5.1 CNN mechanisms for pattern recognition

CNN is one of the most important ML/DL techniques because it can take an input image,
assign importance (learnable biases and weights) to distinct objects/facets in the image, and
compare them. In comparison, CNN requires less pre-processing than other techniques.
CNN can learn these features/filters the same way trained filters in primary mechanisms
are hand-engineered. CNN’s architecture is inspired by the connection between the pattern

Fig. 6  The distribution of selected papers by publishers in the fourth stage

13
Multimedia Tools and Applications

Fig. 7  The introduced taxonomy of DL/ML methods for pattern recognition

of the human brain’s neurons and the structure of the visual cortex. Individual neurons
respond to spurs in a restricted visual field area known as the receptive field. Such fields
congregate to overlap adequately to cover the entire visual region. In this regard, Awan,
et al. [8] used a Deep Transfer Learning (DTL) method known as the Apache Spark system,
which is a large data framework that uses a 100%-accuracy CNN. ResNet50, VGG19—on
COVID-19 chest X-ray images—and Inception V3 are three architectures used to quickly
identify and isolate positive COVID-19 patients [111, 112]. However, COVID-19/pneu-
monia/normal detection accuracy was 98.55% for the ResNet50 and VGG19 models and
97% for the inceptionV3 design. The authors investigated weighted recall, weighted preci-
sion, and accuracy as DTL operation metrics. The results of ResNet50, VGG19, and Incep-
tionV3 were excellent, and these three models for binary-class assortment provided 100%
detection accuracy. While categorizing the three classes, VGG19, ResNet50, and Inception
V3 achieved 98.55%, 98.55%, and 97% accuracy, respectively.
Furthermore, one of the most important financial markets is the stock market, which
generates a lot of money, but the most difficult challenge that has not been solved is decid-
ing which stocks to buy and when to buy or sell shares. With this issue in mind, Sohangir,
et al. [101] provided the idea of using DL systems to construct the sentiment analysis fea-
ture for StockTwits. CNN, doc2vec, and LSTM were among the models used to analyze
stock market ideas submitted on StockTwits. The authors used n-grams, bi-grams, uni-
grams, and the CNN method to extract document sentiment efficiently. Then, they used
logistic regression based on a set of terms. They concluded that the CNN method effec-
tively extracts stock emotion from their utterances.
Also, Hossain and Muhammad [39] proposed an emotion recognition system based on
big emotional data and a DL method big data comprises video and voice, in which a speech
signal is processed in the beginning to obtain a Mel-spectrogram in the frequency domain,
and can be considered an image. As a result, a CNN employed the Mel-spectrogram. The
authors employed 2D and 3D CNN for the voice and video signals, respectively, and their
results showed that the ELM-based fusion performed better than the categorizer’s compo-
sition because ELMs add a significant degree of non-linearity to the feature fusion.
Also, Ni [79] evaluated CNN to generate many visual attributes, then lowered the num-
ber of calculations, followed by a dimension reduction by the pooling layer. The increased
ReLU performance, ReLU employed, and the effect of less performance of the model were
investigated for the network structure based on LeNet-5 to be more helpful for face image
processing. The authors used CelebA as a training set for the model and LEW as a testing

13
Multimedia Tools and Applications

set for performance testing. As a result, the produced LeNet-5 model with A-softmax loss
had a shorter training time when using A-software and softmax loss between LeNet-5,
implying a faster convergence speed in this model. Following that, A-softmax loss was
employed throughout an LFW testing set, and as a result, the recognition accuracy of the
produced LeNet-5 was significantly greater than that of LeNet-5With increased size, the
recognition rate of the two models increased, and the difference between the two models
widened.
By the same token, Xu, et  al. [122] created an emotion-sensitive learning framework
that analyzes the cognitive state and approximates the learners’ focus and mood based on
head posture and facial expression in a non-invasive manner. As a result, the learners’ emo-
tions are assessed based on their facial expressions. They concluded that their suggested
method can approximate learner attention and sentiment with 88.6% and 79.5% accuracy
rates, demonstrating the system’s strength for evaluating sentiment-sensitive learning cog-
nitive circumstances.
Additionally, Li, et al. [55] presented a deep CNN model to reach the hierarchical prop-
erties of huge data by extending the CNN from vector space to tensor space using the ten-
sor representation paradigm. To avoid overfitting and improve training efficiency, a tensor
convolutional procedure is provided to fully use the local properties and topologies pre-
sent in the huge data. Furthermore, they applied a high-order back promotion algorithm for
teaching the deep convolutional computational model’s parameters in the high-order space.
Finally, tests on three datasets, SNAE2, CUAVE, and STL-10, demonstrated their model’s
capacity to learn big data and traditional data features.
Finally, Sevik, et  al. [95] created a deep network capable of recognizing both letters
and fonts in Turkish. A pre-trained network has been taught using around 13 thousand
images to accomplish this goal. The letter and font identification training accuracies are
100% and 73.44%, respectively. Because the type of faces is similar, they used a possibil-
ity calculation after determining the network output to improve the font recognition per-
centage. Although the first test image font’s accuracy is 14/26% because the probability is
greater than 0.5%, they recognized it as Arial, and the function was slightly improved as a
result. Following then, 12 images containing letters were addressed to the network test. As
a result, letter identification accuracy with this network was roughly 100%, but font accu-
racy recognition was low. Table 4 discusses the CNN methods used in pattern recognition
and their properties.

5.2 RNN mechanisms for pattern recognition

An RNN is a type of ANN in which node connections form either an undirected or


directed graph based on a transitory sequence. As a result, it exhibits a transient
dynamic style. RNNs, which are derived from feedforward neural networks, can process
varying length sequences of inputs by utilizing their interior state (memory). As a result,
they can be employed for tasks such as unsegmented, connected speech recognition,
or handwriting recognition. "RNN" refers to a network’s class with an infinite impulse
response, whereas "CNN" refers to a class with a finite impulse response. Both classes
of networks show a transient dynamic manner. In this regard, Jun, et al. [47] presented a
mechanism for character extraction based on RNN AEs. The RNN AEs range the initial
skeleton information more discriminatively and decrease unrelated data, which is espe-
cially significant with the LSTM AE, which performed better than the Generic Encap-
sulation (GRE AE). As a result, the characteristics shape the recognition operation of

13

Table 4  The methods, properties, and features of CNN mechanisms for pattern recognition
Authors Main Idea Advantage Challenges Method algorithm Security Simulation environ- Dataset
involved ment

13
Awan, et al. [8] Proposing a DTL • High accuracy • Poor flexibility • CNN-based models NO Python • The Databricks
method named • Low delay • High Complexity were used to catego-
Apache Spark sys- rize the data
tem to accelerate the • Testing the model
detection of positive on two datasets
COVID-19 patients
Sohangir, et al. [101] Using CNN, n-grams, • High accuracy • High Complexity • Using CNN to No • Python • StockTwits
and logistic regres- • Flexible • Poor scalability simplify the use of
sion to develop • Adaptable n-grams and logistic
optimal financial • Practical regression based on
decisions and • Effective a bag-of-words
analysis in the stock • Submitting doc2vec
market on StockTwits
Hossain and Muham- Proposing an audio- • High scalability • High Complexity • Using 2D and 3D No - • Enterface’05
mad [39] visual emotion • High accuracy CNN for audio and audiovisual emotion
recognition system video signals database
based on emotional
big data to gain
Mel-spectrogram
Ni [79] Applying LEW and • High accuracy • High Complexity • Evaluating No • C +  + & Python • Caffe framework
Celeba as training • High scalability CNN to provide
sets visual attributes for
Proposing a LeNet decreasing the num-
5 model with ber of calculations
A-softmax loss for
shortening training
time
Multimedia Tools and Applications
Table 4  (continued)
Authors Main Idea Advantage Challenges Method algorithm Security Simulation environ- Dataset
involved ment

Xu, et al. [122] Proposing a two- • High accuracy • High latency • CNN No • CCNU • Classroom dataset
module emotion- • High scalability • High Complexity
sensitive learning • High flexibility
framework for
developing analysis
of posture and facial
Multimedia Tools and Applications

expression in a non-
invasive behavior
Li, et al. [55] Presenting a CNN • High accuracy • High latency • CNN No • Tensor • CUAVE, SNAE2,
model to organize • Practical STL-10
the hierarchical
properties of mas-
sive data
Sevik, et al. [95] Creating a CNN for • High accuracy • Poor scalability • Creating a deep No • Alexnet • 228 letter images
recognizing both CNN, a pre-trained
Turkish letters and network, trained
fonts with 13 thousand
images

13
Multimedia Tools and Applications

RNN DMs (Direct Messages) and other DMs. Through the DMs, the GRE DM outper-
forms the GRE AE, and the GRE DM outperforms the LSTM DM in terms of accuracy.
The RNN AE-DM hybrid structures that are nourished with the characteristics perform
better than the separate RNN SMs nourished with the initial skeleton information. They
do so with less training time and fewer learning elements. Furthermore, the RNN AE-
two-pace DM’s training is more efficient than the End-to-End model’s single training
with a similar input stream.
Chancán and Milford [15] suggested an RNN + CNN model that can learn meaning-
ful transient connections from a single image sequence in a large drawing dataset.; when
standard sequence-based techniques surpass in terms of runtime, computing requirements,
and accuracy. The authors used a minor two-layer CNN to examine DeepSeqSLAM’s
end-to-end training method, but their basic results showed that the CNN element does not
generalize well to dramatic visual differences, which was estimated given that these mod-
els require a large amount of data for efficient generalizing and training. They tested their
method on two large benchmark datasets: Oxford RobotCar and Nordland, which logged
over 10 km and 728 km tracks, respectively, over a year with varying seasons, lighting con-
ditions, and weather. On Nordland, they compared their model with two sequence-based
mechanisms along the entire road under seasonal fluctuations, using a sequence length of
2, and showed that their model could attain above 2% AUC for SEQSLAM and 72% AUC
in compared with 27% AUC for Delta Descriptors.; when the arrangement time is reduced
from roughly 1 h to 1 min.
As well, Gao, et al. [30] proposed an effective RNN transducer-based Chinese Sign Lan-
guage Recognition (CSLR) method. They used RNN-Transducer in CSLR for the first time.
To begin, they created a multi-level visual hierarchy transcription network using phrase-
level BiLSTM, gloss-level BiLSTM, and frame-level BiLSTM to examine multi-scale vis-
ual semantic properties. Following that, a lexical anticipating network was used to model
the contextual data from sentence labels. Finally, a collaborative network seeks to learn
language representations as well as video properties. It was then fed into an RNN-Trans-
ducer to optimize adjustment learning between sentence-level labels and sign language
video. Extensive examinations of the CSL dataset confirmed that the provided H2SNet can
achieve higher authenticity and faster velocity.
Besides, Hasan and Mustafa [36] suggested an effective mechanism for robust gait rec-
ognition using an RNN that is related to Gated Recurrent Units (GRU) architecture and
is exceptionally powerful in capturing the transient dynamics of the human body gesture
sequence and executing recognition. They created a low-dimensional gait characteristic
descriptor derived from 2D that mixes human gesture data, is unaffected by diverse covari-
ate factors, and is efficient in describing the dynamics of various gait paradigms. Accord-
ing to their findings, the experiment using the CASIA A and CASIA B gait datasets dem-
onstrated that the given methodology surpasses the current approaches.
As offline Persian handwriting recognition is an issue task due to the Persian scripts’
cursive essence and sameness through the Persian alphabet letters, Safarzadeh and Jafar-
zadeh [91] proposed a Persian handwritten word identifier based on a continuous labeling
mechanism with RNN. A Connectionist Temporal Classification (CTC) loss operation is
also exploited to remove the segmentation pace required in convolutional systems. Fol-
lowing that, the layers are used to exploit the sequence of features from a word picture.
Overall, the RNN layer with CTC performance was used for labeling the input succession.
As a result, they demonstrated that this composition is an appropriate robust recognizer for
the Persian language. Consequently, they tested the approach on IFN/ENIT, Arabic, and
Persian datasets.

13
Multimedia Tools and Applications

Furthermore, Zhao and Jin [138] enhanced a "doubly deep" approach in temporal and
spatial layers of recurrent and convolutional networks for performance recognition. To
begin, they presented a developed p-non-local performance as a common efficient element
for capturing long-distance relationships. Second, they proposed Fusion KeyLess Atten-
tion in the class forecast level merging with the backward and forward bidirectional LSTM
to learn the sequential essence of the information more effectively and elegantly, which
employs a multi-epoch model fusion based on the confusion matrix. The authors tested the
proposed model on two heterogeneous datasets, Hollywoods and HMDB51, which resulted
in the model outperforming standard models and thus just using Rotating Graphics Base
(RGB) features for performance action recognition based on RNN. Table 5 discusses the
RNN methods used in pattern recognition and their properties.

5.3 GAN mechanisms for pattern recognition

A GAN is a type of ML/DL framework that learns to produce new information with the same
statistics as the training set in a given set. A GAN educated on images, for example, can
produce new images that appear to human observers to be at least allegedly genuine, with
multiple realistic qualities. Despite being primarily proposed as a type of generative model
for unsupervised learning. GANs have also been shown to aid reinforcement learning, semi-
supervised learning, and entirely supervised learning. The main principle behind a GAN is
"indirect" training among the separator, a further neural network that can determine how
much input is common-sense and constantly updated. This indicates that the producer is not
educated to reduce the distance to a certain image but rather to deceive the separator. This
allows the model to learn an unsupervised behavior. In this regard, Luo, et al. [66] presented
a Face Augmentation GAN (FA-GAN) to reduce the impact of uneven property distributions.
The authors used a hierarchical disentanglement module to decouple these attributes from the
identity representation. Graph Convolutional Networks (GCNs) are also employed for geo-
metric data recovery by exploring the interrelationships between local zones to provide iden-
tity protection in face information augmentation. Broad examinations of face reconstruction,
identification, and manipulation revealed the efficacy of their proposed approach.
Additionally, Gammulle, et al. [28] addressed the problem of fine-grained action frag-
mentation in sequences in which various performances are proposed in an unsegmented
video stream. The authors introduced a semi-supervised frequent GAN model for fine-
grained human activity segmentation. A Gated Context Extractor (GCE) module, a combi-
nation of gated attention units, seizes transient context data and leads it among the genera-
tor model for increased functionality segmentation. GAN is created to enable the model to
satisfy the action taxonomy accompanying the unsupervised, normal GAN learning pro-
cess due to learning features in a semi-supervised behavior. Finally, the result showed that
it could outperform the current state-of-the-art on three major datasets: MERL shopping
and Tech egocentric performances dataset and 50 salads.
Also, Fang, et al. [27] presented a face-aging approach called Triple-GAN for organiz-
ing age-processed faces. Triple-GAN has adjusted increased adversarial loss to emphasize
the synthesized faces’ realism and learn efficient mapping along age margins. Rather than
resolving ages as independent clusters, triple translation loss has been coupled to an addi-
tional model to the intricate solidarity of multiple age ranges and simulates more realistic
age enhancement, another enhancing the generator’s predominance. Multiple qualitative
and quantitative examinations performed on CACD, MORPH, AND CALFW showed the
efficiency of their proposed mechanism.

13

Table 5  The methods, properties, and features of RNN mechanisms for pattern recognition
Author Main idea Advantage Challenge Method algorithm Security Simulation environ- Dataset
involved ment

13
Jun, et al. [47] Introducing a method • High accuracy • Poor adaptability • Nourishing an No • NVIDIA • Gait dataset
for improving • Poor flexibility RNN AE-DM
character extraction hybrid structure
based on RNN with the features of
AEs, decreasing the initial skeleton
unrelated data and
training time
Chancán and Milford Proposing a DeepSe- • Low latency • Poor integrity • RNN No • MATLAB • Nordland
[15] qSLAM, a trainable • High accuracy • The oxford RobotCar
RNN architecture
for improving train-
ing data
Gao, et al. [30] Creating a multi-level • High accuracy • Poor flexibility • Creating a tran- No • Python • CSL
visual hierarchy scription network
transcription net- with a visual
work using phrase- hierarchy
level to improve • RNN
multi-scale visual
semantic properties
Hasan and Mustafa Proposing an efficient • High accuracy • Poor stability • RNN No • Python • CASIA gait
[36] method for devel- • High robustness
oping robust gait
recognition using
RNN
Safarzadeh and Jafar- Proposing a Persian • High robust recog- • Poor adaptability • Using The RNN No • Tensorflow • IFN/ENIT datasets
zadeh [91] handwritten word nizer layer with CTC
identification based • High accuracy performance for
on a sequence labeling the input
labeling technique succession
with RNN
Multimedia Tools and Applications
Table 5  (continued)
Author Main idea Advantage Challenge Method algorithm Security Simulation environ- Dataset
involved ment
Zhao and Jin [138] Introducing a novel • High efficiency • Poor scalability • Introducing a No - • YouTube
“doubly deep” in • Practical developed p-non-
temporal and spatial local performance
layers to improve • RNN
the performance of
recognition based
on RNN architec-
Multimedia Tools and Applications

ture

13
Multimedia Tools and Applications

Chen, et al. [21] presented the NM-GAN anomaly distinction model, which incorporates
the discrimination network D in the GAN-like architecture as well as the reconstructing
network R. Their work provided a significant contribution by regulating the generalization
capability of detection capabilities of networks D and R by embedding the noise map into
an end-to-end adversarial learning technique at the same time. The authors provided the
model to improve the discriminator’s detection capability and the generative model’s gen-
eralization capability in an integrated architecture. According to the results of the studies,
their model outperforms most competing models in terms of stability and accuracy, dem-
onstrating that the offered noise-modulated adversarial learning is efficient and trustworthy.
Finally, Men, et  al. [72] developed an attribute-Decomposed GAN, a new generative
model for controllable person image combination capable of producing realistic person
images with desired human attributes derived from various source inputs. The authors fun-
damentally integrated human traits as distinct codes in the hidden space and subsequently
obtained flexible and sequential management of attributes through combination and inter-
polation performances in vivid style representations. They specifically presented a design
that incorporates two encoding routes connected by style block connections for the aim of
principal hard mapping deconstruction into multiple accessible subtasks. They then used
an off-the-shelf human decomposer to exploit component layouts and feed them into a
shared global texture encoder for decomposed hidden codes. As a result, they concluded
that their proposed approach is more effective than the existing ones. Table 6 summarizes
the GAN approaches and their properties in pattern recognition.

5.4 AE mechanisms for pattern recognition

An AE is a type of ANN that is used to learn effective coding of unlabeled data. An


attempt to recreate the input from the encoding authenticates and purifies the encoding.
The AE learns to serve a data set, typically for dimensionality reduction, by training the
network to ignore irrelevant information. The variant present addresses the need to force
known representations to assume useful properties. AEs are used for a variety of tasks,
including feature detection, anomaly detection, facial recognition, and determining the
meaning of words. Furthermore, AEs are generative models: they are capable of producing
new information that appears to be input information by accident. By this token, Simpson,
et al. [100] presented a reduced sequenced modeling mechanism based on the availability
of output and input data for developing a representation that can mimic the reaction of non-
linear infrastructure systems under "unseen" compelling time histories. They demonstrated
the modeling approach and its efficacy on various nonlinear systems of variable size and
complexity.
Also, Kim, et  al. [49] introduced a parallel end-to-end Text-To-Speech (TTS) system
that generates more natural-sounding audio than the previous two-step approaches. Their
technique modified variable assumptions raised using normalizing streams and an adver-
sarial training procedure, which developed generative modeling’s stunning strength. The
authors also proposed a stochastic duration predictor to unify speech with different rhythms
from input text. Their model asserted the natural one-to-many link in which text input can
be spoken in numerous directions with diverse rhythms and pitches using the unpredict-
able modeling over hidden variables, the stochastic duration predictor. A subjective human
evaluation of the LJ speech, a unique speaker dataset, showed that their method surpassed
the best publicly available TTS systems and achieved a Metal Oxide Semiconductor (MOS)
comparable to trustworthiness.

13
Table 6  The methods, properties, and features of GAN mechanisms for pattern recognition
Author Main idea Advantage Challenge Method algorithm Security Simulation environ- Dataset
involved ment

Luo, et al. [66] Proposing a graph- • High accuracy • Poor flexibility • Using a hierarchical No • PyTorch • Multi-PIE dataset
based two-pace • Well-adapted to • High complexity disentanglement
FA-GAN to reduce the real environ- module
the impact of uneven ment • Applying a GCN
property distribu- for geometric data
tions recovery
Multimedia Tools and Applications

Gammulle, et al. [28] Proposing a semi- • High flexibility • Poor scalability • GAN No • Keras, Theano • 50 salads
supervised frequent • High adaptability • MERL shopping
GAN model for
fine-grained human
activity segmenta-
tion
Fang, et al. [27] Proposing DR-RGAN a • Low delay • Poor flexibility • Extracting steady No - • CACD, MORPH
model for developing identity features by
age-processed face DR-RGAN
and age-invariant
face authentication
Chen, et al. [21] Presenting NM-GAN • High accuracy • Poor adaptability • Embedding the No • Tensorflow • UCSD dataset
anomaly distinction • High stability noise map into an
model to develop end-to-end adversar-
the ability of pattern ial training method
detection at the same time
Men, et al. [72] Developing a man- • High flexibility • High complexity • GAN No • PyTorch • NVIDIA Tesla-V100
ageable attribute- GPUs
decomposed GAN
for a person image
that can provide
realistic images of a
person

13
Multimedia Tools and Applications

Furthermore, Utkin, et  al. [107] developed a mechanism for modeling an anticipating
DL model that combines the variational AE and the conventional AE. The variational AE
provided a series of vectors based on the previously described picture embedding at the
testing or describing level. Following that, the conventional AE’s directed decoder section
rebuilt a succession of images that configured a heatmap explaining the original explained
image. Finally, they tested their model on two well-known datasets, CIFAR10 and MNIST.
Additionally, et al. [84] developed a generative AE model with dual contradistinctive losses
to produce a generative AE that simultaneously acts on both reconstruction and sampling. The
suggested model, known as the dual contradistinctive generative AE (DC-VAE), combined an
instance-stage discriminative loss with a set-level adversarial loss, both of which are contradis-
tinctive. They analyzed extensive experimental conclusions by DC-VAE over various resolu-
tions consisting of 32 × 32, 64 × 64, 128 × 128, and 512 × 512 are recorded. The two contra-
distinctive losses in VAE function concord in DC-VAE resulted in specific quantitative and
qualitative operations gained across the baseline VAEs missing architectural variances.
Moreover, Bhamare and Suryawanshi [13] proposed an end-to-end algorithm,
VGELDA, that complemented diverse deduction and graph AEs for IncRNA-disease
contributions forecasting. VGAELDA included two types of graph AEs. The association
of both the VGAE for substitute training and graph representation learning by various
assumptions intensified the ability of VGAEDA to grasp effective low-dimensional repre-
sentations from high-dimensional characteristics and therefore allowed the accuracy and
robustness for forecasting IncRNA-disease contributions. Their analyses highlighted the
solvation of the designed co-training framework of IncRNA for VGAELDA, a geometric
matrix issue for grasping effective low-dimensional representations by a DL method.
Besides, Atitallah, et al. [7] developed a 5-layer AE-based model for detecting unusual
network traffic. The primary architecture and parts of the suggested model were developed
as a result of a thorough investigation into the influence of an AE model’s major function
indicators and recognition accuracy. According to their results, their model achieved the
maximum accuracy using the proposed two-sigma outlier availability method and Metro-
politan Area Exchange (MAE) as the rebuild loss criterion. The authors used MAE based
on rebuild loss performance to achieve the maximum accuracy for the AE model used in
network anomaly recognition. In comparison to alternative model architectures, the sug-
gested model with the optimal number of neurons exploited at each latent space layer
delivers the best function. Finally, they tested the model using the widely used NSL-KDD
dataset. Compared to similar models, the performance attained 90.61% accuracy, 98.43%
recall, 86.83% precision, and 92.26% F1 score.
On the other hand, Zhang, et al. [135] introduced an attack architecture, Anti-Intrusion
Detection AE (AIDAE), to create features to disable the IDE. An encoder in the framework
sends parts into a hidden space, while many decoders reconstruct the sequence and dis-
tinct properties accordingly. The authors tested the framework using datasets from UNSW-
NB15, NSL-KDD, and CICIDS2017, which resulted in the system degrading the detection
function of existing Intrusion Detection Systems (IDSs) by producing features. Table 7 dis-
cusses the AE methods used in pattern recognition and their properties.

5.5 EL mechanisms for pattern recognition

EL mechanisms in ML/ DL and statistics use many learning algorithms to provide higher
predictive functions than each component learning algorithm solidarity. In contrast to a
statistical ensemble in statistical mechanics, an ML ensemble, which is often unlimited,
includes just a limited specific sequence of alternative models but normally seeks a more

13
Table 7  The methods, properties, and features of AE mechanisms for pattern recognition
Author Main idea Advantage Challenge Method algorithm Security Simulation environ- Dataset
involved ment

Simpson, et al. [100] Proposing a simpli- • High feasibility • Poor scalability • Reducing No - • 20 DOF single non-
fied model based on • Low energy con- sequenced AE linearity system
the use of an AE sumption modeling based on
available input and
output data
Kim, et al. [49] Proposing VITS, a • High accuracy • High complexity • AE No • NVIDIA • LJ speech, VCTK
Multimedia Tools and Applications

parallel TTS system • High adaptability


that produces more
natural-sounding
audio than previous
ones
Utkin, et al. [107] Developing a method • Low delay • High complexity • AE No • Keras • MNIST, CIFAR-10
for modeling and • High accuracy • Providing a series
forecasting the of vectors
DL model using a • Rebuilding a
combination of a successive image
variable AE that configured a
heatmap
Parmar, et al. [84] Proposing a genera- • High quality syn- • Poor scalability • AE No - • CIFAR-10
tive AE model with thesizes • STL-10
DC-VAE which act • High accuracy • CelebA
on both reconstruc- • CelebA-HQ
tion and sampling
at the same time
Bhamare and Proposing • High robustness • Poor scalability • Two AEs trained No - • IncRNA-diseases
Suryawanshi [13] VGAELDA, an • High preciseness by variational infer-
end-to-end model ence

13

Table 7  (continued)
Author Main idea Advantage Challenge Method algorithm Security Simulation environ- Dataset
involved ment

13
Atitallah, et al. [7] Proposing a new • High accuracy • Poor adaptability • AE No - • NSL-KDD
5-layer AE-based • Low delay • Poor feasibility
model for develop-
ing the detection
of unusual network
traffic
Zhang, et al. [135] Introducing an • High accuracy • Poor scalability • AE No • Python, PyTorch • UNSW-NB15, NSL-
attack architecture, • Low security KDD, CICID2017
AIDAE to create
features to disable
the IDE
Multimedia Tools and Applications
Multimedia Tools and Applications

flexible structure to exist amongst those substitutes. With this in mind, Abbasi, et al. [1]
presented an EL called ElStream uses seven various artificial and real datasets for assort-
ment. Various ensemble and ML algorithms based on majority voting are used. The
ELStream technique employed elegant ML algorithms that are evaluated using f-scores and
accuracy criteria. The baseline approach achieved the highest accuracy of 92.35%, but the
ElStream mechanism achieved the highest accuracy of 99.99%, displaying a skilled util-
ity of 7.64%. According to their findings, the proposed Elstream method can identify idea
drifts and categorize data more accurately than earlier research.
By this token, Zhang, et al. [134] suggested an EL model that directly forecasted Vickers
hardness, which consists of anomalous load-dependent hardness, with quantitative accuracy.
Their approach was confirmed by developing a unique hold-out test set of hard materials and
analyzing eight metal disilicides. Both provided excellent assurance in achieving hardness at
all loads for both materials. The model used to anticipate the hardness of 66,440 is part of
Pearson’s crystal dataset, which contains probable hard characteristics in just 68 previously
unexplored materials. The proposed approach of direction finding is set to update the search
for innovative hard material by leveraging ML’s effectiveness, transferability, and scalability.
Additionally, Lee, et al. [52] proposed a unified ensemble technique called SUNRISE,
which is compatible with different off-policy RL algorithms. Two important components
that have been integrated are (a) an interference technique that selected functions for
effective inspection by using the highest upper-confidence limits and (b) Weighted Bell-
man backups relied on ambiguity approximates from a Q-ensemble to re-weight marked
q-values. The authors implemented the method among agents using Bootstrap with random
initialization to show that these various ideas are highly orthogonal and can be beneficially
integrated, as well as the subsequent development of the performance of existing off-policy
RL algorithms, such as Rainbow DQN and a Soft Actor-Critic, for both separate and con-
tinuous control tasks on both high-scale and low-scale ecosystems.
Onward, Mohammed, et al. [73] contributed to the critical improvement of an entirely
digital COVID-19 test [109] using ML mechanisms to analyze cough recordings. They
developed a way for creating crowdsourced cough sound examples by breaking/insulating
the cough sound into non-overlapping coughs and utilizing six different representations
from each cough sound. It was assumed that there was unnoticeable data loss or frequency
deformation. They did not attain more than 90% accuracy due to a large degree of overlap
among the class of characteristics. However, this unbiased selection criterion ensures that
the predictive model is as independent kinds of the pattern and categorizer as possible.
On the other hand, Khairy, et  al. [48] presented a voting prepositioning and boosting
ensemble model for banknote recognition. A mixture of ten algorithms and nine different pair-
ings were sampled, yielding an exact accuracy rate. Experiments on Swiss franc banknote and
banknote authentication datasets showed that ensemble algorithmic models could create accu-
rate identification of exclusive methods. With the banknote authentication dataset, voting and
AdaBoost served for a maximum of 100% and 99.90%, respectively, while the Swiss franc
dataset served for a maximum of 99.50 percent. As a result, testing and analyzing the offered
models confirmed their adequacy and applicability for detecting counterfeit banknotes.
Zhang, et al. [136] advocated employing ML technologies to solve the PPH predictive
detection problem. Two principal contributions were (1) the well-organized EL approaches
and (2) the amassing of a big clinical dataset. Their DIC and PPH datasets each have 212
and 3842 records. The trained prediction detection model produced accurate findings. As
a result, the accuracy of real PPH detection would increase to 96.7%; The overall accuracy
of anticipating Disseminated Intravascular Coagulation (DIC) can surpass 90%. Table  8
displays the EL methods used in pattern recognition and their properties.

13

Table 8  The methods, properties, and features of EL mechanisms for pattern recognition
Author Main idea Advantage Challenge Method algorithm Security Simulation Dataset
involved environment

13
Abbasi, et al. [1] Proposing ElStream • High accuracy • Poor scalability • ELStream model No • Python • PokerHand
to provide a higher • Practical • High complexity • Applying elegant ML • SEA
predictive function algorithms evaluated • Hyperplane
than sole ones by using f-score and • LED
accuracy criteria
Zhang, et al. [134] Proposing an EL • High accuracy • High complexity • EL No • Python • 66,440 compositions
approach that predicts • High scalability in Pearson’s crystal
Vickers hardness dataset
directly
Lee, et al. [52] Proposing EL method • Proper accuracy • Poor adaptability • EL No • Python • Rainbow DQN
that is compatible with
different off-policy RL
algorithms
Mohammed, et al. [73] Creating a depend- • High accuracy • Excessive false posi- • EL No • Python • Crowdsourced
able categorizer tive cough
for a COVID-19
pre-screening process
using the EL method
Khairy, et al. [48] Proposing a voting • High accuracy • Poor scalability • Using AdaBoost’s No • Python • WEKA
mechanism and a • High flexibility boosting ensemble to
boosting ensemble improve the function
model for banknote • EL
detection
Zhang, et al. [136] Solving the issue of • High accuracy • Poor scalability • EL No • Python • PPH dataset
PPH predictive detec- • DIC dataset
tion by using the EL
method
Multimedia Tools and Applications
Multimedia Tools and Applications

5.6 RL mechanisms for pattern recognition

RL is a branch of ML concerned with how intelligent agents should be enforced in an eco-


system to increase the concept of crowded awards. RL is one of the three main ML patterns
found in supervised and unsupervised learning. RL differs from supervised learning because
it does not require labeled output/input pairs to be presented, and it does not require sub-
optimal performance to be adjusted. Because dynamic programming approaches are utilized
in the context of RL algorithms, the environment typically begins in the shape of a Markov
decision process. Automatic surgical gesture recognition, such as competence appraisal and
conducting sophisticated surgical inspection tasks, is a fundamental advancement in robot-
assisted surgery. In this regard, Gao, et  al. [29] suggested a framework for simultaneous
surgical gesture assortment and segmentation based on RL and tree search. An agent was
instructed whose direct actions were appropriately reviewed via tree search to collect and
segment the surgical video in a human-like behavior. The proposed tree search algorithm
unified the outputs from two designed value networks and neural networks policy. Overall,
the proposed method consistently outperformed existing strategies on the suturing task of
the JIGSAWS dataset in terms of edit, accuracy, and F1 score. Finally, they discussed the
usage of tree search in the RL framework for robotic surgical applications.
Benhamou and Saltiel [85] also handled the difficult task of modifying and determining
portfolio commitment to the crisis ecology. The authors exploited contextual data with the
help of a second deep RL sub-network. The model considered portfolio approaches stand-
ard disparity, such as contextual data, and portfolio methods over multiple rolling periods,
such as the Citigroup economic surprise index, risk aversion, and bond-equity correlation.
The additional contextual data made the dynamic property manager agent’s learning more
resilient to crises. Furthermore, using the standard deviation of portfolio strategies gener-
ated a significant indication for future crises. Their model outperformed typical financial
models in terms of functionality.
Besides, Wang and Deng [114] provided an adoptive border to learn balanced opera-
tion for many races that rely on large border losses. The proposed RL is based on a Race
Balanced Network (RL-RBN), which formulated the procedure of discovering the optimal
borders for non-caucasian as a Markov decision procedure and used deep Q-learning to
learn rules for an agent to choose the proper border by estimating the Q-value operation.
Agents reduced the skewness of attributes distributed between races. They also created two
ethnicity-aware education databases. The datasets BUPT-Balancedfaced and BUPT-Global
face were used to analyze racial prejudice from both algorithm facets and data. Several
large-scale analyses of the RFW database showed that RL-RBN successfully lowers racial
prejudice and learns a fairer operation.
In addition, Wang, et al. [115] modeled an online key decision process in dynamic video
segmentation as a deep RL issue and learned an impressive scheduling rule from special
data about the history and the procedure of maximizing global return. They also looked
into dynamic video segmentation on face videos, which has never been done previously.
They demonstrated that the operation of their reinforcement key scheduler surpasses that of
alternative baselines in terms of running velocity and efficient key selections by analyzing
the 300VW dataset. According to their findings, their provided method was generalizable
to various modes, and they introduced an online key-frame decision in dynamic video seg-
mentation for the first time.
Further, Ma, et  al. [69] proposed a DL solution for robust action identification with
WiFi that exploits an RL agent to recognize the original neural architecture for the

13
Multimedia Tools and Applications

identification algorithm. They evaluated the provided design using real-world tracks of 5
activities carried out by seven people. The introduced concept achieved 97% average iden-
tification accuracy for unidentified receiver directions/places and unseen people. When the
neural architecture was manually examined, the RL agent exhibited a 15% improvement
in accuracy. In collaboration with the RL agent, the state machine improved the additional
20% accuracy by learning transient dependencies from previous assortment outcomes. Two
public datasets assess the presented design and reach 80% and 83respectively.
Moreover, Gowda, et al. [32] proposed a centroid-based model that clustered semantic
and visual models, considered full training instances at once, and generalized precision to
samples from previously undiscovered classes. They optimized the clustering using RL,
which is serious for their model to work. They discovered that it consistently outperformed
the proposed model in the most standard datasets, including HMDB51, Olympic Sports, and
UCF101, by calling the presented method CLUSTER, which was both in generalized zero-
shot learning and zero-shot assessment. They also outperformed their model in the image-
board competition. Table 9 lists the RL methods and their attributes utilized in this topic.

5.7 RF mechanisms for pattern recognition

RF decision is an aggregate learning technique for assortment, regression, and other tasks
that involves constructing many decision trees during training. For assorting tasks, the RF
yield is the class picked by the majority of trees. The average forecast or mean of exclusive
trees is returned for regression tasks. RFs are ideal for decision trees since they have a habit
of overfitting to the training series. RFs outperform decision trees on average, but their
accuracy is lower than that of gradient-increased trees. In this regard, Awan, et al. [9] pro-
posed a solution to a security problem that resulted in a secure platform for social media
users. The solution used facial recognition and Spark ML lib to train 70% of the profile
data on ML and then investigated the remaining 30% of data to investigate prediction and
accuracy. Their prediction model was based on words such as reading datasets from CSV
characteristic engineering training data using RF, displaying learning curves, plotting con-
fusion matrix, and plotting ROC cure. They achieved 94% accuracy. The limitation of this
plan consisted of multiple false positive outcomes that can alter the result operation by up
to 6%.
Also, Moussa, et al. [77] applied the fractional coefficients method for facial recognition
scope. In addition, they applied RF and SVM in face recognition over the Euclidean dis-
tance. They then compared and examined the functions of RF and SVM to categorize the
characteristic vectors, and the results of the assortment issued from various characteristics
created the model’s outstanding benefit, followed by selecting the Discrete Cosine Trans-
form (DCT) coefficients. The authors demonstrated efficient results of applying the RF in
terms of accuracy when compared to SVM and Euclidean distance while the face recogni-
tion algorithm is investigated. As a result, despite SVM, unique decision trees in the RF
instructive performances were automatically used more frequently in the training phase,
resulting in separate predictions blended to generate an accurate RF.
Besides, Marins, et  al. [71] established an approach for identifying and categorizing
problematic events across the operational performance of O&G generation lines and wells.
They considered seven types of faults with normal performance status. The enhanced sys-
tem used a categorizer based on the RF algorithm and a Bayesian non-convex optimization
technique to optimize the system hyperparameters. Three tests were included to evaluate
the system’s capability and robustness in diverse fault recognition/taxonomy settings: tests

13
Table 9  The methods, properties, and features of RL mechanisms
Author Main idea Advantage Challenge Method algorithm Security Simulation environ- Dataset
involved ment

Gao, et al. [29] Putting a framework • High accuracy • Poor adaptability • Training an agent to No - • JIGSAWS
for simultaneous categorize and seg-
surgical gesture ment the surgical
classification and video
segmentation based • RL
on RL and search
Multimedia Tools and Applications

tree
Benhamou, et al. [12] Proposing a DRL • High robustness • Poor scalability • Using contextual No • Python • The standard devia-
framework for • High accurate data via a second- tion of the portfolio
modifying portfolio detection DRL sub-network
commitment to the
ecology crisis
Wang and Deng [114] Providing an adaptive • High accuracy • High complexity • Q-learning No • Tensorflow • BUPT-Global face
border to train • High integrity • BUPT-Balancedface
balanced operations • RFW
for many races
Wang, et al. [115] Modeling an online • High accuracy • Poor adaptability • RL No • Tensorflow • 300v
key decision • High scalability • Cityscapes dataset
approach for
dynamic video
segmentation
Ma, et al. [69] Proposing a DL • High accuracy • Poor scalability • RL No • Python • fallDefi
solution for robust
action detection
with WiFi which
uses a 2D CNN and
an RL agent

13

Table 9  (continued)
Author Main idea Advantage Challenge Method algorithm Security Simulation environ- Dataset
involved ment

13
Gowda, et al. [32] Developing a model • high accuracy • High complexity • Optimizing the No • Kinetics • UCF101
that uses RL to • high scalability clustering using RL • HMDB51
discover the best • Olympic Sports
clustering-based
model
Du, et al. [26] Using a finite-sample • High security • Poor adaptability • RL No - • Linear functions
approach ensures • High robustness • Non-linear
the quality of the
learned investiga-
tion algorithm
Multimedia Tools and Applications
Multimedia Tools and Applications

1and 2 regarded the binary normal × faulty situations, that the flaws were standing alto-
gether and exclusively, respectively; test 3 draws the multiclass scenario, that the system
operated simultaneous fault recognition and assortment and is the best for functional uti-
lization. Besides the high accuracy, the system also reached a short recognition latency,
detecting the fault before finishing 88% of its temporal period, so it generated more time
for the conductor to decrease associated destructions.
Moreover, Jiao, et al. [46] focused on a computational TTCA recognizer called iTTCA-
RF, utilizing the hybrid characteristics of Global Positioning System Data (GPSD), PAAC,
and GAAPC. Using the MRMD-successful characteristic selection approach and IFS the-
ory, the top 263 relevant characteristics were chosen to construct the best operation predic-
tor. In this manner, the imbalance problem was addressed by utilizing the SMOTE-Tomek
resampling process. ITTCA-RF reaches the best CV appraising BACC value of 83.71%
which is 4.9% higher than the related valuing of the prior stated best predictor. The inde-
pendent experiment BACC point was 73.14% development of 2.4%, and joint Sp and Mat-
thews Correlation Coefficient (MCC) values enhanced by 4.0% and 4.6% accuracy respec-
tively as well.
Additionally, Hafeez, et al. [33] developed a model to identify the action; each action
derived based on the character derived from a method of directional angle, time-domain,
and depth motion map, for the Huthe man Action Recognition (HAR) system. They used
multiple RF algorithms as a categorizer with a benchmark UTD-MHAD dataset and
achieved an accuracy of 90%. As a result, they demonstrated that the identification handled
by their method is much improved in terms of imprecision and efficiency.
Besides, Langroodi, et al. [51] provided a fractional RF algorithm to develop an accu-
rate activity detection model. They tested the generalizability of the suggested technique by
applying it to three case studies in which several scenarios were constructed. Consequently,
they reached these results: (1) The current Frame Relay Forum (FRF) can give equivalent
operation to contemporary DL-based activity detection systems with only a fraction of the
training dataset used in earlier techniques, with an accuracy of up to 94% for articulated
equipment and 99% for rigid body equipment. (2) Compared to other baseline superficial
learners, FRF performs better in accuracy, recall, and precision.; (3) With an accuracy of
86.2%, the FRF approach can forecast activities of an actual piece of equipment in var-
ied shapes/sizes. In a repeated scenario of testing the technique on scaled RC equipment,
FRF achieved an accuracy of 72.9%, which is equivalent to the results reported in existing
machine-learning-based techniques.
Moreover, Akinyelu and Adewumi [4] developed a content-based phishing recognition
system that bridged a recent gap discovered in their research. The authors employed and
documented the use of RF ML in categorizing phishing strikes. The primary goal is to
upgrade created phishing email categorizers with greater forecasting accuracy and fewer
features. Afterward, they examined the proposed method on a dataset including 2000 ham
and phishing emails, a series of eminent phishing email features extracted and exploited by
the ML algorithm with a consequence categorizing accuracy of 99.7% with a trivial false
positive rate of about 0.06%. Table 10 deliberates the RNN approaches used in pattern rec-
ognition and their properties.

5.8 MLP learning mechanisms for pattern recognition

A class of feedforward ANN called multilayer perceptron, which is utilized vaguely, some-
times means any feedforward ANN, usually severely points to networks combined of

13

Table 10  The methods, properties, and features of RF mechanisms in pattern recognition
Author Main idea Advantage Challenge Method algorithm Security Simulation Dataset
involved environment

13
Awan, et al. [9] Using a facial recognition • High accuracy • Poor recognition of • RF Yes Python • + 4000 Facebook
library, training 70% of the • Developed perfor- errors profiles
data on an ML algorithm mance
Moussa, et al. Applying fractional coef- • High accuracy • Poor scalability • Applying RF in Yes • MATLAB • Yale benchmark
[77] ficients method for develop- • High flexibility face recognition, face databases
ing facial recognition scope • High robustness comparing and
using RF method testing the perfor-
mance of RF
Marins, et al. [71] Proposing a method for sort- • High accuracy • Poor adaptability • Relies on the RF Yes Python • 3W dataset
ing and detecting defective • Low latency algorithm and
events during the useful • High performance Baysian non-con-
function of O&G manufac- vex optimization
turing techniques
Jiao, et al. [46] Introducing a computational • High accuracy • Poor scalability • RF Yes Python • Jackknife test
recognition method that • High robustness • Poor adaptability • K-fold CV
makes use of the proper- • High reliability • Independent test
ties of GPSD, PAhAC, and
GAAPC comfortable
Hafeez, et al. [33] Developing a RL model for • High accuracy • Poor scalability • RF No Python • UTD-MHAD
action recognition for the
HAR system
Langroodi, et al. Providing a fractional RL • High accuracy • Poor scalability • RF No • Python • Three case study
[51] method to develop an • High efficiently • Poor integration • Utilizing the infor-
accurate activity detection mation from one
model excavator
Akinyelu and Proposing a content-based • High accuracy • Poor adaptability • Making use of the No • 32-bit desktop • 2000 phishing and
Adewumi [4] phishing identification • High robustness RF method in a ham emails
method that bridges the variety of phishing
recent gap in phishing attacks
strikes
Multimedia Tools and Applications
Multimedia Tools and Applications

several layers of perceptron, terminology, and see. The multilayer perceptron is commonly
referred to as a "vanilla" neural network, especially when they comprise a single hidden
layer. An MLP includes as many as three-node layers: a concealed layer, an input layer, and
an output layer. Each node is a neuron with nonlinear activation performance except for the
input nodes. For training, MLP employs a supervised learning method known as backprop-
agation. MLP is distinguished from a linear perceptron by its several layers and non-linear
activation. It is capable of distinguishing information that is not linearly divisible. By the
same token, de Arruda, et al. [6] focused on improving a systematic method of recognition-
among concepts from feature selection, pattern recognition, and network science-the fea-
tures that are especially particular to prose and poetry. The authors drew on the Gutenberg
database for poetry and prose. They summarized the texts in terms of total recognition of
the phones and rhymes. Their contour was characterized in terms of some supplied criteria,
which included the coefficient of the diversity of time intervals and the mean, which is then
used to choose amongst data property selectors. They expressed the connection of patterns
as a complex network of instances.
Also, Chen, et  al. [18] introduced an LPR-MLP hybrid pattern that utilizes ReliefF,
PCA, and Local Binary Pattern (LBP) due to process image information and meteorologi-
cal mechanics information, and thus exploited MLP to forecast its health stage, then solved
the issue of forecasting the health state of transmitting lines below multimode, high-dimen-
sion, heterogeneous, nonlinear, information. According to their findings, the LPR-MLP
pattern outperformed the other classic patterns in terms of forecasting accuracy and func-
tion. Their provided model generated a fresh notion and effective transmission line health
forecasting methodologies, but the rough character of the feature identified from data pho-
tos is a disadvantage.
Also, Zhang, et al. [131] presented a new MorphMLP architecture that focused on col-
lecting local information at low-stage layers while gradually shifting to focus on long-term
modeling at high-stage layers. They specifically designed a fully-connected-Like layer,
understood as MorphFC, of two morphable filters that enhanced their receptive field pro-
gressively over the width and height dimensions. They also offered to modify the Mor-
phFC layer in the video spectrum freely. They created an MLP-like backbone for learning
video outlines for the first time. Finally, they looked at large-scale tests on picture assort-
ments, semantic fragmentation, and video assortment.
Similarly, Chen, et  al. [20] presented a typical MLP-like architecture, CycleMLP,
that was an adaptable backbone for dense forecasting and visual recognition. Compared
to recent MLP architectures such as Gmlp, ResMLP, and MLP-Mixer, whose architec-
tures complement picture size and are hence unachievable in object detection and frag-
mentation, CycleMLP offers two advantages. (1) it achieved linear computing complex-
ity to image size by employing local windows. (2) It could handle a variety of image
sizes. Mutually, prior MLPs had O(N2) computations owing to fully spatial relations. The
authors constructed a family of patterns that exceed present MLPs and even state-of-the-
art Transformer-based patterns. CycleMLP-Tiny outperformed Swin-Tiny by 1.3% mIoU
on ADE20K dataset with lower FLOPs. Additionally, CycleMLP displayed great zero-shot
robustness on the ImageNet-C dataset as well.
Moreover, Hou, et al. [41] proposed an MLP-like network architecture for visual detec-
tion called Vision Permutator. They showed that individually encoding the width and
height data can greatly develop the pattern action compared to current MLP-like patterns
that consider the two spatial sizes as one. Despite the significant advancement over con-
currently famous MLP-like patterns, a significant downside of the given Permutator is the
scaling issue in spatial sizes, which is prevalent in other MLP-like patterns. Because the

13
Multimedia Tools and Applications

characteristics’ forms in the fully-connected layer are designed, processing input photos
with arbitrary forms is impossible, making MLP-like patterns difficult to exploit in down-
stream tasks with different-sized input images. Table 11 discusses the MLP methods used
in pattern recognition and their properties.

5.9 LSTM mechanisms for pattern recognition

LSTM is a type of artificial RNN architecture used in DL. Unlike traditional feedforward
neural networks, LSTM contains a feedback loop that can process individual data points
and entire data sequences. A standard LSTM module consists of an input gate, a cell, a
forget gate, and an output gate. The cell refers to values at arbitrary time intervals, and the
three gates control the current of data in and out of the cell. LSTM networks are designed
to categorize, analyze, and forecast data based on time-series information, which is a chal-
lenge when training typical RNNs. Associated invulnerability to gap length is a benefit
of LSTM over RNNs, hidden Markov models, and other continuous learning strategies
in several applications. In this regard, Xia, et  al. [120] proposed a DNN that composed
convolutional layers with LSTM for human activity detection. The CNN weight attributes
concentrated mostly on the fully-connected layer. In response to this characteristic, a GAP
layer was used to change the fully-connected layer beneath the convolutional layer, sig-
nificantly reducing model features while maintaining a high recognition rate. In addition,
after the GAP layer, a BN layer was added to enhance the pattern’s convergence and appar-
­ 1 score achieved 92.63%, 95.78%, and 95.85% accuracy on the
ent effect. Ultimately, the F
OPPORTUNITY, UCI-HAR, and WISDM datasets. Also, they investigated the effect of
some hyper-parameters on model actions like the filter amount, the kind of optimizers, and
batch size.
Also, Ullah, et al. [106] suggested an effective framework for real-world anomaly recog-
nition in supervision ecosystems with high accuracy on present anomaly recognition data-
sets. Their framework’s generic pipeline used the LSTM model from continuous frames,
which were traced by a unique multi-layer BD-LSTM for ordinary and anomalous class
classification. The examined results showed an enhanced accuracy of 3.14% for the UCF-
Crime dataset and 8.09% for the UCFCrime2Local dataset. Recently, the accuracy of their
framework is insufficient for low difference and requires development, especially as the
UCF-Crime dataset consists of very challenging classifications.
Besides, Rao, et al. [88] presented a generic unsupervised technique called AS-CAL to
learn efficient performance agents from unlabeled skeleton information for performance
recognition. They presented a method for learning essential performance patterns by com-
paring the similarity of increased skeleton sequences altered by various novel increase
methodologies, allowing their technology to realize the fixed pattern and discriminative
performance features from unlabeled skeleton sequences. They also proposed using a
queue to build a more stable, memory-effective dictionary with variable management of
preceding encoded keys to simplify contrastive learning. Computer-Aided Engineering
(CAE) was established as the ultimate function representation for performing action detec-
tion. Their technique beats existing hand-crafted and unsupervised learning mechanisms,
and its function is comparable to or even better than some supervised learning mechanisms.
Moreover, Huang, et  al. [42] presented an LSTM technique for the recognition of 3D
objects in a sequence of LiDAR point cloud observations. Their method conceals status
variables linked to 3D points from previous object recognitions and relies on memory,
which varies depending on vehicle ego-motion at each time step. A sparse 3D convolution

13
Table 11  The methods, properties, and features of MLP mechanisms for pattern recognition
Author Main idea Advantage Challenge Method algorithm Security Simulation Dataset
involved environment

de Arruda, et al. [6] Proposing an automated • High accuracy • Poor scalability • MLP NO • Python • A class of similar
MLP method for dis- • Low delay • Limited classifica- numbers of a phone
Multimedia Tools and Applications

tinguishing between tions


poetry and prose
Chen, et al. [18] Proposing a hybrid pat- • High accuracy • Poor rough feature • Using the MLP No - • Five sections of the
tern that used Relief, • High scalability extracted from image method to forecast the online monitoring
PCA, and LBP to data health status system of China
process image data
and predict the health
state
Zhang, et al. [131] Proposing a seself-atten- • High accuracy • High complexity • Designing a high No • DeiT • ImageNet-1K dataset
tion-freeMLP-Like • High- performance • Poor scalability fully-connected MLP- • ADE20k
backbone for learning Like layer interpreted
video outlines as a MorphFC layer in
the video spectrum
Chen, et al. [20] Proposing an MLP-like • High accuracy • Poor scalability • Constructing a family No • Python • ImageNet-1K
architecture that is an • Practical of patterns that exceed • COCO dataset
adaptable backbone MLP • ADE20K
for dense prediction
Hou, et al. [41] Proposing an MLP-like • High accuracy • Poor adaptability • Implementing an No • Pytorch • ImageNet-1K
network architecture • High scalability MLP-like network
for visual detection architecture

13
Multimedia Tools and Applications

network that co-voxelized the input point cloud and concealed state at each frame and
memory is the foundation of their LSTM. Tests on Waymo Open Dataset displayed that
their algorithm reached the outperformed results and acted with a single initial baseline of
7.5%, a multi-frame object baseline of 6.8%, and a multi-frame object recognition baseline
of 1.2% of accuracy.
In addition, Liu, et al. [64] proposed a new spatiotemporal saliency-based multi-stream
ResNet and a new spatiotemporal saliency-based multi-stream ResNet with attention-con-
scious LSTM for function recognition; these two techniques included three supplemen-
tary currents: a spatial current fed by RGB frameworks, a transient current fed by optical
flood frameworks, and a spatiotemporal saliency current fed by spatiotemporal saliency
graphs. Compared to convolutional two-stream-based and LSTM-based patterns, the pre-
sented techniques Synchronous Transport Signal (STS) can produce spatiotemporal object
background data while reducing foreground intrusion, confirming efficiency for human
performance recognition and the STS-ALSTM multi-stream pattern achieved the highest
accuracy when compared to input with individual modalities. Table 12 shows the LSTM
methods used in pattern recognition and their properties.

5.10 Hybrid methods for pattern recognition

Contrary to the other systems that are simple enough to solve the detection issues,
dynamic environments have to synthesize some approaches to tackle the sophistication
of pattern recognition. Such a situation necessitates the use of hybrid techniques that
combine two or more DL techniques. So, Mao, et  al. [70] proposed a System Activ-
ity Report (SAR) image provision mechanism relying on Cognitive Network-Generative
Adversarial Network (CN-GAN), which mixes LSGAN and Pix2Pix. A limit of regres-
sion performance was added to the producer’s loss performance to reduce the mean
square path between the produced and the actual instances. By considering Pix2Pix,
random noise is exchanged by the noise images inputted to LSGAN. Based on the con-
volutional CNN technique, a light network architecture designed to avoid the issue of
high model sophistication and overfitting resulted in the addition of a deep network
structure, allowing the detection operation to be developed. MSTAR data regulation was
used in the productive pattern training and goal detection tests. These results demon-
strated that CN-GAN can resolve SAR image difficulties with a small instance suitably
and powerful speckle noise.
Also, Wang, et  al. [118] presented a new mechanism relying on the utilization of a
GAN and CNN for Public Domain (PD) pattern recognition categorization in Geographic
Information System (GIS) on unbalanced instances. The unbalanced instances equalized
using this mechanism. A WD2CGAN is designed to offer fault instances for an unbalanced
instance caused by a faulty signal. Furthermore, the deconstructed hierarchical investiga-
tion space automatically constructs an ideal CNN for PD in the GIS. Finally, the PD pattern
identification in GIS under imbalanced cases is recognized using the trained ASCNN and
WD2CGAN. When compared to traditional GAN, the WD2CGAN instance equalization
processing developed by about 1% shows clear advantages. Simultaneously, in compari-
son with traditional CNN, the recognition accuracy of ASCNN is enhanced by a minimum
of 0.4%, and its parameter amount and space of storage are particularly decreased. Con-
sequently, the results validated the superiority of the presented WD2CGAN and ASCNN
models.

13
Table 12  The methods, properties, and features of LSTM mechanisms for pattern recognition
Author Main idea Advantage Challenge Method algorithm Security Simulation Dataset
involved environment

Xia, et al. [120] Proposing a DNN • High accuracy • Poor scalability • LSTM Yes • Tensorflow • UCI-HAR
combined convo- • Practical • Python • WISDM
lutional layers with • OPPORTUNITY
LSTM for human
activity detection
Multimedia Tools and Applications

Ullah, et al. [106] Proposing an LSTM • High accuracy • Poor visual features • LSTM No • Python • UCFCrimeLocal
framework for • Tensorflow • UCF-Crime
real-world anomaly
recognition in super-
vision ecosystems
Rao, et al. [88] Proposing a generic • High accuracy • Poor adaptability • LSTM No • Python • SBU Kinect Interaction
unsupervised • Low security • UWA3D Multiview Activ-
method to effectively ity
train performance
agents from unla-
beled skeleton data
Huang, et al. [42] Proposing a sparse • High accuracy • Poor adaptability • LSTM No • Python • Wayno Open Dataset
LSTM-based multi- • Memory efficient
frame 3D object rec- • Computation
ognition algorithm efficient
for the recognition
of 3D objects
Liu, et al. [64] Proposing a spatiotem- • High accuracy • Poor scalability • LSTM No • Python • Two different streambeds
poral saliency-based • High energy con-
multi-stream ResNet sumption
and a spatiotemporal
LSTM for function
recognition

13
Multimedia Tools and Applications

Besides, Nandhini Abirami, et al. [78] presented an effective assortment framework for
the account of retinal fundus image recognition to prevail over these obstacles. They began
by preprocessing the input image from the publicly accessible STARE database in three
stages: (a) specular reflection elimination and smoothing, (b) contrast increase, and (c)
retinal region extension. The features recovered from the preprocessing image using Multi-
Scale Discriminative Robust Local Binary Pattern (MS-DRLBP), based on RGB element
selection, LBP descriptor, and Gradient operation. Finally, the images were classified using
a hybrid CNN and RBF model that divided the retinal fundus images into four categories:
Copy Number Variation (CNV), Designated Router (DR), New Radio (NR), and Advanced
Micro Devices (AMD). Examined results of the presented mechanism gave an accuracy of
97.22% in comparison with the other present methodologies. Table  13 shows the hybrid
methods used in pattern recognition and their properties.
In addition, Butt, et al. [14] introduced DL considering method over an RNN that gained
successful consequences over Arabic text datasets like Alif and Activ. RNN’s operation
in sequence learning methods has been significant in previous works, particularly in text
transcription and speech recognition. The attention layer allowed people to obtain a con-
centrated scope of the input sequence, resulting in faster and easier learning. The authors
developed the lowering inline error rate in preprocessing by creating a new dataset of one
word on an image from Alif and Activ. They interpreted it with an accuracy of 85% to 87%.
This model reached better results than those based on a typical CNN, RNN, and hybrid
CNN-RNN.
Furthermore, Subhashini, et al. [104] used the DNN-Radial Basis Function (DNN-RBF)
for performing. To remove noise from the input signal, accessible speech samples are pre-
processed using a Wiener filter, and the Mel Frequency Cepstral Coefficients (MFCC)
features of this preprocessed signal are retrieved. The Gaussian Mixture Model (GMM)
super vector estimated an i-vector with reduced dimensionality. The Texas Instruments/
Massachusetts Institute of Technology (TIMIT) dataset is used to evaluate the function of
this speaker detection algorithm. The efficiency of the provided algorithm is then eval-
uated using multiple functions such as recall, precision, and accuracy. Through AHHO-
based DNN-RFB, accuracy, precision, and recall values are achieved at 94.92%, 89.87%,
and 94.67%, respectively. The performance of DNN-RBF developed in the presence of an
adaptive optimization method in speaker recognition. Table 13 discusses the Hybrid tech-
niques used in pattern recognition and their properties.

6 Results and comparisons

The previous section investigated several papers that used DL/ML approaches in pattern
recognition issues. DL techniques are being used to train computers for various tasks, such
as face recognition, image classification, object identification, and computer vision. In this
approach, computer vision seeks to mimic human perception, its many performances, and
DL behavior by providing computers with the necessary data. This section involves five
subsections that evaluate various aspects of DL/ML methods: DL methods applications,
DL method for pattern recognition, datasets of DL methods, criteria of DL/ML methods,
and result and analysis. Pattern recognition practices utilize various methods to extract
meaningful information from data. One commonly employed method is the use of machine
learning algorithms, such as SVMs, Random Forests, and MLP neural networks. SVMs
are effective in binary classification tasks, finding an optimal hyperplane to separate data

13
Table 13  The methods, properties, and features of Hybrid mechanisms for pattern recognition
Author Main idea Advantage Challenge Method algorithm Security Simulation envi- Dataset
involved ronment

Mao, et al. [70] Introducing an image • High accuracy • Poor efficiency • CNN + GAN No Python • MSTAR​
provision method
Multimedia Tools and Applications

based on CNN-GAN
to reduce the mean
square
Wang, et al. [118] Proposing a mechanism • High accuracy • A limited num- • GAN + CNN No - • AlexNet, BPNN,
based on using a GAN • High fault tolerance ber of instances ShffleNet, LeNet,
and CNN in GIS for
PD paradigm recogni-
tion
Nandhini Abirami, et al. Presenting an effective • High accuracy • Poor robustness • Classifying the No MATLAB • 400 raw images
[78] assortment framework • High performance images using a hybrid
for retinal fundus CNN-RBF
image recognition
Butt, et al. [14] Introducing a hybrid • High accuracy • High complexity • CNN + RNN No - • AcTiV dataset
CNN-RNN for recog-
nition of Arabic text
and speech transcrip-
tion
Subhashini, et al. [104] Presenting DNN-RBF- • High accuracy • High complexity • DNN + RBF No MATLAB • TIMIT of reading
based AHHO method • High performance speech
for speech recognition

13
Multimedia Tools and Applications

points. Random Forests combine multiple decision trees to improve accuracy and handle
complex datasets. MLP neural networks consist of interconnected layers of artificial neu-
rons and are effective in learning complex patterns. Another popular method is deep learn-
ing, which involves the use of deep neural networks, such as CNNs and LSTM networks.
CNNs excel in image and video analysis, capturing spatial hierarchies, while LSTMs are
suitable for sequential data analysis, preserving temporal dependencies. Ensemble learning
methods, including AdaBoost and Bagging, combine multiple models to enhance predic-
tion accuracy. Reinforcement learning techniques, such as Q-learning and Policy Gradi-
ent, enable machines to learn optimal decisions through interactions with an environment.
These methods provide a diverse toolbox for practitioners in pattern recognition, allowing
them to tackle various tasks and achieve accurate results.

6.1 DL applications for big data pattern recognition

In this section, we will discuss a variety of applications of DL techniques in pattern rec-


ognition.: (a) Virtual assistants such as Google Assistant, Amazon Echo, Siri, and Alexa
all use DL to provide you with a personalized user experience. They are trained to rec-
ognize the user’s voice and accent and provide you with a secondary human experience
amid machines by utilizing deep neural networks that replicate speech and human tone.
(b) DL is used in the iPhone’s Facial Recognition to detect data points from the user’s
face to unlock the phone in photos. DL used a large number of data points to create a
precise map of a user’s face, which the built-in algorithm then uses for detection. (c)
NLP: Some well-known applications gaining traction include document summarization,
language modeling, sentiment analysis, question answering, and text classification. (d)
Healthcare: Primitive illness and condition recognition, quantitative imaging, and the
availability of decision support tools for experts are all having a significant impact on
life science, medicine, and healthcare. (e) Data from geo-mapping, GPS, and sensors
are merged in DL to develop models that specify recognized directions, street signs,
and dynamic components like congestion, traffic, and pedestrians. (f) DL models for
text generation perfect spelling, style, punctuation, grammar, and tone are required to
replicate human behavior. (g) CNN enables digital image processing, which can later be
separated into handwriting, object recognition, facial recognition, etc. Figure  8 shows
the frequency of parameters used in evaluations of papers, and based on the evaluation,
accuracy (29.6%), delay (15.8%), and availability (11.3%), respectively, are the most
frequent parameters studied in the investigated papers. Also, Fig.  9 demonstrates the
frequency of different DL methods for pattern recognition. As is shown in this figure,
visual recognition (26.7%), image recognition (20.0%), and speech recognition (5.0%),
respectively, are the most frequent pattern recognition applications which use DL
methods.

6.2 DL methods for big data pattern recognition

DL mechanisms are representation-learning methods with numerous degrees of representa-


tion, achieved by combining plain but non-linear modules which each exchange the repre-
sentation at one stage (beginning with the raw input) in a representation at a higher, some-
what abstract stage.

13
Multimedia Tools and Applications

Fig. 8  Frequency of parameters used in evaluations of papers

6.2.1 CNN methods

CNN takes various techniques for arranging data. They benefit from the hierarchical pat-
tern in data and collect patterns of enhancing sophistication by utilizing easier and smaller
patterns highlighted in their filters. Some utilities of CNN in pattern recognition are
image classification, facial recognition software, speech recognition programs, evaluat-
ing documents, environmental and historical collections, predicting climate, grey zones,
advertising, etc. The benefits of using CNNs over other standard neural networks in com-
puter vision environments are listed below: (a) the primary reason for using CNN is the

Fig. 9  Distribution of pattern recognition approaches using the DL method in studied papers

13
Multimedia Tools and Applications

weight-sharing feature, which decreases the number of learnable network components and
aids the network in increasing generalization and preventing overfitting. (b) Learning the
assortment and feature extraction layers concurrently leads to the model’s highly reliable
and well-organized output on the extracted features. (c) Implementing a large-scale net-
work with CNN is significantly easier than with other neural networks. Also, some dis-
advantages: (a) CNN is particularly slower owing to performance as the max pool. (b) if
the CNN has multiple layers, thus the training process takes a great deal of time if the
computer does not involve a suitable GPU. (c) a CNN needs a big dataset to process and
learn the neural network. Overall, all papers investigated the CNN method and highlighted
the diverse applications of deep learning in various domains, including healthcare, finance,
emotion recognition, education, IoT, and image recognition. They demonstrated the poten-
tial of deep learning approaches and the utilization of big data for improved analysis and
decision-making in different fields. The papers mentioned likely used the CNN method
due to its exceptional capability in processing and analyzing visual data. CNNs have been
widely recognized for their effectiveness in tasks such as image classification, object detec-
tion, and segmentation, making them a natural choice for visual pattern recognition appli-
cations. The CNN architecture is specifically designed to capture spatial and hierarchical
features from images, allowing it to learn and detect intricate visual patterns automatically.
Moreover, CNNs are well-suited for big data analysis as they can efficiently handle large
volumes of image data by leveraging parameter sharing and local receptive fields. With
a proven track record of success in deep learning tasks and their adaptability to specific
applications, CNNs offer a powerful framework for extracting meaningful information
from visual data, making them an ideal choice for these papers.

6.2.2 RNN methods

The logic of employing RNN is based on input sequencing. RNNs can use their internal
state to process variable lengths of inputs that make them applicable to pattern recogni-
tion tasks such as handwriting recognition, speech recognition, and so on. To forecast the
next word in the sequence, we must recall what word appeared in the previous time level.
Because this level is performed for each input, these neural networks are referred to as
recurrent. Here we have listed some of its advantages. (a) the RNN is a dynamic neural
network that is computationally strong and can be utilized in multiple transient processing
applications and models. (b) using RNN, we can approximate arbitrary nonlinear dynami-
cal systems with arbitrary accuracy by perceiving complicated mappings from input to out-
put sequences. Also, some of its disadvantages include (a)exploding issues and gradient
vanishing. (b) learning an RNN is a complex task. All papers that studied RNN methods
covered a diverse range of topics in big data analysis. They included an overview of the
applications of rough sets, an analysis of research and technology trends in smart live-
stock technology, a survey on data-efficient algorithms, research on vessel behavior pat-
tern recognition, a systematic review of automatic segmentation of brain MRI images, and
the authentication of commercial kinds of honey using pattern recognition analysis. Each
paper contributed valuable insights and advancements in their respective areas of study
within the field of big data analysis. Briefly, all papers investigated the RNN method often
use the Recurrent Neural Network (RNN) method due to its ability to handle sequential
and temporal data effectively. RNNs are particularly suitable for tasks where the order and
context of data are crucial for accurate predictions or classifications. In pattern recognition,

13
Multimedia Tools and Applications

RNNs can capture the data’s sequential dependencies and temporal relationships, making
them well-suited for tasks such as speech recognition, natural language processing, and
time series analysis. The recurrent nature of RNNs allows them to retain information from
previous inputs and utilize it in the current prediction, enabling them to model complex
patterns and dependencies in the data. Therefore, researchers in pattern recognition often
leverage RNNs to achieve better performance and accuracy in analyzing and recognizing
patterns in sequential data.

6.2.3 GAN methods

The application of GANs has seen rapid growth in recent years. The main idea of GAN lies
behind the indirect training among the discriminator, which can predict and recognize pat-
terns in which how much input is credentialed. The technique has been used for high-reli-
ability natural image combination, data accompaniment, producing image condensations,
and other applications. Several advantages of the GAN method are: (a) GANs produce the
same data as original data. Similarly, it can produce various text, video, and audio versions.
(b) GANs go into in-depth details of data so they can interpret it into various versions sim-
ply and is suitable for doing ML work. To name mention but a few disadvantages of GANs:
(a) difficult training: we have to produce various kinds of data continuously to monitor if
it corks precisely or not. (b) it isn’t very easy to produce results from speech or text. The
papers that studied the GAN method covered a range of topics related to big data analysis.
These papers contributed to understanding different aspects of big data analysis in areas
such as heterogeneity, healthcare, agriculture, classification, and medical applications.
They used the GAN method because it generates new and realistic data samples. GANs are
particularly useful for image generation, data augmentation, and anomaly detection tasks.
GANs consist of two components: a generator and a discriminator. The generator generates
synthetic data samples, while the discriminator tries to distinguish between real and gener-
ated samples. Through an adversarial training process, GANs learn to generate data that
closely resembles real data distribution. This makes GANs a valuable tool in pattern recog-
nition for tasks such as image synthesis, data generation, and data representation learning.
By using GANs, researchers can explore new possibilities in data analysis, improve the
quality of generated samples, and enhance the overall performance of pattern recognition
systems.

6.2.4 AE methods

AEs are neural network models that are used to train complex nonlinear connections
between data points. Given an AEs are efficient in learning representations for classifica-
tion patterns and employ several issues from anomaly recognition, facial recognition, and
feature recognition to attaining the meaning of words. The primary applications of AEs
are dimensionality reduction and data retrieval; however, novel variations have been used
in various tasks. Due to the state, we have to refer to some advantages of the AE method:
(a) the benefit of the AE is that it eliminates noise from the input signal, leaving a high-
value representation of the input. By the way, ML algorithms can operate better owing to
the algorithms can learn patterns in the data from a smaller series of valuable input. And
some disadvantages, such as (a) the AE may conduct a better job on the messy data, but it

13
Multimedia Tools and Applications

may be performing better yet to data cleaning. (b) another drawback is that we may remove
the significant data in the input data. The AE algorithm needs a purposeful performance
for assessing the precision of decoded/encoded input data. Papers that studied AE meth-
ods in pattern recognition have been utilizing the autoencoder method for several reasons.
First, autoencoders are effective in unsupervised learning tasks, where labeled training data
may be scarce or unavailable. They can learn useful representations and extract meaningful
features from the input data without the need for explicit class labels. Second, autoencod-
ers can reduce dimensionality, which is beneficial for handling high-dimensional data and
reducing computational complexity. By compressing the input data into a lower-dimen-
sional latent space, autoencoders can capture the most salient information and discard irrel-
evant or noisy features. Third, autoencoders are used for data reconstruction and denoising.
By training the model to reconstruct the original input from a corrupted or incomplete ver-
sion, autoencoders can effectively denoise and recover missing or distorted patterns in the
data. This makes them particularly useful in applications where data quality and integrity
are crucial. Finally, autoencoders can also be employed for anomaly detection. By learning
the normal patterns in the data, they can identify deviations or anomalies that do not con-
form to the learned representations. This ability to detect anomalies is valuable in various
domains, including fraud detection, cybersecurity, and fault diagnosis. Overall, the autoen-
coder method offers a versatile framework for pattern recognition tasks, providing capa-
bilities such as unsupervised learning, dimensionality reduction, data reconstruction, and
anomaly detection.

6.2.5 EL methods

The number of applications for massive EL in a logical time framework has recently
increased due to progressive computational capacity facilitating learning massive EL in
a logical time framework. It facilitates the use of EL for recognizing the common pattern.
Change detection, Malware detection, intrusion detection, face recognition, and emotion
recognition are some of the benefits of this technology. Two primary reasons for utiliz-
ing an ensemble over a signal model are (a) robustness: an ensemble decreases the disper-
sion or spread of the forecasting and model operation. (b) performance: an ensemble can
make better forecasting and reach better operation than any individual associating model.
There are some drawbacks too. For instance, (a) interpreting an ensemble can be difficult.
Usually, even the best ideas cannot be sold to decision-makers, and the final users do not
confirm the best idea. (b) creating, training, and deploying ensembles are costly. Ensem-
ble learning is a powerful technique that combines multiple individual models to improve
overall predictive performance and robustness. The specific reasons why the mentioned
papers used ensemble learning may vary, but some common motivations include its ability
to reduce bias and variance, enhance generalization, handle complex and high-dimensional
data, and mitigate overfitting. By leveraging the diversity of multiple models or algorithms,
ensemble learning can capture different aspects of the data and make more accurate predic-
tions. Additionally, ensemble methods are known for their flexibility and applicability to
various domains and problem types.

6.2.6 RL methods

RL has made quick progress on action recognition in the video, which depends on a large-
scale training set. The RL analysis aims to construct a mathematical framework to answer

13
Multimedia Tools and Applications

the problems. Various applications include resource management in computer clusters,


traffic light monitoring, robotics, web system configuration, chemistry, and games. The
main advantage of RL is (a) maximizing performance and (b) bearing change for a long
period. And the disadvantages are: (a) overload of states results from too much reinforce-
ment learning. (b) RL is not beneficial for solving simple issues. RL requires many data
and many computations. All considered papers in pattern recognition have utilized rein-
forcement learning as a powerful approach to improve the performance of pattern recogni-
tion systems. Reinforcement learning enables the development of intelligent agents that
can learn optimal decision-making policies by interacting with an environment. In the con-
text of pattern recognition, reinforcement learning algorithms can learn to make sequen-
tial decisions based on observed patterns or features, optimizing their actions to maximize
performance metrics such as accuracy or recognition rates. By employing reinforcement
learning, these papers aim to enhance pattern recognition systems’ adaptability, flexibility,
and robustness, allowing them to learn and improve from experience, adapt to changing
environments, and make optimal decisions in complex and dynamic scenarios. The utiliza-
tion of reinforcement learning techniques in pattern recognition research contributes to the
advancement of intelligent systems capable of autonomously learning and improving their
pattern recognition capabilities.

6.2.7 RF methods

In ML, the RF algorithm is commonly used as the RF classifier. The ensemble classifier that
relies on RF was presented to tackle the complex issue. The appropriateness of RF in both
regression learning and classifications, handling missing values, and the capacity to oper-
ate on a big data set with increased dimensionality are some of the advantages and major
aspects of RF application. Some advantages of RF are: (a) providing high accuracy and (b)
managing big data with multiple variables running into thousands. Mention some disadvan-
tages of RF there are: (a) an RF is less interpretable than an individual decision tree, and (b)
a learned RF might need particular memory for storage. All studied papers that applied the
RF method in pattern recognition topics have employed the random forest method due to
its ability to handle complex and high-dimensional data, robustness against noise and miss-
ing values, and efficient processing of large datasets. Random forest is an ensemble learn-
ing technique that combines multiple decision trees to make predictions. It leverages the
diversity of the ensemble to capture intricate patterns and relationships in the data, provid-
ing accurate and reliable classification or prediction results. Additionally, the random for-
est offers feature importance measures, allowing researchers to identify the most relevant
features for pattern recognition. Its popularity in recent research reflects its effectiveness in
addressing pattern recognition challenges and achieving high-performance outcomes.

6.2.8 MLP methods

MLP classical neural networks are utilized for prime performances like encryption, data
visualization, and data compression. MLP is a holistic means of coping with a wide range
of complex tasks in pattern recognition and regression owing to its highly adjustable non-
linear structure. Some advantages of MLP are (a) aiding probability-based forecasting or
categorizing of items in numerous labels. (b) ability to learn non-linear models. Also, some
disadvantages such as (a) MLP with latent layers have a non-convex loss performance that
there is more than one local one. (b) MLP needs tuning several hyperparameters, such as

13
Multimedia Tools and Applications

the number of latent layers, neurons, and iterations. All in all, all papers that used the MLP
method in studying pattern recognition have utilized the MLP method due to its capability
to handle complex and nonlinear relationships within data. MLP is an artificial neural net-
work consisting of multiple layers of interconnected nodes, allowing it to learn intricate pat-
terns and make accurate predictions. It is widely used in pattern recognition tasks because it
captures high-level abstractions and extracts features from input data. MLPs are flexible and
can be trained on various data types, making them suitable for diverse pattern recognition
applications. The popularity of MLPs in recent research signifies their effectiveness in mod-
eling complex patterns and achieving superior recognition performance in various domains.

6.2.9 LSTM methods

LSTM is an artificial RNN architecture used in DL that addresses the issue of human
activity recognition and classifying sequences of patterns. LSTM applications include
robot control, time series prediction, handwriting recognition, protein homology detec-
tion, and human action recognition. From a beneficial standpoint, (a) LSTM generates
various parameters like learning rates and input and output biases. Therefore, no need for
fine regulations. (b) the complexity of updating every weight is decreased to O (1), the
same as Back Propagation Through Time (BPTT). We have to mention a few drawbacks
of LSTM: (a) LSTM takes longer to learn and needs more memory to learn. (b) LSTM has
a harder process for dropout implementation. By the way, all papers that used the LSTM
method studying pattern recognition have employed the LSTM method due to its ability to
model sequential and temporal dependencies in data effectively. LSTM is a type of RNN
that addresses the vanishing gradient problem by introducing memory cells and gates that
regulate the flow of information. This enables LSTMs to capture long-term dependencies
and retain important context information over extended sequences. In pattern recognition
tasks, such as speech recognition, natural language processing, and gesture recognition,
where sequential patterns play a crucial role, LSTM has shown remarkable performance.
Its ability to learn and remember information over extended time steps makes it suitable for
capturing intricate patterns and making accurate predictions. The utilization of LSTM in
recent research underscores its effectiveness in handling sequential data and its impact on
advancing pattern recognition techniques.

6.3 Dataset for big data pattern recognition

Datasets are used for ML research and have been discussed in peer-reviewed academic
journals since they are an essential component of the area of ML. Progress in learning algo-
rithms, the availability of high-quality learning datasets, and, less logically, computer hard-
ware datasets were the primary drivers of development in this sector. With this in mind, a
large number of examples, such as 10,000, are characterized as being more than sufficient
to learn the topic. This serves as an upper bound on the number of training instances and
takes advantage of the various samples in the test set. The proposed models were tested by
fitting them to various-sized learning datasets and evaluating their ability to operate on the
test set. Too few samples will result in poor experiment accuracy because the chosen model
overfits the learning set or the learning set is insufficiently representative of the issue. On
the other side, too many samples will result in outstanding but less-than-perfect accu-
racy; this could be because the chosen model can train the nuance of such a large learning

13
Multimedia Tools and Applications

dataset, or the dataset is over-representative of the issue. A line plot of learning dataset size
versus experiment accuracy must show a growing tendency to decline returns and possibly
even a small drop in operation. In the case of a fixed model and learning dataset, we must
determine how much data is required to achieve a precise approximate model operation.
This subject can be investigated by fitting an MLP with a fixed-sized training set and evalu-
ating the model with variable-sized experiment sets. We can employ a mechanism similar
to that used in the previous section’s study.
A dataset is a data collection based on the capacity of a single database table or a single
statistical data matrix. Each table column represents a significant variable, and rows relate
to a specific dataset member. For ML projects, the real dataset used to create the training
model for operating distinct performances is referred to as the training dataset.

6.3.1 Importance of dataset

The reliance on a dataset for ML is not only unavoidable since AI cannot learn without it,
but it is also the most important aspect that facilitates algorithm training. The significance
of the dataset stems from the observation that the size of the AI team is not as significant as
the appropriate size of the dataset. Data is required at every stage of AI growth, from train-
ing, tuning, and model selection to testing. We look at three different datasets: the training
set, the authenticated set, and the testing set. Keep in mind that simply gathering data is not
enough; datasets must also be categorized and labeled, which takes a significant amount of
effort. Two main datasets used for various purposes during AI projects are dataset and test
sets. (a) Some concepts, such as neural networks, are necessary to train and generate results
when using a training dataset to train an algorithm. It includes both input and expected
output data. Nearly 60% of the total data is made up of training sets. (b) The test data set is
used since the training algorithm is being evaluated with the training dataset. The training
dataset cannot be used during testing because the algorithm already knows the expected
output. Twenty percent of the total data is made up of testing sets. It must be verified that
input data is grouped with properly validated outputs, often through human authentication.
Data processing entails selecting the correct data from the ideal dataset and generating a
training set. Feature transportation refers to the process of assembling data in the best pos-
sible format. Long-term and goal-oriented ML initiatives rely on dynamic, continuously
updated datasets. In other words, a method for continuous development of the considered
dataset is as accurate a model as it can be.

6.3.2 The best public datasets for big data pattern recognition

Here we are going to list some of the common datasets used for DL projects in different cat-
egories. To begin with, we mention some dataset finders, including Google Dataset Search,
Kaggle, UGI ML Repository, VisualData, GMU Libraries, Big Bad NLP database, and so
on. Some general databases include Housing datasets, such as the Boston Housing data-
set, and geographic datasets, such as Google-Landmarks-v2. The Mall Gustomers, IRIS,
MNIST, and Boston Housing datasets are just a few examples of machine learning datasets.

6.4 Criteria of DL/ML methods

The quality of functions is defined by mathematical metrics that show profitable feedback
and analysis of an ML/DL pattern’s performance. To name but a few critical parameters,

13
Multimedia Tools and Applications

we have to name accuracy, MCC, Confusion Matrix, recall, precision, and F1 score. As a
result, as previously stated, accuracy is the most significant indicator for demonstrating the
fraction of accurately recognized observed to satisfy the predicted observation demand. In
the time of combining total values in a confusion matrix, the True Negative to True Posi-
tive rate is exploited. The total quantity of patterns successfully detected is demonstrated
by n, and the entire pattern number is given by t in this equation [37].
n
A=
t
∗ 100 (1)

The given number of exact predictions has indicated by P and the rate of True Positive
forecasted in comparison with the total positively forecasted as well. Moreover, STP is the
representation of the sum of total true positives, when AFP is the representation of total
false positives.
STP
P= ∗ 100 (2)
STP + AFP

A criterion of the amount of real positive observations is demonstrated by ReCall named


recall which can precisely forecast. Besides, AFN specifies total false negatives in Eq.
STP
ReCall = ∗ 100 (3)
STP + AFN

In addition, the F1 score is a total functionality criterion determination of Recall and


Precision and representation of Harmonic achieved by Precision and Recall.
2 ∗ ReCall ∗ P
F1score = ∗ 100 (4)
ReCall + P

Additionally, functionality matrix measurement which weighs forecasted and real obser-
vations is a confusion matrix that utilizes True Negatives, True positives, False Positives
labels, and False Negatives. All true predictions are the total number of Positives and Neg-
atives, so all wrong predictions are the aggregated False Negatives and False Positives [38].
| STP AFP |
|
| A TN |
| (5)
| FN |
Furthermore, in binary classification, a true positive refers to the correct prediction of
a positive class instance, while a true negative represents the correct prediction of a nega-
tive class instance. On the other hand, false negatives are incorrect predictions of negative
class instances, while false positives are incorrect predictions of positive class instances.
The MCC is an individual value function that encapsulates the entire confusion matrix. It
provides a more informative and accurate evaluation metric than the F1 score and accuracy
in assessing classification challenges. A high MCC score indicates advantageous prediction
outcomes across all four quadrants of the confusion matrix.

6.5 Result and analysis

After everything was said and done, we evaluated 60 publications in 10 categories about
using DL/ML techniques in pattern recognition in previous portions. The more prominent
flaws in these articles have missed the impact of security as well as a lack of adaptive

13
Multimedia Tools and Applications

Fig. 10  DL methods used and their frequency in selected papers

capacity in these strategies. We thoroughly examine the mechanisms under discussion in


diverse contexts with all of this in mind. Figure  10 displays DL methods and their fre-
quency in selected papers. Python is the most well-known programming language for this
kind of job in the case of implementation, simulation, and theatrical about the presented
mechanisms, which is such an appealing section for investigators to utilize in future work,
as shown in Fig. 11. Depending on the application of each method for every specific use,
they applied these ways. Moreover, Fig. 12 shows a geographical distribution map of coun-
tries that contributed to the investigated papers in which China, with 23 papers, the USA,
with 7 papers, and Pakistan, with five papers is the most frequently studied article. Also,
Table 14 depicts the considered parameters in studies articles.

Fig. 11  The distribution of the utilization of various simulation environments in DL-pattern recognition

13
Multimedia Tools and Applications

Fig. 12  The geo-chart about the studied countries by the studied articles

7 Open issues

Despite all of the breakthrough development in pattern recognition algorithms in DL, some
bottlenecks and drawbacks need to be addressed in additional research. Many investigators
have reached promising consequences by employing a broad range of algorithms, but there
is some overlap through studies, as well as the joint use of several efficient tools, is slow to
advent. The lack of common consensus regarding the most precious characteristics and the
optimal neural network architecture might hinder reaching better practical consequences.
Recognition of continuous patterns remains a remarkable issue; even the best-automated
systems struggle with fine pattern distinction. This might be partially a result of the fact
that many available datasets consist of only limited vocabularies and typical sentences. At
the same time, training models for progressed patterns need far more expansive libraries
consisting of various samples. The realization of pattern connection stands as a tough issue
for automated systems. From a closer point of view, the reasons for the continued inability
of machines to accurately and continuously interpret patterns to weigh sequences are not as
puzzling as they appear. Any natural-language characteristics are a complicated interaction
of multiple policies and connections, which are problematical to summarize in a math-
ematical layout that can be programmed into computers. So, these drawbacks will virtually
stay in the future, as this field is a significant topic from the perspective of many world
research teams. Several other major reasonable challenges which would expect to be con-
sidered for future work in this area include the following:

• Dataset

To begin with, Normally, all DL techniques necessitate large datasets. Using DL tech-
niques on moderate-sized datasets is not worthwhile, although increased computer power

13
Table 14  Considered parameters in studied articles
Type Author Accuracy Latency/ Security/ Cost-efficiency Integrity Vulnerability Scalability Robustness Availability Fault Tol- Flexibility
Response Privacy erance
time

CNN Awan, et al. [8] ✓  ×   ×   ×  ✓ ✓  ×  ✓ ✓ ✓  × 


Sohangir, et al. [101] ✓ ✓  ×  ✓ ✓ ✓ ✓ ✓  ×  ✓ ✓
Hossain and Muham- ✓ ✓  ×  ✓ ✓  ×   ×   ×   ×  ✓  × 
mad [39]
✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓
Multimedia Tools and Applications

Ni [79]  ×   ×   × 


Xu, et al. [122] ✓  ×   ×  ✓ ✓ ✓ ✓  ×  ✓ ✓ ✓
Li, et al. [55] ✓  ×   ×  ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓
Sevik, et al. [95] ✓ ✓  ×  ✓ ✓ ✓  ×  ✓ ✓  ×  ✓
RNN Jun, et al. [47] ✓ ✓  ×  ✓ ✓ ✓ ✓  ×  ✓ ✓  × 
Chancán and Milford • ✓  ×  ✓  ×  ✓ ✓  ×   ×  ✓  × 
[15]
Gao, et al. [30] ✓ ✓  ×  ✓ ✓ ✓ ✓  ×  ✓ ✓  × 
Hasan and Mustafa ✓ ✓  ×  ✓ ✓ ✓ ✓  ×  ✓ ✓  × 
[36]
Safarzadeh and ✓ •  ×  ✓ ✓ ✓ ✓ ✓ ✓ ✓  × 
Jafarzadeh [91]
Zhao and Jin [138] ✓ •  ×  ✓ ✓ ✓ ✓ ✓ ✓ ✓  × 
GAN Luo, et al. [66] ✓ ✓  ×  ✓ ✓ ✓ ✓  ×  ✓ ✓  × 
Gammulle, et al. [28] • ✓  ×  ✓ ✓ ✓  ×  ✓ ✓ ✓ ✓
Fang, et al. [27] • ✓  ×  ✓ ✓ ✓ ✓  ×  ✓ ✓  × 
Chen, et al. [21] ✓ ✓  ×  ✓ ✓ ✓  ×  ✓ ✓ ✓  × 
Men, et al. [72] ✓ ✓  ×  ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓

13

Table 14  (continued)
Type Author Accuracy Latency/ Security/ Cost-efficiency Integrity Vulnerability Scalability Robustness Availability Fault Tol- Flexibility
Response Privacy erance

13
time

AE Simpson, et al. [100] ✓ ✓  ×  ✓ ✓ ✓  ×   ×  ✓ ✓ ✓


Kim, et al. [49] ✓ ✓  ×  ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓
Utkin, et al. [107] ✓  ×   ×  ✓ ✓  ×  ✓ ✓ ✓ ✓ ✓
Parmar, et al. [84] ✓ ✓  ×  ✓ ✓ ✓  ×  ✓ ✓ ✓  × 
Shi, et al. [99] ✓  ×   ×   ×  ✓  ×  ✓ ✓ ✓ ✓  × 
Xu, et al. [124] ✓ ✓  ×  ✓ ✓ ✓ ✓  ×  ✓ ✓  × 
Chen, et al. [19] ✓ ✓  ×  ✓ ✓ ✓  ×  ✓ ✓ ✓  × 
EL Abbasi, et al. [1] • ✓  ×  ✓ ✓ ✓  ×  ✓ ✓ ✓ ✓
Zhang, et al. [134] ✓ ✓  ×  ✓ ✓  ×   ×  ✓  ×  ✓ ✓
Lee, et al. [52] ✓ ✓  ×  ✓ ✓ ✓ ✓ ✓  ×  ✓  × 
Mohammed, et al. ✓ ✓  ×   ×  ✓ ✓ ✓ ✓  ×  ✓
[73]
Khairy, et al. [48] ✓ ✓  ×  ✓ ✓  ×   ×  ✓ ✓ ✓ ✓
Zhang, et al. [136] • ✓  ×  ✓ ✓  ×   ×  ✓ ✓ ✓ ✓
RL Gao, et al. [29] ✓ ✓  ×  ✓ ✓ ✓  ×  ✓ ✓ ✓  × 
Benhamou, et al. ✓  ×   ×  ✓ ✓  ×   ×  ✓ ✓ ✓ ✓
[12]
Wang and Deng ✓ ✓  ×  ✓ ✓  ×  ✓ ✓ ✓ ✓  × 
[114]
Wang, et al. [115] ✓ ✓  ×  ✓ ✓ ✓ ✓  ×  ✓ ✓  × 
Ma, et al. [69] ✓ ✓  ×  ✓ ✓ ✓  ×  ✓ ✓ ✓ ✓
Gowda, et al. [32] ✓ ✓  ×  ✓ ✓ ✓ ✓  ×  ✓ ✓ ✓
Du, et al. [26] • ✓  ×  ✓ ✓  ×   ×  ✓ ✓ ✓  × 
Multimedia Tools and Applications
Table 14  (continued)
Type Author Accuracy Latency/ Security/ Cost-efficiency Integrity Vulnerability Scalability Robustness Availability Fault Tol- Flexibility
Response Privacy erance
time

RF Awan, et al. [9] ✓ ✓ ✓ ✓ ✓ ✓  ×  ✓ ✓  ×  ✓


Moussa, et al. [77] ✓ ✓ ✓ ✓ ✓  ×  ✓ ✓ ✓  × 
Marins, et al. [71] ✓  ×  ✓ ✓ ✓ ✓ ✓  ×  ✓ ✓  × 
✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓
Multimedia Tools and Applications

Jiao, et al. [46]  ×   × 


Hafeez, et al. [33] ✓ ✓  ×  ✓ ✓ ✓  ×  ✓ ✓ ✓ ✓
Langroodi, et al. [51] ✓  ×   ×  ✓  ×  ✓  ×  ✓ ✓ ✓ ✓
Akinyelu and ✓ ✓  ×  ✓ ✓ ✓ ✓ ✓ ✓ ✓  × 
Adewumi [4]
MLP de Arruda, et al. [6] ✓ ✓  ×  ✓ ✓ ✓  ×  ✓ ✓ ✓  × 
Chen, et al. [18] ✓ ✓  ×  ✓ ✓ ✓ ✓ ✓ ✓  ×  ✓
Zhang, et al. [131] ✓ ✓  ×  ✓ ✓ ✓  ×  ✓ ✓ ✓ ✓
Chen, et al. [20] ✓ ✓  ×  ✓ ✓ ✓  ×  ✓ ✓ ✓ ✓
Hou, et al. [41] ✓ ✓  ×  ✓ ✓ ✓ ✓ ✓ ✓ ✓  × 
LSTM Xia, et al. [120] • ✓ ✓ ✓ ✓  ×  ✓  ×   ×  ✓ ✓
Ullah, et al. [106] ✓ • ✓  ×  ✓ ✓ ✓  ×  ✓  ×  ✓
Rao, et al. [88] ✓ ✓ ✓ ✓ ✓  ×  ✓ ✓ ✓  ×  ✓
Huang, et al. [42] ✓ ✓  ×  ✓ ✓  ×  ✓ ✓ ✓  ×   × 
Liu, et al. [64] ✓ ✓ ✓ ✓ ✓  ×  ✓ ✓ ✓  ×  ✓

13

Table 14  (continued)
Type Author Accuracy Latency/ Security/ Cost-efficiency Integrity Vulnerability Scalability Robustness Availability Fault Tol- Flexibility
Response Privacy erance

13
time

Hybrid Mao, et al. [70] ✓ ✓ ✓ ✓ ✓ ✓  ×   ×  ✓ ✓  × 


Wang, et al. [118] ✓ ✓ ✓ ✓  ×   ×  ✓  ×  ✓  ×   × 
Nandhini Abirami, ✓  ×   ×  ✓ ✓  ×  ✓  ×  ✓  ×   × 
et al. [78]
Butt, et al. [14] ✓ ✓ ✓ ✓  ×  ✓ ✓  ×  ✓ ✓  × 
Subhashini, et al. • ✓  ×  ✓ •  ×  ✓ ✓  ×   ×  ✓
[104]
Multimedia Tools and Applications
Multimedia Tools and Applications

and speed reduce computing costs. Nonetheless, while there are casual or intricate commu-
nications between data to be learned by geometric transformations, DL techniques under-
perform and fail when the dataset size is taken into account. Furthermore, the scarcity of
huge datasets limits deep model training. Large datasets in practical applications limit the
ability to train dependable supervised techniques. As a result, establishing large, reliable,
available, and homogeneous datasets is fundamentally gathered either by (1) providing
synthetic datasets by current algorithms or (2) scanning multiple bags with various objects
and directions in a lab environment manually.

• Input type

The main reviewed methods make use of depth modeling, even though some are
concentrated on the RGB images in detail to simplify effective recognition. Sequential
data has been beneficial as well, most generally for tracking things and sites, along with
data about the joint positions. There is a difference between dynamic and static signs on
the phase of signs, with the next category having a subclass utilized in sequence SLR.
With this in mind, it can be presumed that complex patterns and continuous video will
become a critical focus in further studies. It is crystal clear that the total precondition is
in the position for this focus move.

• Synthesizing various features

Several researchers have addressed this problem, but some of them yet require to be
studied. Mixing features to describe numerous parts of the human body is desirable. But
this challenge is generally complicated by the variation in the type of formats which
consist of images, depth and skeleton data, textual elements, etc., merging several of
these data can result in developed feature engineering and a more accurate model. The
torso, hands, and facial area are three major body parts where such characteristics are
focused. Imperfect models are the result of limiting the focus to hands. Particular parts
related to successful modeling consist of hand position recognition, hand shapes, and
gesture recognition. It is worth mentioning that quick motion of the neck and face dur-
ing language use presents problems.

• Sequence patterns

While remarkable success has been achieved in the field of isolated patterns, the algo-
rithms must recognize the sole word or alphabetic sign; the likeness cannot be considered
for sequence patterns, which consists of the interpretation of lingering speech segments.
Contextual connections between signs have a powerful influence on the meaning of the
sentences; as a result, this task cannot be decreased to the recognition of the gestures of the
individual. Finding an appropriate configuration can tackle the difficulties. We believe that
investigation in this subject will dwell on more complex neural network methods that apply
more layers and mix multiple types of layer compositions to gain more processing power.

• Developing recognition accuracy

To ensure commercial utilization and gain credibility among an expanding user base,
technologies must exhibit high levels of security (> 99%) and stability. As the size of
the vocabulary and task complexity increases, there is a higher likelihood of incorrectly

13
Multimedia Tools and Applications

detecting patterns, resulting in false positives or false negatives. Consequently, it becomes


imperative to proceed with the next step, which involves summoning extensive support and
gathering sufficient resources to achieve optimal accuracy levels. Undoubtedly, the systems
should undergo thorough analysis across different settings and yield valuable results, even
in less-than-ideal external conditions.

• Developing the efficiency of pattern recognition solutions

Previously, scientific studies have been limited to improving the principal ability of
meaningful connect observed body gestures and hands and fixed units of sign language.
Since it is comprehendible for the early level of scientific investigation, it is critical to
enhance attention on the dimension’s applicability in future work. Several pattern recogni-
tion solutions needed part-worn sensors and other equipped tools, but modern systems are
notably less reliant upon them and might involve only a few cameras. Also, the interaction
between users and the system can be considered seriously as a future topic with the notion
of generating the user with the stage of management over the software utilized by a system.
In addition, feedback methods are being developed to swiftly discover broad faults while
ensuring that user suggestions are honored.

• Poor quality of information

Detection for ML will be impossible if the trained data has many flaws, noise, and outli-
ers. So, for a machine to recognize a pattern correctly, data scientists must pay extra atten-
tion to cleaning the data. one of the main overarching issues in pattern recognition research
is the chronic deficiency of high-quality inputs. This is gradually altering as the volume of
study into pattern recognition increases. On the other hand, some regional changes in lan-
guages, signs, and words have occurred due to an exclusive mix of facial and hand gestures
to express meaning. Also, there is a lack of enough labeled sets which enable the evalu-
ation of pattern tools under normal situations; hopefully, developed datasets will finally
simplify the improvement of the applicable pattern recognition method.

• Well-organized reports

The ability to reap the full benefits of open access data repositories in terms of reus-
ability and data transmission is frequently hampered by a lack of standards for consolidated
reporting data and nonconsolidated data reported.

• Enough well-trained data

Despite all attempts, ML is not up to date, even though most algorithms require a large
quantity of data to perform properly. Huge samples are required to create a new case for
enforcing a common duty. For example, completing an advanced task like picture or speech
identification may necessitate millions of samples.

• Nonrepresentative training data

Making sure training data is representative of new cases for generalization aims the
model for more accurate prediction, which is almost a gap in this area that should be cov-
ered by more investigation.

13
Multimedia Tools and Applications

• Overfitted training data

Overfitting occurs when the model is overly clever, resulting in overgeneralization and,
as a result, ML models mimicking it. As a result, the overfitted model performs well but
fails to generalize, and for an organized system to be successfully used, this issue has to be
solved.

• Underfitted training data

it is the opposite of overfitting and occurs when the model is too simple and learns the
behavior from the data. A linear model on a series with multi-collinearity is used for confi-
dent underfit, and the predictions will be inaccurate. It also needs the same attitude toward
a well-organized system.

• Useless features

The outputs of the ML system will be unexpected if the data learned contains unrelated
features. As a result, one of the most important aspects of a successful ML project is selecting
the necessary characteristics.

• Model arrangement and offline Learning

The deficiency of skilled deployment of data is one of the biggest issues for ML practition-
ers. Developers need an online source like Kaggle to collect data, train the model, and put
offline learning under question, which may not be useful for variable data types.

• Sensing

The issues arise in input, like sensitivity, latency, bandwidth, distortion, resolution, signal-
to-noise ratio, etc.

• Grouping and Segmentation

The most crucial issue in pattern recognition is recognizing or incorporating each other in
the different parts of an object.

• Different Issues

Furthermore, DL approaches employ crucial aspects for a variety of applications such as


NLP, Service Edge Router (SER), and sequential information processing. Using supervised
algorithms during its implementation increases the learning of actual data without the need
for manual human labeling. The incorporation of various categorization models like GMMs
and HMMs needs a larger dataset to gain more accuracy. It is worth noting that sensitivity
to gradient eclipse is a major issue that affects the overall performance of the RNN. As a
result, a customizable SER system based on the DL method known as Diagonal Recurrent
Neural Network (DRNN) is used for SER. Furthermore, using CNN and RNN as a hybrid
DL modality allows the model to detect patterns with both transient and frequent dependen-
cies. The RNN model is used for pattern prediction and constructing AEs for features. It can
also be used to gain greater insight into the operation of LSTM-based RNNs by utilizing

13
Multimedia Tools and Applications

regression models such as Software Verification Results (SVR). Last but not least, various
fundamental flaws have not been addressed. There are two key issues: (1) a lack of gen-
eralization potential. The supervised learning process cannot adapt to a circumstance that
has not been cleared in the training set.; (2) Deploying on mobile devices is tricky. CNN’s
sophisticated and amazing operation is frequently accompanied by many parameters, which
is a problem for real-time calculation on mobile devices. Additionally, multiple directions
might be the next study in the future. To begin, semantic segmentation is a computationally
intensive approach for embedding layout. More efficient architectures must be investigated.
Second, supervised learning requires a large amount of annotated data, and labeling data is
a time and expensive money operation. Making accurate predictions in a changeable envi-
ronment is also critical.

8 Conclusion and limitation

This research comprehensively explores ML/DL approaches for pattern recognition. The
study begins by discussing the benefits and drawbacks of survey papers, establishing
a foundation for further investigation. Then, the reviewed research articles are evaluated
based on their main ideas, strategies, simulation environments, and datasets, with a particu-
lar focus on assessing their accuracy, security, adaptability, robustness, availability, integ-
rity, latency, flexibility, and scalability. The findings show that the majority of the publica-
tions were released in 2021. Python has the most simulation environments. Additionally,
the most crucial factors in these studies include accuracy, flexibility, and fault tolerance.
By highlighting the potential of DL in uncovering patterns and behaviors, this research
provides valuable insights and serves as a comprehensive resource for future studies in DL
approaches for pattern recognition. It offers a well-organized roadmap for researchers and
practitioners interested in implementing established DL methods in real-world infrastruc-
tures, facilitating advancements in intelligent solutions, and driving innovation in pattern
recognition.
Also, our literature review may be limited by the scope of the study and the selection cri-
teria for including papers. It is challenging to cover the entire breadth of research in such a
broad and interdisciplinary field, which may result in some relevant papers being excluded
or overlooked. Besides, assessing the quality and validity of the included papers may be
challenging. The review relies on the available information provided in the selected papers,
and variations in research methodologies, experimental setups, and reporting standards
can impact the overall quality and reliability of the findings. Finally, we discovered several
limitations, such as the lack of use of book chapters and literary notes, which prevents us
from benefiting from many studies that can be incorporated into future research. Another
barrier was the inaccessibility of non-English articles, which prevented us from participat-
ing in various research papers. In addition, we found certain flaws in the clear explanations
of their suggested frameworks and approaches in the publications we examined. Our last
limitation was dissatisfaction with various papers released by specific publications.

Declarations 
Competing interest  The authors declare that they have no known competing financial interests or personal
relationships that could have appeared to influence the work reported in this paper.

13
Multimedia Tools and Applications

References
1. Abbasi A, Javed AR, Chakraborty C, Nebhen J, Zehra W, Jalil Z (2021) ElStream: An ensemble
learning approach for concept drift detection in dynamic social big data stream learning. IEEE Access
9:66408–66419
2. Aghakhani S, Larijani A, Sadeghi F, Martín D, Shahrakht AA (2023) A novel hybrid artificial bee
colony-based deep convolutional neural network to improve the detection performance of backscatter
communication systems. Electronics 12(10):2263
3. Akhavan J, Lyu J, Manoochehri S (2023) A deep learning solution for real-time quality assessment
and control in additive manufacturing using point cloud data. J Intell Manuf. https://​doi.​org/​10.​1007/​
s10845-​023-​02121-4
4. Heidari A, Navimipour NJ, Jamali MAJ, Akbarpour S (2023) A hybrid approach for latency and
battery lifetime optimization in IoT devices through offloading and CNN learning. Sustain Comput
Inform Syst 39:100899. https://​doi.​org/​10.​1016/j.​suscom.​2023.​100899
5. Amiri Z, Heidari A, Navimipour NJ et al (2023) Resilient and dependability management in distrib-
uted environments: a systematic and comprehensive literature review. Cluster Comput 26:1565–1600.
https://​doi.​org/​10.​1007/​s10586-​022-​03738-5
6. de Arruda HF, Reia SM, Silva FN, Amancio DR, Costa LdF (2021) A pattern recognition approach
for distinguishing between prose and poetry. arXiv preprint arXiv:2107.08512
7. Atitallah SB, Driss M, Boulila W, Ghézala HB (2020) Leveraging Deep Learning and IoT big data
analytics to support the smart cities development: Review and future directions. Comput Sci Rev
38:100303
8. Awan MJ, Bilal MH, Yasin A, Nobanee H, Khan NS, Zain AM (2021) Detection of COVID-19 in
chest X-ray images: A big data enabled deep learning approach. Int J Environ Res Public Health
18(19):10147
9. Awan MJ, Khan MA, Ansari ZK, Yasin A, Shehzad HMF (2022) Fake profile recognition using big
data analytics in social media platforms. Int J Comput Appl Technol 68(3):215–222
10. Bagheri M, Zhao H, Sun M, Huang L, Madasu S, Lindner P, Toti G (2020) Data conditioning and
forecasting methodology using machine learning on production data for a well pad. In: Offshore tech-
nology conference. OTC, p D031S037R002
11. Bai X et al (2021) Explainable deep learning for efficient and robust pattern recognition: A survey of
recent developments. Pattern Recogn 120:108102
12. Benhamou E, Saltiel D, Ohana J-J, Atif J (2021) Detecting and adapting to crisis pattern with context
based Deep Reinforcement Learning, in 2020 25th International Conference on Pattern Recognition
(ICPR), IEEE, pp. 10050–10057
13. Bhamare D, Suryawanshi P (2018) Review on reliable pattern recognition with machine learning
techniques. Fuzzy Inf Eng 10(3):362–377
14. Butt H, Raza MR, Ramzan MJ, Ali MJ, Haris M (2021) Attention-Based CNN-RNN Arabic Text
Recognition from Natural Scene Images. Forecasting 3(3):520–540
15. Chancán M and Milford M (2020) Deepseqslam: A trainable cnn+ rnn for joint global description
and sequence-based place recognition, arXiv preprint arXiv:2011.08518
16. Chen P et al (2022) Effectively detecting operational anomalies in large-scale IoT data infrastructures
by using a gan-based predictive model. Comput J 65(11):2909–2925
17. Chen Y, Chen Z, Guo D, Zhao Z, Lin T, Zhang C (2022) Underground space use of urban built-up
areas in the central city of Nanjing: Insight based on a dynamic population distribution. Underground
Space 7(5):748–766
18. Chen Y, Chen S, Zhang N, Liu H, Jing H, Min G (2021) LPR-MLP: a novel health prediction model
for transmission lines in grid sensor networks. Complexity 2021:1–10
19. Chen J, Wu D, Zhao Y, Sharma N, Blumenstein M, Yu S (2021) Fooling intrusion detection systems
using adversarially autoencoder. Digit Commun Netw 7(3):453–460
20. Chen S, Xie E, Ge C, Liang D, Luo P (2021) Cyclemlp: A mlp-like architecture for dense prediction,
arXiv preprint arXiv:2107.10224
21. Chen D, Yue L, Chang X, Xu M, Jia T (2021) NM-GAN: Noise-modulated generative adversarial
network for video anomaly detection. Pattern Recogn 116:107969
22. Cheng L, Yin F, Theodoridis S, Chatzis S, Chang T-H (2022) Rethinking Bayesian learning for
data analysis: The art of prior and inference in sparsity-aware modeling. IEEE Signal Process Mag
39(6):18–52
23. Darbandi M (2017) Proposing new intelligent system for suggesting better service providers in cloud
computing based on Kalman filtering. Published by HCTL Int J Technol Innov Res, (ISSN: 2321-
1814) 24(1):1–9

13
Multimedia Tools and Applications

24. Darbandi M (2017) Kalman filtering for estimation and prediction servers with lower traffic loads
for transferring high-level processes in cloud computing. Published by HCTL Int J Technol Innov
Res,(ISSN: 2321-1814) 23(1):10–20
25. Deng Y, Zhang W, Xu W, Shen Y, Lam W (2023) Nonfactoid question answering as query-focused
summarization with graph-enhanced multihop inference. IEEE Trans Neural Netw Learn Syst. https://​
doi.​org/​10.​1109/​TNNLS.​2023.​32584​13
26. Du S, Krishnamurthy A, Jiang N, Agarwal A, Dudik M, Langford J (2019) Provably efficient RL with
rich observations via latent state decoding, in International Conference on Machine Learning, PMLR,
pp 1665–1674
27. Fang H, Deng W, Zhong Y, Hu J (2020) Triple-GAN: Progressive face aging with triple translation
loss. in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition
Workshops, pp 804–805
28. Gammulle H, Denman S, Sridharan S, Fookes C (2020) Fine-grained action segmentation using the
semi-supervised action GAN. Pattern Recogn 98:107039
29. Gao X, Jin Y, Dou Q, Heng P-A (2020) Automatic gesture recognition in robot-assisted surgery with
reinforcement learning and tree search, in 2020 IEEE International Conference on Robotics and Auto-
mation (ICRA), IEEE, pp 8440–8446
30. Gao L, Li H, Liu Z, Liu Z, Wan L, Feng W (2021) RNN-transducer based Chinese sign language rec-
ognition. Neurocomputing 434:45–54
31. Gong J, Rezaeipanah A (2023) A fuzzy delay-bandwidth guaranteed routing algorithm for video con-
ferencing services over SDN networks. Multimed Tools Appl 82:25585–25614. https://​doi.​org/​10.​
1007/​s11042-​023-​14349-6
32. Gowda SN, Sevilla-Lara L, Keller F, Rohrbach M (2021) Claster: clustering with reinforcement learn-
ing for zero-shot action recognition. arXiv preprint arXiv:2101.07042
33. Hafeez S, Jalal A, Kamal S (2021) Multi-fusion sensors for action recognition based on discrimina-
tive motion cues and random forest, in 2021 International Conference on Communication Technolo-
gies (ComTech): IEEE, pp 91–96
34. Hajipour Khire Masjidi B, Bahmani S, Sharifi F, Peivandi M, Khosravani M, Hussein Moham-
med A (2022) CT-ML: Diagnosis of breast cancer based on ultrasound images and time-dependent
feature extraction methods using contourlet transformation and machine learning. Comput Intell
Neurosci:2022
35. Han C, Fu X (2023) Challenge and opportunity: Deep learning-based stock price prediction by using
bi-directional LSTM model. Front Bus Econ Manag 8(2):51–54
36. Hasan MM, Mustafa HA (2020) Multi-level feature fusion for robust pose-based gait recognition
using RNN. Int J Comput Sci Inf Secur (IJCSIS) 18(1)
37. Heidari A, Javaheri D, Toumaj S, Navimipour NJ, Rezaei M, Unal M (2023) A new lung cancer
detection method based on the chest CT images using Federated Learning and blockchain systems.
Artif Intell Med 141:102572
38. Heidari A, Jafari Navimipour N, Unal M (2023) A secure intrusion detection platform using block-
chain and radial basis function neural networks for internet of drones. IEEE Internet of Things Jour-
nal 10(10):8445–8454. https://​doi.​org/​10.​1109/​JIOT.​2023.​32376​61
39. Hossain MS, Muhammad G (2019) Emotion recognition using deep learning approach from audio–
visual emotional big data. Inf Fusion 49:69–78
40. Hou X et al (2023) A space crawling robotic bio-paw (SCRBP) enabled by triboelectric sensors for
surface identification. Nano Energy 105:108013
41. Hou Q, Jiang Z, Yuan L, Cheng M-M, Yan S, Feng J (2023) Vision permutator: A permutable MLP-
like architecture for visual recognition. IEEE Trans Pattern Anal Mach Intell 45(1):1328–1334.
https://​doi.​org/​10.​1109/​TPAMI.​2022.​31454​27
42. Huang R et al (2020) An lstm approach to temporal 3d object detection in lidar point clouds. Euro-
pean Conference on Computer Vision. Springer, pp 266–282
43. Huang C -Q, Jiang F, Huang Q -H, Wang X -Z, Han Z -M, Huang W -Y (2022) Dual-graph attention
convolution network for 3-D point cloud classification. IEEE Trans Neural Netw Learn Syst. https://​
doi.​org/​10.​1109/​TNNLS.​2022.​31623​01
44. Jafari BM, Luo X, Jafari A (2023) Unsupervised keyword extraction for hashtag recommendation in
social media. In: The International FLAIRS Conference Proceedings, vol 36
45. Jafari BM, Zhao M, Jafari A (2022) Rumi: An intelligent agent enhancing learning management sys-
tems using machine learning techniques. J Softw Eng Appl 15(9):325–343
46. Jiao S, Zou Q, Guo H, Shi L (2021) iTTCA-RF: a random forest predictor for tumor T cell antigens. J
Transl Med 19(1):1–11

13
Multimedia Tools and Applications

47. Jun K, Lee D-W, Lee K, Lee S, Kim MS (2020) Feature extraction using an RNN autoencoder for
skeleton-based abnormal gait recognition. IEEE Access 8:19196–19207
48 Khairy RS, Hussein A, ALRikabi H (2021) The detection of counterfeit banknotes using ensemble
learning techniques of AdaBoost and voting. Int J Intell Eng Syst 14(1):326–339
49. Kim J, Kong J, Son J (2021) Conditional variational autoencoder with adversarial learning for end-to-
end text-to-speech. in International Conference on Machine Learning, PMLR, pp 5530–5540
50. Kosarirad H, Nejati MG, Saffari A, Khishe M, Mohammadi M (2022) Feature selection and training
multilayer perceptron neural networks using grasshopper optimization algorithm for design optimal
classifier of big data sonar. J Sens 2022
51. Langroodi AK, Vahdatikhaki F, Doree A (2021) Activity recognition of construction equipment using
fractional random forest. Autom Constr 122:103465
52. Lee K, Laskin M, Srinivas A, Abbeel P (2021) Sunrise: A simple unified framework for ensemble
learning in deep reinforcement learning, in International Conference on Machine Learning, PMLR,
pp 6131–6141
53. Li W et  al (2021) A comprehensive survey on machine learning-based big data analytics for IoT-
enabled smart healthcare system. Mob Netw Appl 26(1):234–252
54. Li J, Chen M, Li Z (2022) Improved soil–structure interaction model considering time-lag effect.
Comput Geotech 148:104835
55. Li P, Chen Z, Yang LT, Zhang Q, Deen MJ (2017) Deep convolutional computation model for feature
learning on big data in internet of things. IEEE Trans Industr Inf 14(2):790–798
56. Li B, Li Q, Zeng Y, Rong Y, Zhang R (2021) 3D trajectory optimization for energy-efficient UAV
communication: A control design perspective. IEEE Trans Wireless Commun 21(6):4579–4593
57. Li Q-K, Lin H, Tan X, Du S (2018) H∞ consensus for multiagent-based supply chain sys-
tems under switching topology and uncertain demands. IEEE Trans Syst Man Cybern: Syst
50(12):4905–4918
58. Li B, Lu Y, Pang W et al (2023) Image colorization using CycleGAN with semantic and spatial
rationality. Multimed Tools Appl 82:21641–21655. https://​doi.​org/​10.​1007/​s11042-​023-​14675-9
59. Li X, Sun Y (2020) Stock intelligent investment strategy based on support vector machine
parameter optimization algorithm. Neural Comput Appl 32:1765–1775. https://​doi.​org/​10.​1007/​
s00521-​019-​04566-2
60. Li X, Sun Y (2021) Application of RBF neural network optimal segmentation algorithm in credit
rating. Neural Comput Appl 33:8227–8235
61. Li B, Tan Y, Wu A-G, Duan G-R (2021) A distributionally robust optimization based method for
stochastic model predictive control. IEEE Trans Autom Control 67(11):5762–5776
62. Lin J (2019) Backtracking search based hyper-heuristic for the flexible job-shop scheduling prob-
lem with fuzzy processing time. Eng Appl Artif Intell 77:186–196
63. Liu Q, Kosarirad H, Meisami S, Alnowibet KA, Hoshyar AN (2023) An optimal scheduling
method in IoT-fog-cloud network using combination of aquila optimizer and african vultures opti-
mization. Processes 11(4):1162
64. Liu Z, Li Z, Wang R, Zong M, Ji W (2020) Spatiotemporal saliency-based multi-stream networks
with attention-aware LSTM for action recognition. Neural Comput Appl 32(18):14593–14602
65. Lu S, Ding Y, Liu M, Yin Z, Yin L, Zheng W (2023) Multiscale feature extraction and fusion of
image and text in VQA. Int J Comput Intell Syst 16(1):54
66. Luo M, Cao J, Ma X, Zhang X, He R (2021) FA-GAN: face augmentation GAN for deformation-
invariant face recognition. IEEE Trans Inf Forensics Secur 16:2341–2355
67. Lv Z, Qiao L, Li J, Song H (2020) Deep-learning-enabled security issues in the internet of things.
IEEE Internet Things J 8(12):9531–9538
68 Lv Z, Yu Z, Xie S, Alamri A (2022) Deep learning-based smart predictive evaluation for interactive
multimedia-enabled smart healthcare. ACM Trans Multimed Comput Commun Appl (TOMM)
18(1s):1–20
69. Ma Y et al (2021) Location-and person-independent activity recognition with WiFi, deep neural
networks, and reinforcement learning. ACM Trans Internet Things 2(1):1–25
70. Mao C, Huang L, Xiao Y, He F, Liu Y (2021) Target recognition of SAR image based on CN-
GAN and CNN in complex environment. IEEE Access 9:39608–39617
71. Marins MA et al (2021) Fault detection and classification in oil wells and production/service lines
using random forest. J Petrol Sci Eng 197:107879
72. Men Y, Mao Y, Jiang Y, Ma W-Y, Lian Z (2020) Controllable person image synthesis with attrib-
ute-decomposed gan. in Proceedings of the IEEE/CVF Conference on Computer Vision and Pat-
tern Recognition, pp. 5084–5093

13
Multimedia Tools and Applications

73. Mohammed EA, Keyhani M, Sanati-Nezhad A, Hejazi SH, Far BH (2021) An ensemble learning
approach to digital corona virus preliminary screening from cough sounds. Sci Rep 11(1):1–11
74. Morteza A, Sadipour M, Fard RS, Taheri S, Ahmadi A (2023) A dagging-based deep learning
framework for transmission line flexibility assessment. IET Renew Power Gener 17(5):1092–1105
75. Morteza A, Yahyaeian AA, Mirzaeibonehkhater M, Sadeghi S, Mohaimeni A, Taheri S (2023)
Deep learning hyperparameter optimization: Application to electricity and heat demand prediction
for buildings. Energy Build 289:113036
76. Mousavi A, Sadeghi AH, Ghahfarokhi AM, Beheshtinejad F, Masouleh MM (2023) Improving the
Recognition Percentage of the Identity Check System by Applying the SVM Method on the Face
Image Using Special Faces. Int J Robot Control Syst 3(2):221–232
77. Moussa M, Hmila M, Douik A (2021) Face recognition using fractional coefficients and discrete
cosine transform tool. Int J Electr Comput Eng 11(1):892
78. Nandhini Abirami R, Durai Raj Vincent PM, Srinivasan K, Tariq U, Chang CY (2021) Deep CNN
and deep GAN in computational visual perception-driven image analysis. Complexity 2021:1–30
79. Ni H (2020) Face recognition based on deep learning under the background of big data. Infor-
matica 44(4)
80. Ni Q, Guo J, Wu W, Wang H, Wu J (2021) Continuous influence-based community partition for
social networks. IEEE Trans Netw Sci Eng 9(3):1187–1197
81. Niknam T, Bagheri B, Bonehkhater MM, Firouzi BB (2015) A new teaching-learning-based opti-
mization algorithm for distribution system state estimation. J Intell Fuzzy Syst 29(2):791–801
82. Pan S, Lin M, Xu M, Zhu S, Bian L-A, Li G (2021) A low-profile programmable beam scanning
holographic array antenna without phase shifters. IEEE Internet Things J 9(11):8838–8851
83. Paolanti M, Frontoni E (2020) Multidisciplinary pattern recognition applications: a review. Com-
put Sci Rev 37:100276
84. Parmar G, Li D, Lee K, Tu Z (2021) Dual contradistinctive generative autoencoder. in Proceedings of
the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 823–832
85. Peivandizadeh A, Molavi B (2023) Compatible authentication and key agreement protocol for low
power and lossy network in Iot environment. Available at SSRN 4454407
86. Peng Y, Zhao Y, Hu J (2023) On The Role of Community Structure in Evolution of Opinion Forma-
tion: A New Bounded Confidence Opinion Dynamics. Inf Sci 621:672–690
87. Qu Z, Liu X, Zheng M (2022) Temporal-spatial quantum graph convolutional neural network based
on Schrödinger approach for traffic congestion prediction. IEEE Trans Intell Transp Syst. https://​doi.​
org/​10.​1109/​TITS.​2022.​32037​91
88. Rao H, Xu S, Hu X, Cheng J, Hu B (2021) Augmented skeleton based contrastive action learning
with momentum lstm for unsupervised action recognition. Inf Sci 569:90–109
89. Sadi M et al. (2022) Special session: On the reliability of conventional and quantum neural network
hardware. in 2022 IEEE 40th VLSI Test Symposium (VTS), IEEE, pp 1–12
90. Saeed R, Feng H, Wang X, Zhang X, Fu Z (2022) Fish quality evaluation by sensor and machine
learning: A mechanistic review. Food Control 137:108902
91. Safarzadeh VM and Jafarzadeh P (2020) Offline Persian handwriting recognition with CNN and
RNN-CTC. in 2020 25th international computer conference, computer society of Iran (CSICC),
IEEE, pp 1–10
92. Salehi S, Miremadi I, Ghasempour Nejati M, Ghafouri H (2023) Fostering the adoption and use of
super app technology. IEEE Trans Eng Manag. https://​doi.​org/​10.​1109/​TEM.​2023.​32357​18
93. Sarbaz M, Manthouri M, Zamani I (2021) Rough neural network and adaptive feedback linearization
control based on Lyapunov function. in 2021 7th International Conference on Control, Instrumenta-
tion and Automation (ICCIA), IEEE, pp 1–5
94. Sarbaz M, Soltanian M, Manthouri M, Zamani I (2022) Adaptive optimal control of chaotic system
using backstepping neural network concept. in 2022 8th International Conference on Control, Instru-
mentation and Automation (ICCIA), IEEE, pp 1–5
95. Sevik A, Erdogmus P, Yalein E (2018) Font and Turkish letter recognition in images with deep
learning. in 2018 International Congress on Big Data, Deep Learning and Fighting Cyber Terrorism
(IBIGDELFT): IEEE, pp 61–64
96. Shahidi S, Vahdat S, Atapour A, Reisizadeh S, Soltaninejad F, Maghami-Mehr A (2022) The clini-
cal course and risk factors in COVID-19 patients with acute kidney injury. J Fam Med Prim Care
11(10):6183–6189
97. Shen G, Han C, Chen B, Dong L, Cao P (2018) Fault analysis of machine tools based on grey rela-
tional analysis and main factor analysis. J Phys: Conf Ser 1069(1: IOP Publishing):012112
98. Shen G, Zeng W, Han C, Liu P, Zhang Y (2017) Determination of the average maintenance time of
CNC machine tools based on type II failure correlation. Eksploatacja i Niezawodność 19(4)

13
Multimedia Tools and Applications

99. Shi Z, Zhang H, Jin C, Quan X, Yin Y (2021) A representation learning model based on variational
inference and graph autoencoder for predicting lncRNA-disease associations. BMC Bioinformatics
22(1):1–20
100. Simpson T, Dervilis N, Chatzi E (2021) Machine learning approach to model order reduction of non-
linear systems via autoencoder and lstm networks. J Eng Mech 147(10):04021061
101. Sohangir S, Wang D, Pomeranets A, Khoshgoftaar TM (2018) Big Data: Deep Learning for financial
sentiment analysis. J Big Data 5(1):1–25
102. Song F, Liu Y, Shen D, Li L, Tan J (2022) Learning control for motion coordination in wafer scan-
ners: Toward gain adaptation. IEEE Trans Industr Electron 69(12):13428–13438
103. Song Y, Xin R, Chen P, Zhang R, Chen J, Zhao Z (2023) Identifying performance anomalies in
fluctuating cloud environments: a robust correlative-GNN-based explainable approach. Futur Gener
Comput Syst 145:77–86
104. Subhashini PS, Ram MSS, Rao DS. DNN-RBF & AHHO for speaker recognition using MFCC
105. Tian J, Hou M, Bian H et al (2023) Variable surrogate model-based particle swarm optimization for
high-dimensional expensive problems. Complex Intell Syst 9:3887–3935. https://​doi.​org/​10.​1007/​
s40747-​022-​00910-7
106. Ullah W, Ullah A, Haq IU, Muhammad K, Sajjad M, Baik SW (2021) CNN features with bi-
directional LSTM for real-time anomaly detection in surveillance networks. Multimed Tools Appl
80(11):16979–16995
107. Utkin L, Drobintsev P, Kovalev M, Konstantinov A (2021) Combining an autoencoder and a vari-
ational autoencoder for explaining the machine learning model predictions. in 2021 28th Conference
of Open Innovations Association (FRUCT), IEEE, pp 489–494
108. Vahdat S (2022) A review of pathophysiological mechanism, diagnosis, and treatment of thrombosis
risk associated with COVID-19 infection. IJC Heart & Vasculature 41:101068
109. Vahdat S (2022) The effect of selenium on pathogenicity and mortality of COVID-19: focusing on the
biological role of selenium. J Pharm Negat Results, pp 235–242
110. Vahdat S (2021) Association between the use of statins and mortality in COVID-19 patients: A meta-
analysis. Tob Regul Sci 7(6):6764–6779
111. Vahdat S (2022) The role of IT-based technologies on the management of human resources in the
COVID-19 era. Kybernetes 51(6):2065–2088
112. Vahdat S (2022) Clinical profile, outcome and management of kidney disease in COVID-19
patients—A narrative review. Eur Rev Med Pharmacol Sci 26(6):2188–2195
113. Wang H, Cui Z, Liu R, Fang L, Sha Y (2023) A multi-type transferable method for missing link
prediction in heterogeneous social networks. IEEE Trans Knowl Data Eng. https://​doi.​org/​10.​1109/​
TKDE.​2022.​32334​81
114. Wang M, Deng W (2020) Mitigating bias in face recognition using skewness-aware reinforcement
learning. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition,
pp 9322–9331
115. Wang Y, Dong M, Shen J, Wu Y, Cheng S, Pantic M (2020) Dynamic face video segmentation via
reinforcement learning, in Proceedings of the IEEE/CVF conference on computer vision and pattern
recognition, pp 6959–6969
116. Wang H, Gao Q, Li H, Wang H, Yan L, Liu G (2022) A structural evolution-based anomaly detection
method for generalized evolving social networks. Comput J 65(5):1189–1199
117. Wang B, Shen Y, Li N, Zhang Y, Gao Z (2023) An adaptive sliding mode fault‐tolerant control of a
quadrotor unmanned aerial vehicle with actuator faults and model uncertainties. Int J Robust Nonlin-
ear Control
118. Wang Y, Yan J, Yang Z, Jing Q, Wang J, Geng Y (2022) GAN and CNN for imbalanced partial dis-
charge pattern recognition in GIS. High Voltage 7(3):452–460
119. Wang B, Zhu D, Han L, Gao H, Gao Z, Zhang Y (2023) Adaptive fault-tolerant control of a hybrid
canard rotor/wing UAV under transition flight subject to actuator faults and model uncertainties. In:
IEEE Trans Aerosp Electron Syst. https://​doi.​org/​10.​1109/​TAES.​2023.​32435​80
120. Xia K, Huang J, Wang H (2020) LSTM-CNN architecture for human activity recognition. IEEE
Access 8:56855–56866
121. Xiong Z, Li X, Zhang X et al (2023) A comprehensive confirmation-based selfish node detection algo-
rithm for socially aware networks. J Sign Process Syst. https://​doi.​org/​10.​1007/​s11265-​023-​01868-6
122. Xu R, Chen J, Han J, Tan L, Xu L (2020) Towards emotion-sensitive learning cognitive state analysis
of big data in education: deep learning-based facial expression analysis using ordinal information.
Computing 102(3):765–780
123. Xu J, Guo K, Sun PZ (2022) Driving performance under violations of traffic rules: novice vs. experi-
enced drivers. IEEE Trans Intell Veh 7(4):908–917

13
Multimedia Tools and Applications

124. Xu W, Jang-Jaccard J, Singh A, Wei Y, Sabrina F (2021) Improving performance of autoencoder-


based network anomaly detection on nsl-kdd dataset. IEEE Access 9:140136–140146
125. Xu J, Pan S, Sun PZH, Hyeong Park S, Guo K (2023) Human-factors-in-driving-loop: driver identifi-
cation and verification via a deep learning approach using psychological behavioral data. IEEE Trans
Intell Transp Syst 24(3):3383-3394. https://​doi.​org/​10.​1109/​TITS.​2022.​32257​82
126. Yan A et al (2022) LDAVPM: a latch design and algorithm-based verification protected against multi-
ple-node-upsets in harsh radiation environments. IEEE Trans Comput-Aided Des Integr Circ Syst
127. Yang M, Nazir S, Xu Q, Ali S (2020) Deep learning algorithms and multicriteria decision-making
used in big data: a systematic literature review. Complexity:2020
128. Yumusak S, Layazali S, Oztoprak K, Hassanpour R (2021) Low-diameter topic-based pub/sub over-
lay network construction with minimum–maximum node degree. PeerJ Computer Science 7:e538
129. Zenggang X et al (2022) Social similarity routing algorithm based on socially aware networks in the
big data environment. J Signal Process Syst 94(11):1253–1267
130. Zerdoumi S et al (2018) Image pattern recognition in big data: taxonomy and open challenges: survey.
Multimed Tools Appl 77(8):10091–10121
131. Zhang DJ et al. (2021) MorphMLP: A self-attention free, MLP-like backbone for image and video.
arXiv preprint arXiv:2111.12527
132. Zhang X, Huang D, Li H, Zhang Y, Xia Y, Liu J (2023) Self-training maximum classifier discrepancy
for EEG emotion recognition. CAAI Transactions on Intelligence Technology
133. Zhang J, Liu Y, Li Z, Lu Y (2023) Forecast-assisted service function chain dynamic deployment for
SDN/NFV-enabled cloud management systems. IEEE Syst J. https://​doi.​org/​10.​1109/​JSYST.​2023.​
32638​65
134. Zhang Z, Mansouri Tehrani A, Oliynyk AO, Day B, Brgoch J (2021) Finding the next superhard
material through ensemble learning. Adv Mater 33(5):2005112
135. Zhang JZ, Srivastava PR, Sharma D, Eachempati P (2021) Big data analytics and machine learning:
A retrospective overview and bibliometric analysis. Expert Syst Appl 184:115561
136. Zhang Y, Wang X, Han N, Zhao R (2021) Ensemble learning based postpartum hemorrhage diagno-
sis for 5g remote healthcare. IEEE Access 9:18538–18548
137. Zhang X, Wen S, Yan L, Feng J, Xia Y (2022) A hybrid-convolution spatial–temporal recurrent net-
work for traffic flow prediction. The Comput J:bxac171
138. Zhao H and Jin X (2020) Human action recognition based on improved fusion attention cnn and rnn.
in 2020 5th International Conference on Computational Intelligence and Applications (ICCIA), 2020:
IEEE, pp 108–112
139. Zheng W et al (2022) A few shot classification methods based on multiscale relational networks. Appl
Sci 12(8):4059
140. Zheng W, Zhou Y, Liu S, Tian J, Yang B, Yin L (2022) A deep fusion matching network semantic
reasoning model. Appl Sci 12(7):3416
141. Zhou L, Ye Y, Tang T, Nan K, Qin Y (2021) Robust matching for SAR and optical images using mul-
tiscale convolutional gradient features. IEEE Geosci Remote Sens Lett 19:1–5
142. Zhou G, Zhang R, Huang S (2021) Generalized buffering algorithm. IEEE Access 9:27140–27157
143. Zong C, Wan Z (2022) Container ship cell guide accuracy check technology based on improved 3D
point cloud instance segmentation. Brodogradnja: Teorija i praksa brodogradnje i pomorske tehnike
73(1):23–35
144. Zong C, Wang H (2022) An improved 3D point cloud instance segmentation method for overhead
catenary height detection. Comput Electr Eng 98:107685

Publisher’s Note  Springer Nature remains neutral with regard to jurisdictional claims in published maps and
institutional affiliations.

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under
a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted
manuscript version of this article is solely governed by the terms of such publishing agreement and applicable
law.

13
Multimedia Tools and Applications

Authors and Affiliations

Zahra Amiri1 · Arash Heidari1,2 · Nima Jafari Navimipour3,4   · Mehmet Unal5 ·


Ali Mousavi6

* Arash Heidari
arash_heidari@ieee.org
* Nima Jafari Navimipour
nima.navimipour@khas.edu.tr
Zahra Amiri
zahraamiriii1398@gmail.com
1
Department of Computer Engineering, Tabriz Branch, Islamic Azad University, Tabriz, Iran
2
Department of Software Engineering, Haliç University, Istanbul 34060, Türkiye
3
Department of Computer Engineering, Kadir Has Universitesi, Istanbul, Türkiye
4
Future Technology Research Center, National Yunlin University of Science and Technology,
Douliou 64002, Taiwan
5
Department of Computer Engineering, Nisantasi Universitesi, Istanbul, Türkiye
6
Islamic Azad University, Sanandaj Branch, Sanandaj, Iran

13

You might also like