Artificial Intelligence and Knowledge Processing Improved Decision-Making and Prediction (Etc.)
Artificial Intelligence and Knowledge Processing Improved Decision-Making and Prediction (Etc.)
Artificial Intelligence and Knowledge Processing Improved Decision-Making and Prediction (Etc.)
Knowledge Processing
Artificial intelligence (AI) and knowledge processing play a vital role in various
automation industries and their functioning in converting traditional industries to
AI-based factories. This book acts as a guide and blends the basics of AI in various
domains, which include machine learning, deep learning, artificial neural networks,
and expert systems, and extends their application in all sectors.
The book discusses the designing of new AI algorithms used to convert general
applications to AI-based applications. It highlights different machine learning and
deep learning models for various applications used in healthcare and wellness,
agriculture, and automobiles. The book offers an overview of the rapidly growing
and developing field of AI applications, along with knowledge of engineering and
business analytics. Real-time case studies are included across several different fields
such as image processing, text mining, healthcare, finance, digital marketing, and
HR analytics. The book also introduces a statistical background and probabilistic
framework to enhance the understanding of continuous distributions. Topics such as
ensemble models, deep learning models, artificial neural networks, expert systems,
and decision-based systems round out the offerings of this book.
This multicontributed book is a valuable source for researchers, academics,
technologists, industrialists, practitioners, and all those who wish to explore the
applications of AI, knowledge processing, deep learning, and machine learning.
Artificial Intelligence and
Knowledge Processing
Improved Decision-Making and
Prediction
Edited by
Hemachandran K, Raul V. Rodriguez,
Umashankar Subramaniam and
Valentina Emilia Balas
Design cover image: © Shutterstock
First edition published 2024
by CRC Press
2385 Executive Center Drive, Suite 320, Boca Raton, FL 33431
and by CRC Press
4 Park Square, Milton Park, Abingdon, Oxon, OX14 4RN
CRC Press is an imprint of Taylor & Francis Group, LLC
© 2024 selection and editorial matter, Hemachandran K., Raul V. Rodriguez, Umashankar
Subramaniam, and Valentina Emilia Balas; individual chapters, the contributors
Reasonable efforts have been made to publish reliable data and information, but the
author and publisher cannot assume responsibility for the validity of all materials or
the consequences of their use. The authors and publishers have attempted to trace
the copyright holders of all material reproduced in this publication and apologize to
copyright holders if permission to publish in this form has not been obtained. If any
copyright material has not been acknowledged please write and let us know so we
may rectify in any future reprint.
Except as permitted under U.S. Copyright Law, no part of this book may be reprinted,
reproduced, transmitted, or utilized in any form by any electronic, mechanical, or
other means, now known or hereafter invented, including photocopying, microfilming,
and recording, or in any information storage or retrieval system, without written
permission from the publishers.
For permission to photocopy or use material electronically from this work, access
www.copyright.com or contact the Copyright Clearance Center, Inc. (CCC), 222
Rosewood Drive, Danvers, MA 01923, 978–750–8400. For works that are not
available on CCC please contact mpkbookspermissions@tandf.co.uk
Trademark notice: Product or corporate names may be trademarks or registered
trademarks and are used only for identification and explanation without intent to infringe.
Library of Congress Cataloging‑in‑Publication Data
Names: K., Hemachandran, editor. | Rodriguez, Raul Villamarin, editor. |
Subramaniam, Umashankar, editor. | Balas, Valentina Emilia, editor.
Title: Artificial intelligence and knowledge processing : improved decision-making
and prediction / Hemachandran K., Raul V. Rodriguez, Umashankar Subramaniam,
and Valentina Emilia Bas.
Description: Boca Raton : CRC Press, 2023. | Includes bibliographical references
and index.
Identifiers: LCCN 2023010710 (print) | LCCN 2023010711 (ebook) |
ISBN 9781032354163 (hardback) | ISBN 9781032357577 (paperback) |
ISBN 9781003328414 (ebook)
Subjects: LCSH: Artificial intelligence—Case studies. | Decision making—Data processing.
Classification: LCC TA347.A78 A7889 2023 (print) | LCC TA347.A78 (ebook) |
DDC 006.3—dc23/eng/20230609
LC record available at https://lccn.loc.gov/2023010710
LC ebook record available at https://lccn.loc.gov/2023010711
ISBN: 978-1-032-35416-3 (hbk)
ISBN: 978-1-032-35757-7 (pbk)
ISBN: 978-1-003-32841-4 (ebk)
DOI: 10.1201/9781003328414
Typeset in Times
by Apex CoVantage, LLC
Contents
Preface����������������������������������������������������������������������������������������������������������������������� ix
Editors’ Biographies������������������������������������������������������������������������������������������������� xi
List of Contributors������������������������������������������������������������������������������������������������� xiii
v
vi Contents
Index���������������������������������������������������������������������������������������������������������������������� 367
Preface
Artificial intelligence (AI) is essential because it gives computers the ability to plan,
comprehend, reason, communicate, and perceive. AI technology is effective because
it efficiently processes vast amounts of data. Knowledge processing representing a
variety of distinct ways of creating knowledge, including socialization, externaliza-
tion, combination, and internalization, is fundamental to the numerous automation
industries and their operations. On the other hand, computers must enhance their
capacity for prediction to become more proficient at making decisions.
According to experts, artificial intelligence is a component of production that can
open up new avenues for growth and transform how labour is performed across sec-
tors. For instance, the PricewaterhouseCooper (PWC) article estimates that by 2035,
AI might contribute $15.7 trillion to the world economy. With approximately 70%
of the worldwide influence, China and the United States are well positioned to reap
the greatest rewards from the impending AI boom. According to a Deloitte poll,
75% of firms believe that sharing and maintaining knowledge across their changing
workforces are crucial to their success. Only 9% of businesses claim to be prepared
to confront this trend, while 55% of enterprise data remains inactive.
This book covers applications of analytics techniques and their limits, statistics
and probability, the incorporation of robotics and AI in the medical and health sec-
tors, AI and Internet of Things (IOT) integration for smart systems, polarity in native
language comments on the Internet, edge and fog computing techniques, denoising
using autoencoders, prediction of terrorist attacks, breast cancer prediction using
machine learning algorithms, and machine learning techniques for detecting and
analysing online fake reviews.
We believe that this book will help students, businesspeople, academics, mentors,
and anyone else interested in learning more about the uses of AI. We are grateful
to our contributors, who come from prestigious institutions and businesses and who
made a significant contribution by sharing their expertise for societal benefit. We
would like to extend our sincere gratitude to our editing and production staff for their
tireless efforts and unwavering support in helping us publish this book on schedule.
ix
Editors’ Biographies
Hemachandran K is currently working as a professor in the Department of Analytics
and Artificial Intelligence at the School of Business, Woxsen University, Hyderabad,
Telangana, India. He is a passionate teacher with 14 years of teaching experience and
5 years of research experience. His research interests are machine learning, deep
learning, computer vision, natural language processing (NLP), knowledge engi-
neering, and decision support systems. He has three patents to his credentials. He
has more than 20 journals and international conference publications to his credit
and has served as a resource person at various national and international scientific
conferences.
xi
Contributors
Balajee A Sunitha Purushottam Ashtikar
Assistant Professor Research Scholar
Department of Computer Science and School of Business
Engineering SR University
Srinivasa Ramanujan Centre Warangal, India
SASTRA (Deemed to be University)
Kumbakonam, India Gogineni Venkata Ashwith
Woxsen University
Mir Aadil Hyderabad, India
Assistant Professor
School of Computer Science and IT D. Suresh Babu
Jain (Deemed to be University) VR Siddhartha Engineering College
Bangalore, India Vijayawada
India
Tejaswini Aala
Woxsen School of Business M. Balamurugan
Woxsen University
Associate Professor
Hyderabad, India
Department of CSE
CHRIST (Deemed to be University)
Kirti Aija
Woxsen University
Divya Batra
Hyderabad, India
National Institute of Fashion
Reham Alahmadi Technology
Department of Basic Science New Delhi, India
College of Science and Theoretical
Studies K. Bharath
Medinah-Female Branch Assistant Professor of Computer
Saudi Electronic University Science & Engineering
Riyad, Saudi Arabia Malla Reddy College of Engineering
Hyderabad, India
Sirisha Alamanda
Chaitnya Bharathi Institute of J. Bhuvana
Technology Associate Professor
Hyderabad, India Department of CSIT
Jain (Deemed to be University)
Dheeraj Anchuri
Student Tanushree Biswas
Woxsen University St. Xavier’s University
Hyderabad, India Kolkata
xiii
xiv Contributors
1.1 INTRODUCTION
Since the creation of the first computer, humans have concentrated on developing
various approaches to decrease the computer size and increase its operational
capacity. During the evolution of computer systems, researchers were interested
in creating machines that think, work, and act like humans [1]. This enthusiasm
induced the development of artificial intelligence theory (AI) and gave rise to the
creation of computer-based machines (e.g., robots) that have intelligence almost
like humans [2]. More precisely, AI is a set of algorithms and techniques that are
mutually and widely used nowadays to create machines and software solutions
emulating the capabilities of a human being. These solutions perform tasks that
used to be performed solely by a human, with few additional advantages com-
pared to their human counterparts, like the ability to perform these tasks with a
minimal margin of error and with a significantly decreased required time to find
the results.
According to John McCarthy, the father of AI, artificial intelligence is defined
as “the science and engineering of making intelligent machines, especially intel-
ligent computer programs” [3]. Additionally, the word artificial in AI stands for
human-created; the word intelligence represents the power of thinking. Therefore,
AI is a human-made machine with thinking power [3, 4].
In the literature, AI is divided into two main types: AI type 1, based on capa-
bilities, and AI type 2, based on functionality. AI type 1 includes three subtypes:
(i) narrow AI, (ii) general AI, and (iii) super AI. Narrow AI is a type of AI which
is able to perform a dedicated task with intelligence. Narrow AI (also called weak
AI) is only trained for one specific task. General AI is a type of intelligence that
can process human tasks. The main idea is to let these systems carry out daily tasks
without human intervention. Many research efforts are now focused on implement-
ing machines with general AI, but it is still an active research area. Super AI is a high
level of intelligent systems that can perform tasks better than humans with cognitive
properties such as the ability to think, plan, learn, and communicate. Super AI is still
a theoretical concept of AI. The real development of these systems is still in a very
early stage.
DOI: 10.1201/9781003328414-1 1
2 Artificial Intelligence and Knowledge Processing
In AI type 2, we can extract four subtypes [5]: (i) reactive machines, (ii) limited
memory machines, (iii) theory of mind, and (iv) self-aware AI. In the first type, reac-
tive machines have no memory and do not use past experiences to determine the best
actions. They simply perceive the world and react to it. In the second type, limited
memory, the machines hold data for a short period. However, they cannot add any
new information to the library of their experiences. The third type, theory of mind, is
where the researcher hopes to create a machine that imitates human mental models.
Finally, the self-aware AI has not been developed until now. In this type, machines
are conscious of themselves. They can perceive their internal states and others’ emo-
tions and act accordingly.
Nowadays, AI is integrated into our daily activity in many forms, including
computer gaming, Alexa, Google Assistant, etc. Recently, AI has also become
part of many fields like healthcare, social media, education, banking, and finance
[6]. Recent years have witnessed a significant expansion in digitized financial
services. AI has been considered a robust tool in the financial field [7]. Many
analytical tools, including machine learning, are used by firms to analyze data
collected over time. AI improves the pattern recognition step by the use of modern
statistical methods and large volumes of data to provide the best solution to any
defined problem set [8].
In the context of AI, machine learning is now gaining importance. Machine learn-
ing is a trendy concept that goes back many decades [9, 10]; it is considered a sub-
field of AI [9]. The idea of the machine learning concept is to develop programming
models that can process human activities by using a self-learning approach without
any human interaction. In the classical programming models, the human role is cru-
cial, while machine learning is based on the automation of analytical models where
the system can learn from given data sets and proceed with decisions with minimal
human interference [9–12].
The rest of this chapter is organized as follows. In Section 1.2, we present dif-
ferent types of AI. In Section 1.3, we describe AI systems and subsets. Section 1.4
discuss some relevant AI applications in the finance domains and other domains.
Finally, Section 1.5 concludes the chapter and presents the current challenges and
opportunities in this area of research.
Narrow AI, called also weak AI, performs only one task. Machines that
belong to this type target a single subset of cognitive capabilities and advance
in that scope. Recently, many applications of this type have become progres-
sively common as machine learning and deep learning methods continue to
develop. Apple Siri, Google Translate, image recognition software, spam fil-
tering, and Google’s page-ranking algorithm are example of systems of the
narrow AI type.
General AI, also known as strong AI, has the ability to understand, learn, and
perform any task just like a human. In this type, a machine can apply knowledge
and skills in different circumstances. Currently, machines of this kind do not exist.
Researchers are focused on developing machines with general AI.
Super AI exceeds human intelligence. In this type, the machine will be able to
perform any task better than a human can. Super AI is based on the concept that
machines may progress in order to be extremely close to humans in terms of senti-
ments and behaviors. In this type, machine plus human understanding, it also induces
its own emotions, needs, beliefs, and desires. Thinking, solving puzzles, making
judgments, and making decisions are some critical characteristics of super AI. The
existence of this type is still speculative.
4 Artificial Intelligence and Knowledge Processing
With respect to functionality, AI may be classified into four subtypes. For instance,
the reactive machine and the limited memory are considered a simple AI system with
limited functionality and performance. The theory of mind AI [15] performs human
tasks with a high level of proficiency and is considered to be the most sophisticated
and developed type of AI.
The AI systems based on functionality (see Figure 1.1) can be classified into four
subtypes: reactive machine, limited memory, theory of mind, and self-aware [2, 4].
Next, we will detail each subtype.
The reactive machine is the basic type of AI systems. In this type, machines
are built without memory-based functionality. They have no capability to use old
experiences in order to correct current decisions. Therefore, they are not able
to learn. They only study the current situation and select the best action among
possible ones. A common example of this type is Deep Blue, the IBM chess
program that beat Garry Kasparov in the 1990s [16]. This system can identify
pawns on the chessboard and make the best move without retaining memories
or using the old experiences. These AI systems act according to a complete and
direct observation of the environment without any previous knowledge about
the world.
In the limited memory type, machines are built with a small amount of memory.
Therefore, they have a limited capability to use past experiences in order to make
new decisions. They can hold data for a short time and have limited capacity. As well
as reactive capabilities, these machines are able to learn from historical data to make
better decisions. Self-driving vehicles, chatbots, virtual assistants, and many existing
applications fall under this type [16].
Theory of mind AI [15] is a psychology term. In the AI field, it means that
machines must have social intelligence and understand human emotions. The
goal of developing such machines is to simulate real emotions and beliefs using
computers. This simulation can influence future decisions. Nowadays, several
models are used to appreciate human behavior. However, a model that includes
its own mind is not yet created. These systems can understand human require-
ments and predict behavior. Also, they have the capability to interact with people
and identify their needs, emotions, and requirements. As well, they can predict
behaviors. Bellhop Robot is an example of this type. It is created in order to be
used in hotels. It has the ability to assess people’s demands who come stay at the
hotel.
Up to now, the self-aware type of AI does not exist. Machines of this kind will be
smarter than human beings. In addition to perceiving human emotions, such systems
will be able to understand their internal state and conditions. As well, they have their
own demands, emotions, and faith.
The next section discusses how an AI system works by discussing AI systems and
subsets.
objectives. Using sensors or specific given inputs, it perceives the real or virtual
environment in order to take the appropriate decisions or actions [17]. The general
structure of AI systems is represented in Figure 1.2.
AI is composed of several subsets. In this chapter, only three subsets are rep-
resented (see Figure 1.3): machine learning, deep learning, and natural language
processing.
TABLE 1.1
ML Techniques Comparison
Description Advantages Disadvantages
Dimensionality Based on converting the higher – Time, computation, – Data loss due to
reduction dimensions data set into a and storage reduction reduction
lesser dimensions data set. – Accuracy increase due – Misinterpretation of
This conversion should be to less data principal components
ensured to conserve similar interpretation due to fewer features
information. – Noise and interference used
removal – Undesirable for
non-linear data
Ensemble It is a general meta-approach – Higher predictive – More cost to train and
to machine learning that accuracy deploy
produces better predictive – Useful for linear and – Less interpretable
performance by combining non-linear data types – Time and space
the predictions from multiple consumption
models.
Decision tree The most popular tool for – Simple to interpret – Unstable due to major
classification and prediction.– Simple to understand results changing even
It is a flowchart-like tree – Logarithmic order of for small data
structure that predicts growth used in training variations
decisions with their outcomes data due to its tree – Time consumption in
and costs. structure training when inputs
– Little data preparation increase
Rules system It is a system that uses – Availability to users – Deep knowledge
human-made rules in order to – Cost-efficient – Manual work
store, sort, and manipulate – End result accurate – Time-consuming to
data. Therefore, it mimics – High in terms of speed generate rules
human intelligence. – Less learning capacity
Generally, such systems need
a source of data and a set of
rules for manipulating that
data.
Many algorithms are used in ML such as decision tree, regression, rules systems,
ensemble approach, and others. Table 1.1 summarizes a comparison between these
techniques.
and structures from large data sets. Generally, DL does not use prior data processing.
It automatically extracts features from available data. Figure 1.4 represents the main
difference between ML and DL.
In ML techniques, a feature extraction step will be needed before applying a
model. This step is very complex and needs the intervention of an expert and may be
manual. However, DL does not necessarily need features. Hence, there is no need to
an expert for manually define any features in the model [19].
phonemic rules. The variation in stress and intonation across a sentence is repre-
sented by phonemic rules [23].
Morphology is usually used for identifying the part of the sentence in which
words interact together. It represents the word formation [22]. In addition, the set of
relations between words’ surface forms and lexical forms are also represented in this
level.
The structural relationships between words of a sentence are studied in the
syntax level [23]. Syntax involves the use of Afan Oromo grammar rules. It con-
sists of analyzing the words in a sentence in order to depict its grammatical
structure [22].
Semantic analysis aims to understand the meaning of natural language
[24]. Understanding natural language appears an effortless process to humans.
Nevertheless, interpreting natural language is an extremely complex task for
machines due to the extensive difficulty and subjectivity engaged in human language.
Semantic analysis of natural language represents the meaning of a given text while
considering the grammar roles and the logical structuring of sentences and grammar
[22]. It uses the several approaches, such as first-order predicate logic, in order to
represent the meaning [21].
Pragmatics consists of analyzing the real meaning of an expression in a human
language by defining and reviewing it. This can be done by identifying vagueness
encountered by the system and resolving it.
In the next section, we will describe some relevant AI applications in the financial
field as well as in other fields (e.g., healthcare).
1.4 AI APPLICATIONS
The broad areas in which AI is currently operating and contributing include but are
not limited to finance, medicine, education, robotics, information management, biol-
ogy, space, NLP, and many other critical areas supporting people’s various activities
in their daily life.
Following in this section, some applications of AI in different areas will be elab-
orated on, mainly in finance and a few other major areas in which AI is considered
a major game-changer.
Introduction to Artificial Intelligence 9
The first stage consists in collecting the data related to prices from the exchange,
news data from companies like Reuters and Bloomberg.
The second stage consists of analyzing this data where the algorithm trading exe-
cutes complex analysis to try making profits.
The third stage consists of applying some simulations to check the outcomes. For
example, if we decided to buy a particular stock, is this a good option?
The last stage is the decision to buy/sell/hold, including some details (e.g., quan-
tity, the time to trade) by sending trading signals. The signals are directed
to the exchanges, and the trading orders are performed without any human
intervention.
In some scenarios, the investor can initiate the trading action manually or simply
ignore the signals. This is a semi-automatic option that can be provided by the system.
Introduction to Artificial Intelligence 11
1.5 CONCLUSION
Over the past 60 years, the field of AI has made considerable advancements but
has also been met with periods of hype. However, more recently researchers have
been able to create many technical breakthroughs in rapid succession that enabled
machines to outperform humans in fields that required intelligence. As a result, many
real-world applications, as mentioned earlier in this section, have used AI to provide
humanity with much-needed benefits for businesses and economies and contributed
to productivity growth and innovation.
Despite this considerable progress, many challenges are persisting and preventing
us from applying AI in many other areas or improving current applications to cover
other aspects or have more efficient and reliable results. Therefore, more effort needs
to be put into what is usually referred to as “artificial general intelligence” that allows
solving more complex problems that tackle things like any human being trying to
solve such problems; however, this level in AI is considered still unreachable by most
researchers.
A briefing note published in 2018 by McKinsey Global Institute [32] dis-
cusses a few challenges that are hampering the evolution of AI and its expansion
to other areas. One of these challenges is the effort required to manually label
data when using one of the most used techniques in AI, which is the supervised
approach in ML. For example, in the context of an application that diagnoses
patients with a specific medical problem using lab tests as input, it is required
to have huge data sets and label every element in these data sets to determine
if the problem is occurring and use these data sets to train the application to be
able to determine for a new element if it has the medical problem or not. Note
that obtaining this training data is considered another challenge, as in many
areas this data could be simply unavailable, and therefore this type of technique
would not be applicable.
Another challenge that poses problems sometimes is the inability to explain
the output using some technique like deep learning, as it is based on having
a complex black box that performs some processing and outputs a decision or
prediction as a result. In some applications, the result needs to be well explained
in order to ensure the application’s transparency, credibility, and trust among
people using this output, as in the case of applications for judicial investiga-
tions, where investigators are required to provide proofs and explanations to get
approval for further actions.
Building generalized learning techniques, where applications can benefit and use
their experiences in other similar circumstances as humans are able to do, is another
challenge that needs to be overcome. Transfer learning is an AI technique that is
currently under study and that could provide a response to this challenge and allow
AI models to apply their previous learning to perform other activities in similar but
different circumstances.
All of these challenges are definitely obstructing the expansion of AI to give a
hand in solving other problems in several areas; however, they also represent oppor-
tunities for this field to grow if they are handled appropriately to develop new AI
techniques or improve the existing ones to overcome their limitations.
Introduction to Artificial Intelligence 13
REFERENCES
[1] I. Ananth, “Artificial intelligence,” 2018. [Online]. Available: https://witanworld.com/
article/2021/04/17/artificial-intelligence/.
[2] A. Karthikeyan and U.D. Priyakumar, “Artificial intelligence: Machine learning for
chemical sciences,” Journal of Chemical Sciences, 134, 2022, pp. 1–20.
[3] J. McCarthy, “Artificial intelligence tutorial—It’s your time to to innovate the future,” 2019.
[Online]. Available: https://data-flair.training/blogs/artificial-intelligence-ai-tutorial/.
[4] V. Jokanović, Artificial intelligence, Taylor and Francis, Boca Raton, London and New
York, 2022.
[5] E. Kambur, “Emotional intelligence or artificial intelligence?: Emotional artificial intel-
ligence,” Florya Chronicles of Political Economy, 7(2), 2021, pp. 147–168.
[6] A. Dhamanda, M. I. M.H. Ahmad, M. S. Arshad, M. Zubair, M. Javed, and P. Bazyar,
“Artificial intelligence applications,” in Artificial intelligence applications, Iksad Publi-
cations, Ankara, Turkey, 2021.
[7] D. Bholat and D. Susskind, “The assessment: Artificial intelligence and financial ser-
vices,” Oxford Review of Economic Policy, 37(3), 2021, pp. 417–434.
[8] H. Arslanian and F. Fischer, “Applications of artificial intelligence in financial services,”
The Future of Finance, 2019, pp. 179–197.
[9] Priyadharshini, “Machine learning: What it is and why it matters,” from Simpli
Learn, 2017. [Online]. Available: www.simplilearn.com/what-is-machine-learning-
and-why-it-matters-article.
[10] P. Sodhi, N. Awasthi, and V. Sharma, “Introduction to machine learning and its basic
application in python,” in Proceedings of 10th international conference on digital strat‑
egies for organizational success, 2019.
[11] R. Choi, A. Coyner, J. Kalpathy-Cramer, M. Chiang, and J. Campbell, “Introduction to
machine learning, neural networks, and deep learning,” Translational Vision Science &
Technology, 9(2), 2020.
[12] T. Hastie, R. Tibshirani, and J. Friedman, The elements of statistical learning: Data
mining, inference, and prediction (vol. 2, pp. 1–758), Springer, New York, 2009.
[13] H. Lu, Y. Li, M. Chen et al., “Brain intelligence: Go beyond artificial intelligence,”
Mobile Networks and Applications, 23(2), 2018, pp. 368–375.
[14] EU High-Level Expert Group on Artificial Intelligence, “Ethics guidelines for trustwor-
thy AI [text],” FUTURIUM–European Commission, 2019. Available: https://ec.europa.
eu/futurium/en/ai-allianceconsultation/guidelines.
[15] L. Tucci, “Ultimate guide to artificial intelligence to enterprise,” 2020. [Online]. Availa-
ble: https://searchenterpriseai.techtarget.
[16] N. Joshi, “Types of artificial intelligence,” 2020. [Online]. Available: www.forbes.com/
sites/cognitiveworld/2019/06/19/7-types-of-artificialintelligence/#3e68129b233e.
[17] OECD, Scoping the OECD AI principles: Deliberations of the expert group on artificial
intelligence at the OECD (AIGO), OECD Publishing, Paris, France, 2019.
[18] C. Angermueller, T. Pärnamaa, L. Parts, and O. Stegle, “Deep learning for computational
biology,” Molecular Systems Biology, 12(7), 2016.
[19] P. Ongsulee, “Artificial intelligence, machine learning and deep learning,” in 2017 15th
international conference on ICT and knowledge engineering (ICT&KE). IEEE, 2017.
[20] J. Hirschberg and C.D. Manning, “Advances in natural language processing,” Science,
349(6245), 2015, pp. 261–266.
[21] A. Reshamwala, D. Mishra, and P. Pawar, “Review on natural language processing,”
IRACST Engineering Science and Technology: An International Journal (ESTIJ), 3(1),
2013, pp. 113–116.
[22] A. Abeshu, “Analysis of rule based approach for Afan Oromo automatic morphological
synthesizer,” Science, Technology and Arts Research Journal, 2(4), 2013, pp. 94–97.
14 Artificial Intelligence and Knowledge Processing
[23] B. Abera and S. Dechasa, “A review of natural language processing techniques: Appli-
cation to Afan Oromo,” International Journal of Computer Applications Technology and
Research, 10, 2021, pp. 51–54.
[24] T. Nasukawa and J. Yi, “Sentiment analysis: Capturing favorability using natural lan-
guage processing,” in Proceedings of the 2nd international conference on Knowledge
capture, 2003.
[25] C. Kirchner and J. Gade, Implementing social network analysis for fraud prevention,
CGI Gr, Mumbai, India, 2011.
[26] V. Ravi and S. Kamaruddin, “Big data analytics enabled smart financial services: Oppor-
tunities and challenges.,” in International conference on big data analytics (pp. 15–39),
Springer, Cham, December, 2017.
[27] X. Qin, “Making use of the big data: Next generation of algorithm trading,” in Interna‑
tional conference on artificial intelligence and computational intelligence (pp. 34–41),
Springer, Berlin, Heidelberg, October 2012.
[28] “Renaissance Technologies. In Wikipedia,” 13 May 2022. [Online].
[29] “New forum center to advance global cooperation on fourth industrial revolution,”
World Economic Forum, [Online]. Available: www.weforum.org/press/2016/10/new-
forum-center-to-advance-global-cooperation-on-fourth-industrial-revolution/.
[Accessed 16 May 2022].
[30] V. Vinolyn, I. R. Thomas, and L. A. Beena, “Framework for approaching blockchain in
healthcare using machine learning,” In Blockchain and machine learning for E-healthcare
systems, n.d.
[31] “Artificial intelligence in education market size, share, analysis report,” Market
Research Engine, 15, 2021. [Online]. Available: www.marketresearchengine.com/
artificial-intelligence-in-education-market.
[32] M. G. Institute, The promise and challenge of the age of artificial intelligence, Briefing
Note, Washington, DC, 2018.
2 AI and Human
Cognizance
P. Sanjna, D. Sujith, K. Vinodh,
and B. Vasavi
2.1 HISTORY OF AI
Artificial intelligence (AI) was first introduced in the year 1956 by John McCarthy
in a conference. The goal behind AI is machines operating like humans and under-
standing if the machines have the ability to think and learn by themselves (Christof
Koch, 2019). Alan Turing who was a mathematician, has put his hypothesis and ques-
tions into an action and analyzed if the machines can think, which was called as
Turing test and enabled the machine to think like humans.
DOI: 10.1201/9781003328414-2 15
16 Artificial Intelligence and Knowledge Processing
process for thinking derived from the interaction of neurons in the human frontal
cortex. To give just one example, computer-based comprehension has enabled the
destruction and recovery of word-related plans. A handful of people have also intro-
duced the “machine risk thought.” Before AI can be used in general, it must first be
understood. At its core, modernized believing is an extension of human data, and its
progress is reliant on computational advancements. For an unusually lengthy period,
copied data has made the essential strides not to coordinate attention in specialists.
Care clearly refers to several aspects of human understanding that are necessary for
our finest intellectual abilities: opportunity, strength, thrilling experience, learning,
and thought, to name a few.
2.3 TYPES OF AI
The first two types of AI, reactive AI and limited memory AI, shown in Figure 2.1
are simple and would help better life for all humans, whereas the other two, theory
of mind AI and self-aware AI, would come at a greater loss and risk for human civ-
ilization as they can understand humans and make decisions on their own using the
intelligence they have.
2.6 ETHICS OF AI
Human-made brainpower morals, as well as AI morals, are a bunch of con-
victions, ideas, and strategies that manage moral conduct in the creation and
arrangement of the AI situation utilizing generally acknowledged good and bad
standards.
Robot ethics, commonly referred to as robo ethics or machine ethics, is the
study of how to build ethical robots and what rules should be followed to guar-
antee that robots behave ethically. Robo ethics is concerned with issues such
as whether robots will be able to address a long-term risk to persons and if
the use of explicit robots, such as killer robots in wars, will be detrimental
to humans. Roboticists must ensure that autonomous structures may conduct
ethically acceptable behavior when robots, AI structures, and other free sys-
tems, such as self-driving vehicles, work with humans (Cangelosi & Schlesinger,
2015; Cangelosi, 2010).
Computerized reasoning (AI) and mechanical technology are adjusting and
changing our human headway overall. In the public area, applying AI moral norms to
the preparation and execution of algorithmic or savvy systems and AI drives is basic.
Simulated intelligence will be made and utilized in an ethical, safe, and reliable way
in a similar way as human-made mindfulness ethics. Computer-based intelligence
morals and wellbeing should be an essential need in the plan and arrangement of
AI and Human Cognizance 19
The main purpose of the research is to spotlight the obstacles that are faced by
engineers and that they have overcome in a process to develop robots in a more
natural manner, where a robot can process the language naturally like us humans
(natural language processing [NLP]). Providing solid and provable presentations
on the current knowledge of robots and their performance, their ability to integrate
speech and language, and their ability to adapt to new situations and operate mostly
on real-time assumptions instead of analysis and providing a certain and simple way
of robots able to speak are important issues in this field. Language itself is dynamic,
intellectual, and social.
In order to make the robots behave and speak like individuals in practical and fair
situations, there is a need to build approaches that actually allow robots to under-
stand the expressions and phrases that are connected to a real-world environment.
Creating robots that converse regularly with people in social settings is one example.
We need to understand that language is neither a physical entity nor a genuine signal,
but rather a distinct framework in which the meaning of signing alters depending on
the context.
A ‘vernacular’ is a group of languages that are not identical in language and are
understood by people who speak other dialects of the same language. The accent of
the language, the words that are used, and the individuals portraying and organizing
their speech are said to be examples of how languages can vary from one another. It
might be a combination of geographical and social reasons, and people who use or
speak the same language are more likely to reside in a neighborhood.
Etymologists use the term ‘lects’ in several circumstances. This term refers to the
way people speak inside a discourse community that recognizes them in some way,
but not because of friendly circumstances or where they live.
We have a similar mental structure and are ‘wired’ for language, we dwell on
the same planet, and we may have (usually) comparable interactions. However,
when it comes to language, particularly language development, we all have different
approaches to deciphering our interactions and incorporating them into the language
creation process. For you, one sound may represent something, whereas for the other
person, a different sound may imply the same thing. I may use one gesture to convey
a message, whereas another person may use a different signal. As a result, people
in different parts of the country invented their own methods of pronouncing “man,”
“woman,” “dog,” and so on.
Once a group of people agreed on a sound/motion set, their relatives either
changed it or migrated away and went through strong/motion change processes on
their own, creating tongues.
The languages for robots is different. We have various programs like Java, Python,
C++, and Lisp.
takeoff and landing times, which are all calculated and displayed for smooth
plane operation.
• The navigation of maps and routes, as well as a thorough examination of
the entire cockpit panel to ensure that everything is in working order, was
carried out with the assistance of AI in the airways. This has produced very
effective and promising results, which is why it is recommended that it be
used frequently. AI’s main goal in the field of air transportation is to ensure
that humans have a comfortable and safe journey.
AI in Manufacturing industry:
In the manufacturing sector, AI has limitless potential. Everything from main-
tenance to replacing humans in certain tasks has been automated. AI allows
the quality of work to be improved while also increasing output. Microsoft,
for example, will transform all information so that workers can perform
better.
AI in organizational intelligence:
For businesses, a large amount of data is generated from customers, which
takes a long time to process and analyze. Traditional businesses and meth-
ods are failing due to technological and speed advancements. AI enables
companies to explore data, analyze data, and predict changes faster than
humans, allowing them to make quicker and more effective decisions.
AI in urban design:
AI aids in the development and planning of cities. There will be a massive
amount of data that needs to be analyzed; AI gathers large amounts of data
and aids in the organization and understanding of urban areas as they evolve.
AI data can express itself and show how growth has progressed in the past
and in the future, utilities required, safety, and so on.
AI in education:
The concept of education must evolve from generation to generation, and this
evolution is critical. People in the education industry are always asking where
changes are needed and how to make them (Chatterjee & Bhattacharjee, 2020).
AI has the potential to create a dynamic, systemized, and effective learning
environment for subjects, which could be a game-changer. AI teachers are
another example of useful advancements. AI can be a better tutor by showing
students visualizations in 3D to help them understand concepts better.
28 Artificial Intelligence and Knowledge Processing
AI in fashion:
With the help of AI, the world can better understand people’s buying patterns
and changing behaviors, as well as predict future fashion trends, which is a
huge step forward.
AI in supply management:
AI will be able to predict humans without judging but in a way with proper
risk analysis and find exact decisions even in difficult situations and in a
cost-effective manner. AI will be able to create more dependencies and com-
plicated data than humans. As a result, proper and effective decisions can
be made.
2.11 SINGULARITY
In terms of technology, singularity means a hypothetical future where tech-
nology is growing very rapidly and is out of control and irreversible. These
powerful technologies will change rapidly and unpredictably transform our
reality. Singularity would be applied to such advancements where it involves
computers and programs that are being advanced with the help of artificial
intelligence, which is created by humans. These changes would affect and cross
the boundary of humanity and computers. Nanotechnology is said to be one
of the important technologies that probably will make singularity come into
reality. This explosion will have a drastic impact on human civilization. These
computer programs and AI turn the machines and robots into super-intelligent
and high-cognitive-capacity machines that would be beyond human capability
and intelligence.
If the AI would return in a way that would destroy human civilization, singularity
has won its pace. When AI forms a society for itself, there is a high chance that it will
become stronger, and humans cannot destroy that vast technology. Humans create
AI, which can destroy its own kind, and this action is irrevocable. Once AI has its
own society, it is protected and cannot be destroyed.
Singularity would occur when computer programs improve to the point that AI
surpasses human intelligence, potentially erasing the human-computer divide. One
of the main technologies that will make singularity a reality is nanotechnology.
This expansion of intelligence will have a tremendous impact on human society.
These computer programs and AI will evolve into super-intelligent machines with
cognitive powers far exceeding those of humans.
Challenges that society needs to face when AI has its own society,
which would give way to singularity:
• Not all the effects are positive. AI has the potential to cause harm by leaking
private information. Several states and cities have banned the government
AI and Human Cognizance 29
REFERENCES
Admoni, H. and Scassellati, B. Social eye gaze in human-robot interaction: A review. Journal
of Human-Robot Interaction, 6(1), pp. 25–63, 2017.
Barrat, J. Our final invention: Artificial intelligence and the end of the human era. Thomas
Dunne Books, New York, 2013.
Cangelosi, A. Grounding language in action and perception: From cognitive agents to human-
oid robots. Physics of Life Reviews, 7(2), pp. 139–151, 2010.
Cangelosi, A. and Schlesinger, M. Development robotics: From babies to robotics. Cambridge,
Massachusetts, 2015.
Chatterjee, S. and Bhattacharjee, K.K. Adoption of artificial intelligence in higher education:
A quantitative analysis using structural equation modelling. Education and Information
Technologies, 25, pp. 3443–3463, 2020.
Christof Koch. Will machines ever become conscious?, 2019. https://www.scientificamerican.
com/article/will-machines-ever-become-conscious/
Dong, Y., Hou, J., Zhang, N. and Zhang, M. Research on how human intelligence, conscious-
ness, and cognitive computing affect the development of artificial intelligence. Complexity,
2020, pp. 1–10, 2020.
Dutoit, T. An introduction to text-to-speech synthesis, vol. 3. Springer Science & Business
Media, Berlin, 1997.
Mutlu, B., Yamaoka, F., Kanda, T., Ishiguro, H. and Hagita, N. Nonverbal leakage in robots:
Communication of intentions through seemingly unintentional behavior. In Proceedings
of the 4th ACM/IEEE international conference on Human robot interaction (pp. 69–76),
2009.
Nguyen, S.M. and Oudeyer, P.Y. Socially guided intrinsic motivation for robot learning of
motor skills. Autonomous Robots, 36, pp. 273–294, 2014.
Noda, K., Yamaguchi, Y., Nakadai, K., Okuno, H.G. and Ogata, T. Audio-visual speech recog-
nition using deep learning. Applied Intelligence, 42, pp. 722–737, 2015.
Wang, P. A constructive explanation of consciousness. Journal of Artificial Intelligence and
Consciousness, 7(2), pp. 257–275, 2020.
Wladawsky-Berger, I. The impact of artificial intelligence on the world economy. The Wall
Street Journal, (16), p. 11, 2018.
3 Integration of
Artificial Intelligence
with IoT for Smarter
Systems
A Review
R. Raj Mohan, N.P. Saravanan,
and B. Leander
DOI: 10.1201/9781003328414-3 31
32 Artificial Intelligence and Knowledge Processing
TABLE 3.1
Major Terms and Appropriate Acronyms for Smarter System with Relevant
Literature
Key Terms Acronym with Literature
AI A possible result to cognitive queries that are frequently related to human
cleverness, like learning, answering queries, and identifying patterns [6]
IoT Linked sensors organized by using the cloud, which enables them to
communicate with each other in a wireless manner [7]
An interconnection of things from sensors to smart devices [8]
Machine learning A quantity of design is created from data given to the machine and targeted to
make sense of prior unaware data and to develop a great agreement of the data
in terms of responsibilities like identifying images, speech, patterns, or
improving strategies [9]
Big data Handling a huge quantity of data, which is complex and reliant on various causes,
categorized by capacity, speed, diversity, and reliability [10]
Deep learning A technique of permitting computational models, including processing layers, to
acquire information depicted by various levels of knowledge [11]
Smart The use of independence, situation responsiveness, and connectivity to make a
gadget or thing smarter [8]
devices like tablets or smart mobile phones through BLE and allowed to send or
receive a small amount of data only.
Message protocols used for IoT: The IoT environment is generally extremely het-
erogeneous with a fusion of gadgets in a row with various kinds of message protocols
and using various methods of communication models. Tschofenig et al. [12] defined
four data or message transfer models: device to device, device to the cloud, device to
the gateway, and back-end data sharing. In device-to-device transfer, protocol design
problems that address interoperability need to be examined and ground-breaking
results established.
Major challenges:
• The rate of data that can be provided by the LP-WAN network,
• More devices will create a greater level of interference with each other,
• Even simple ALOHA channel access by the LP-end gadgets can decay the
performance of the system, and
• Interoperability is a major challenge, since there are no broad schemes or
tools in the open-source environment to install the LP-WAN.
3.10 CONCLUSIONS
Here with this review, it is realized that the IoT is going to offer a novel scheme of
technology and hardware with a chance to collect continuous information about all
of the physical actions of a process. Besides modern reasoning, AI helps the system
to be smarter when it is incorporated into it in an apt way based on the requirements
of the system. The smart system might include one or more other technologies like
big data and the cloud to enhance the process. In the current situation, many manu-
facturing industries and business enterprises are capturing the benefit of AI and IoT
knowledge to get rid of a logistics network and supply interruptions or lessen influ-
ence over the process. It is understood that AI-driven IoT units can track and gather
various forms of human interactions and investigations and review the information
38 Artificial Intelligence and Knowledge Processing
before moving from one gadget to another. The importance and the contributions
of AI implementation over the IoT-based smarter systems or CPS need to be real-
ized by all the business organizations, manufacturing, and production industries
since it is going to play a greater role in the life of everyone in the near future. This
review gives the benefits of LoRa technology for smarter home automation using
LP-WAN. The case study is discussed from different perspectives of the review to
tell apart AI-based systems for hardware or processor-controlled applications and
data handling applications. The rise of IoT is seen all over the world for creating a
relaxed and unchallenged living environment with the usage of smarter devices such
as smart sensing units, controllers, or actuators and plenty more devices. Finally, IoT
has reformed every facet of the lifecycle and attracted the eye of major researchers
into a brand-new type of lifestyle and also the standard of existing to remain with
implemented smart systems.
REFERENCES
[1] Kankanhalli, Atreyi, Yannis Charalabidis, and Sehl Mellouli. 2019. “IoT and AI for
Smart Government: A Research Agenda.” Government Information Quarterly 36 (2):
304–9. https://doi.org/10.1016/j.giq.2019.02.003.
[2] Ghosh, Ashish, Debasrita Chakraborty, and Anwesha Law. 2018. “Artificial Intelligence
in Internet of Things.” CAAI Transactions on Intelligence Technology 3 (4): 208–18.
https://doi.org/10.1049/trit.2018.1008.
[3] Xiaoping, Yang, Dongmei Cao, Jing Chen, Zuoping Xiao, and Ahmad Daowd. 2020.
“AI and IoT-Based Collaborative Business Ecosystem: A Case in Chinese Fish Farm-
ing Industry.” International Journal of Technology Management 82 (2): 151. https://doi.
org/10.1504/ijtm.2020.107856.
[4] Li, Wen, and Sami Kara. 2017. “Methodology for Monitoring Manufacturing Environ-
ment by Using Wireless Sensor Networks (WSN) and the Internet of Things (IoT).”
Procedia CIRP 61: 323–28. https://doi.org/10.1016/j.procir.2016.11.182.
[5] Chatfield, Akemi Takeoka, and Christopher G. Reddick. 2019. “A Framework for Inter-
net of Things-Enabled Smart Government: A Case of IoT Cybersecurity Policies and
Use Cases in U.S. Federal Government.” Government Information Quarterly 36 (2):
346–57. https://doi.org/10.1016/j.giq.2018.09.007.
[6] Ma, Wenting, Olusola O. Adesope, John C. Nesbit, and Qing Liu. 2014. “Intelligent
Tutoring Systems and Learning Outcomes: A Meta-Analysis.” Journal of Educational
Psychology 106 (4): 901–18. https://doi.org/10.1037/a0037123.
[7] Khaled, Mandour, and Salma Raja. 2019. “Smart Homes: Perceived Benefits and Risks
by Swedish Consumers.” Bachelor’s Thesis, Malmo University, Faculty of Technology
and Society, Sweden.
[8] Silverio-Fernández, Manuel, Suresh Renukappa, and Subashini Suresh. 2018. “What Is
a Smart Device?—a Conceptualisation within the Paradigm of the Internet of Things.”
Visualization in Engineering 6 (1). https://doi.org/10.1186/s40327-018-0063-8.
[9] Maria, Schuld, Ilya Sinayskiy, and Francesco Petruccione. 2015. “An Introduction to
Quantum Machine Learning.” Contemporary Physics 56 (2): 172–85. https://doi.org/10
.1080/00107514.2014.964942.
[10] Xindong, Wu, Xingquan Zhu, Gong-Qing Wu, and Wei Ding. 2014. “Data Mining with
Big Data.” IEEE Transactions on Knowledge and Data Engineering 26 (1): 97–107.
https://doi.org/10.1109/tkde.2013.109.
[11] LeCun, Yann, Yoshua Bengio, and Geoffrey Hinton. 2015. “Deep Learning.” Nature 521
(7553): 436–44. https://doi.org/10.1038/nature14539.
Integration of Artificial Intelligence with IoT 39
[12] Tschofenig, Hannes, and Emmanuel Baccelli. 2019. “Cyber physical Security for the
Masses: A Survey of the Internet Protocol Suite for Internet of Things Security.” IEEE
Security & Privacy 17 (5): 47–57. https://doi.org/10.1109/msec.2019.2923973.
[13] Mukhopadhyay, Subhas Chandra, Sumarga Kumar Sah Tyagi, Nagender Kumar Sury-
adevara, Vincenzo Piuri, Fabio Scotti, and Sherali Zeadally. 2021. “Artificial Intelligence-
Based Sensors for next Generation IoT Applications: A Review.” IEEE Sensors Journal
21 (22): 24920–32. https://doi.org/10.1109/jsen.2021.3055618.
[14] SAS. 2018. “Artificial Intelligence—What It Is and Why It Matters.” www.sas.com/
en_us/insights/analytics/what-is-artificial-intelligence.html.
[15] Yashodha, G., P. R. Pameela Rani, A. Lavanya, and V. Sathyavathy. 2021. “Role of Arti-
ficial Intelligence in the Internet of Things—a Review.” IOP Conference Series: Mate‑
rials Science and Engineering 1055 (1): 012090. https://doi.org/10.1088/1757-899x/
1055/1/012090.
[16] Laxmi, S., N. S. Rudra, K. Hemachandran, and K. N. Santosh. 2021. Machine Learning
Techniques in IoT Applications: A State of the Art. IoT Applications, Security Threats,
and Countermeasures, 105–117.
[17] “What Are LoRa and LoRaWAN?” The Things Network. Accessed July 9, 2022. www.
thethingsnetwork.org/docs/lorawan/what-is-lorawan.
[18] Md. Shahjalal, Moh. Khalid Hasan, Md. Mainul Islam, Md. Morshed Alam, Md. Faisal
Ahmed, and Yeong Min Jang. 2020. “An Overview of AI-Enabled Remote Smart—
Home Monitoring System Using LoRa.” IEEE, 510–13. https://ieeexplore.ieee.org/
document/9065199.
[19] Chen, J., K. Hu, Q. Wang, Y. Sun, Z. Shi, and S. He. 2017. “Narrowband Internet of
Things: Implementations and Applications.” IEEE Internet of Things Journal 4 (6):
2309–14. https://doi.org/10.1109/jiot.2017.2764475.
[20] Ikpehai, Augustine, Bamidele Adebisi, Khaled M. Rabie, Kelvin Anoh, Ruth E. Ande,
Mohammad Hammoudeh, Haris Gacanin, and Uche M. Mbanaso. 2019. “Low-Power
Wide Area Network Technologies for Internet-of-Things: A Comparative Review.” IEEE
Internet of Things Journal 6 (2): 2225–40. https://doi.org/10.1109/jiot.2018.2883728.
[21] Castro Tome, Mauricio de, Pedro H. J. Nardelli, and Hirley Alves. 2019. “Long-Range
Low-Power Wireless Networks and Sampling Strategies in Electricity Metering.”
IEEE Transactions on Industrial Electronics 66 (2): 1629–37. https://doi.org/10.1109/
tie.2018.2816006.
4 Influence of Artificial
Intelligence in Robotics
Pingili Sravya, Skanda S. Tallam, Shivani Prabhu,
and Anil Audumbar Pise
4.1 INTRODUCTION
A robot is usually treated as a tool, as it can perform only several limited or special-
ized functions; a welding robot can be said to be a perfect example of a robot. It has
a physical presence in the world but cannot adapt to fundamental world changes. If
some parts are to be welded by the robot, then clear instructions or plans must be
laid for the robot to complete the task. Otherwise, the robot does not function, as the
design procedure focuses on a robot that can execute a particular function, like how
a screwdriver is prepared to turn screws and hammers to hammer nails. The process
usually includes designing parts and fixtures to keep the details to make it more
comfortable for the robot.
So, the opposite of treating a tool is to treat it as an agent, and an agent can be
represented as an entity that can sense and effect change in the world.[1] So, it’s the
human’s job to improvise this tool so that it can adapt to changes in the world just like
the way humans do. An automatic vacuum cleaner is a perfect example of an intelli-
gent robot. Here, the vacuum cleaner acts as its agent to accomplish its task (vacuum-
ing). A vacuum cleaner can operate in various rooms with different sizes or layouts,
whereas an agent can adjust to any circumstance. When compared, an instrument
may be altered to fit strange events, like when the screwdriver handle is used to ham-
mer the nail, but the modification was not planned and is usually not the best option.
Science fiction films featuring robots that seem and behave like peers to humans have
helped to strengthen the agency perspective of robots. The design process is how an
agent or a tool can communicate with the world, especially by sensing and planning,
and it will adjust to new but equivalent situations or circumstances based on previous
experience. So, to achieve this state, artificial intelligence is operated to provide the
required functionality.
Figure 4.1 shows that robotics have both joint cognitive systems and artificial
intelligence (AI), which is a recent strategy that suggests that a robot is considered
a member of a human-machine unit where the intelligence is harmonious and ema-
nates from the contributions of each other. Because there is at least one robot and one
human agent on the team, it is repeatedly referred to as a mixed team. One example
of a joint cognitive system is self-driving automobiles, where drivers may switch on
and off the driving. The design procedure focuses on how the agents work together
and harmonize to achieve the unit’s objectives. Joint cognitive systems treat robots
differently from peer agents with distinct agendas.
environment. In this case, the cost of 1,000 cars is minimal, but moving the model
from the simulated environment to the actual world is tough.
A key differentiator in reinforcement learning is how well the agent has been
taught or instructed. Instead of looking at the data provided, the model interacts with
the environment and looks for ways to maximize rewards. In reinforcement learning,
neural networks take over the storage of experiences to improve task completion.
4.5 APPLICATION OF AI
The benefits of connecting AI and robots for industrial applications are already evi-
dent in many factories. In this consideration, there are infrequent challenges to over-
come. One of the main issues is the specialized talent needed to integrate AI into the
industry. There is still a gap between the professional AI community and industry
experts.
AI makes robots more efficient because they learn on their own to recognize new
situations or things. Today, however, robotics is mainly used in manufacturing and
many other fields, performing a wide range of movements more efficiently and pre-
cisely than humans. Robots perform incredible actions, including handling boxes
in warehouses that facilitate specific tasks. Here, we examine the applications of
AI robotics in different fields and the training data used to train a robot to make it
industry-ready.
of the time. In contrast, the curacy rate for human health condition detection
was 86%. This accuracy is underpinned by the fact that robots and AI can scan
thousands of cases simultaneously, looking for correlations between hundreds
of variables.
additional personnel, and virtual robots can be deployed at a nominal extra charge.
Virtual service robots (chatbots, virtual agents, and many more) can scale at almost
no additional cost. Such dramatic scalability applies to virtual service robots such
as chatbots and “visible” robots such as holograms. For example, airports have
hologram-based humanoid assistance robots used frequently to help passengers and
answer general questions (e.g., any flight information, check-in counters, directions
to airport hotels, and many more) that can be responded to in all typically used
languages. These holograms require affordable hardware (such as cameras, micro-
phones, speakers, and projectors) and do not take up any floor space (travelers are
encouraged to use luggage carts when the holograms are complete).
4.7 CONCLUSION
AI is the central part of new companies creating computational intelligence mod-
els. Debates are going on, saying that AI can be harmful or even have chances of
surpassing human intelligence; this might not be very comforting. However, there
are many uses, such as problem-solving, reasoning, and language comprehension.
AI makes our lives easier by carrying out activities like helping us navigate through
busy streets or having many conversations, which can run on any conventional com-
puter or phone. Also, there is no need to develop another type of computer or technol-
ogy because we already have the technology to integrate AI, which can support the
vast complexities of human intelligence. AI is the future in the technology field; yes,
it can be harmful or even surpass human intelligence, but only if it is not monitored
or if the program is not written clearly.
REFERENCES
[1] Murphy, R. R. (2019). Introduction to AI robotics, second edition (Intelligent robotics
and autonomous agents series) (Second edition). Bradford Books.
[2] Bullock, M. (2022, March 30). Artificial general intelligence in plain English—towards
data science. Medium. https://towardsdatascience.com/artificial-general-intelligence-
in-plain-english-e8f6e9a56555.
[3] GeeksforGeeks. (2019, October 29). Artificial intelligence in robotics. www.geeksfor
geeks.org/artificial-intelligence-in-robotics/.
[4] Escott, E. (2017, October 24). What are the 3 types of AI? A guide to narrow, general,
and super artificial intelligence. Codebots. https://codebots.com/artificial-intelligence/
the-3-types-of-ai-is-the-third-even-possible.
[5] Han, C., Sun, X., Liu, L., Jiang, H., Shen, Y., Xu, X., Li, J., Zhang, G., Huang, J., Lin, Z.,
Xiong, N., & Wang, T. (2016). Exosomes and their therapeutic potentials of stem cells.
Stem Cells International, 2016, 1–11. https://doi.org/10.1155/2016/7653489.
48 Artificial Intelligence and Knowledge Processing
[6] Towards Future Farming: How Artificial Intelligence is Transforming the Agriculture
Industry—Wipro. (2020). Wipro. www.wipro.com/holmes/towards-future-farming-
how-artificial-intelligence-is-transforming-the-agriculture-industry/#:%7E:text=AI%20
systems%20are%20helping%20to,to%20apply%20within%20the%20region.
[7] Kumar, S. (2021, December 12). Advantages and disadvantages of artificial intelligence.
Medium. https://towardsdatascience.com/advantages-and-disadvantages-of-artificial-
intelligence-182a5ef6588c.
5 A Review of Applications
of Artificial Intelligence
and Robotics in
the Medical and
Healthcare Sector
Pokala Pranay Kumar, Dheeraj Anchuri, Pusarla
Bhuvan Sathvik, and Raul V. Rodriguez
5.1 INTRODUCTION
In today’s world, people are fighting dangerous and unstoppable diseases. Compared
with 15 years ago, there has been a lot of improvement in every sector like med-
ical, finance, manufacturing, technology, textile, etc. This change has revolution-
ized human thoughts and made them think in different ways that weren’t imaginable
before. But many times this change saves the world. In the same way, evolution
changes happen in human life to save humans. Nowadays technology rules the world
and plays a major role in every sector. For example, in manufacturing, the past gen-
eration handled the work with humans or with human-controlled machinery. Now,
robots have introduced and changed the phase of the manufacturing sector with their
automation power. The same is true with the medical field. This field was the one that
has created huge opportunities to showcase an individual’s brilliant work. Previously,
there were no x-rays to scan fractures, there were no advanced medical facilities to
cure harmful diseases like a tumour, etc. But now we have medicine to cure advanced
levels of tumours and detect the fractured part easily using mobile applications or
software applications. The medical and healthcare sector are the fields that emerge
with the latest technology to help people in fighting diseases. Scientists are trying to
improve our biological activities which make us strong and fight harmful diseases
with our DNA. DNA helps in finding all solutions to our problems. Bioinformatics,
DNA sequencing, etc., are fields that involve both tech and medical synthesis. These
emerging technologies help to rescue humans and save their lives. For example,
COVID-19 ruled the world without medications. But with our advancements of tech-
nology like artificial intelligence (AI), machine learning (ML), robotics, etc., scien-
tists found a vaccine that helps people in creating antibodies that help to fight the
COVID virus.
DOI: 10.1201/9781003328414-5 49
50 Artificial Intelligence and Knowledge Processing
5.3.3 Robotics in Healthcare
Medical robots alter how operations are performed, streamline provision and disinfection,
and provide clinicians time to interact with patients. Intel provides a comprehensive tech-
nological portfolio for the advancement of robotic technology in medical care, including
careful, measured, administration, social, versatile, and free robots/independent robots [9].
“A model example is mechanical robots are utilized for the gathering of clin-
ical syringes (FANUC/Farason) or filling and shutting of vials (Stäubli/Zellwag
Pharmatech)” [11].
5.3.3.4 Radiology
Radiology is one of the important advances in the development of robots with
special relevance in view of high levels of radiation and human safety hazards.
Siemens’ twin robotized x-rays is an advancement in fluoroscopy, angiography, and
3D plotting healthcare. The expert may continuously observe 3D images in just one
room when the robot travels in comparison with the patient. It plays an enormous
role in x-rays. Regular 2D x-ray beams don’t consistently detect minimal bone
hairline ruptures; the objective is to support the accuracy by using a 3D figurative
scan view. A 3D image can be produced of a related design of an x-ray structure
that takes what is crucial for a framework computed tomography (CT) [13].
“Error-free Robotic cyberknife treatment (Cyberknife Exactness, Sunnyvale,
USA) is used in malignant growth patients with radiation treatment, it gives stereo-
tactic radiation (SRS) treatment and stereotactic body radiation (SBRT), innovative
automated accuracy therapies wherever in the body and coordinated synchroniza-
tions of movements in real-time” [13].
5.3.3.5 Nanorobots
Although we certainly have not achieved the stage of nanotechnology, trends are
growing increasingly significant towards it. As digestible and digital pills arise, we
are gradually moving towards nanorobots [14]. On this subject, Max Planck Institute
researchers experiment with robots that physically swim through the body fluids
and can be utilized to give medications or another medical relief to a very specific
degree—exceedingly micro-sized—less than 1 millimetre [14]. These scallop-like
microbots are intended to swim in non-Newtonian fluids like the bloodstream around
your lymph system or slippery goo on your skin surface or eyes. Despite its small
size, the origami robot is equally as stunning as a super-fortified carrier. The cap-
sule containing it is swallowed and disappears in the patient’s stomach. It can repair
wounds in the stomach lining or remove foreign materials, for instance, swallowed
toys, with the aid of the magnetic fields controlled by a technician [14].
genomic data, and electronic health records, to address a range of medical prob-
lems, such as a reduction in error diagnosis rates and a prediction of procedures [15].
between the wearables of consumer health and medical devices, a solitary wear-
able gadget may now monitor the scope of clinical danger factors [19]. Deep
learning is viewed as a vital component in breaking down this new kind of infor-
mation. There has been no major assessment using deep learning through a wear-
able sensor. A few researches dealt with telephone and clinical screen information,
“specifically, applicable examinations dependent on profound learning were done
on Human Movement Acknowledgment (HAR)” [19].
5.3.5.4 Telemedicine
Telemedicine has been in the industry for four decades and has advanced with time.
Online consultations, video conferences, wireless devices, etc., are exploited to the
maximum. Initial diagnosis, remote monitoring of the patients, and medical educa-
tion are also performed. Telesurgery robots perform the surgery on doctors’ com-
mands quickly and in real time without being present at the surgery location.
Personalized treatment is possible with telemedicine as it reduces patient visits
to the hospital and the cost of travel and admission. This also helps in monitoring
patients from anywhere in the world [27].
5.3.5.5 Opioid Abuse in the United States
This is a national health issue, and big data is finding its use in the healthcare sec-
tor for detection, diagnosis, prevention, and treatment of diseases and disorders and
helps in dealing with the social issues that need to be dealt with to maintain public
health—for instance, the prevention of opioid abuse in the United States.
Accidental deaths in the country have been the most common cause of death. The
experts used big data obtained from years of insurance and data from pharmacies.
A total of 742 risk factors have been identified, and those at risk have been alerted. This
saves lots of lives and money for the country [27]. There are many future experiments
and applications in the building stage that may change the face of the medical field.
5.4 CONCLUSION
The referred applications gave an overview of how technology has been utilized in
saving mankind. There would be many conclusions and solutions to unsolved medical
A Review of Applications of Artificial Intelligence 57
diseases which helps to improve medical and healthcare sectors in a way where every
individual will get medicine at less cost. These advancements help in cost reduction
and save the time of doctors as well as specialists. With the help of robotics in the
future people may be treated by robots. Robotic surgery is in the testing stage where
worldwide medical research institutes are focused to make this testing phase into
a physically deployed phase. Technology gives humankind a chance to live a long
healthy life.
ACKNOWLEDGEMENT
We would like to thank Dr Raul V. Rodriguez for guiding us and thank the authors
and organizations who provided valuable information that helped in the completion
of this chapter.
REFERENCES
[1] K. Shailaja, B. Seetharamulu, and M. A. Jabbar, “Machine learning in healthcare:
A review,” in Proceedings of the 2nd International Conference on Electronics, Com‑
munication and Aerospace Technology, ICECA 2018, Sep. 2018, pp. 910–914, doi:
10.1109/ICECA.2018.8474918.
[2] R. Miotto, F. Wang, S. Wang, X. Jiang, and J. T. Dudley, “Deep learning for healthcare:
Review, opportunities and challenges,” Brief. Bioinform., vol. 19, no. 6, pp. 1236–1246,
May 2017, doi: 10.1093/bib/bbx044.
[3] Michelle Rice, “The Growth of Artificial Intelligence (AI) in Healthcare,” HRS, 2019.
www.healthrecoverysolutions.com/blog/the-growth-of-artificial-intelligence-ai-
in-healthcare (accessed Jun. 29, 2021).
[4] A. Bohr and K. Memarzadeh, “The rise of artificial intelligence in healthcare applica-
tions,” in Artificial Intelligence in Healthcare, Elsevier, 2020, pp. 25–60.
[5] C. Toh and J. P. Brody, “Applications of machine learning in healthcare,” in Smart Manufacturing—
When Artificial Intelligence Meets the Internet of Things, IntechOpen, 2021.
[6] “Top 10 applications of machine learning in healthcare—FWS,” Flatworld Solutions,
2017. www.flatworldsolutions.com/healthcare/articles/top-10-applications-of-machine-
learning-in-healthcare.php (accessed Jun. 29, 2021).
[7] Olena Kovalenko, “12 Real-world applications of machine learning in healthcare—SPD
group blog,” SPD GROUP Blog, 2020. https://spd.group/machine-learning/machine-
learning-in-healthcare/ (accessed Jun. 29, 2021).
[8] Prashant Kathuria, “12+ Machine learning applications enhancing healthcare sector
2021 | upGrad blog,” upgrad, 2021. www.upgrad.com/blog/machine-learning-applications-
in-healthcare/ (accessed Jun. 29, 2021).
[9] “Robotics in healthcare: The future of medical care—intel,” Intel, 2020. www.intel.
com/content/www/us/en/healthcare-it/robotics-in-healthcare.html (accessed Jun.
29, 2021).
[10] P. Veerabhadram, “Applications of robotics in medicine—IJSER journal publication,”
IJSER J., 2011. www.ijser.org/onlineResearchPaperViewer.aspx?Applications-of-
Robotics-in-Medicine.pdf (accessed: Jun. 29, 2021).
[11] “The role of robots in healthcare—international federation of robotics,” IFR, 2021.
https://ifr.org/post/the-role-of-robots-in-healthcare (accessed Jun. 29, 2021).
[12] Mark Crawford, “Top 6 robotic applications in medicine—ASME,” ASME, 2016. www.
asme.org/topics-resources/content/top-6-robotic-applications-in-medicine (accessed
Jun. 29, 2021).
58 Artificial Intelligence and Knowledge Processing
[13] Z. H. Khan, A. Siddique, and C. W. Lee, “Robotics utilization for healthcare digitization
in global COVID-19 management,” Int. J. Environ. Res. Public Health, vol. 17, no. 11,
p. 3819, Jun. 2020, doi: 10.3390/ijerph17113819.
[14] “Benefits of robotics in healthcare: Tasks medical robots will undertake,” The Medi‑
cal Futurist, 2019. https://medicalfuturist.com/robotics-healthcare/ (accessed Jun. 29,
2021).
[15] Tommaso Buonocore, “Deep learning & healthcare: All that glitters ain’t gold | by tom-
maso buonocore | towards data science,” Towards Datascience, 2020. https://towards
datascience.com/deep-learning-in-healthcare-all-the-glitters-aint-gold-4913eec32687
(accessed Jun. 29, 2021).
[16] Y. Xu, H. Yao, and K. Lin, “An overview of neural networks for drug discovery and the
inputs used,” Expert Opinion on Drug Discovery, vol. 13, no. 12. Taylor and Francis Ltd,
pp. 1091–1102, Dec. 02, 2018, doi: 10.1080/17460441.2018.1547278.
[17] Meenu EG, “These are the top applications of deep learning in healthcare,” Analytics Insight,
2021. www.analyticsinsight.net/these-are-the-top-applications-of-deep-learning-
in-healthcare/ (accessed Jun. 29, 2021).
[18] A. Rajkomar et al., “Scalable and accurate deep learning with electronic health records,”
npj Digit. Med., vol. 1, no. 1, p. 18, Dec. 2018, doi: 10.1038/s41746-018-0029-1.
[19] R. Miotto, F. Wang, S. Wang, X. Jiang, and J. T. Dudley, “Deep learning for healthcare:
Review, opportunities and challenges,” Brief. Bioinform., vol. 19, no. 6, pp. 1236–1246,
May 2017, doi: 10.1093/bib/bbx044.
[20] K. Shameer, M. A. Badgeley, R. Miotto, B. S. Glicksberg, J. W. Morgan, and J. T. Dud-
ley, “Translational bioinformatics in the era of real-time biomedical, health care and
wellness data streams,” Brief. Bioinform., vol. 18, no. 1, pp. 105–124, Jan. 2017, doi:
10.1093/bib/bbv118.
[21] S. Liu, S. Liu, W. Cai, S. Pujol, R. Kikinis, and D. Feng, “Early diagnosis of Alzheimer’s
disease with deep learning,” in 2014 IEEE 11th International Symposium on Biomedical
Imaging, ISBI 2014, Jul. 2014, pp. 1015–1018, doi: 10.1109/isbi.2014.6868045.
[22] T. Brosch and R. Tam, “Manifold learning of brain MRIs by deep learning,” in Lecture
Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence
and Lecture Notes in Bioinformatics), 2013, vol. 8150 LNCS, no. PART 2, pp. 633–640,
doi: 10.1007/978-3-642-40763-5_78.
[23] Y. Yoo, T. Brosch, A. Traboulsee, D. K. B. Li, and R. Tam, “Deep learning of image
features from unlabeled data for multiple sclerosis lesion segmentation,” Lect. Notes
Comput. Sci. (including Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinformatics), vol.
8679, pp. 117–124, 2014, doi: 10.1007/978-3-319-10581-9_15.
[24] J. Z. Cheng et al., “Computer-Aided Diagnosis with Deep Learning Architecture: Appli-
cations to Breast Lesions in US Images and Pulmonary Nodules in CT Scans,” Sci. Rep.,
vol. 6, no. 1, pp. 1–13, Apr. 2016, doi: 10.1038/srep24454.
[25] S. Dash, S. K. Shakyawar, M. Sharma, and S. Kaushik, “Big data in healthcare: Manage-
ment, analysis and future prospects,” J. Big Data, vol. 6, no. 1, pp. 1–25, Dec. 2019, doi:
10.1186/s40537-019-0217-0.
[26] R. Pastorino et al., “Benefits and challenges of big data in healthcare: An overview of the
European initiatives,” Eur. J. Public Health, vol. 29, no. Suppl 3, pp. 23–27, Oct. 2019,
doi: 10.1093/eurpub/ckz168.
[27] Sandra Durcevic, “18 Examples of big data in healthcare that can save people,” Datap‑
ine, 2020. www.datapine.com/blog/big-data-examples-in-healthcare/ (accessed Jun. 30,
2021).
[28] R. V. Rodriguez, P. S. Sairam, and K. Hemachandran (eds.), Coded Leadership: Develop‑
ing Scalable Management in an AI-induced Quantum World, CRC Press, India, 2022.
6 Impact of the AI-Induced
App ‘Babylon’ in the
Healthcare Industry
Satheesh Kumar M, Tanmai Sree Musalamadugu,
Neelam Kumari, and Raul V. Rodriquez
6.1 INTRODUCTION
Artificial information can be used in human management organization planning and
resource assignment in successful and social idea associations. Harrow Council, for
instance, oversees the IBM Watson Care Manager framework to improve cost pro-
ductivity. It links users with an idea supplier who manages their issues within their
allocated idea budget. In addition to organizing original ideas and claims, it provides
informational nuggets for dynamically effective use of the clerk’s benefits. Similar
techniques are employed to enhance the permissive experience using PC-based knowl-
edge. In order to facilitate patient engagement, IBM Watson and Birch Hey Children’s
Hospital in Liverpool are developing a “psychology clinic” that combines an app.
Prior to a visit, the app will detect productive weights, provide information upon
request, and provide doctors with data to aid in prescribing the proper medications.
Artificial intelligence (AI) can be used to deconstruct and identify patterns in
large and complex data sets faster and more conclusively than was previously possi-
ble. It can also be used to combine different types of information, such as to help val-
idate findings and identify authentic sources for further study. The AR Compartment
Database from the Institute for Cancer Research uses AI to predict new pain med-
ication possibilities by fusing genetic and clinical data from patients with insight-
ful research data. To make the process of finding a cure quicker and more logical,
researchers have developed an AI “robot tester” called Eve (K. Williams, 2015).
By conducting clinical trials with appropriate patients, AI structures used in human
organizations could serve as the foundation for remedial research.
Clinical care—Artificial understanding is currently being trialed in some emer-
gency rooms in the UK to assist in the assessment of disabilities. The same could be
said for the use of AI to query clinical data, investigate discrepancies, and implement
standards (E. L. Siegel 2013).
The following are some potential uses of AI in clinical research:
• Medical imaging and restorative options have long been purposefully gath-
ered and addressed, and they are now easily accessible for developing AI
systems. AI could speed up target treatment while reducing the cost and
DOI: 10.1201/9781003328414-6 59
60 Artificial Intelligence and Knowledge Processing
time associated with isolating yields. This could potentially allow for more
options to be pursued. When it comes to diagnosing conditions like pneu-
monia, skin and chest cancer, and eye conditions, PC-based insight has
shown promising results (D. Wang, 2016).
• Echocardiography—The Ultromics structure, tested at the John Radcliffe
Hospital in Oxford, uses AI to dissect channels used in echocardiography
that detect certain kinds of pulses and monitor coronary disease.
• Neurological condition screening—AI tools are being developed to help
advisors anticipate disturbing scenes and monitor and manage the side
effects of neurological conditions like Parkinson’s disease.
• Surgery—In research, mechanical tools with AI restrictions have been used
to finish certain tasks in the keyhole clinical system, like fastening packs to
close wounds.
The start-up claims that in its own tests, the AI structure was spot on 80 percent
of the time and that the tool was never designed to completely overcome the direc-
tion of a certifiable authority, just to reduce dwell time and help with selection and
making dynamically accurate decisions. The world is facing an excessive absence of
authorities and therapeutic specialists, and the technologies offered by Babylon are
one way to deal with the help that will improve the social protection of a large num-
ber of people. As indicated by NHS England, “every safety case [of Babylon] meets
the rules required by the NHS and has been carried out using a sound assessment
strategy to a heightened desire.”
While it probably won’t be a perfect structure, Babylon shows that human think-
ing has advanced adequately to work closely with restorative administration special-
ists and can be a valuable tool. In any case, despite everything, patients need to be
their own advocates for social protection. In the event that a direction obtained from
human-made thinking doesn’t seem to hit the mark, it’s helpful to ask for a resulting
guess—from a human.
6.10 CONCLUSION
Artificial awareness pressures are employed or observed for some welfare and
research purposes, including the detection of disease symptoms, the management
of prevailing conditions, the creation of successful associations, and the display of
pharmaceuticals. However, they may be constrained by the possibility of success-
related information being available and the lack of AI that possesses some human
traits, such as compassion. Reproducible advances in knowledge can help to
solve enormous difficulties. The use of AI raises two or three significant societal
issues, the basic number of which has increased as a result of issues brought on
by the use of information and the expansion of human organizations even further.
The ability for AI to be developed and used in a manner that is immediate and
appropriate to the interests of general society, while enlivening and encouraging
improvement in this endeavor, will be a key test for the future amalgamation of
AI advancement types.
BIBLIOGRAPHY
Insights, C.B. 2017. AI, healthcare & the future of drug pricing, s.l.: s.n.
Jacobsmeyer, B. 2012. Focus: Tracking down an epidemic’s source phyiscs, s.l.: s.n.
Moore, S.F. 2018. Harnessing the power of intelligent machines to enhance primary care,
s.l.: s.n.
Release, I.P. 2017. Arthritis research UK introduces IBM watson-powered ‘virtual assistant’ to
provide information to people with Arthritis. s.l., s.n.
66 Artificial Intelligence and Knowledge Processing
Shafner, L. 2017. Evaluating the use of an artificial intelligence platform on mobile dveices to
measure and support tuberculosis medication adherence, s.l.: s.n.
Siegel, E.L. 2013. Artificial intelligence in medicine and cardiac imaging, s.l.: s.n.
Wang, D. 2016. Deep learning for identifying metastatic breast cancer, s.l.: s.n.
Williams, K. 2015. Cheaper faster drug development validated by the reposotioning of drugs
against neglected tropical diseases, s.l.: s.n.
7 Identification and
Prediction of
Pneumonia from
CXR Images Using
Deep Learning
Paul Nirmal Kumar K, Raul V. Rodriguez, and
Pokala Pranay Kumar
7.1 INTRODUCTION
The condition of inflammation of the lungs, which happens mostly in the alve-
oli air sacs, is called pneumonia [1]. This is generally caused by infection with
viruses and other microorganisms and even bacteria. A single agent causing the
disease is not able to be isolated through highly careful testing in around half
of the cases. The chances of getting affected by pneumonia is high when people
have a history of certain diseases such as asthma, diabetes, chronic obstructive
pulmonary disease (COPD), sickle cell disease (SCD) or even a weak immune
response, smoking, alcoholism, exposure to air pollution, malnutrition, and
poverty [2]. This disease presents with symptoms such as difficulty breath-
ing, cough, sharp chest pain, and fever. Pneumonia causes around 4.5 million
premature deaths annually and around 120 million people get infected [3].
Based on the agent that causes it, pneumonia is classified into the following
classifications:
DOI: 10.1201/9781003328414-7 67
68 Artificial Intelligence and Knowledge Processing
Viral Pneumonia: Viruses utilize droplets which advance through the mouth
and nose while a person is inhaling to enter the lungs. Once they reach
the alveoli in the lungs, viruses start invading other cells and try to kill
them. When the immune system responds to these invasive viruses, the
conditions gets even worse and eventually leads to an unbalanced oxygen
supply. Viruses have caused about one-third of pneumonia cases in adults.
When taken in a wider context, the contingency with pneumonia caused
by viruses may get out of control when there are virus outbreaks such as
coronavirus.
Fungal Pneumonia: Fungal pneumonia is seen mostly in people with a weak
immune system, for example, due to acquired immune deficiency syndrome
(AIDS) and other health issues. This kind of pneumonia can be even caused by
a hidden infection which later arises due to other factors.
Parasitic Pneumonia: Parasitical pneumonia is scarce when compared to other
causes of pneumonia. Parasites enter the body through direct contact with skin
and even through inhalation. After entering the body, these parasites travel to
the lungs and cause inflammation and an imbalanced oxygen supply.
The confirmation of this disease’s diagnosis is generally done either by blood tests,
physical examination or chest x-rays (CXRs) [4–6].
Physical examination: This includes notably a higher heart rate, decreased oxy-
gen in the body or even crackles heard through a stethoscope. It can also be deter-
mined if there is a low expansion of the chest while breathing.
CXR: The CXR is the most commonly used method for the diagnosis of chest-
related diseases. In the case of pneumonia, it is hard to determine the diagnosis through
CXR when there is a case of dehydration, obesity, and other lung-related health
issues. Based on CXRs, pneumonia is further classified into the following types:
Most commonly, the classification of pneumonia is done on the basis of the place it
was acquired so as to identify the agents that are the suspects for the cause of the
disease and treat the patient accordingly. This classification is done in two forms,
i.e. community-acquired pneumonia (CAP) and health care–associated pneumonia
(HCAP).
CAP: When a person acquires pneumonia outside the health care system, it is
called CAP. This is treated by using antibiotics that kill the infecting organisms.
HCAP: When a person acquires pneumonia within the health care system, such
as hospitals and other medical clinics, then it is called HCAP. This type is also called
medical care–associated pneumonia (MCAP).
Artificial intelligence refers to amplifying the most powerful phenomenon in
the world, i.e. ‘intelligence’. In this field of technology, there has been a huge
advancement in the areas of image recognition and computer vision, which use
images as the data. When artificial neural networks and algorithms inspired by
the human brain learn from huge amounts of data that are unstructured, highly
diverse and also inter-related, then it is referred to as deep learning. A deep
learning algorithm is designed to draw conclusions similar to humans by repeat-
edly analyzing data, and this algorithm is structured logically. The accuracy of a
prediction can be determined by a functional deep learning algorithm itself. The
performance of these algorithms gets better with further learning of the data. By
performing a certain task repeatedly, these deep learning algorithms learn from
the experience by slightly modifying the task every time. The number of neural
network layers are several and deep, hence the name deep learning. Since data
is the fuel for these deep learning algorithms, the enormous amount of data we
are producing paves the way for the possibility of such intelligent technology.
This revolutionary technology has aided a lot of industries, and health care is
one of them. Here, in this chapter, we use deep learning algorithms and convo-
lutional neural networks (CNNs) to classify the CXR images into either ‘pneu-
monia’ or ‘normal’ using the data provided by Mendeley Datasets—‘Labeled
Optical Coherence Tomography (OCT) and Chest X-Ray (CXR) Images for
Classification’.
specific folders, namely ‘train’ and ‘test’. Each of these folders contains two folders,
namely ‘Normal’ and ‘Pneumonia’. The ‘Normal’ folder contains CXR images with
no pneumonia, and the ‘Pneumonia’ folder contains CXR images with pneumonia.
In the ‘train’ folder, there are 1,349 normal CXR images and 3,883 pneumonia CXR
images, whereas in the ‘test’ folder, there are 234 normal CXR images and 390 pneu-
monia images. We further split the data into another folder, namely ‘val’ alongside
‘test’ and train’, in order to be able to check the validity of our model. This ‘val’
folder has been given two folders, which are ‘Normal’ and ‘Pneumonia’, with eight
respective CXR images in each of them. All the previously mentioned images are
in the JPEG format and gray color space. The dataset is collected from the website
http://dx.doi.org/10.17632/rscbjbr9sj.2. The final dataset and its contents are repre-
sented in Figure 7.1.
• Convolution: Here, the very first layer is the convolution layer, which is
used to extract features from images in the case of CNNs. This convolution
layer is applied on the input image matrix using a filter matrix, which results
in an activation, and when this task is repeated, it produces a map of activa-
tions called a feature map. The height and width of a filter matrix are chosen
Identification and Prediction of Pneumonia 71
to be smaller than that of an input matrix [8, 9]. When an image is convolved
with different filters or kernels, it results in different operations such as blur,
sharpen, edge detection and image identification and then creates the feature
map accordingly (Figure 7.3).
• Strides: This is the number of steps the filter or convolution kernel has to
shift over the input image, i.e. the filter matrix shifts 1 pixel when the stride
is 1 and by default the stride is set to 1 (Figure 7.4). We can change the value
of the stride accordingly.
• Padding: When the filter matrix moves over the input matrix, the corner
values are not given the same weight as the other corresponding values.
Hence, in order to overcome this, padding is used. Padding is of two types—
zero padding and valid padding. In zero padding, the input matrix is added
with zeros on the corners, and in valid padding, a part of the input matrix is
deleted.
• Activation function: The decision of whether a neuron should be
activated or not is taken by the activation function by calculating the
72 Artificial Intelligence and Knowledge Processing
FIGURE 7.4 Shifting the filter matrix over the input matrix when the stride is 1.
weighted sum and further adding bias with it. There are different types
of activation functions to be chosen from, such as Linear Activation
Function, Sigmoid Activation Function, ReLU Activation Function and
Tanh Activation Function.
• Pooling: In order to extract the most important feature from the fea-
ture map, pooling is used. Pooling is a process where the feature map is
sub-sampled in order to get only the important features. The process of
pooling consists of three methods—Max Pooling, Average Pooling and
Sum Pooling. In Max Pooling, we take the largest value from the feature
map into the spatial neighborhood. In Average Pooling, we take the aver-
age of all the specified values of the feature map into the spatial feature
map. In Sum Pooling, we take the sum of all the specified values of the
feature map into the spatial map.
With the formation of the convolutional layer and the pooling layer, a
convolutional block is formed.
Identification and Prediction of Pneumonia 73
• Flatten layer: After pooling is done for all the feature maps, a number of
pooled feature maps are produced. In order to feed these features to a neural
network, we use the process of flattening the pooled feature maps, which
results in a vector with all the features (Figure 7.5).
• Fully connected layer: A fully connected layer is a collection of layers of
neurons forming a neural network. This is the last layer of the CNN archi-
tecture. The vector values formed in the flatten layer are fed to these neural
networks, which are totally connected to one another. These fully connected
layers have individual activation functions which are based on the problem
[10–13]. This forms a whole CNN model.
1. Before creating a CNN deep learning algorithm, the model first uses
data augmentation and data generation for the effectiveness of the algo-
rithm. Data augmentation is a process to virtually increase the size of
the data that is to be trained. This process helps to advance the data with
more diversity by using various data augmentation techniques such as
rotating, flipping and cropping the images. Data generation is a process
of creating a certain amount of data batches that are inserted in the net-
work. This batch size is generally 32, 64 or 128. This batch size must be
chosen based on the resources used for computation and accordingly to
the model performance [14–16]. The current model uses a size of 64 per
batch.
2. Now, for the building of the algorithm, the model consists of five convolu-
tional blocks in which the number of filters for each block are 16, 32, 64,
128 and 256, respectively. With the kernel or filter size of a 3 × 3 matrix,
padding as zero padding, batch normalization, activation function as ReLU,
pooling as maximum pooling and dropout functions in order to decrease
the overfitting, these convolutional blocks are formed. While compiling, the
model uses the Adam optimizer and binary cross-entropy loss. Adam is an
algorithm that optimizes the model by updating the weights of the neural
networks. This optimizer is an extension of a gradient descent optimizer.
It is named ‘Adam’ since it works on the adaptive moment estimation of
network weights.
3. Once the CNN is created, the model uses callbacks such as model check-
points and early stoppings in order to minimize loss, overfitting and under-
fitting. Model checkpoint helps the model to save a copy where it seems to
be efficient. This forms a checkpoint where the model has performed the
best in the iterative process. Early stopping is a process to stop the training
iterations when the model notices an increasing generalization difference
[17, 18]. When the model identifies such a difference, it calls back to the
model checkpoint where the training had performed well.
74 Artificial Intelligence and Knowledge Processing
7.5 EVALUATION
In this model, which is developed on the Python programming language, when the
model accuracy and the loss of the model are created, it is seen that with the increase
in epochs, the model loss is decreasing and the model accuracy is increasing, and
hence the train and validation values are seen to be converging. After creating the
confusion matrix for the model, it emphasizes the accuracy of the model as around
about 91% (Figure 7.6).
The confusion matrix of the model shown in Figure 7.7 shows how the model
has performed and gives different metrics such as accuracy, precision, recall and
F1-score. Precision is the ratio of correctly predicted positives to the total pos-
itives predicted. Recall is the ratio of correctly predicted positives to the total
positive examples in the data [21]. F1-score is the combined metric of precision
and recall.
7.6 CONCLUSION
Considering the size of the data used, which was relatively small, the accuracy
gained is comparatively good. Therefore, this deep learning technique can be used
to classify CXR images to identify and detect pneumonia, which can help patients
save time and be prepared for their treatment. Moreover, it is seen that with further
REFERENCES
[1] Nonspecific Interstitial Pneumonia (NSIP) https://my.clevelandclinic.org/health/dise
ases/14804-nonspecific-interstitial-pneumonia-nsip.
[2] India has one doctor for every 1,457 citizens: Govt www.business-standard.com/article/
pti-stories/india-has-one-doctor-for-every-1-457-citizens-govt-119070401127_1.html.
[3] Yuan Tian: Detecting Pneumonia with Deep Learning https://becominghuman.ai/
detecting-pneumonia-with-deep-learning-3cf49b640c14.
[4] Chest radiograph https://en.wikipedia.org/wiki/Chest_radiograph.
[5] X-ray (Radiography)—Chest www.radiologyinfo.org/en/info.cfm?pg=chestrad.
[6] Chest X-rays www.mayoclinic.org/tests-procedures/chest-x-rays/about/pac-20393494.
[7] Deep learning vs machine learning: A simple way to understand the difference www.
zendesk.com/blog/machine-learning-and-deep-learning/.
[8] What Is Deep Learning AI? A Simple Guide With 8 Practical Examples www.
forbes.com/sites/bernardmarr/2018/10/01/what-is-deep-learning-ai-a-simple-guide-
with-8-practical-examples/#454415868d4b.
[9] Deep learning https://en.wikipedia.org/wiki/Deep_learning.
[10] Understanding of Convolutional Neural Network (CNN)—Deep Learning https://
medium.com/@RaghavPrabhu/understanding-of-convolutional-neural-net-
work-cnn-deep-learning-99760835f148.
[11] A Basic Introduction to Convolutional Neural Network https://medium.com/@hima
drisankarchatterjee/a-basic-introduction-to-convolutional-neural-network-8e39019
b27c4.
[12] How Do Convolutional Layers Work in Deep Learning Neural Networks? https://
machinelearningmastery.com/convolutional-layers-for-deep-learning-neural-networks/.
[13] Convolutional Layer—Science Direct www.sciencedirect.com/topics/engineering/
convolutional-layer.
[14] Activation functions in Neural Networks www.geeksforgeeks.org/activation-functions-
neural-networks/.
[15] Convolutional Neural Networks (CNN): Step 3—Flattening www.superdatascience.
com/blogs/convolutional-neural-networks-cnn-step-3-flattening.
76 Artificial Intelligence and Knowledge Processing
[16] Gentle Introduction to the Adam Optimization Algorithm for Deep Learning https://
machinelearningmastery.com/adam-optimization-algorithm-for-deep-learning/.
[17] Deep Learning for Detecting Pneumonia from X-ray Images https://towardsdatascience.
com/deep-learning-for-detecting-pneumonia-from-x-ray-images-fc9a3d9fdba8.
[18] Training a CNN to detect Pneumonia https://medium.com/datadriveninvestor/training-
a-cnn-to-detect-pneumonia-c42a44101deb.
[19] Epoch—DeepAI https://deepai.org/machine-learning-glossary-and-terms/epoch.
[20] Epoch vs Batch Size vs Iterations https://towardsdatascience.com/epoch-vs-iterations-
vs-batch-size-4dfb9c7ce9c9.
[21] Idiot’s Guide to Precision, Recall, and Confusion Matrix www.kdnuggets.com/2020/01/
guide-precision-recall-confusion-matrix.html.
8 Pulmonary Cancer
Detection Using
Deep Convolutional
Networks
Tejaswini Aala, Varadaraja Krishna, Megha Gada,
Raul V. Rodriguez, and Abdullah Y. Muaad
8.1 INTRODUCTION
Pulmonary cancer is one of the major causes of cancer deaths across the globe
due to its aggressive nature and delayed detection at advanced stages. If pul-
monary cancer can be detected at earlier stages, the number of deaths due to
cancer will drastically decreases. Cancer starts to grow when cells in the body
grow out of control. Cancerous, i.e., malignant, and noncancerous, i.e., benign,
pulmonary nodules are the small growths of cells inside the lung. Early detec-
tion of cancers is similar to a noncancerous nodule, which makes differentiation
difficult. The diagnosis is based on slight morphological changes, locations, and
clinical biomarkers. Various diagnostic procedures are used by physicians for the
early diagnosis of malignant lung nodules, such as clinical settings, computed
tomography (CT) scan analysis (morphological assessment), positron emission
tomography (PET) (metabolic assessments), and needle prick biopsy analysis [1].
Mostly invasive methods such as biopsies or surgeries are used by healthcare
practitioners to differentiate between benign and malignant lung nodules. For
such a fragile and sensitive organ, invasive methods involve lots of risks and
increase patient anxiety.
DOI: 10.1201/9781003328414-8 77
78 Artificial Intelligence and Knowledge Processing
as size, location, shape, adjacent structures, edges, and density, which can increase
the workload of a radiologist significantly to screen a CT scan for the possible exis-
tence of a nodule.
This work done by radiologists can be done by a machine, which can give an accu-
rate result in less time. The tool which can assist the radiologist is computer-aided
detection (CAD). These systems are basically designed to reduce the work for the
radiologist and increase the nodule detection rate. However, the present-day genera-
tion of CAD systems also helps in the screening process by differentiating between
benign and malignant nodules [2, 3].
In Figure 8.1 it is very easy to detect cancer, but in Figure 8.2 the radiologist will
find it difficult to determine if the person has cancer or not.
With the advanced deep neural networks, especially in image analysis, CAD
systems are consistently outperforming expert radiologists in both nodule detec-
tion and localization tasks. The results from various researchers show a broad
range of detection from 38% to 100%, with a false-positive (FP) rate from 1 to 8.2
per scan by the CAD systems. The categorization between benign and malignant
nodules is still a challenging problem, however, due to the very close resemblance
at early stages [4].
FP reduction phase requires a dataset of labeled true and false nodule candidates,
and the nodule malignancy prediction phase requires a dataset with nodules labeled
with malignancy.
True/false labels, i.e., generally denoted as 0/1 for nodule candidates and malig-
nancy labels for nodules, are sparse for lung cancer and may be nonexistent for some
other cancers, so CAD systems that rely on such data would not apply to other can-
cers. In order to achieve greater computational efficiency and generalizability to
other cancers convolution neural networks are better.
The presently available CAD system has a shorter pipeline and only requires the
following data during training: a dataset of CT scans with true nodules labeled, and
a dataset of CT scans with an overall malignancy label and also starts with prepro-
cessing the 3D CT scans using segmentation, normalization, downsampling, and
zero-centering. The preliminary approach was to simply input the preprocessed 3D
CT scans into 3D CNNs, but the results were poor. So, an additional preprocessing
was performed to input only regions of interest into the 3D CNNs. To identify regions
of interest, a convolutional U-Net was trained for nodule candidate detection. Then
input regions around nodule candidates detected by the U-Net were fed into 3D CNNs
to ultimately classify the CT scans as positive or negative for lung cancer. The overall
architecture is shown in Figure 8.5.
For each patient, pixel values were first converted in each image to Hounsfield
units (HU), a measurement of radiodensity, and 2D slices are stacked into a single
3D mage. Because malignancy forms on lung tissue, segmentation is used to mask
out the bone, outside air, and other substances that would make data noisy and leave
only lung tissue information for the classifier. A number of segmentation approaches
were tried, including thresholding, clustering (K-means and mean shift), and water-
shed. K-means and mean shift allow very little supervision and did not produce good
qualitative results. Watershed produces one of the best qualitative results, but takes
too long to run to use by the deadline. Ultimately, thresholding was used. After seg-
mentation, the 3D mage is normalized by applying the linear scaling to squeeze all
pixels of the original unsegmented image to values between 0 and 1. Spline interpo-
lation downsamples each 3D image by a scale of 0.5 in each of the three dimensions.
Finally, zero-centering s performed on data by subtracting the mean of all the mages
from the training set (Figure 8.6).
FIGURE 8.6 (a) Histograms of pixel values in HU for sample patients’ CT scans at various
slices. (b) Corresponding 2D axial slices.
Pulmonary Cancer Detection Using Deep Convolutional Networks 83
8.6 THRESHOLDING
Air is generally around –1000 HU; lung tissue is typically around –500; water, blood,
and other tissues are around 0 HU; and bone is typically around 700 HU, so pixels
that are close to –1000 or above –320 are masked (Figure 8.7).
8.7 WATERSHED
The segmentation obtained from thresholding has a lot of noise. Many voxels, i.e.,
each of an array of elements of volume that constitute a notional 3D space, especially
each of an array of discrete elements into which a representation of a 3D object
is divided) that were part of lung tissue, especially voxels at the edge of the lung,
FIGURE 8.7 (a) Sample patient 3D mage with pixels values greater than 400 HU reveals the
bone segment. (b) Sample patient bronchioles within lung. (c) Sample patient initial mask with
no air. (d) Sample patient final mask in which bronchioles are included.
84 Artificial Intelligence and Knowledge Processing
tended to fall outside the range of lung tissue radiodensity due to CT scan noise.
This means that our classifier will not be able to correctly classify images in which
cancerous nodules are located at the edge of the lung [5]. To filter noise and include
voxels from the edges, we use driven watershed segmentation. Qualitatively, this
produces a much better segmentation than thresholding. Missing voxels (black dots
in Figure 8.8) are largely reincluded. However, this is much less efficient than basic
thresholding, so due to time limitations, it was not possible to preprocess all CT
scans using watershed, so thresholding was used instead. The typical radiodensities
in HU of various substances in a CT Scan are shown in Table 8.1.
TABLE 8.1
Typical Radiodensities in HU of Various Substances in a CT Scan
Substance Radiodensity (HU)
Air –1000
Lung tissue –500
Water and blood 0
Bone 700
FIGURE 8.9 (a) U-Net sample input from LUNA16 validation set. Note that the image has
the largest nodule from the LUNA16 validation set, which we chose for clarity; most nodules
are significantly smaller than the largest one in this mage. (b) U-Net predicted output from
LUNA16 validation set. (c) U-Net sample labels mask from LUNA16 validation set showing
nodule location.
able to declare images with nodules as detected by U-Net are positive for lung cancer
and images without any nodules detected by U-Net are negative for lung cancer [6].
However, as shown in Figure 8.9c, U-Net produces a strong signal for the actual nod-
ule, but also produces a lot of FPs, so we need an additional classifier that determines
the malignancy. The model parameter is shown in Table 8.2.
A 3D CNN was used as a linear classifier. It uses weighted SoftMax cross entropy
loss (weight for a label the inverse of the frequency of the label in the training set)
and Adam Optimizer, and the CNNs use ReLU activation and dropout after each
convolutional layer during training [7, 8].
8.9 RESULTS
For pulmonary, i.e., lung nodule detection using CT imaging, CNNs have recently
been used as a feature extractor within a larger CAD system. For simplicity in train-
ing and testing we selected the ratings of a single radiologist. All the test experi-
ments were done using 50% training set, 20% validation set, and 30% testing set. To
estimate the results, we considered a variety of testing metrics. The accuracy metric
86 Artificial Intelligence and Knowledge Processing
TABLE 8.2
Model Parameter
Layer Params Activation Output
Input 256 × 256 × 1
Convla 3 × 3 × 32 ReLu 256 × 256 × 32
Convlb 3 × 3 × 32 ReLu 256 × 256 ×32
Max Pool 2 × 2, stride 2 128 × 128 × 32
Conv2a 3 × 3 × 80 ReLu 128 × 128 × 80
Conv2b 3 × 3 × 80 ReLu 128 × 128 × 80
Max Pool 2 × 2, stride 2 64 × 64 × 80
Conv3a 3 × 3 × 160 ReLu 64 × 64 ×160
Conv3b 3 × 3 × 160 ReLu 64 × 64 × 160
Max Pool 2 × 2, stride 2 32 × 32 × 160
Conv4a 3 × 3 × 320 ReLu 32 × 32 × 320
Conv4b 3 × 3 × 320 ReLu 32 × 32 × 320
Up Conv4b 2×2 64 × 64 × 320
Concat Conv4b, Conv3b 64 × 64 × 480
Conv5a 3 × 3 × 160 ReLu 64 × 64 × 160
Conv5b 3 × 3 × 160 ReLu 64 × 64 × 160
Up Conv5b 2×2 128 × 128 × 160
Concat Conv5b, Conv2b 128 × 128 × 240
Conv6a 3 × 3 × 80 ReLu 128 × 128 × 80
Conv6b 3 × 3 × 80 ReLu 128 × 128 × 80
Up Conv6b 2×2 256 × 256 × 80
Concat Conv6|b, Convlb 256 × 256 × 112
Conv6a 3 × 3 × 32 ReLu 256 × 256 × 32
Conv6b 3 × 3 × 32 ReLu 256 × 256 × 32
Conv7 3×3×3 256 × 256 × 2
was the used metric in our evaluations. In our first set of experiments we considered
a range of CNN architectures for the binary classification task. Early experimenta-
tion suggested that the number of filters and neurons per layer were less significant
than the number of layers. Thus, to simplify analysis the first convolutional layer
used seven filters with size 5 × 5 × 5, the second convolutional layer used 17 filters
with 5 × 5 × 3, and all fully connected layers used 256 neurons. These were found to
generally perform well, and we considered the impact of one or two convolutional
layers followed by one or two fully connected layers. The networks were trained
as described earlier and the results of these experiments can be found. Our results
suggest that two convolutional layers followed by a single hidden layer are one of the
optimal network architectures for this dataset. Figure 8.10 shows the training error
for the 3D CNN.
Pulmonary Cancer Detection Using Deep Convolutional Networks 87
Predicted
Abnormal Normal
Actual
Abnormal 0.853 0.147
Normal 0.119 0.881
The results are shown in figure 8.11 by confusion matrix achieved on the DSB
dataset with 3D CNN. The accuracy of model is 86.6%, and the misclassification
rate is 13.4%, the FP rate is 11.9%, and the false-negative rate is 14.7%. Almost all
patients are classified correctly. Secondarily, there is an enhancement on accuracy
due to efficient U-Net architecture and segmentation.
88 Artificial Intelligence and Knowledge Processing
8.10 CONCLUSION
In this chapter we developed a deep CNN architecture to detect nodules in patients
with lung cancer and detect the interest points using U-Net architecture. This is a
prheprocessing step for 3D CNN. The 3D CNN model performed the best on the
test set and produced results accordingly. While we achieve state-of-the-art perfor-
mance in area under the curve (AUC) of 0.83, we performed well, considering that
we use less labeled data than most state-of-the-art CAD systems. As an interesting
observation, the first layers were a preprocessing layer for segmentation using dif-
ferent techniques. Identification of nodules is done through these variables: thresh-
old, watershed, and U-Net. The network can be trained end-to-end from raw image
patches. Its main requirements are the availability of training database, but otherwise
no assumptions are made about the objects of interest or underlying image modality.
Advancement could be possibly to extend our current model to not only determine
whether or not the patient has cancer but also determine the exact location of the
cancerous nodules. The immediate further improvisation on this architecture is to
use watershed segmentation as the basic lung segmentation. Other opportunities for
improvement include making the network deeper and more extensive hyper param-
eter tuning.
REFERENCES
[1] Y. Xu, T. Mo, Q. Feng, P. Zhong, M. Lai, and E. I. Chang, “Deep learning of feature
representation with multiple instances learning for medical image analysis,” in IEEE
International Conference on Acoustics, Speech and Signal Processing, ICASSP,
pp. 1626–1630, 2014.
[2] D. Kumar, A. Wong, and D. A. Clausi, “Lung nodule classification using deep features
in ct images,” in 2015 12th Conference on Computer and Robot Vision, pp. 133–138,
June 2015.
[3] Y. Bar, I. Diamant, L. Wolf, S. Lieberman, E. Konen, and H. Greenspan, “Chest pathology
detection using deep learning with non-medical training,” in Proceedings—International
Symposium on Biomedical Imaging, vol. 2015, pp. 294–297, July 2015.
[4] W. Sun, B. Zheng, and W. Qian, “Computer aided lung cancer diagnosis with deep learn-
ing algorithms,” in SPIE Medical Imaging, vol. 9785, pp. 97850Z–97850Z, International
Society for Optics and Photonics, 2016.
[5] A. Chon, N. Balachandar, and P. Lu, Deep Convolutional Neural Networks for Lung
Cancer Detection, Technical report, Stanford University, 2017.
[6] Y. LeCun, K. Kavukcuoglu, and C. Farabet, “Convolutional networks and applications
in vision,” in Proceedings of the IEEE International Symposium on Circuits and Systems
(ISCAS), pp. 253–256, IEEE, 2010.
[7] C. Chola, A. Y. Muaad, M. B. Bin Heyat, J. B. Benifa, W. R. Naji, K. Hemachandran, . . .
T. S. Kim, “BCNet: A deep learning computer-aided diagnosis framework for human
peripheral blood cell identification,” Diagnostics, 12(11), 2815, 2022.
[8] K. Hemachandran, A. Alasiry, M. Marzougui, S. M. Ganie, A. A. Pise, M. T. H. Alouane,
and C. Chola, “Performance analysis of deep learning algorithms in diagnosis of malaria
disease,” Diagnostics, 13(3), 534, 2023.
9 Breast Cancer Prediction
Using Machine
Learning Algorithms
Raja Krishnamoorthy, Arshia Jabeen,
and Harshitha Methukula
9.1 INTRODUCTION
Cancer is a disease that occurs when there are changes or mutations in genes relating
to cell growth. These mutations allow the cells to divide and multiply in an uncon-
trolled and chaotic manner. These cells keep increasing and start making replicas
which end up becoming more and more abnormal [1]. These abnormal cells later
on form a tumor. Tumors, unlike other cells, don’t die even though the body doesn’t
need them.
The major cancer classifications are categorized two types, that is, malignant and
benign. Malignant cancers are cancerous. These cells keep dividing uncontrollably
and start affecting other cells and tissues in the body [2, 3]. They reach to all other
organs of the body, and it is hard to cure this type of cancer. Chemotherapy, radiation
therapy and immunotherapy are types of treatments that can be given for these types
of tumors. Benign cancer is non-cancerous. Unlike malignant, this tumor shouldn’t
propagate to rest of the organs and hence is much less risky than malignant. In many
cases, such tumors don’t really require any treatment.
DOI: 10.1201/9781003328414-9 89
90 Artificial Intelligence and Knowledge Processing
• Breast exam
• Mammogram
• Breast ultrasound
• MRI of the breasts
• Removing a sample of breast cells for testing (biopsy)
instead of taking more tests to check whether the cancer is malignant or benign, ML
can be used to predict the case based on the huge amount of data on breast cancer.
This proposed system helps the patients as it reduces the amount of money they need
to spend just for the diagnosis. Also, if the tumor is benign, then it is not cancerous,
and the patient doesn’t need to go through any of the other tests. This saves a lot of
time as well.
TABLE 9.1
Proposed System – Decision Tree Method
S. No. Existing System Data Proposed System Data
1 4.2 7.0
2 2.6 4.3
3 3.4 6.0
4 4.8 7.9
92 Artificial Intelligence and Knowledge Processing
TABLE 9.2
Proposed System – Random Forest Method
S. No. Existing System Data Proposed System Data
1 3.1 4.0
2 4.0 7.0
3 6.0 8.0
4 8.1 9.0
Breast Cancer Prediction Using Machine Learning Algorithms 93
9.3.5 Design
A. DESIGN GOALS
Under this model, the goal of this project is to create a design to achieve the
following:
B. ACCURACY
Only accurate outcomes can help make this model a good one. It can be
reliable only when all the outcomes are correct and can be trusted. As this
data is required for healthcare purposes, it is important that no errors occur.
C. EFFICIENCY
The model should be efficient as there is no requirement of manual data
entry work or any work by doctors. It takes less time to predict outcomes
after all the ML algorithms have been used on the data.
94 Artificial Intelligence and Knowledge Processing
TABLE 9.3
Proposed System – Logistic Regression Method
S. No. Proposed System Data Existing System Data
1 2.0 2.0
2 2.5 3.0
3 3.5 3.0
4 4.5 2.9
TABLE 9.4
All Proposed System Comparison (Decision Tree, Random
Forest and Logistic Regression)
S. No. Proposed Sys-1 Proposed Sys-2 Proposed Sys-3
1 7.0 4.0 2.0
2 4.3 7.0 2.5
3 6.0 8.0 3.5
4 7.9 9.0 4.5
9.3.6 System Architecture
As this project does not have any user interface, the architecture is basically the
dataset and the features of the dataset. It is trying to understand the dataset and
make the system as simple and easy as possible. The dataset is first split into
Breast Cancer Prediction Using Machine Learning Algorithms 95
training and testing sets. The training set is first exposed to the ML algorithms
so that the system understands what data gives what type of outcome. After the
system is trained, the testing data is used to test whether the system can correctly
predict the class of the data illustrated in Figure 9.8. It checks the percentage
accuracy of the model.
96 Artificial Intelligence and Knowledge Processing
9.4 IMPLEMENTATION
PREPARING THE DATA
Step 1: The first step in the ML process is to prepare the data.
Step 2: After importing all the necessary packages, we need to load the dataset.
We use the help of Pandas to load the dataset.
Step 3: We need to drop the first column of the dataset which consists of IDs, as
this field will not help us in the classification process. This is done as follows:
Stage Four: Predict the number of DP points damaged and initiate.
diagnosis_all = list(data.shape)[0]
diagnosis_categories = list(data[‘diagnosis’].value_counts())
print(“\n\t The data has{} diagnosis, {} malignant and
{}benign.”.format(diagnosis_all,diagnosis_categories[0],diagnosis_categor
is[1]))
The data has 569 diagnosis: 357 malignant and 212 benign.
9.5 RESULTS
According to the compiled algorithm we have secured the following outcome
9.6 CONCLUSION
In this chapter, the suitable dataset is collected to help in this predictive analysis.
This dataset is then processed to remove all the junk data. The predictive analysis
method is being used in many different fields and is slowly picking up pace. It is help-
ing us by using smarter ways to solve or predict a problem’s outcome. This scheme
was developed to reduce the time and cost factors of the patients as well as to mini-
mize the work of a doctor, and we have tried to use a very simple and understandable
98 Artificial Intelligence and Knowledge Processing
model to do this job. The ML algorithm of this work computed the training data and
testing data that must be used to check accuracy of the outcome.
REFERENCES
[1] Wolberg, Street and Mangasarian, “Wisconsin Diagnostic Breast Cancer Dataset”,
http://archive.ics.uci.edu/ml.
[2] S. Palaniappan and T. Pushparaj, “A Novel Prediction on Breast Cancer from the Basis
of Association Rules and Neural Network”, International Journal of Computer Science
and Mobile Computing, 2(4), (2013), 269–277.
[3] N. Khuriwal and N. Mishra, “A Review on Breast Cancer Diagnosis in Mammography
Images Using Deep Learning Techniques”, Journal of Image Processing & Pattern Rec‑
ognition Progress, 5(1), (2018), 51–57.
[4] Mengjie Yu, Breast Cancer Prediction Using Machine Learning Algorithm, The Univer-
sity of Texas at Austin, 2017.
[5] Wenbin Yue, Zidong Wang, Hongwei Chen, Annette Payne, and Xiaohui Liu, “Machine
Learning with Applications in Breast Cancer Diagnosis and Prognosis”, Designs, 2(2),
(2018), 13. https://doi.org/10.3390/designs2020013
Breast Cancer Prediction Using Machine Learning Algorithms 99
10.1 BACKGROUND
Cancer involves abnormal and uncontrollable cell growth with the potential to prolif-
erate and advance to other body parts. This contrasts with benign cancers, which do
not spread to other body parts. In the year 2018, with a death rate of 9.6 million, can-
cer ranks as the second most common cause of death worldwide. Low- and middle-
income countries witness close to 70% of these deaths [1]. After skin cancer,
breast cancer is the second most occurrent cancer to affect women. Mammograms
are x-rays which aid in the early detection of breast cancer. Breast cancer is formed in
the breast cells. Breast cancer occurs in both men and women, but it rarely occurs
in men. Nearly 500,000 women around the world are killed due to breast cancer.
In 2018, breast cancer took the lives of 627,000 women, which makes up 15% of
all cancer-related deaths in women, according to estimates from the World Health
Organization (WHO).
As a part of the screening process women are tested to find cancer cells before
any symptoms arise. The use of mammography, professional breast examina-
tions, and breast self-exams are only a few of the various techniques used for
screening [2].
10.2 MAMMOGRAPHY
It employs low-energy x-rays to find breast abnormalities. In high-resource environ-
ments, it has been proven to reduce breast cancer mortality by about 20%. The WHO
position paper on mammography screening deduced that in well-equipped environ-
ments women between the age group of 50 and 69 years should undergo organized,
population-based mammography screening if prerequisite conditions on program
implementation are satisfied. Mammography is not cost-effective in places with
weak health systems and limited resources; thus early detection should be focused
at diagnosis via increased awareness. The WHO advises systematic mammography
only in the context of thorough research and in locations with adequate resources for
women aged 40–49 and 70–75.
10.5 CAUSES
Benign breast problems are generally brought on by a variety of factors, includ-
ing breast composition, age, and hormone problems. External treatments such
as birth control pills and hormone therapy can also be an active cause for it.
Gynecomastia, or male benign breast disease, is brought on by an imbalance of
hormones. In addition, certain diseases, hormone therapy, and being overweight
contribute to it.
1. Adenosis:
Adenosis occurs once several lobules (milk-producing sacs) grow larger
and contain more glands than usual. If the enlarged lobules contain scar-
like fibrous tissue, this is often referred to as sclerosing gland disease. Ade-
nosis might cause a lump that you or your doctor will feel (Figure 10.2).
A diagnostic assay is required to assess the distinction between gland dis-
ease and cancer. With a real gland disease, any increase in carcinoma risk
seems to be slight.
2. Fibroadenoma:
A fibroadenoma is typically felt as a lump within the breast that is smooth
and moves easily below the skin. Fibroadenomas are sometimes painless;
however, they’ll feel tender or perhaps painful. See Figure 10.3.
Types of fibroadenoma:
1. Simple fibroadenoma:
Most fibroadenomas are 1–3 cm in size and are referred to as ‘easy
fibroadenomas’. Once checked out beneath a magnifier, simple fibro-
adenomas can look similar everywhere. Simple fibroadenomas don’t
increase the chance of developing carcinoma in the future.
2. Complex fibroadenoma:
Having a noticeable adenoma will increase the danger of developing
carcinoma within the future.
Malignant tumor: malignant tumor cells rapidly spread. These cells may have an
abnormal shape.
104 Artificial Intelligence and Knowledge Processing
1. Ductal carcinoma:
Ductal cancer is a non-invasive carcinoma that’s restricted to the ducts in
the breast. Accrued use of diagnostic techniques like screening has led
to a rapid increase in ductal cancer detection (DC). Sixty-four thousand
cases of DC are annually identified in the United States, and 90% of
DC cases are classified as suspicious calcifications using diagnostic tech-
niques having a linear, clustered, segmental, focal, or mixed distribution.
See Figure 10.6.
Breast Cancer Histopathological Images Classification 105
DC is classified into three types, that is low grade (grade I), intermediate
grade (grade II), or high grade (grade III) on the basis of growth rate of cells.
Classifying DC as low and intermediate-grade indicates that the cancer
cells’ square measure is growing at a comparatively slow rate. Low-grade
DC cells resemble atypical ductal dysplasia cells or conventional breast
cells in many ways. Grade II DC cells develop more quickly than usual and
seem less likely to be unaffected cells. Due to the rapid growth of grade III
DC cells, grade III DC has a higher risk of developing invasive cancer in the
first five years after diagnosis.
2. Lobular carcinoma:
Lobular cancer in place (LC) develops from the lobe at the end of the duct
and shows a widespread distribution throughout the breast, which explains its
appearance like a non-palpable mass in most cases (as shown in Figure 10.7)
[7]. The incidence of LC has doubled over the previous 25 years and now
occurs at a rate of 2.8 per 100,000 females.
3. Mucinous carcinoma:
Mucinous carcinoma is another uncommon microscopic anatomy type,
seen in less than 5% of invasive carcinoma occurrences. It usually
appears in septuagenarians as a palpable mass or appears on mammogra-
phy as a poorly defined growth with uncommon calcifications and tends
to restrict mucin production. Type A and type B are the two main kinds,
while AB lesions can have either one [8, 9]. Type A glycoprotein malig-
nant neoplastic disease is represented by selections with larger amounts
of animate glycoprotein (as shown in the image), while the blood group
could be a different variation with endocrine differentiation and micro-
scopic anatomy showing additional granular living substance than type
A carcinoma.
106 Artificial Intelligence and Knowledge Processing
4. Papillary carcinoma:
Papillary cancer comprises a range of microscopic anatomy subtypes. The
two common types are cystic (non-invasive form) and micropapillary duc-
tal cancer (invasive form). Papillary carcinoma typically affects women
over the age of sixty and makes up about 1%–2% of all breast cancer
cases. Papillary carcinomas are located at the centre of the breast and
might appear as discharge of blood from the breast. They are powerful sex
hormones receptor (ER) and lipo-lutin receptor (PR). Low mitotic activ-
ity in cystic papillary carcinoma results in a prognosis and course that
Breast Cancer Histopathological Images Classification 107
10.6 METHODS
Deep learning approaches are able to mechanically extract choices, retrieve data,
and develop sophisticated abstract representations. Deep learning is also a set of
machine learning algorithms with networks capable of differentiating unattended
from unstructured information [7].
This chapter analyses the breast cancer images using transfer learning through
pre-trained networks. We are classifying these histopathological images using
Shufflenet by dividing the dataset according to the analysis. We are using convo-
luted neural network (CNN) for the classification problem, and we are comparing
the CNN with the transfer learning for better results with increasing perfor-
mance. A multi-class classification for diagnosis is also being conducted. The
experiment aims to prove the Shufflenet provides the best accuracy results com-
pared to CNN. In this experiment, the input size should be 224 × 224 for transfer
learning. Our final goal is to provide a deep analysis using transfer learning on
the BreaKHis Dataset. We also provide a confusion matrix for summarizing the
performance of the classification method. We are utilizing a 400× sub-dataset.
400×). So far it has 2,480 benign and 5,429 cancerous samples (700 × 460 pixels,
three-channel RGB, 8-bit depth in every channel, PNG format) (https://web.inf.ufpr.
br/vri/databases/breast-cancer-histopathological-database-breakhis/).
The two primary categories of the BreaKHis dataset are benign tumours and
malignant cancers. A lesion that doesn’t satisfy any criteria for malignancy, such
as mitosis, significant cellular atypia, disruption of basement membranes, distribu-
tion, etc., is said to be histologically benign. Typically, benign tumours are relatively
“harmless,” show a modest rate of growth, and remain contained. A lesion that can
spread to distant areas (metastasize) and invade and destroy nearby structures (locally
invasive) resulting in loss of life is referred to as a malignant tumour. In the present
iteration, samples in the dataset were gathered using the surgically obtained biopsy
(SOB) methodology and were designated as excisional biopsy or partial mastectomy.
Compared to other needle biopsy techniques, this type of operation extracts a larger
sample of tissue, and it is carried out in a hospital under anaesthesia. Adenosis (A),
phyllodes tumour (PT), fibroadenoma (F), and tubular adenoma (TA) are the four
distinctive types of benign breast tumours currently present within the dataset, while
ductal carcinoma (DC), mucinous carcinoma (MC), lobular carcinoma (LC), and
papillary malignant neoplastic disease (PC) are the four malignant tumours shown
in Table 10.1.
10.9 ARCHITECTURE
All of the architectural models in this CNN are comparable to one another. Convolution,
pooling, and fully connected layers are included in these architectural models.
10.10 CONVOLUTION
Convolution involves applying filters to an input image before multiplying the
result with additional filters to create a new image. Consider the 5 × 5 image
matrix with pixel values of 0 and 1, for instance, and the 3 × 3 filter image
matrix. Together, this 5 × 5 image matrix and 3 × 3 filter image matrix multiply
the images. The output is referred to as ‘a feature map’. Using a variety of filters
and numerous convolutions, we generate various feature maps from the input
images. To create a new image, we further merge all feature maps. Edge detec-
tion, blurring, and sharpening are just a few of the many processes that may be
carried out using various filters.
Breast Cancer Histopathological Images Classification
TABLE 10.1
Images in Subclasses in 400× Magnification
MAGNIFICATION BENIGN MALIGNANT
109
110 Artificial Intelligence and Knowledge Processing
10.11 STRIDES
Strides are used to indicate how many steps a convolution filter must move through
in a single step. The default setting is 1. To minimise issues with field overlap, we
can provide higher values. The feature map’s size may be decreased as a result.
10.12 PADDING
When the filter does not exactly fit the input image, padding is a technique used to
minimise the image dimensions. Padding can be performed in two ways: 1. To pad
the bits with zeros, which is referred known as zero-padding. 2. Since only valid
padding bits are used, we can remove the area of the image where the filter is not
perfectly fit.
1. Minimum pooling
2. Average pooling
3. Sum pooling
4. Maximum pooling
It takes bigger value elements from the modified feature map output image. For
instance, our output image size is 4 × 4 and after applying max-pooling, the matrix
was converted to a 2 × 2 matrix where larger values were found and placed in the
2 × 2 matrix’s first block. The process of max pooling is shown in Figure 10.10:
1 1 2 4
5 6 7 8
3 2 1 0
1 2 3 4
10.14.1 Average Pooling
It will take the average of all values of each 2 × 2 matrix.
10.14.2 Sum Pooling
It will perform sum operation on each 2 × 2 matrix. Check the process in
Figure 10.11:
1 1 2 4
5 6 7 8
3 2 1 0
1 2 3 4
TP+TN
Accuracy =
TP+TN+FP+FN
TP
Recall =
TP+TN
TP
Precision =
TP+FP
2 * Recall * Precision
F_measure =
Re call + Precision
10.20 SHUFFLENET
Shufflenet is one of the pre-trained networks provided by MATLAB, and it is a
CNN that is trained on a large number of images from the ImageNet datastore.
MATLAB Shufflenet is available in the Deep Learning Toolbox Model package.
If the support packages are not installed, then the Shufflenet function automat-
ically provides a link to download the package. Shufflenet input image size is
224 × 224. In this network, there are 173 layers connected to lower layers [11]. This
pre-trained network is mainly used for mobile devices. It contains residual blocks
called Shuffleunits.
10.21 RESULTS
In this section of the chapter, we are presenting the classification results on
the breast cancer histopathological images from the BreaKHis dataset. In this
experiment, we are using MATLAB as a platform for execution and program-
ming language. For this training progress, we have used graphics processing
unit (GPU).
114 Artificial Intelligence and Knowledge Processing
TABLE 10.2
Performance Metrics for Each Class
Accuracy Sensitivity Specificity Recall F_score
Adenosis 0.967213 0.636363636 0.988372093 0.636363636 0.7
Ductal carcinoma 0.907104 0.898734177 0.913461538 0.898734177 0.893081761
Fibroadenoma 0.967213 0.875 0.981132075 0.875 0.875
Lobular carcinoma 0.945355 0.5 0.982248521 0.5 0.583333333
Mucinous carcinoma 0.956284 0.764705882 0.975903614 0.764705882 0.764705882
Papillary carcinoma 0.967213 0.714285714 0.98816568 0.714285714 0.769230769
Phyllodes tumour 0.956284 1 0.953488372 1 0.733333333
Tubular adenoma 0.994536 0.923076923 1 0.923076923 0.96
×The results are on the augmented dataset which performed the training process.
The results are displayed in Table 10.2 on the 400X dataset. The images are taken
from eighty-two patients.
The metrics for each class and overall performance metrics for each class are
described in the table.
We divided the dataset into train, validation, and test. The training options used
are at 0.002 as initial learning rate with 20 epochs. After the training is done, the
validation accuracy is up to 89±% and the test accuracy is 90±%. The confusion
matrix is provided in Table 10.3.
Table 10.4 includes the confusion matrix values for each class.
The overall performance metrics graph is shown in Figure 10.13. The average
accuracy of all classes is 95±%.
The resultant images are the random images displayed after testing is done. The
images display how much the probability percentage of the data is matched with the
training data. See Figure 10.14.
10.22 DISCUSSION
All these results are provided by using the transfer learning method with the help
of pre-trained networks on the histopathological images of breast cancer. It is very
useful for doctors to find out which type of cancer can be diagnosed from images.
It reduces the time of clinical testing and allows doctors to provide an immediate
diagnosis.
This analysis is the base for researchers to further explore histopathological
images of breast cancer in the future.
10.23 CONCLUSIONS
We have proposed our methods of using pre-trained networks in the transfer learning
process using MATLAB on histopathological images of breast cancer in this chapter. All
Breast Cancer Histopathological Images Classification
TABLE 10.3
Correlation Matrix for Each Class
Adenosis Ductal Fibroadenoma Lobular Mucinous Papillary Phyllodes Tubular
carcinoma carcinoma carcinoma carcinoma tumor adenoma
Adenosis 7 0 0 0 1 0 3 0
Ductal carcinoma 0 71 1 3 1 2 1 0
Fibroadenoma 0 0 21 0 1 0 2 0
Lobular carcinoma 0 5 0 7 1 0 1 0
Mucinous carcinoma 2 1 1 0 13 0 0 0
Papillary carcinoma 0 3 0 0 0 10 1 0
Phyllodes tumor 0 0 0 0 0 0 11 0
Tubular adenoma 0 0 1 0 0 0 0 12
115
116 Artificial Intelligence and Knowledge Processing
TABLE 10.4
Confusion Matrix for Each Class
TP TN FP FN
Adenosis 7 170 2 4
Ductal carcinoma 71 95 9 8
Fibroadenoma 21 156 3 3
Lobular carcinoma 7 166 3 7
Mucinous carcinoma 13 162 4 4
Papillary carcinoma 10 167 2 4
Phyllodes tumor 11 164 8 0
Tubular adenoma 12 170 0 1
10.24 ABBREVIATIONS
A—Adenosis
CNN—Convolutional Neural Networks
DC—Ductal carcinoma
Breast Cancer Histopathological Images Classification 117
ER—Estrogen receptors
F—Fibroadenoma
FN—False negative
FP—False positive
GPU—graphics processing unit
LC—Lobular carcinoma
MC—Mucinous carcinoma
PC—Papillary malignant neoplastic disease
PNG—Portable Network Graphics
PR—Progesterone receptors
PT—Phyllodes tumor
RGB—Red, Green, Blue, concerning computer display
SOB—Surgically obtained biopsy
TA—Tubular adenoma
TN—True negative
TP—True positive
118 Artificial Intelligence and Knowledge Processing
REFERENCES
[1] American Cancer Society: Phyllodes Tumors of the Breast. www.cancer.org/cancer/
breast-cancer/non-cancerous-breast-conditions/phyllodes-tumors-of-the-breast.html.
Accessed 14 Jan 2020.
[2] American Cancer Society: Adenosis of the Breast. www.cancer.org/cancer/breast-
cancer/non-cancerous-breast-conditions/adenosis-of-the-breast.html. Accessed 14 Jan 2020.
[3] F. Bray, J. Ferlay, I. Soerjomataram, R. L. Siegel, L. A. Torre, and A. Jemal. Global
Cancer Statistics 2018: GLOBOCAN Estimates of Incidence and Mortality World-
wide for 36 Cancers in 185 Countries. www.wcrf.org/dietandcancer/cancer-trends/
breast-cancer-statistics.
[4] Breastcancer.org: Invasive Ductal Carcinoma (IDC). www.breastcancer.org/symptoms/
types/idc. Accessed 14 Jan 2020.
[5] Breastcancernow: Breast Cancer Causes. https://breastcancernow.org/information-
support/have-i-got-breast-cancer/breast-cancer-causes. Accessed 14 Jan 2020.
[6] Breastcancer.org: IDC Type: Mucinous Carcinoma of the Breast. www.breastcancer.org/
symptoms/types/mucinous. Accessed on 14 Jan 2020.
[7] D. Sarkar. A Comprehensive Hands-on Guide to Transfer Learning with Real-World
Applications in Deep Learning. 2018. https://towardsdatascience.com/a-comprehensive-
hands-on-guide-to-transfer-learning-with-real-world-applications-in-deep-learning-
212bf3b2f27a. Accessed 14 Jan 2020.
[8] Boubacar Efared et al. Tubular Adenoma of the Breast: A Clinicopathologic Study of a
Series of 9 Cases. Clinical Medicine Insights. 2018; doi:10.1177/117955571875749.
[9] E. Roth and D. Weatherspoon. Lobular Breast Cancer: What Are the Prognosis and
Survival Rates? 2018 www.healthline.com/health/breast-cancer/lobular-breast-cancer-
prognosis-survival. Accessed 12 Jan 2020.
[10] Abhishek Sharma. Confusion Matrix in Machine Learning. www.geeksforgeeks.org/
confusion-matrix-machine-learning/. Accessed 14 Jan 2020.
[11] X. Zhang, X. Zhou, M. Lin, and J. Sun. Shuffle Net: An Extremely Efficient Convolu-
tional Neural Network for Mobile Devices. 2018 IEEE/CVF Conference on Computer
Vision and Pattern Recognition. 2018; doi: 10.1109/CVPR.2018.00716.
11 Machine Learning
and Signal Processing
Methodologies to
Diagnose Human
Knee Joint Disorders
A Computational Analysis
Murugan R, Balajee A, Senbagamalar L, Mir Aadil,
and Shahid Mohammad Ganie
11.1 INTRODUCTION
One of the critical disorder analyses in the medical field trending nowadays is the
diagnosis of knee joint disorders which are very complex due to the structure of the
knee joints. At present there are two different methods of diagnosing in practice.
The invasive and the non-invasive methods are adopted for the knee joint disorder
diagnosis. An invasive method is similar to arthroscopy, as it is not only expensive
but also not suitable for regular diagnostics. The other disadvantage of this method
is it is entirely prone to infection [1]. Non-invasive methods are those which use
systemized methods to diagnose without clinical surgery methods. This includes
computed tomography (CT), x-rays, magnetic resonance imaging (MRI), ultraso-
nography (US), etc.
Vibroarthrography (VAG) is a non-invasive method that involves analyzing
vibrations in the knee joint. These vibrations are obtained from the relative
movement of the articular surfaces of the synovial joints. In a healthy state, the
outer surface of the joint is covered by a smooth and slimy hyaline ossein, which
detects optimal arthro-kinematic movement quality. Osteoarthritis is again and
again observed by using the patello-femoral joint (PFJ), a portion of the knee
joint. VAG signals exhibit the ability of non-linearity, are multi-component, and
are non-stationary in nature. Thus, the analysis of VAG signals would not be
preferred for digital signal processing using conventional methods. The greatest
awareness of the VAG test results is those recorded from the skin-deep position
of the knee.
VAG signals play a vital role in discriminating the different levels of joint dis-
orders and they could also act as a database for future reference [2]. There are a
number of models that are proposed for handling the binary classification, whereas
the abnormal level-based multiple classifications remain unaddressed [3]. The binary
classifications are directly performed with the feature vector that consists of data
lacking in performance in terms of sensitivity [4–6]. Combining the optimization
methods with abnormality-level identification could induce the overall performance
of the VAG systems [7].
Thus, the initial goal of this chapter is to analyze the materials and methods used
for performing the binary classification of VAG data samples using signal processing
and machine learning methods [8]. In the signal processing criteria, we considered
the recently proposed CEEMDAN method for analysis. This method performs not
only the decomposition of the raw VAG signals but also acts as a pre-processing
method for creating the original data matrix that could act as input data to feature
extraction and the classification methods that could induce the overall performance
of the VAG-based disorder diagnosis. Two major machine learning methods are
adopted for analyzing the classification performance of data samples. LS-SVM and
SVM-RFE are used, where each method supports the performance improvement of
the VAG system.
The VAG method is a dynamic measurement system that achieves higher
performance metrics when compared to the other non-invasive modalities. The
results achieved by the VAG methods using the considered classification systems
are projected in the results section. Visualizing the knee joint as x-rays at the time
of recording the signal samples also used US [9, 10]. Invasive methods are com-
monly performed via image review, and it could not provide information about
the early bone disorder [11]. On the other hand, for the early diagnosis of bone
joint disorders, a non-invasive method was employed. The surface of the bone
joint disorder was analyzed using LabVIEW software with the aid of an acquired
VAG signal.
11.2 METHODOLOGIES
This section covers the major methodologies focused on the process of analyz-
ing knee joint disorders using VAG signals. Earlier decomposition techniques
like variational mode decomposition lack the successful mode splitting that is
calculated as intrinsic mode functions (IMFs), whereas the recently proposed
CEEMDAN method [7] achieves the better split of mode functions so that the
noise can be removed. The noise removal also utilizes the minor addition of
added noise via transformation metrics, which also achieved better pre-process-
ing of available VAG signal samples. The alternate mechanism of including the
Machine Learning and Signal Processing Methodologies 121
existing decomposition strategies also could improve the overall signal pre-pro-
cessing results. By including the CEEMDAN, VMD, and also the timing decom-
position techniques, an empirical signal processing method can be adopted as
shown in Figure 11.1.
Figure 11.1 explains the overall block diagram of the feature extraction meth-
ods. This is designed to work well for data that is non-stationary and non-lin-
ear frequency images that will give input to the time-frequency representation.
The features extracted from time-frequency images are classified using SVM,
LS-SVM, and SVM-RFE machine learning algorithms to analyze the efficiency
of each of the algorithms in clinical classification performance. Finally, the
classification system identifies the healthy and unhealthy samples as shown in
Figure 11.1.
S1 ( I ) = X - S ( I ) (11.1)
FIGURE 11.1 Block diagram for VAG signal-based classification of healthy and unhealthy
knee joints.
122 Artificial Intelligence and Knowledge Processing
ALGORITHM: CEEMDAN
Input Time-frequency signal images i, i2 , i3 ..... in
Output Pre-processed signal data
Begin
Step 1: Remove the white noise using the following equation:
( )
I p = I + b 0 X1 W P (2)
Where WP denotes the white noise included in each of the signal samples.
Step 2: Calculation of mean value along with the R real values is performed using the
following notation:
( )
R1 = S I p (3)
Step 3: The outliers that are left unaddressed till the mean value calculation are removed
using the following notation:
O1 = I - R1 (4)
Step 4: The overall residues that are associated with the original samples are calculated by
finding the iterative average of the local means as follows:
( )
O2 = R1 - S ( R1 + b 0 X 2 W P (5)
End
1
åin=1 a j - - åin=1 å nj =1 a ia j y i y j K(x i, x j ) (11.6)
2
0 £ ai £ C åin=1 a i y i = 0
g ( x ) = å iÎSVa i K ( X i ,X ) + b
n
(11.7)
Finally, detect the knee joint disorder using the SVM classifier.
ALGORITHM: SVM-RFE
Input Original dataset that has number of samples s, s2 , s3 ..... sn
Output Feature vector with absolute set of features
Begin
Step 1: Provide input data subset
Si = {1, 2,¼ m}
(Continued )
124 Artificial Intelligence and Knowledge Processing
(Continued)
V = åim=1 b i pi qi
Here i denotes the entities and p and q denote two identical target classes relevant
to the subset. b i is the Lagrangian multiplier estimated from the training set.
Step 5 Calculate the ranking criteria by performing V 2
Step 6 Rank the attributes based on the calculated criteria
Step 7 Update the rank list after every iteration
Step 8 Features with minimal ranking criteria are eliminated from the vector
Step 9 Once the criteria reach the smallest rank (i.e. <0.01) stop performing the ranking
End
The VAG signals are decomposed into the IMF’s signal. The VMD signals have
to perform mode signal operation. The VMD method examines the total number of
modes and their center frequencies, and the mode has reproduced the input signals
as smooth and demodulated to the baseband of spectral decomposition shown in
Figure 11.4.
Spectrum-based decomposition of the VAG signal can be applied to any band
with a number of modes. The modes are extracted concurrently using a non-recursive
variation mode decomposition model. The model is shown for an ensemble of modes,
then their corresponding center frequencies, such as the modes collected, and repro-
duces the input signal of each as smooth after being demodulated into the baseband
as shown in Figure 11.5.
126 Artificial Intelligence and Knowledge Processing
FIGURE 11.5 Reconstructed mode of input signal for the VMD method.
FIGURE 11.7 Input signal for various decomposition modes using the CEEMDAN method.
FIGURE 11.8 Shifting iteration of each mode for normal and abnormal samples.
the original vectors of the considered sample. This mode avoids the overlap-
ping issue, and it combines the functions to create a new vector. Whenever the
individual vector has a higher value that could not be accommodated within the
normal mode function, then the bivariate EMD can be adopted for creating
the mode functions that are equivalent to the dual step of shifting the sequence
taking place in the traditional decomposition schemes. Signal masking tech-
niques are adopted by CEEMDAN to avoid the noise that occurs with each of the
data samples shown in Figure 11.7.
The overlap that occurs in modes is eliminated through the scaling and partition-
ing schemes of decomposition methods. The scale partition capabilities of EEMD
enable the elimination of the mode mixing problem. The CEEMDAN method
improves the shifting process for complete shifting in multiple-mode operation
shown in Figure 11.8.
128 Artificial Intelligence and Knowledge Processing
FIGURE 11.11 LS-SVM using feature extraction for healthy and unhealthy.
11.4 CONCLUSION
In this chapter, we analyzed various features that are key factors in discriminat-
ing between the normal and abnormal data samples. The analysis also contains the
conversion techniques that are used to interact between the mode functions and the
original class of the data. The data analysis had been implemented for the dataset
of non-stationary and non-linear signals using different processing techniques. The
Hilbert transformation method has been performed for the different mode functions
of TVF-EMD, VMD, and CEEMDAN methods. To add the noise performs the
shifting process reconstructs the input signal of time-frequency images. The time-
frequency data is fed to perform the pattern classification of LS-SVM and SVM-
RFE algorithms for feature extraction where SVM-RFE extraction shows the best
optimum results. Finally, VAG signals are analyzed and classified for both healthy
and unhealthy knee joint samples.
REFERENCES
[1] G. Rajalakshmi, C. Vinothkumar, A. Anne Frank Joe, and T. Thaj Mary Delsy. “Vibroar-
thographic signal analysis of bone disorders using Arduino and piezoelectric sensors”.
International Conference on Communication and Signal Processing, April 4–6, 2019,
India.
[2] A. C. D. Faria, G. R. C. Pinheiro, J. Neri1, and P. L. Melo. “Instrumentation for the
analysis of changes in the knee joint of patients with rheumatoid arthritis: Focus on low-
frequency vibrations”. Journal of Physics: Conference Series, 1044, 2018, conference 1.
[3] Manish Sharma and U. Rajendra Acharya. “Analysis of knee-joint vibroarthographic
signals using bandwidth-duration localized three-channel filter bank”. DOI: 10.1016/j.
compeleceng.2018.08.019.
[4] Saif Nalband, Amalin Prince, and Anita Agrawal. “Entropy-based feature extraction
and classification of vibroarthographic signal using complete ensemble empirical model
decomposition with adaptive noise”. IET Science, Measurement & Technology, 12 (3),
2018, pp. 350–359 © The Institution of Engineering and Technology 2017.
130 Artificial Intelligence and Knowledge Processing
[5] Jawad F. Abulhasan and Michael J. Grey. “Anatomy and physiology of knee stability”.
Journal of Function Morphology and Kinesiology, 2 (4), 2017, pp. 34.
[6] F. Picard, A. Deakin, N. Balasubramanian, and A. Gregori. “Minimally invasive total
knee replacement: Techniques and results”. European Journal of Orthopaedic Sur‑
gery & Traumatology, 28, 2018, pp. 781–791.
[7] Aditya Sundar Nalband, A. Amalin Prince, and Anita Agarwal. “Feature selection and
classification methodology for the detection of knee-joint disorders Saif”. 2016. DOI:
10.1016/j.cmpb.2016.01.020.
[8] K. Hemachandran, S. Khanra, R. V. Rodriguez, and J. Jaramillo (Eds.). Machine Learn‑
ing for Business Analytics: Real-Time Data Analysis for Decision-Making. CRC Press,
2022.
[9] M. T. Hirschmann and W. Müller. “Complex function of the knee joint: The current
understanding of the knee”. Knee Surgery, Sports Traumatology, Arthroscopy, 23, 2015,
pp. 2780–2788.
[10] Dawid Bączkowicz, Edyta Majorczy, and Krzysztof Kręcisz. “Age-related impair-
ment of quality of joint motion in vibroarthrographic signal analysis”. 2015. DOI:
10.1155/2015/591707.
[11] Yunfeng Wu, Pinnan Chena, Xin Luoa, Hui Huangc, Lifang Liaoa, Yuchen Yao, Mei-
hong Wu, and Rangaraj M. Rangayyan. “Quantification of knee vibroarthrographic
signal irregularity associated with patello femoral joint cartilage pathology based on
entropy and envelope amplitude measures”. 2016. DOI: 10.1016/j.cmpb.2016.03.021.
[12] Mei-Ling Huang, Yung-Hsiang Hung, et al. “SVM-RFE based feature selection and
Taguchi parameters optimization for multiclass SVM classifier”. Hindawi Publishing
Corporation □e Scientific World Journal, 2014, Article ID 795624.
[13] Jianchen Wang, Ganlin Shan et al. “Improved SVM-RFE feature selection method for
multi-SVM classifier”. 2011 International Conference on Electrical and Control Engi-
neering. DOI: 10.1109/ICECENG.2011.6058060.
12 Diagnostics and
Treatment Help to
Patients at Remote
Locations Using Edge
and Fog Computing
Techniques
T. Sunil, J. Gladson Maria Britto, and
K. Bharath
12.1 INTRODUCTION
In the current scenario it is very important to see that diagnosis and treatment hap-
pen at the location where the patient is available. The framework is designed to see
that a provision is made to reach the patient in the remote location and to see that
the patient is monitored 24/7 by making use of technology like the fog and edge
computing [1].
The basic idea is to see that the patient is monitored all the time with the assis-
tance of different layers like the fog and edge layers, which are connected in the
network, where all the required devices like the sensors will be connected to the
patient in order to monitor them and based on the data captured by the various
sensors.
This data which is captured by the various sensors will be the input for the
edge computers, and then after performing the process of filtration, the data is
then transferred or uploaded to the fog systems, where again the process of fil-
tration happens, and ultimately the data is transferred to the cloud system or the
server [2].
Here the basic use of edge computing is to get the data from the sensors and other
devices which are used to monitor the patient and then to send the same to fog com-
puters; care is taken to see that the required treatment is also provided to the patient
in case of any emergency [3].
The basic advantage is that the required data will be sent to the cloud server
whereby the traffic is controlled and reduced. The nodes which are connected to
the edge of the network, which are referred as edge systems, will help in generating
faster results as the nodes will process and transfers the data to the next layer. Fog
systems which are connected between the edge and the cloud systems help in the fil-
tration process and thereby reduce the amount of the data which is getting transferred
to the cloud server [4].
Here we also make use of the concept of Internet of Things, which will be used in
order to connect the devices to the patient and to capture the data from those devices.
All the sensors used will be connected to the edge system, and then after collecting
and filtration of data, it will be sent to the fog computer. The data collected at the
edge will be localized, and based on the need the data will be escalated to the fog
system for further processing and finally will be sent to the cloud server. The overall
concept is to see that the traffic on the cloud is reduced [5].
The system is designed with the help of multiple systems which are connected
in order to ensure that the data which is transferred to the cloud is minimal, which
helps in reducing the hassles of the bandwidth required. Basically the devices like
the sensors and the actuators which are connected to the body of the patient in order
to monitor the health generate lot of data. So instead of directly sending the data
which is generated, the data is first transferred to the first layer in the system, which
is the edge computer. The edge computer is responsible for handling the data that is
generated by the various devices connected.
The next layer in the system is the fog system. This system will be responsible for
accepting all the data that is given or transferred by the edge systems. The task of
the fog system is to see that the data is correctly analyzed, and then it has to check
for the relevance of the data, and it has to pull out all the data which is not relevant
to transfer to the cloud system.
By placing the fog system, the advantage is that only the data which is relevant for
monitoring the health of the patient will be transferred to the cloud and not all the
data that is generated. It means that the data will be voluminous when it is generated
by the devices connected to the network which will be reaching the cloud after mul-
tiple layers of processing and filtration processes.
By introducing multiple layers in the system, the system will be transferring only
the required data, which helps to keep the cloud server to help in making decisions
by the experts in a faster manner, as the data to be searched will be less and will only
be useful data.
system, these layers are introduced. These layers will help to reduce the quantity of
data which needs to be on the cloud.
The data generated will be sent to the various nodes connected at edge of the
network, which are referred as edge system or computers. The data which is given as
input to the edge systems or the nodes are then processed at local level so that some
amount of data is reduced at this level.
The data is processed at every edge of the network while taking into consideration
only the important data which is to be sent to the cloud. So by processing the data
and collecting the data at every edge, this will help the system to transfer only the
required data to the cloud server.
The next layer in the system is the layer where fog systems are available.
These systems will receive the stream of data from the first layer, which is the
edge layer or edge nodes. With the introduction of a fog system in between the
edge and cloud, data latency is reduced and the data efficiency is increased. As
these systems, which are referred as fog systems, receive the stream of data from
the edge systems, the data is then processed on the basis of some parameters
which are important to monitor the health of the patient, and then the data is also
filtered in order to see that only relevant and important data is transferred to the
cloud server.
The system consists of edge and fog systems, which basically create platforms
to see that the data is collected at a place where they are generated. So in order to
maintain the efficiency of the system along with the security and the system capacity,
these layers are introduced.
The system can replace or can avoid using the fog systems but cannot afford to
skip the edge nodes which are connected to the edge of the network. By processing
the data which is collected right near to the point where the data is generated, this
will help in the reduction of data transferred to the cloud system. Collection of the
data and storing it a point where it is generated will help to reduce the cost of the
system as the same need not to be transferred to the cloud server.
The purpose of reducing data sent to the cloud server is achieved when the edge
and fog layer is introduced. The concept is to see that the data is gathered at a point
where it is generated by various Internet of Things (IoT) devices like the sensors
and so on. Bringing the storage and processing near to the devices connected or the
application used will help to minimize the time required for processing of the data
and also helps in using less Internet bandwidth [3].
So in order to see that the information pertaining to the patient is available on the
cloud server in order to take the decision on the health of patient and also to provide
the right treatment, the system makes use of edge and fog computing layers. The
advantage is improvement in efficiency of the data and reduction in bandwidth along
with latency.
Figure 12.1 shows how different layers are used in order to control traffic on a
network. So by making use of different layers in the system, we can see that the
bandwidth required is reduced along with an increase in system efficiency and speed.
The concept is to make sure that the application present on the cloud which is used
by the end user has to be simple and clear in terms of providing information about
134 Artificial Intelligence and Knowledge Processing
FIGURE 12.1 Different layers are used in order to control traffic on a network.
FIGURE 12.2 Flow chart to show how the data is captured and sent to the cloud.
the patient. The system is designed so that it collects the data from the various IoT
devices, which are wearable devices and which are connected to the patient’s body
and capable of generating data that should be processed as shown in Figure 12.2.
If we make all the devices which are connected to generate and send the data
to the cloud, the system will be clunky and costly and the decision-making pro-
cess will take a back seat. Introduction of edge nodes at the edges of the network
will help in collecting the data at the point of generation thereby providing more
security to the data which is collected. The edge computers will store the data
which is generated and will perform the task of processing the data in order to
reduce the load on the next layer. This means the edge nodes will store and buffer
the data and will also process the same before the data is sent to the fog systems
in the network.
Diagnostics and Treatment Help to Patients at Remote Locations 135
The fog systems, which are connected between the edge and the cloud server, are
responsible for processing the data which is captured, and then on the basis of certain
parameters the data is analyzed, and the data is also filtered so that only the relevant
and important data reaches the cloud server.
The system will provide multiple advantages in terms of improved efficiency
along with reduction in bandwidth required. The system will also help in reducing
the congestion of the data while it is transferred to the cloud system.
12.4 CONCLUSION
The designed system was implemented by making use of various devices like the sen-
sors connected to the patient present at a remote location and then capturing the data
related using the edge computers and then transferring the same to the fog computers,
so that the data required can be extracted and the irrelevant data can be filtered at this
level in order to see that the traffic and congestion on the cloud server are reduced.
The data from the cloud server was properly utilized by the various domain experts
as per the need in order to provide right and perfect treatment for the patient in need.
This designed framework has helped the patients to get the experts’ advice on time
every time. As the system makes use of the edge and fog computing concept, it is very
easy for the computers connected at the edges to capture the data at various locations
and then to store the same at the local level and also to filter and analyze the data so
that only the relevant data from all the edges will pass on to the fog system, and the
edges will receive the data from the various devices connected at every time inter-
val fixed. This has to be filtered and then transferred to the fog, which is connected
between the edges and the cloud server. The system provides a lot of advantages in
terms of required bandwidth and congestion on the cloud server, as the process of
filtration and removing irrelevant data will be done at this level also.
REFERENCES
[1] Mora-Sánchez, O.B., López-Neri, E., Cedillo-Elias, E.J., Aceves-Martínez, E. and Lar-
ios, V.M., 2020. Validation of IoT infrastructure for the construction of smart cities solu-
tions on living lab platform. IEEE Transactions on Engineering Management, 68(3),
pp. 899–908.
[2] Fortino, G., Fotia, L., Messina, F., Rosaci, D. and Sarné, G.M., 2020. Trust and reputa-
tion in the internet of things: State-of-the-art and research challenges. IEEE Access, 8,
pp. 60117–60125.
[3] Goudarzi, M., Wu, H., Palaniswami, M. and Buyya, R., 2020. An application placement
technique for concurrent IoT applications in edge and fog computing environments.
IEEE Transactions on Mobile Computing, 20(4), pp. 1298–1311.
[4] Ali, B., Pasha, M.A., ul Islam, S., Song, H. and Buyya, R., 2020. A volunteer-supported
fog computing environment for delay-sensitive IoT applications. IEEE Internet of
Things Journal, 8(5), pp. 3822–3830.
[5] Sarkar, S., Wankar, R., Srirama, S.N. and Suryadevara, N.K., 2019. Serverless manage-
ment of sensing systems for fog computing framework. IEEE Sensors Journal, 20(3),
pp. 1564–1572.
13 Image Denoising
Using Autoencoders
Mursal Furqan Kumbhar
FIGURE 13.2 An overview of how the entire procedure works to denoise the noisy images.
In their paper [2], Wu, C. and Gao, T. describe the concept of image/photo denoising,
share a complete list of common image noises, and discuss several classic algorithm
discussions used in traditional denoising procedures. Additionally, the shortcomings
of old methodologies are assessed. Then we summarize the deep learning–based
image denoising techniques, including the image denoisation methods based on sev-
eral structures, including GAN, DnCNN, REDNet, Noise2Noise, and CBDNet, along
with the concepts and structures of different methods. Finally, the limitations of pic-
ture denoising are investigated, as well as potential future study fields.
The authors, Chen, J., Chao, H. et al. in [3] of the study used image blind
deformation to study methods for removing unknown distortions from loud
Image Denoising Using Autoencoders 139
images. We all know that discriminatory learning algorithms like DnCNN can
produce cutting-edge results in denoisation. They are not suitable for our applica-
tions, however, as there are no pairing training data. To solve this problem, Chen
J. et al. proposed a new and efficient two-step approach. The generation contrast
network (GAN) is a noise prediction and noise sampling network on noise input
images. Second, the noise blocks of the previous phase were used to create train-
ing pairs and then trained in deep convolutional neural networks (CNNs).
As described by Guo, S. in [4], in recent years, gaussian noise analysis algo-
rithms have been successful, especially if gaussian noise is produced or encoun-
tered at regular intervals. While, on the other hand, original noise or real-world
noise is chaotic and complex. In order to properly decrease noise and capture a
high-quality image, improved and high-quality hardware devices are required.
Furthermore, the generated image may be distorted, hazy, or low resolution. As
a result, figuring out how to recover the latent clean picture from the superposed
noisy image is critical. Furthermore, although deep learning algorithms require
the use of ground truth to collect features, the created real-world noisy images do
not. These are significant issues that academics and researchers must solve [4].
In [5] the researcher Li, X.X. created a revolutionary wavelet denoising approach
based on an unsupervised learning model. The technique constructs an unsupervised dic-
tionary learning algorithm for noise reduction dictionary synthesis using wavelet trans-
form properties such as sparsity, multiresolution structure, and proximity to the human
visual system. We develop an adaptive dictionary by learning the wavelet decomposition
of the noisy picture using the K-singular value decomposition (K-SVD) approach. Our
suggested technique surpasses state-of-the-art denoising algorithms in terms of peak sig-
nal to noise ratio (PSNR), structural similarity index (SSIM), and visual effects with
varying noise levels, according to experimental results on benchmark test images.
13.3 METHODOLOGY
As mentioned earlier, image noise is caused by several intrinsic or external reasons,
and it is difficult to deal with them. Denoising the image is a major problem in the
processing and visualization of images. This makes it useful in various industries
where the acquisition of original images is crucial to strong performance. Figure 13.3
shows the appearance of noisy images.
13.3.2 Implementation
Because actual research implementations are done in Python, libraries and other
relevant information are simply called Python programming languages.
13.3.2.2 Dataset
In this study, the updated dataset of the National Institute of Standards Technology
(MNIST) is used. The MNIST datasets are the foundations of computer vision.
It is composed of labelled greyscale 28 × 28 handwritten number images. The
MNIST dataset is divided into two portions for the optimal model optimization:
13.4 RESULTS
DAEs bypass the identity function and, unlike traditional noise reduction filter-
ing methods, do not produce too smooth images and may be computed rapidly.
DAEs, in general, use an updated autoencoder approach for operation, which is
essentially based on the injection of noise into the input and reconstruction of
the output from the damaged picture. This change to the general autoencoder
approach prohibits DAEs from simply copying the input to output, forcing DAEs
to eliminate noise from the input before extracting valuable data.
CNN was used in our DAE approach because of its effectiveness in denoising and
retaining spatial links within images. Furthermore, when arbitrary-sized images are
utilized as input, the employment of CNN fulfils the goal of lowering dimensional
and computational complexity. An example of different images from the MNIST
dataset is shown in Figure 13.8.
13.5 CONCLUSION
In this study, an autoencoder model is developed that effectively cleans very noisy
pictures that it has never seen before, i.e., those on the dataset. These images do,
however, include certain unnoticed abnormalities. However, it can greatly clean
distorted and noisy images, demonstrating that our method is effective in recover-
ing damaged images. In the future, this autoencoder model might be expanded and
integrated into a picture enhancement software to increase the clarity and crispi-
ness of the images.
Image Denoising Using Autoencoders 143
FIGURE 13.7 Denoised images.
144 Artificial Intelligence and Knowledge Processing
REFERENCES
[1] Li, X., Yan, G., Mei Li, X.-and Chen, L. 2007. Image denoise based on soft-threshold
and edge enhancement. Second Workshop on Digital Media and its Application in
Museum & Heritages (DMAMH 2007).
[2] Wu, C. and Gao, T. 2021. Image denoise methods based on deep learning. Journal of
Physics: Conference Series, 1883(1), p. 012112.
[3] Chen, J., Chen, J., Chao, H., et al. 2018. Image blind denoising with generative adver-
sarial network-based noise modelling [C]. Proceedings of the IEEE Conference on Com-
puter Vision and Pattern Recognition. Salt Lake City, Utah, pp. 3155–3164.
[4] Guo, S. 2019. A Study on Real Camera Image Denoise Based on Convolutional Neural
Network [D]. Harbin Institute of Technology.
[5] Li, G.X. 2006. A study on image wavelet denoise based on wavelet shrinkage method
[J]. Journal of Remote Sensing, 5, pp. 697–702.
[6] Hemachandran, K., Khanra, S., Rodriguez, R. V. and Jaramillo, J. (Eds.). 2022. Machine
Learning for Business Analytics: Real-Time Data Analysis for Decision-Making. CRC
Press, New York.
14 Genetic Disorder
Prediction Using
Machine Learning
Techniques
Sirisha Alamanda, T. Prathima, Rajesh Kannan K,
and P. Supreeth Reddy
14.1 INTRODUCTION
An illness known as a genetic disorder is typically brought on by abnormali-
ties in the DNA or alterations to the chromosomes’ size or structure in general.
Hereditary gene mutations are linked to a number of prevalent disorders. Studies
show that the frequency of hereditary diseases has increased rapidly in paral-
lel with population growth. Hereditary illnesses are becoming more common
because people are less aware of the value of genetic testing. Genetic testing
must be done when children are young because many children die from these
conditions.
Existing systems for diagnosing genetic disorders are very expensive and time-
consuming. State-of-the-art equipment and cutting-edge technology are needed,
and it can only be provided by a few top-level institutes and hospitals, which
are out of reach for most people. Even after the process of genome sequencing,
there are a lot of difficulties in finding the abnormalities in the gene. There are
no direct tests to find out the genetic disorder a person has. Genetic illnesses are
still a mystery to scientists, and the notion is still in the study stage.
There are two types of disorders that are genetically determined: “single gene
inheritance diseases” and “mitochondrial gene inheritance disorders”. A single-gene
illness results from genetic changes to one gene. Single-gene diseases are highly
diverse and can have an impact on all aspects of functioning because they can arise
in any gene. The majority of inborn metabolic illnesses are mitochondrial gene
inheritance abnormalities [1], and 1.6 out of every 5000 persons are affected accord-
ing to [2].
In this study, the authors used machine learning and deep learning models
to predict genetic disorders. Attributes used in this study don’t require complex
equipment and don’t need genome sequencing. The authors have used features
like white blood cell count, history of radiation exposure, substance abuse, birth
asphyxia, history of previous pregnancies, number of previous abortions, birth
defects, (institute masked) tests, assisted conception IVF/ART, folic acid detail
(peri-conceptional), etc.
The patients in the dataset considered for the study are in the age group of
0–15 years, which gives time to detect, acknowledge, and prevent further deteriora-
tion of the patient at an early stage. All the data for the various attributes mentioned
earlier is retrieved from the patient and their biological parents. In this study, an
autoencoder model was used for feature extraction and machine learning models.
For classification, XGBoost and artificial neural networks were used.
The proposed study is presented in the earlier algorithm AEML-GDP. The fol-
lowing four phases make up the current study:
1. Data preparation
2. Feature extraction
3. Classification model development
4. Performance evaluation of models
1. The missing values are filled using random sample imputation for continu-
ous variables and mode for categorical.
2. Some continuous variables, such as mother’s and father’s ages, are discretized.
TABLE 14.1
Distribution of Genetic Disorder Class Label in the Training Data
Type of Disorder % of Tuples
Mitochondrial genetic inheritance disorder 44
Single-gene inheritance diseases 33
Other (multifactorial genetic inheritance disorders + missing label) 23
150 Artificial Intelligence and Knowledge Processing
is trained and the confident model makes predictions for every two-class labelling
problem.
Given a multi-class classification problem, for instance, with examples for the
classes “red”, “blue”, and “green”. The following three binary classification prob-
lems might be created from this:
This method’s potential drawback is that it necessitates the creation of one model for
each class. The authors have employed the OvR Classifier with SVM.
An SVM can be handled using both regression and classification. SVM can
address regression concerns, and categorization is appropriate term. The SVM
method looks for a hyper-plane in an N-dimensional space that categorize the
data points with precision. The size of the hyper-plane is proportional to the size
of the features. When there are two input features, the hyper-plane effectively
looks like a 2D line. The hyper-plane is a 2D plane even for three input features.
Visualizing is challenging when there are more than three features. In this study
the authors have achieved an accuracy of 85.27% using the OvR Classifier with
SVM.
14.3.3.2 XGBoost
By combining several weak classifiers, the ensemble modelling technique
known as “boosting” aims to create a powerful classifier. It is accomplished
by using weak models in series to develop a model. First, a model is created
using a training set of data. A second model is then created in an effort to fix
the previous model’s flaws. Models are added in this manner until either the full
training dataset is properly predicted or the maximum numbers of models have
been added.
In the XGBoost approach, the decision trees are generated in a sequential man-
ner. Weights are significant in this approach. Independent variable weight is com-
puted before considering it to be inserted into the decision tree to perform prediction.
Variables that the tree incorrectly predicted are given more weight before being
placed into the second decision tree. These different predictors are then joined to
produce an accurate and robust model.
A gradient boosted decision tree implementation designed for speed and perfor-
mance is called XGBoost. Gradient boosting is a technique that uses new models to
predict the errors or residuals of older models, which are then merged to provide the
final prediction. The method is called “gradient boosting” because it uses a gradient
descent methodology to lessen loss while introducing new models. The authors got
an accuracy of 88.9% from XGBoost.
decisions in a way that is human-like. The various layers that can be used in an
ANN include:
Input layer: This accepts inputs from the user in various formats.
Hidden layer: This stays in between the input and output layers. It makes all the
computations necessary to uncover patterns and hidden features.
Output Layer: This layer is used to communicate the output after the input has
undergone a number of alterations in the hidden layer.
The most popular ANN model in the deep learning field is an MLP. The MLP has
multiple layers of neurons, and each layer multiplies the features with weights and
applies an activation function such as tanh, ReLU, or sigmoid. In the final layer of
the MLP, a softmax function is used, which produces probability for each class of
disorder.
In this study, the authors used an ANN model to include a basic sequential
model followed by a sequence of dense layers and dropout layers. Preprocessed
data with 37 columns is given as input to the input layer, which is followed by five
hidden layers of neurons with a rectified liner unit as the activation function used
and an output layer with three neurons corresponding to the three classes of genet-
ical inheritance disorders: a. multi-factorial, b. mitochondrial, and c. single. The
ANN model utilizes categorical cross entropy loss and the Adam optimizer. The
training set achieves an accuracy of 92.62%, while the test set achieves an accu-
racy of 94.13%. Analyzing Figure 14.2 reveals that the ANN model stabilizes after
approximately 70–75 epochs, with no signs of over-fitting as both the train and test
loss exhibit similar patterns.
The performance of all these models is evaluated, and the comparison of the
results is discussed in detail in this next section.
T .P.
Precision = (14.1)
T .P. + F.P.
T .P.
Recall = (14.2)
T .P. + F. N .
2 ( Precision * Recall )
F1-Score = (14.3)
Precision + Recall
T .P. + T . N .
Accuracy = (14.4)
T .P. + F.P. + T . N . + F . N .
From Table 14.3, it can be observed that the ANN has better accuracy and can be
chosen as the best model for genetic disorder prediction. But as the study is on genetic
disorder prediction, the cost of false negatives is much higher than the cost of false
positives. So false negatives should be given more importance. The lower the recall,
the more false negatives the model predicts and can be considered a bad model.
From Table 14.3, it can be observed that the OvR Classifier is performing well
for mitochondrial genetic disorders but for multifactorial and single-gene genetic
disorders its performance is not good in terms of precision. From Table 14.3 it can be
seen that the recall is very low for multifactorial genetic inheritance disorders with
the OvR Classifier so this is not a good model. On the other hand it can be observed
that the XGBoost model is performing well for mitochondrial genetic disorders and
single-gene genetic disorders but not so well for multifactorial genetic disorders,
whereas it can be observed that the ANN model is performing well for all classes of
disorders. So it can be considered an optimal model for the genetic disorder.
TABLE 14.2
Parameter Description
Parameter Definition
True Positives (TN) Number of concerned (target) class tuples, labelled as such.
False Positives Number of other (non-target) class tuples, labelled as concerned
(FN) (target) class tuples
True Negatives (TN) Number of other (non-target) class tuples, labelled as such.
False Negatives Number of concerned (target) class tuples, labelled as other
(FN) (non-target) class tuples.
154 Artificial Intelligence and Knowledge Processing
TABLE 14.3
Comparison of Prediction Models (SVM, XGBoost and ANN) Used in the
Current Study
Prediction Type of Disorder Precision Recall F1-score Accuracy
Model Used
OnevsRest Mitochondrial 0.97 0.88 0.92 85%
classifier with genetic inheritance
SVM disorder
Multifactorial genetic 0.71 0.43 0.54
inheritance disorder
Single-gene 0.76 0.93 0.84
inheritance diseases
XGBoost Mitochondrial 0.90 0.96 0.92 89%
genetic inheritance
disorder
Multifactorial genetic 0.83 0.66 0.74
inheritance disorder
Single-gene 0.89 0.87 0.88
inheritance diseases
ANN Mitochondrial 0.90 0.99 0.95 94%
genetic inheritance
disorder
Multifactorial genetic 1.00 0.82 0.90
inheritance disorder
Single-gene 0.99 0.91 0.95
inheritance diseases
After the detailed analysis from the obtained results, the framework is compared
with the parallel research results on the same dataset in Table 14.4. From the table it
is evident that feature extraction with the autoencoder helps to extract more signifi-
cant features and thereby predicts the genetic disorder more efficiently.
14.5 CONCLUSION
Many genetic disorders are hard to detect. They can be very dangerous if not detected.
Many people are suffering from them. Many people die due to them. Detecting them
early can be very helpful. The existing systems for detecting genetic disorders can
be very expensive, and some disorders are hard to identify. So, this has inspired the
authors to create a system that will be able to identify the disorder quickly in the peo-
ple who are suffering from it. The authors have trained the genetic disorder data from
Kaggle on the OvR Classifier with SVM, XGBoost, and ANN models and compared
their performance. The results have shown the best model is ANN. The authors were
able to address the limitation identified in genetic disorder prediction sequential data
by using patient medical history information. Many lives can be saved with the use
Genetic Disorder Prediction Using Machine Learning Techniques 155
TABLE 14.4
Comparison of Prediction Models Used in [19] and the Current Study
Author, Pre-proceeding Feature Prediction Accuracy
Year Extraction Method Models Used Observed
Nasir et al., yes Linear Regression SVM 60.1 %
2022 Artificial Neural Network 84.9 %
[19] (ANN)
Proposed yes Autoencoder One-vs-Rest Classifier with 85.29 %
framework, SVM
2022 Artificial Neural Network 94.13 %
(ANN)
of this method to predict the genetic adversity early based on medical data. In the
future, this system can be made accessible to everyone in the world, anywhere and
anytime, by creating web applications and mobile applications. The system’s accu-
racy can be increased even more by working on large amounts of data.
REFERENCES
[1] C. R. Ferreira, C. D. M. van Karnebeek, J. Vockley and N. Blaue, “A proposed nosology
of inborn errors of metabolism”, Genetic Medicine, vol. 21, no. 1, pp. 102–106, 2019.
[2] J. Tan, M. Wagner, S. L. Stenton, T. M. Storm, S. B. Wortmaan et al., “Lifetime risk of
autosomal recessive mitochondrial disorders calculated from genetic databases”, Lancet,
vol. 54, pp. 111–119, 2019.
[3] Y. Park and M. Kellis, “Deep learning for regulatory genomics”, Nature Biotechnology,
pp. 825–826, 2015.
[4] J. Menche, A. Sharma, M. Kitsak, S. Ghiassian, M. Vidal et al., “Uncovering disease-
disease relationships through the incomplete human interactome”, Science, vol. 347, no.
6224, pp. 1257601, 2015.
[5] S. Won, H. Choi, S. Park, J. Lee, C. Park et al., “Evaluation of penalized and nonpenal-
ized methods for disease prediction with large-scale genetic data”, BioMed Research
International, p. 605891, 2015.
[6] N.G, B. A et al., Cardiovascular Disease Prediction using Genetic Algorithm and Neu‑
ral Network. IEEE, 2012.
[7] K. R. Gray et al., “Alzheimer’s disease neuroimaging initiative. Random forest-based
similarity measures for multimodal classification of Alzheimer’s disease”, Neuro Image,
vol. 65, pp. 167–175, 2013.
[8] Y. liu, D. A. Tennant, Z. Zhu, J. K. Health, X. Yao et al., “Dime: A scalable disease mod-
ule identification algorithm with application to glioma progression”, PloS One, vol. 9,
no. 2, pp. 866–876, 2014.
[9] S. D. Ghiassian, J. Menche and A. L. Barabasi, “A disease module detection (diamond)
algorithm derived from a systematic analysis of connectivity patterns of disease proteins
in the human interactome”, PloS Computational Biology, vol. 11, no. 4, pp. 1004120,
2015.
[10] W. Hoskins, Y. Zhang, Y. Guo, and J. Tang, Down syndrome prediction/screening model
based on deep learning and Illumina genotyping array. IEEE, 2017.
156 Artificial Intelligence and Knowledge Processing
[11] W. Kim, D. Qiao, M. H. Cho, S. H. Kwak, K. S. Park et al., “Selecting cases and controls
for DNA sequencing studies using family histories of disease”, Statistics in Medicine,
vol. 36, pp. 2081–2099, 2017.
[12] Jungsoo Gim, Wonji Kim, Soo Heon Kwak, Hosik Choi, Changyi Park, Kyong Soo
Park, Sunghoon Kwon, Taesung Park, and Sungho Won, “Improving disease predic-
tion by incorporating family disease history in risk prediction models with large-scale
genetic data”, Genetics, vol. 207, no. 3, pp. 1147–1155, 1 November 2017.
[13] Jaydeep Patil, Gene Expression Analysis for Early Lung Cancer Prediction Using
Machine Learning Techniques. IEEE, 2018.
[14] Frangly Francis and T. N. Namitha, Ensemble Approach for Predicting Genetic Disease.
IEEE, 2018.
[15] Psomagen, “How can genetics help predict diseases?” Psomagen, 31August 2020. [Online].
Available: https://psomagen.com/how-can-genetics-help-predict-diseases-psomagen/.
[16] A. Romagnoni, S. Jégou, K. Van Steen et al., “Comparative performances of machine
learning methods for classifying Crohn Disease patients using genome-wide genotyping
data”, Scientific Reports, vol. 9, pp. 10351, 2019.
[17] Duc-Hau Le, “Machine learning-based approaches for disease gene prediction”, Brief‑
ings in Functional Genomics, vol. 19, no. 5–6, pp. 350–363, 2020.
[18] M. Bracher-Smith, K. Crawford, and V. Escott-Price, “Machine learning for genetic pre-
diction of psychiatric disorders: A systematic review”, Molecular Psychiatry, vol. 26,
pp. 70–79, 2021.
[19] M. U. Nasir, M. A. Khan, M. Zubair, T. M. Ghazal, R. A. Said, and H. Al Hamadi, “Sin-
gle and mitochondrial gene inheritance disorder prediction using machine learning”,
CMC – Computers Materials & Continua, vol. 73, no. 1, pp. 953–963, 2022.
[20] Amit Kumar, “Predict the genetic disorders dataset-of genomes: Dataset of genomes
and genetics”, ML Challenge Hackerearth, 2021. www.kaggle.com/datasets/aibuzz/
predict-the-genetic-disorders-datasetof-genomes.
15 Bayesian Models
in Cognitive
Neuroscience
Gabriel Kabanda
15.1 INTRODUCTION
Big Data analytics, artificial intelligence (AI) and robotics, machine learning
(ML), cybersecurity, blockchain technology, and cloud computing are some of the
most revolutionary technologies available today. In essence, the Fourth Industrial
Revolution (4IR) is about cyber-physical technologies that allow the physical and
virtual worlds to intersect. In order to optimize production chains as part of the
4IR, which drives scientific and technological developments, cyber-physical sys-
tems (CPS) are built on the Internet of Things (IoT) and its supporting technol-
ogies. To secure information systems, computers, devices, programs, data, and
networks against internal or external threats, injury, damage, attacks, or illegal
access, a mix of technologies, processes, and operations are referred to as cyber-
security [1]. In order to stop an attack from happening, cybersecurity combines
the confidentiality, integrity, and availability of computing resources, networks,
software, and data into a coherent set of policies, technologies, processes, and
strategies.
A combination of laws, methods, technologies, and procedures is needed to
protect the availability, confidentiality, and integrity of computing resources,
networks, software, and data against attack. Ref. [2] asserts that when computers,
associated telecommunications equipment, and other components that allow for the
fast transport of vast amounts of data are connected, a human-created information
environment called “cyberspace” is created. The use of IP addresses reveals the vir-
tual nature of cyberspace. Unlike addresses in the physical domain, IP addresses give
users navigational information without necessarily relating to a physical place. The
networked devices and data that make up cyberspace were all created by humans.
Cyberspace is divided into three layers: the physical layer, the intellectual layer, and
the social layer.
The escalating usage of the Internet and the dangers it poses has given rise to
network intrusion detection systems (NIDSs). A sort of computer program called
NIDS monitors system activity to find behavior that violates security rules and can
discriminate between malicious and legitimate network users [3]. Misuse network
detectors and anomaly detectors are the two types of NIDSs. Misuse detection
systems maintain an extensive attack base and monitor all incoming traffic for any
sequences that might be present there. Anomaly detection systems, on the other hand,
concentrate on identifying fresh, unidentified threats. Ref. [3] asserts that research
on network anomaly detection has utilized a number of well-known AI paradigms,
including support vector machines, fuzzy logic, genetic algorithms, finite automata,
neural networks, and genetic algorithms. The best tool for achieving this integration
of misuse network detectors and anomaly detectors is a set of Bayesian networks.
Following a period of training, the Bayesian network learns how the model behaves
and can then predict its conclusion.
Based on observations and prior assumptions, Bayesian statistics provide a
framework for drawing conclusions about the fundamental nature of the uni-
verse. A set of potential causes for the observed data are taken into account, and
each is given a probability using the Bayesian approach to data analysis. The
variance of a probability density function (PDF), or the width of the PDF, used
to convey a belief about the state of the world, is a measure of how uncertain
that belief is. The fact that Bayesian systems take this uncertainty into account
and use it to weight various sources based on their varying degrees of accuracy
is a critical component of the Bayesian m ethodology. In Figure 15.1, a sample
of a Bayesian network is displayed. The directed graph’s nodes correspond to
the variables that make up the problem, while its edges show their conditional
dependencies.
The study looks into how the healthy population is affected by the dysfunc-
tions in the hierarchical Bayesian inference process from a perceptual and belief
fixation standpoint [4–8]. These hierarchical Bayesian models were sparked by
an influential theory in cognitive and computational neuroscience that models
the brain as a “ probabilistic prediction machine” striving to minimize the mis-
match between internally generated predictions of its sensory inputs and the
sensory inputs themselves [9–13]. This theory models the brain as a “probabi-
listic prediction machine” that strives to minimize the mismatch between inter-
nally generated predictions of its sensory inputs and the sensory. Computational
psychiatry has emerged as a paradigm for understanding how changes in brain
functions can lead to the onset of severe psychiatric symptoms, and this under-
standing is largely due to growing knowledge of the brain as an organ of predic-
tive inference [14].
Researchers in the newly developed discipline of computational psychiatry
have recently made an effort to address these problems by using dysfunc-
tions in the hierarchical Bayesian inference process that are hypothesized to
underlie perception and belief fixation in the healthy population [4–8]. These
hierarchical Bayesian models have been greatly influenced by predictive pro-
cessing, also known as hierarchical predictive coding, a significant theory in
cognitive and computational neuroscience that conceptualizes the brain as a
“probabilistic prediction machine” aiming to reduce the discrepancy between
internally generated predictions of its sensory inputs and the sensory inputs
themselves [9–13].
15.1.1 Background
Bayesian networks (BNs) are directed acyclic graphs with a corresponding prob-
ability distribution function that are described as graphical probabilistic models
for multivariate analysis [3]. BNs are a widely diverse family of models, accord-
ing to [15], that may be used to depict nested, acyclic statistical models of almost
any type of non-pathological joint probability distribution. A Bayesian network
is easily described by [16] as a directed acyclic graph (DAG) with nodes and
arcs, where the nodes stand in for random variables (RVs) and the directed arcs
between nodes show dependencies between the RVs. The edges of the directed
graph show conditional relationships between the variables that make up the
problem. The probability function also shows how strong these connections are
in the graph.
Let’s formalize the definition of a Bayesian network B as a pair, B = (D, P), where
D is a DAG, P = {p(x1|Ψ 2), . . . , p(xn|Ψ n)} is the set made of n conditional probability
functions (one for each variable), and I is the set of parent nodes of the node Xi in D.
The joint probability density function is used to define the set P [3].
P( x ) = n Õ i =1 p(x i | Y i )
160 Artificial Intelligence and Knowledge Processing
P ( X1 , X 2 ,...., X n ) = n Õ i = 1 p ( X i | PaG ( X i ) )
where PaG(Xi) denotes the set of parent nodes of Xi in G, and p(Xi |PaG(Xi)) describes
the conditional probability distribution (CPD) for Xi given PaG(Xi) [17].
The ability of BNs to calculate the likelihood that a particular hypothesis is
true, given a history dataset (for example, the likelihood that an email is spam or
authentic) is their most crucial feature. The Bayesian technique is the one that com-
bines knowledge under uncertainty with the most accuracy, as may be demonstrated
mathematically. However, to determine whether people mix information in simi-
lar ways, we must first determine whether these models are effective in cognitive
neuroscience. Both humans and animals have been known to exhibit behavior that is
somewhat optimal as predicted by Bayesian theory. The human brain can be viewed,
at its most basic level, as a machine whose function it is, at least in part, to infer the
state of the environment. The application of Bayesian theory suggests that individu-
als possess a prior belief that influences their information processing. By employing
the iterative Bayes’ rule, it becomes possible to model the process of learning from a
sequence of observations. Ref. [15] cites the following benefits of using BNs for this
kind of research:
1) To help us make decisions and direct our actions in the world, the brain uses
Bayesian inference.
2) Probability distributions are how the brain represents sensory data.
Bayesian Models in Cognitive Neuroscience 161
p ( h/e ) = p ( e/h ) p ( h ) /p ( e )
Under a set of plausible assumptions, Bayes Theorem defines the optimum calculus
for updating beliefs in the face of uncertainty. Bayes Theorem specifically asserts that
the likelihood of the hypothesis given the evidence, or p(e/h), is proportional to its
prior probability, or p(h), which is the probability of the hypothesis evaluated inde-
pendently of the evidence. If e is a piece of evidence and h is a potential explanation
for this evidence, then in light of this equation, one should modify their beliefs in
accordance with Bayes Rule. The study of cognitive phenomena has seen “a boom in
research employing Bayesian models” in recent years [18–20]. At least two important
factors have contributed to this “revolution” [21]: first, a growing understanding of
the ways in which Bayesian statistics and decision theory can be used to capture the
solutions to such problems in mathematically precise and empirically illuminating
ways; and second, a growing understanding that across many psychological domains,
the main challenge that the brain faces is inference and decision-making under uncer-
tainty [19].
Although Bayesian models have been used to explore a wide range of cogni-
tive phenomena, including categorization, causal learning and inference, language
processing, abstract thinking, and more, they have been most frequently utilized in
perceptual and sensorimotor psychology [18, 19]. Sometimes, without any preten-
sion of being sufficiently descriptive, these models are designed just as normative
“ideal observer” models. They seek to record cognitive task performance at its best.
The Bayesian models used in this study, however, are focused on providing descrip-
tive accounts of genuine cognitive systems. The success of such descriptions served
as inspiration for the “Bayesian brain hypothesis” [22], which proposes that some
knowledge is stored in the brain.
The Bayesian brain hypothesis is hindered by at least two significant challenges.
First of all, accurate Bayesian inference requires a lot of time and is frequently com-
putationally difficult. As a result, algorithms for approximate Bayesian inference have
received a lot of attention in statistics and AI, with sampling and variational methods
being the most common [23]. Second, scientists must explain how the approxima-
tion algorithms they have selected are implemented in the brain’s neural networks.
In other words, to be descriptively realistic about Bayesian cognitive science, Ref.
[24]’s three-tiered schema for computational explanation requires researchers to find
credible hypotheses at both the “algorithmic” and “implementational” levels.
Numerous studies have been conducted on these problems. Predictive process-
ing, also known as hierarchical predictive coding, is the most well-known of these
systems [9–13, 25, 26, 27]. There are excellent overviews of predictive processing
available elsewhere in Refs. [10, 13], and [27]. Predictive processing has been exten-
sively studied in both scientific and philosophical literature. Here, the study solely
focuses on three issues: how to formulate approximate Bayesian inference in terms of
162 Artificial Intelligence and Knowledge Processing
15.1.3 Scope of Research
The purpose of the study is to determine whether or not the brain can be viewed as
a Bayesian machine and whether perception can be regarded as a form of Bayesian
inference.
The objectives of this research are to:
observational data is the greedy equivalence search (GES). According to Ref. [28],
GES is a two-phase score-based system that includes a forward phase. The two-phase
score-based method known as GES includes both forward equivalence search (FES)
and backward equivalence search (BES), in accordance with Ref. [28]. Since each
forward and backward step in GES requires scoring a single node given its parents,
a node-wise decomposable score is required.
The creation of a Bayesian method for learning a BN structure entails searching a
dataset for a structure with a high posterior probability. Let D be a dataset containing
n discrete variables X = {X1, X2, . . . , Xn}, where each variable Xi can take ri values
and its parents P a(Xi) can take qi distinct instantiations [28].
The structure we want to score is G. The posterior probability of graph G given
data D is as follows, according to Bayes Theorem:
P ( G|D ) = P ( D|G ) · P ( G ) / P ( D )
where P(G) is the structural prior, P(D) is the probability of the data, and P(D|G) is
the marginal likelihood of the data. We define the model G score as given in Ref.
[28] because P(D) is an independent normalization constant that does not depend on
the model:
P ( D | G ) = ò P ( D | G,q ) × P (q | G ) .
q
According to the work by Ref. [28], the suggested IGES method is a potential strat-
egy to develop a BN structure that more accurately captures the interactions between
variables of a particular instance T as opposed to a population-wide model.
Ref. [29] examined the effects of family characteristics, socioeconomic sta-
tus, the biophysical environment, institutional support, and farm features on local
inhabitants’ decisions on reforestation. They did this by using the BN. The BN was
effectively used to pinpoint the key variables impacting landowners’ planted forest
area, their interactions with one another, and the restrictions on tree planting. The
belief network, commonly known as the BN, is a well-liked technique for manag-
ing and evaluating actual data. BN may study scenario-based subjects, incorporate
qualitative variables with quantitative and spatially explicit data, and make robust
predictions with high accuracy even with relatively small sample sizes without
over-fitting [29]. A BN is useful in finding the pertinent aspects for qualitative
reasoning in order to decide how to encourage or discourage particular options
among a complicated group of interrelated elements. BN is a suitable method to
address our study concerns because it is known to better capture such interde-
pendencies without penalizing significant factors that do not have the strongest
influence. Making decisions can be a difficult process impacted by many inter-
connected variables [29]. According to Ref. [29], the BN was successfully used to
identify important variables influencing tree planting decisions and was useful in
highlighting the complexity of decision-making. Causal, belief, and probabilistic
networks are other names for BNs.
A BN G is a probabilistic graphical model that uses conditional dependencies to
describe a joint probability distribution over a set of variables X = {X1, X2, . . . , Xn}.
A direct probabilistic connection between the two connected nodes is indicated by
an edge in this DAG, which also represents each node as a random variable [17]. The
joint probability distribution is faithfully represented by the BN as
n
p( X1 , X 2 , , X n ) = Õ p( Xi | Pa G ( Xi ))
i =1
where PaG(Xi) defines the CPD of Xi given PaG(Xi), and PaG(Xi) denotes the set of
parent nodes of Xi in G [17].
There are different types of graphical models, each with its own properties,
structure, and benefits; these types can be classified into three categories: undi-
rected graphical models, factor graphical models, and directed graphical models
as shown in Figure 15.2. The probabilistic graphical models are also shown in
Figure 15.2.
Bayesian Models in Cognitive Neuroscience 165
The undirected graphical model is a model of the (full) joint probability distri-
bution of a collection of random variables. It is also known as the Markov network
or Markov random field. It can express some dependencies that a BN is unable to
describe (such as cyclic dependencies), but it is unable to represent some dependen-
cies that a BN is capable of (such as induced dependencies).
A second class of graphical models that resemble undirected graphs is a factor
graph, which is an undirected bipartite graph connecting variables and factor nodes.
Each element represents a probability distribution over the variables it is linked to.
A Bayesian network, often called a belief network, is a probabilistic DAG model that
depicts a set of variables and their probabilistic connections. For example, a Bayesian
network may show the likelihood of relationships between diseases and symptoms.
The network can be used to determine how likely it is given a collection of symptoms
that a particular group of diseases will be present.
Sequential data modeling is done using dynamic Bayesian networks (DBNs).
There is sequential data everywhere. For instance, in speech recognition, eye track-
ing, or financial forecasting, temporal data describes a system that is dynamically
changing or evolving through time. Sequential data can also be used in text process-
ing or biosequence analysis to depict system changes, such as state changes. These
types of data have problems with classification, segmentation, state estimation, prob-
lem diagnosis, and prediction. DBNs facilitate reasoning in fields where variables’
values change over time. We collect data in regular time slices and reproduce the
network structure for each slice (i.e., it is assumed that the relationship between vari-
ables in the same time slice is stationary).
Matrix representation is the most practical approach to represent graphs. An n
2n matrix is used to represent a DBN, with the number of variables being n. The
observation model is represented by the first submatrix (1..n, 1..n) and the transition
model is represented by the second submatrix (1..n, n+1..2n), as shown in Figure 15.3.
166 Artificial Intelligence and Knowledge Processing
FIGURE 15.3 An example for a DBN (a) drawn using BNT, its matrix representation (b) and
its equivalent initial PrDBN (c).
The chance that an edge exists between nodes I and j is represented by a real integer
between 0 and 1 in the DBN(i, j). Initial probabilities for all candidate edges acquired
from the constraint step are 0.5. Edges that the search space restriction phase removes
from the candidate set correspond to entries that are set to 0. The probabilistic DBN
is the name of this matrix (PrDBN).
(a)
01101000
(b) 00100000
00010000
00000001
0 0.5 0.5 0 0.5 0 0 0
(c) 0 0 0.5 0 0 0 0 0
0 0 0 0.5 0 0 0 0
0 0 0 0 0 0 0 0.5
15.2.2 Predictive Processing
We can first formulate Bayesian inference by presuming a Gaussian distribution (i.e.,
a density function), as shown later. Calculate the prediction error by comparing the
mean m of the historical distribution to the mean e of the evidence. The prediction
error is the difference between these two numbers. As a result, the prediction error
Bayesian Models in Cognitive Neuroscience 167
such superficial pyramidal cells at the “implementation level” result in changes in the
prediction and prediction error carrying capacities of deep and superficial cortical
pyramidal cells, respectively. It offers accuracy weighting. Second, neuromodula-
tors like dopamine, serotonin, and acetylcholine have a role in at least some of the
pathways that affect postsynaptic strengthening [4].
Some individuals may find that this method of hierarchical predictive coding only
captures their preferences for the visual cortex when it comes to information pro-
cessing [26]. A foundation for comprehending all cortical information processing
has been added by others [10, 13, 25, 38]. A potential explanation for “perception and
action, and everything in between, the mind,” in particular, has been advanced as a
component of this comprehensive theory of brain activity [13]. According to this per-
spective, which is conceptually connected to the “free energy principle” as it is stated
by [11], “the brain is an organ for minimizing prediction mistakes” [39]. In other
words, all brain activity is structured to minimize long-term prediction mistakes [10,
12, 40]. Even if the previous description of predictive processing was incomplete and
left out many crucial details, it is nevertheless adequate for the purposes of this chap-
ter to gauge its applicability in explaining the emergence and persistence of illusions.
In the modern brain sciences, the most powerful models are computational in nature. . . .
The relatively novel approach [of computational psychiatry] harnesses these powerful
computational models and applies them to psychiatric and neurological disorders.
A flood of research using the Bayesian brain hypothesis and predictive processing
has emerged in recent years to shed light on psychopathologies such as autism [45],
anxiety disorders [46], and depression [47]. It is crucial to clarify how dysfunctions
in the hierarchical Bayesian inference procedure described in Section 15.2 lead to
the development and maintenance of delusional beliefs. It’s crucial to be aware of
the following:
The beginning point of this discourse is autism [45], anxiety disorders [46], and
depression [47] in light of this explanatory technique. We first concentrate on recent
attempts to explain the development and maintenance of delusional ideas in terms of
limitations in the hierarchical Bayesian reasoning process that have been put forth.
With this strategy, it is intended to pinpoint structural problems [4, 48]. The two
points are specifically as follows:
Consider the Capgra craze as an illustration. People with this illness develop the erro-
neous belief that a visually indistinguishable (or almost indistinguishable) impostor
has taken the place of someone dear to them (often a spouse or loved one). This
deception’s likely cause, according to Ref. [55], is injury to the area of the brain that
generates the variety of autonomic reactions that agents often experience while visu-
ally recognizing faces. In these conditions, the agent sees the face but does not pick
up the typical autonomic signals connected to that face. This suggests the following
explanations:
All two-factor accounts are based on a conceptual distinction (and an empirical discon-
nect) between perception and cognition. The first factor is cognitive abnormalities and
the second factor is deficits in cognitive assessment.
with excessively accurate predictions [59]. Refs. [4, 8] present the core of this
proposal:
Pervasive psychotic symptoms can be explained by the inability to express the accu-
racy of beliefs about the world. The delusional system can become elaborate when
sensory evidence permeates it too precisely. There are primary pathologies here which
show substantial metacognitive properties. In the sense that it is based on beliefs, is
about beliefs, and more importantly, there are no obstacles necessary for the formation
of predictions or prediction errors. [4]
Here, traditional statistical inference [4] and [13] serve as a useful parallel. Consider
contrasting the data’s mean with the null hypothesis that it is zero. The prediction
error is determined by the difference between these two values. The null hypothesis
is refuted by the prediction error. However, we must take into account the precision
of the prediction error in order to calculate the amount of evidence we have. You
shouldn’t reject the null hypothesis if your data are highly variable (poor precision).
Most likely, the prediction inaccuracy is just noise reflecting. On the other hand, the
null hypothesis ought to be disproved if the precision-weighted prediction error is
sizable enough. This illustration demonstrates how flaws in some second-order sta-
tistics can result in significant inference errors. That is, mistakes that happen during
the weighting of the resulting prediction error rather than when comparing forecasts
to the available data, or dependability. This latter calculation inaccuracy might very
easily determine whether a novel medicine is regarded as safe or dangerous to the
broader populace.
This deficiency is cited by proponents of predictive processing theories of delusions
as a major pathology underlying the development of delusional beliefs in diseases like
schizophrenia [4, 7, 48]. It has been proposed, for example, that errors in the accuracy-
weighting procedure overstate the dependability of sensory evidence in comparison
to earlier, “higher-level” beliefs. The agent is informed that its worldview is incorrect
via persistent, high-weight prediction errors, which prompts quick rectification of the
world model. However, because messaging in the prediction processing architecture is
bidirectional, these updated predictions are transmitted back to affect how we interpret
the sensory data we receive. Even worse, high-precision prediction errors necessitate
updating both learning and inference (the construction of models based on those con-
clusions), which ultimately necessitates significant revisions to our understanding of
the agent world. A possibility exists. To explain this phenomenon, Ref. [7] mentions a
psychologist who talks about his own experience with paranoid schizophrenia.
I had to make sense of all these macabre coincidences. I did so by radically changing
my conception of reality. [60]
They write,
This hypothesis, which is currently being offered in the literature like most delu-
sional notions, is therefore very rudimentary. It does, however, provide a number of
well-known attractions.
what the agent believes is responsive to what she perceives, but not vice versa.
In contrast, advocates of hierarchical Bayesian models stress the influence of
top-down p redictions (i.e. priors) on the agent’s perceptual experiences.
Bayesian Models in Cognitive Neuroscience 173
iii. In particular, it is not merely that abrupt modifications to the agent’s “higher-
level” worldview are caused by exceptionally high-precision prediction
errors. A recurrent process known as a “insulated self-confirming loop” [63]
is when these revisions act as the prior factors the agent uses to interpret
(predict) her own experience. This results in false perceptions that appear to
confirm the extremely high-level beliefs that they are intended to test.
iv. The third and most significant difference, however, is that hierarchical
Bayesian models of delusion disavow the very notion that perceptual expe-
riences and beliefs—and thus hallucinations and delusions—should be
treated “as distinct entities” from the perspective of psychiatric explanation
in the first place. This difference does not stem from the use of a single
deficit or the emphasis on bidirectional message passing as such. Refs. [6,
7, 49] [49], for example, write that positing two factors “is only necessary
in so far as there is a clear distinction between perception and inference: a
distinction which is not actually compatible with what is known about how
the brain deals with the world.”
This third distinction is significant in that it is more fundamental than the first two.
For example, a dopaminergic abnormality at the center of predictive processing may
be the primary flaw that causes the genesis and maintenance of delusional beliefs.
Despite this, the dysfunction serves two purposes. Similarly, while acknowledging
the distinction between perception and belief, we might also concede that an actor’s
beliefs might affect their perspective of the world. Although there is a two-way com-
munication between them, this does not negate the fact that people are unique systems.
The third distinction goes far beyond pointing out the possibility of isolated errors
and cognitive penetration. According to the two-factor model, the perceptual sys-
tem’s main responsibility is to choose the most plausible explanation for the agent’s
sensory information, which is then communicated to the belief-fixation system. This
latter system determines what to believe and how to act by fusing this perceptual data
with prior knowledge and reasoning skills [51]. In contrast, there is only one system
in a hierarchical Bayesian model. In other words, there is just one unified hierarchy
of inferences, and only the obscure parts of that hierarchy are tracked by our intuition
about the distinction between perceptions and beliefs [9, 13]. This is a fundamental
departure from conventional delusional theory and much of the relevant cognitive
psychology studies [50]. This is a possibility, according to proponents of hierarchical
Bayesian models; however, we contend in the following section that this notion is
unwarranted.
15.3.4 Integrative Conclusion
As a strong challenge to the popular two-factor paradigm, hierarchical Bayesian
models of delusion are put forth. Such models specifically reject the separation of
vision and cognition in favor of a unified inferential hierarchy where bidirectional
message flow resembles Bayesian inference. Then, it is asserted, delusional ideas
result from flaws in this information-processing architecture. This method is based
on current developments in computational neuroscience, and it is supported by a
174 Artificial Intelligence and Knowledge Processing
strong body of theoretical and empirical arguments. The end result is an incredibly
outstanding effort to shed light on the enigmatic process by which people lose their
sense of reality, an effort that justifies its reputation as one of the most promising
and fascinating by-products of the new discipline of computational psychiatry [44].
However, even those who support these models admit that hierarchical Bayesian
models of illusion have limitations [48]. For instance, how can a single malfunc-
tion explain the disparity between those who have Capgras delusion and others who
are not delusional yet share the same experience [64]? Advocates of hierarchical
Bayesian models of delusion have yet to respond to this challenge [47], which applies
to all monothematic delusions. This suggests that the application of such models may
at best be limited to polythematic delusions of the type that are present in conditions
like schizophrenia [65, 66].2
There are still more unanswered questions.
a) How, in the first place, can such theories explain the normal social compo-
nents of delusional beliefs [48]? A temporal sequence by which one mal-
function transforms into the other [67]?
b) Does the underlying dysfunction consist of unusually exact sensory evi-
dence, abnormally precise priors, some combination of the two, or neither
of these?
Hierarchical Bayesian models of delusion have the following two main summary
features:
a hierarchy of reasoning must be able to describe the kinds of phenomena that take
place in delusional thinking for this extreme departure from conventional theory to
be acceptable. We must therefore comprehend precisely how inference hierarchies
are understood in hierarchical Bayesian models and where illusions are meant to be
in it in order to determine whether this criterion is met.
First, it is frequently suggested in the literature that representations at “higher”
levels of the inference hierarchy correspond to what we intuitively think of as beliefs.
There is likely a hierarchy of such reasoning devices in the brain, with the lower levels
of the hierarchy being more relevant to cognition and the higher levels being more
relevant to beliefs.
model, it is generally agreed that the low-level, largely unconscious systems involved
in cognition and motor control provide the strongest support for the Bayesian model.
We have observed them to occur and exist at higher and even the greatest levels of
the “hierarchy” of thinking, even among advocates of hierarchical Bayesian models.
Therefore, whether or not the mechanisms underpinning belief fixation are Bayesian
is not immediately related to showing Bayesian optimality in the area of sensorim-
otor processing.
It should be noted that:
of ML methods, notably ANNs, decision tree C4.5, random forests, and SVMs, is
addressed as an alternative to the current solutions.
Cybercrimes are steadily on the rise, and there is rising worry about the security
and access control of the data that is being held. Host-based intrusion detection sys-
tems (HIDSs) and network-based intrusion detection are the two main forms of IDSs
(NIDSs). A DAG serving as the visual representation of a set of variables and their
probabilistic dependencies is known as a BN. A BN G is a probabilistic graphical
model that uses conditional dependencies to describe a joint probability distribu-
tion over a set of variables X = X1, X2, . . . , Xn. It should be emphasized that the
Bayesian network classifier can be trained using training data, with structure learn-
ing and conditional probability distribution estimation used in the learning process.
Evaluations with a portion of the KDDCUP’99 dataset, which was employed in this
study, revealed that the abuse detection module produced a high detection rate with
a low false-positive rate, while the anomaly detection component had the ability to
discover novel intrusions. From the data, the annealed maximum a posteriori proba-
bility (MAP) of the BN was developed, and the computations were displayed on the
BN’s descriptive statistics and structural equation modeling.
Graphical models come in a variety of forms, each with unique characteristics,
structures, and advantages. These forms can be arranged into three groups: undi-
rected graphical models, factor graphical models, and directed graphical models.
A random variable is connected to each node of a BN, which is a label that represents
an aspect of the problem. These attributes are binary and can take the values TRUE
180 Artificial Intelligence and Knowledge Processing
or FALSE. The two main types of hybrid IDSs used today are sequence-based and
parallel-based. Sequence-based hybrid IDSs apply either anomaly detection or mis-
use detection first and then the other one, and parallel-based hybrid IDSs apply
multiple detectors simultaneously and base their final decision on multiple output
sources. Attacks known as Distributed Denial of Service (DDoS) are a serious threat
to Internet security because they aim to render services unreachable by saturating the
server’s network and end-user systems with falsely produced traffic.
Intrusion attacks can now be detected quite well and efficiently using BN
classifiers with strong reasoning capabilities. The best IDS must be able to func-
tion constantly without human oversight. The phases of data collection, data pre-
processing, intrusion recognition, reporting, and response are frequently included in the
intrusion detection process. To combat incredibly intelligent cyber-attacks, effective
and efficient IDSs are required to quickly detect and block intrusion. A unique dis-
tributed IDS was created utilizing an IDS based on a BN classification modeling
technique to detect and stop attacks such denial of service, probes, user to root, and
remote to user attacks. Methods for anomaly-based intrusion detection create models
from typical activities and locate audited data by calculating the difference between
observed and built-in models. Sequential data is present everywhere, for example, in
biosequence analysis, text processing, and temporal data. Sequential data is present
everywhere. For example, temporal data models a system that is dynamically chang-
ing or evolving over time in speech recognition, visual tracking, or financial forecast-
ing. Sequential data also represents changes in the system, such as changes in state,
in biosequence analysis, or in text processing. The naive Bayes, a two-layer Bayesian
network that presupposes total independency between the nodes, is one method in
which BNs have been used in anomaly identification. Each example in the KDD99
dataset reflects the attribute values of a class in the network data flow, and each class
is assigned either the label “attack” or “normal.”
How to choose important and efficient characteristics from a vast array of potential
related features is an issue of significant interest in the training of IDSs. Sequential
data modeling is done using DBNs. Very low false-negative and false-positive rates
are required for the system. However, BNs only work when there are no cyclic rela-
tionships, despite the fact that they are particularly successful for studying microar-
ray data. Nodes can connect to other nodes both inside the same time slice and to
nodes in the next slice since DBNs describe their domains as partially observable
Markov processes. The application of BN analysis to time is known as a DBN. The
directed graph’s nodes correspond to the variables that make up the problem, while
its edges show the conditional relationships between those variables.
15.6 CONCLUSION
In the developing field of computational psychiatry, hierarchical Bayesian models
of delusions have recently been very well-liked and are frequently hailed as one of
the most significant success stories. Particularly in the hierarchical and Bayesian
aspects, the two theoretical basic elements of such models are thoroughly docu-
mented in the literature. Given that there is more to thought than a single information
processing inference hierarchy and that the mechanisms underlying belief fixation
Bayesian Models in Cognitive Neuroscience 181
are not Bayesian, we face significant challenges that have not been addressed. One
tries to explain delusional beliefs in terms of the dysfunction of the mechanisms
underlying hierarchical Bayesian inference. If this is the case, then not only does
the more general hierarchical Bayesian model of the brain have trouble explaining
“perception and action and everything in between the mind” but also the hierarchical
Bayesian model of delusion. Instead of focusing directly on a hierarchical Bayesian
model of neural information processing in general, the research study concentrated
on explaining delusional beliefs. Instead of functionally differentiating between
bi-hallucinations and delusions, the idea of using Bayesian reasoning to explain
delusional beliefs conveys a directional message. Proponents of predictive process-
ing have minimized predicted errors in order to explain everything as a collective
effort to model a certain high-level cognitive activity in an effort to explain illusions.
Hierarchical Bayesian delusion models thus offer a crucial test case for predictive
processing. Predictive processing attempts to explain delusional ideas that encounter
significant issues with complexity and clarity, as well as a dearth of supporting data
and convincing arguments for central hypotheses. Computational psychiatry must
give up universal theories of brain function and cognitive optimality models in favor
of other fields, particularly cognitive science and science, if it is to fulfill its promise
of illuminating the information processing mechanisms behind mental diseases.
NOTES
1 Technically, a widespread assumption is that such dysfunctions must be harmful [47].
2 Ref. [6] argues that positing two factors is “only necessary insofar as there is a clear
distinction between perception and inference: a distinction which is not actually compat-
ible with what is known about how the brain deals with the world.” This objection to the
two-factor framework is confused, however. First, the argument for positing two factors
is that it accounts for certain dissociations—cases in which individuals share the same
anomalous experience but do not form the delusions—and has nothing to do with whether
perception is inferential [52], [64]. Second, the distinction advocated in the two-factor
framework is between perception and cognition, not perception and inference. One can
think that perception is inferential in a computational sense without abandoning this dis-
tinction. In fact, that is the mainstream view in classical cognitive psychology [57], [58].
Take note that hierarchical Bayesian models can embrace two factors.
REFERENCES
[1] Kabanda, G. (2020, May). Performance of machine learning and other artificial
intelligence paradigms in cybersecurity. Oriental Journal of Computer Science and
Technology, 13(1), 1–21, ISSN: 0974–6471, Online ISSN : 2320–848, http://www.
computerscijournal.org/vol13no1/performance-of-machine-learning-and-other-
artificial-intelligence-paradigms-in-cybersecurity/.
[2] Berman, D.S., Buczak, A.L., Chavis, J.S., and Corbett, C.L. (2019). Survey of deep learn-
ing methods for cyber security. Information 2019, 10, 122. doi:10.3390/info10040122.
[3] Bringas, P.B., and Santos, I. (2010). Bayesian Networks for Network Intrusion Detec-
tion, Bayesian Network, Ahmed Rebai (Ed.), ISBN: 978–953–307, 124–4, InTech,
http://www. intechopen.com/books/bayesian-network/bayesiannetworks-for-network-
intrusion-detection.
182 Artificial Intelligence and Knowledge Processing
[4] Adams, Adams, R., Stephan, K., Brown, H., Frith, C., and Friston, K. (2013). The com-
putational anatomy of psychosis. Frontiers in P sychiatry, 4. http://dx.doi.org/10.3389/
fpsyt.2013.00047.
[5] Corlett, P., Taylor, J., Wang, X., Fletcher, P., & Krystal, J. (2010). Toward a neurobiology
of delusions. Progress in Neurobiology, 92 (3), 345–369. http://dx.doi.org/10.1016/j.
pneurobio.2010.06.007.
[6] Fletcher, P., and Frith, C. (2009). Perceiving is believing: A Bayesian approach to
explaining the positive symptoms of schizophrenia. Nature Reviews Neuroscience, 1
0(1), 48–58. http://dx.doi.org/10.1038/nrn2536.
[7] Frith, C.D., and Friston, K.J. (2013) False perceptions and false beliefs: Understand-
ing schizophrenia. Neurosciences and the Human Person: New Perspectives on Human
Activities, 121, 1–15. [RMR].
[8] Schmack, K., Gomez-Carrillo de Castro, A., Rothkirch, M., Sekutowicz, M., Rossler, H.,
Haynes, J., et al. (2013). Delusions and the role of beliefs in perceptual inference. Journal of
Neuroscience, 33(34), 13701–13712. http://dx.doi.org/10.1523/jneurosci.1778–13.2013.
[9] Clark, A. (2013). Whatever next? Predictive brains, situated agents, and the future of
cognitive science. Behavioral And Brain Sciences, 36(0 3), 181–204. http://dx.doi.
org/10.1017/s0140525x12000477.
[10] Clark, A. (2016). Surfing Uncertainty. Oxford: Oxford University Press.
[11] Friston, K. (2010). The free-energy principle: A unified brain theory? Nature Reviews
Neuroscience, 11 (2), 127–138. http://dx.doi.org/10.1038/nrn2787.
[12] Friston, K., FitzGerald, T., Rigoli, F., Schwartenbeck, P., and Pezzulo, G. (2017a).
Active Inference: A process theory. Neural Computation, 29(1), 1–49. http://dx.doi.
org/10.1162/neco_a_00912.
[13] Hohwy, J. (2013). The Predictive Mind. Oxford: Oxford University Press.
[14] Griffin, J., and Fletcher, P. (2017). Predictive processing, source monitoring, and psycho-
sis. Annual Review of Clinical Psychology, 13 (1), 265–289. http://dx.doi.org/10.1146/
annurev-clinpsy-032816-045145.
[15] Margaritis, D. (2003, May). Learning Bayesian Network Model Structure from Data.
PhD Thesis, Carnegie Mellon University, Pittsburgh, PA.
[16] Boudali, H., and Dugan, J.B. (2006, March). A continuous-time Bayesian network
reliability modeling, and analysis framework. IEEE Transactions on Reliability, 55(1).
[17] Xiao, L. (2016). Intrusion Detection Using Probabilistic Graphical Models. PhD Dis-
sertation, Iowa State University.
[18] Chater, N., Oaksford, M., Hahn, U., and Heit, E. (2010). Bayesian models of cogni-
tion. Wiley Interdisciplinary Reviews: Cognitive Science, 1 (6), 811–823. http://dx.doi.
org/10.1002/wcs.79.
[19] Tenenbaum, J., Kemp, C., Griffiths, T., and Goodman, N. (2011). How to grow a mind:
Statistics, structure, and abstraction. Science, 331(6022) , 1279–1285. http://dx.doi.
org/10.1126/science.1192788.
[20] Oaksford, M., and Chater, N. (2007). Bayesian Rationality. Oxford: Oxford University
Press.
[21] Hahn, U. (2014). The Bayesian boom: Good thing or bad? Frontiers In P sychology, 5.
http://dx.doi.org/10.3389/fpsyg.2014.00765.
[22] Knill, D., and Pouget, A. (2004). The Bayesian brain: The role of uncertainty in neu-
ral coding and computation. Trends In Neurosciences, 27(1 2), 712–719. http://dx.doi.
org/10.1016/j.tins.2004.10.007.
[23] Penny, W. (2012). Bayesian models of brain and behaviour. ISRN Biomathematics,
2012, 1–19. http://dx.doi.org/10.5402/2012/785791.
[24] Marr, D. (1980). Vision. New York: Freeman.
[25] Friston, K. (2005). A theory of cortical responses. Philosophical Transactions of the
Royal Society B: Biological Sciences, 360(145 6), 815–836. http://dx.doi.org/10.1098/
rstb.2005.1622.
Bayesian Models in Cognitive Neuroscience 183
[26] Rao, R. P. and Ballard, D. H. (1999). Predictive coding in the visual cortex: A functional
interpretation of some extra-classical receptive-field effects. Nature Neuroscience, 2(1),
79–87. 10.1038/4580.
[27] Seth, A. K. (2015). The cybernetic Bayesian brain—From interoceptive inference to
sensorimotor contingencies. In T. Metzinger and J. M. Windt (Eds.), Open MIND: 35(T).
Frankfurt am Main: MIND Group. doi:10.15502/9783958570108.
[28] Jabbari, F., Visweswaran, S., and Cooper, G.F. (2018). Instance-specific Bayesian net-
work structure learning. Proceedings of Machine Learning Research, 72, 169–180,
PGM 2018.
[29] Tran, T.M., Ko, D.W., Ryul, C., and Dinh, H. (2019). A Bayesian network analysis of
reforestation decisions by rural mountain communities in Vietnam. Forest Science and
Technology. doi:10.1080/21580103.2019.1581665.
[30] Hohwy, J. (2017). Priors in perception: Top-down modulation, Bayesian perceptual
learning rate, and prediction error minimization. Consciousness and Cognition , 47,
75–85. http://dx.doi.org/10.1016/j.concog.2016.09.004.
[31] Friston, K., Rosch, R., Parr, T., Price, C., and Bowman, H. (2017). Deep temporal mod-
els and active inference. Neuroscience and Biobehavioral Reviews, 77, 388–402.
[32] Friston, K. (2008). Hierarchical models in the brain. PLoS Computational Biology,
4(11), e1000211.
[33] Williams, D. (2017). Predictive processing and the representation wars. Minds and
Machines. doi:10.1007/s11023-017-9441-6.
[34] Williams, D., and Colling, L. (2017). From symbols to icons: The return of resem-
blance in the cognitive neuroscience revoluti on. Synthese. https://doi.org/10.1007/
s11229-017-1578-6.
[35] Lee, T. S., and Mumford, D. (2003). Hierarchical Bayesian inference in the visual cor-
tex. Journal of the Optical Society of America, A, 20, 1434–1448.
[36] Mathys, C., Daunizeau, J., Iglesias, S., Diaconescu, A., Weber, L., Friston, K., and
Stephan, K. (2012). Computational modeling of perceptual inference: A hierarchical
Bayesian approach that allows for individual and contextual differences in weighting
of input. International Journal of Psychophysiology, 85 (3), 317–318. http://dx.doi.
org/10.1016/j.ijpsycho.2012.06.077.
[37] Mathys, C., Lomakina, E., Daunizeau, J., Iglesias, S., Brodersen, K., Friston, K., and
Stephan, K. (2014). Uncertainty in perception and the Hierarchical Gaussian Filter.
Frontiers in Human Neu roscience, 8. http://dx.doi.org/10.3389/fnhum.2014.00825.
[38] Bastos, A., Usrey, W., Adams, R., Mangun, G., Fries, P., and Friston, K. (2012).
Canonical microcircuits for predictive coding. Neuron, 76 (4), 695–711. http://dx.doi.
org/10.1016/j.neuron.2012.10.038.
[39] Hohwy, J. (2016). Attention and conscious perception in the hypothesis testing brain.
Frontiers in Psychology, 3. doi:10.3389/fpsyg.2012.00096.
[40] Friston, K. J., and Frith, C. (2015). A duet for one. Consciousness and Cognition, 36,
390–405.
[41] Murphy, D. (2006). Psychiatry in the Scientific Image. Cambridge: MIT Press.
[42] Montague, P., Dolan, R., Friston, K., and Dayan, P. (2012). Computational psy-
chiatry. Trends In Cognitive Sciences, 1 6(1), 72–80. http://dx.doi.org/10.1016/j.
tics.2011.11.018.
[43] Teufel, C., and Fletcher, P. (2016). The promises and pitfalls of applying computational
models to neurological and psychiatric disorders. Brain, 139(10) , 2600–2608. http://
dx.doi.org/10.1093/brain/aww209.
[44] Friston, K., Stephan, K., Montague, R., and Dolan, R. (2014). Computational psychia-
try: The brain as a phantastic organ. The Lancet Psychiatry, 1 (2), 148–158. http://dx.doi.
org/10.1016/s2215-0366(14)70275-5.
[45] Lawson, R., Rees, G., and Friston, K. (2014). An aberrant precision account of autism.
Frontiers In Human Neu roscience, 8. http://dx.doi.org/10.3389/fnhum.2014.00302.
184 Artificial Intelligence and Knowledge Processing
[46] Seth, A., Suzuki, K., and Critchley, H. (2012). An interoceptive predictive coding
model of conscious presence. Frontiers In P sychology, 2. http://dx.doi.org/10.3389/
fpsyg.2011.00395.
[47] Chekroud, A. (2015). Unifying treatments for depression: An application of the free energy
principle. Frontiers In P sychology, 6. http://dx.doi.org/10.3389/fpsyg.2015.00153.
[48] Corlett, P., Honey, G., and Fletcher, P. (2016). Prediction error, ketamine and psychosis:
An updated model. Journal Of Psychopharmacology, 30(11) , 1145–1155. http://dx.doi.
org/10.1177/0269881116650087.
[49] Corlett, P., and Fletcher, P. (2015). Delusions and prediction error: Clarifying the roles
of behavioural and brain responses. Cognitive Neuropsychiatry, 20 (2), 95–105. http://
dx.doi.org/10.1080/13546805.2014.990625.
[50] Firestone, C., and Scholl, B. (2015). Cognition does not affect perception: Evaluating
the evidence for “top-down” effects. Behavioral And Brain Sciences, 39. http://dx.doi.
org/10.1017/s0140525x15000965.
[51] Fodor, J. (1983). The Modularity of Mind. Cambridge: The MIT Press.
[52] Maher, B. (1974). Delusional thinking and perceptual disorder. Journal of Individual
Psychology, 30, 98–113.
[53] Hemsley, D., and Garety, P. (1986). The formation of maintenance of delusions:
A Bayesian analysis. The British Journal of Psychiatry, 14 9(1), 51–56. http://dx.doi.
org/10.1192/bjp.149.1.51.
[54] Bortolotti, L. (2016). Delusion. In The Stanford Encyclopedia of Philosophy (Spring
2016 Edition), Edward N. Zalta (E d.). https://plato.stanford.edu/archives/spr2016/
entries/delusion/.
[55] Ellis, H., and Young, A. (1990). Accounting for delusional misidentifications. The Brit-
ish Journal of Psychiatry, 157 (2), 239–248. http://dx.doi.org/10.1192/bjp.157.2.239.
[56] Coltheart, M. (2007). The 33rd Sir Frederick bartlett lecture: Cognitive neuropsychi-
atry and delusional belief. The Quarterly Journal of Experimental Psychology, 60(8),
1041–1062. [RMR].
[57] Colheart, M. (2013). On the distinction between monothematic and polythematic delu-
sions. Mind & Language, 28 (1), 103–112. http://dx.doi.org/10.1111/mila.12011.
[58] Ross, R., McKay, R., Coltheart, M., and Langdon, R. (2016). Perception, cogni-
tion, and delusion. Behavioral And Brain Sciences, 39. http://dx.doi.org/10.1017/
s0140525x15002691.
[59] Friston, K. J., and Frith, C. D. (2005). Active inference, communication and hermeneu-
tics. cortex. A Journal Devoted to the Study of the Nervous System and Behavior, 68,
129–143.
[60] Chadwick, P. (1993). The stepladder to the impossible: A first hand phenomenological
account of a schizoaffective psychotic crisis. Journal of Mental Health, 2 (3), 239–250.
http://dx.doi.org/10.3109/09638239309003769.
[61] Brown, H., Adams, R., Parees, I., Edwards, M., and Friston, K. (2013). Active inference,
sensory attenuation and illusions. Cognitive Processing, 14 (4), 411–427. http://dx.doi.
org/10.1007/s10339-013-0571-3.
[62] Teufel, C., Kingdon, A., Ingram, J., Wolpert, D., and Fletcher, P. (2010). Deficits in
sensory prediction are related to delusional ideation in healthy individuals. Neuropsych-
ologia, 48(14) , 4169–4172. http://dx.doi.org/10.1016/j.neuropsychologia.2010.10.024.
[63] Deneve, S., and Jardri, R. (2016). Circular inference: Mistaken belief, misplaced
trust. Current Opinion in Behavioral Sciences , 11, 40–48. http://dx.doi.org/10.1016/j.
cobeha.2016.04.001.
[64] Bortolotti, L., and Miyazono, K. (2015). Recent work on the nature and development of
delusions. Philosophy Compass, 10(9), 636–645.
Bayesian Models in Cognitive Neuroscience 185
[65] Gadsby, S. (2019). Body Representations and cognitive ontology: Drawing the bounda-
ries of the body image. Consciousness and Cognition, 74, 102772.
[66] Gadsby, S., and Hohwy, J. (2021). Why use predictive processing to explain psychopa-
thology? The case of anorexia nervosa. In S. Gouveia, R. Mendonça, and M. Curado
(Eds.), The Philosophy and Science of Predictive Processing, London: Bloomsbury.
[67] Notredame, C., Pins, D., Deneve, S., and Jardri, R. (2014). What visual illusions teach us
about schizophrenia. Frontiers In Integrative Neu roscience, 8. http://dx.doi.org/10.3389/
fnint.2014.00063.
[68] Ernst, M., and Banks, M. (2002). Humans integrate visual and haptic informa-
tion in a statistically optimal fashion. Nature, 415(687 0), 429–433. http://dx.doi.
org/10.1038/415429a.
[69] Sanborn, A., and Chater, N. (2016). Bayesian brains without probabilities. Trends In
Cognitive Sciences, 20(1 2), 883–893. http://dx.doi.org/10.1016/j.tics.2016.10.003.
16 Knowledge
Representation in AI
Ashwin Kumaar K, Krishna Sai Talupula,
Gogineni Venkata Ashwith, and Mitta
Chaitanya Kumar Reddy
16.1 INTRODUCTION
Artificial intelligence (AI) systems are designed to process and analyze data,
make decisions, and perform tasks that would normally require human-level
intelligence. In order to do this effectively, AI systems must be able to represent
and manipulate the knowledge and information that they use to make decisions
and solve problems. This process of representing knowledge is known as knowl-
edge representation.
There are many ways in which knowledge can be represented in AI systems, and
the choice of representation can have a significant impact on the system’s perfor-
mance and effectiveness. Some of the most common methods of knowledge repre-
sentation in AI include:
1. Rule-based systems
2. Ontologies and semantic networks
3. Decision trees
4. Neural networks
5. Case-based reasoning
In this chapter, we will examine each of these methods in more detail and discuss
their strengths and limitations [1].
16.1.1 Rule-Based Systems
One of the simplest and most widely used methods of knowledge representation in
AI is the rule-based system. In a rule-based system, knowledge is represented as a set
of rules that specify the conditions under which a particular action should be taken.
For example, consider a simple rule-based system for diagnosing medical condi-
tions. The system might contain rules such as:
• If the patient develops a fever and a cough, then they may have some flu.
• If the patient develops a rash and joint pain, then they probably have a form
of arthritis.
To use the system, the user would input the symptoms that the patient is expe-
riencing, and the system would apply the rules to determine the most likely
diagnosis.
Rule-based systems are relatively easy to design and implement, and they can
be very effective for solving simple problems with well-defined rules. However,
they can be difficult to scale up to more complex problems, and they can be inflex-
ible, as it can be difficult to add or modify rules once the system has been imple-
mented [1].
16.1.4 Neural Networks
The structure and operation of the human brain served as the inspiration for the
machine learning algorithm known as neural networks. They consist of interconnected
layers of “neurons,” which process and transmit information through the network.
188 Artificial Intelligence and Knowledge Processing
16.2 ADVANTAGES
There are several key advantages to using neural networks for knowledge represen-
tation in AI:
1. Neural networks are highly flexible and can learn to represent a wide range
of knowledge, from simple patterns to complex relationships.
2. They can learn to represent knowledge automatically, without the need for
explicit programming or manual feature engineering.
3. They can learn from large and complex datasets, making them well-suited to
tasks that involve extracting knowledge from unstructured or noisy data.
4. They can handle incomplete or uncertain data, making them robust in terms
of missing or ambiguous information.
However, there are also some limitations to using neural networks for knowledge
representation:
One of the key advantages of CBR is that it allows the system to learn and adapt
over time by adding new cases to its memory as they are encountered. This makes
CBR well-suited to tasks that involve complex or dynamic environments, where the
knowledge base may need to be updated and refined continuously.
CBR systems are also generally easy to design and implement, as they do
not require explicit programming of rules or decision-making logic. This makes
them a popular choice for tasks that involve open-ended or unstructured prob-
lems, where it may be difficult to define a set of rules or a clear decision-making
process.
However, there are also some limitations to using CBR for knowledge represen-
tation. One limitation is that CBR systems may be less efficient than other methods,
as they may need to search through a large number of cases to find the most similar
one. In addition, CBR systems may struggle with tasks that require more abstract or
theoretical reasoning, as they are typically based on concrete examples rather than
general principles.
Overall, CBR is a useful method of knowledge representation in AI and has been
applied to a wide range of tasks, including diagnosis, planning, and problem-solving.
It is particularly well-suited to tasks that involve adapting to changing environments
or adapting to new problems based on past experience [1].
16.6 ONTOLOGIES
A formal depiction of the ideas and connections in a certain field of knowl-
edge is called ontology. It is employed to specify the relationships between
the terminology and concepts utilized to describe that domain as well as their
definitions. [2].
Ontologies are usually written in a formal language and can be represented
as a graph structure, with the concepts as nodes and the relationships as edges.
They can be used to represent both the structure of the domain (e.g., the hierar-
chical relationships between concepts) and the attributes and properties of the
concepts [2].
Knowledge Representation in AI 191
Ontologies are useful for knowledge representation in AI because they allow for the
representation of complex domain-specific knowledge in a structured and formalized
way. They are often used in natural language processing and information retrieval systems
to help understand the meaning of words and phrases and to disambiguate them based
on their context. They are also used in the development of intelligent agents and other AI
systems that need to reason about and interact with a specific domain of knowledge [2].
16.7 FRAMES
A frame is a structure that represents knowledge about a particular concept and the
relationships between different aspects of that concept. It consists of a set of slots,
which represent the different characteristics or attributes of the concept, and a set of
values, which fill those slots and provide information about the concept [2].
For example, consider a frame for the concept of a “car.” This frame might have
slots for the make and model of the car, the year it was manufactured, its color, and
its engine size. Each car would be represented by a separate frame, with the values
for the slots providing specific information about that car [2].
Frames are useful for knowledge representation in AI because they allow for the
representation of complex concepts and the relationships between their different
aspects in a structured and organized way. They are often used in expert systems and
other AI applications that need to reason about and manipulate complex concepts [2].
16.9 CONCEPT-BASED
A concept map is a graphical representation of the relationships between concepts. It
consists of a set of nodes, which represent the concepts, and edges, which represent
the relationships between the concepts [3].
Concept maps are similar to semantic networks, but they are typically more
focused on showing the relationships between concepts, rather than the attributes
of the concepts themselves. They are often used to visualize and organize complex
192 Artificial Intelligence and Knowledge Processing
knowledge structures and to help identify the key concepts and relationships within
a domain of knowledge [3].
In AI, concept maps can be used to represent and reason about knowledge in
a variety of applications, such as natural language processing and information
retrieval. They can also be used as a tool for organizing and structuring knowledge
in the development of intelligent agents and other AI systems [3].
16.13 CONCLUSION
Knowledge representation is a central aspect of AI, as it determines how knowledge
is represented, stored, and used by AI systems. There are several techniques and
approaches used in knowledge representation, including symbolic representations,
194 Artificial Intelligence and Knowledge Processing
REFERENCES
[1] “Knowledge Representation in Artificial–Intelligence - Javatpoint,” www.javatpoi nt.
com, 2022. https://www.javatpoint.com/knowledge-representation-in-ai (accessed Jan.
05, 2023).
[2] Wikipedia Contributors, “Knowledge Representation and Reasoning,” Wikipedia, N
ov. 14, 2022. https://en.wikipedia.org/wiki/Knowledge_representation_and_reasoning
(accessed Jan. 05, 2023).
[3] “What is Knowledge Representation In AI? Usage, Types & Methods | upGrad
Blog,” upGrad blog, S ep. 17, 2020. https://www.upgrad.com/blog/what-is-knowledge-
representation-in-ai/ (accessed Jan. 05, 2023).
[4] Kabanda, G., & Kannan, H. (2023). A systematic literature review of reinforcement algo-
rithms in machine learning. In Handbook of Research on AI and Knowledge E ngineering
for Real-Time Business Intelligence, IGI Global, USA, pp. 17–33.
[5] softlogicsys, “Knowledge Representation in AI,” Software Training Institute in
Chennai with 10–% Placements - Softlogic, J ul. 04, 2022. https://www.softlogicsys.
in/knowledge-representation-in-ai/ (accessed Jan. 05, 2023).
17 ANN Model for
Analytics
Gaddam Venkat Shobika, S. Pavan Siddharth,
Student, Vishwa KD, Hemachandran K, and
Manjeet Rege
17.1 INTRODUCTION
An artificial neural network (ANN) is an algorithm used in machine learning
that is used to construct data trends. ANNs, like the human brain, are made up
of a network of interconnected nodes, or neurons, that can recognize patterns
in input data.
An ANN is constructed as a function of artificial intelligence to mimic the
neural network, which is similar to the human brain so that computers can make
decisions comparable to humans. ANNs are created by programming computers
to behave like interconnected brain cells. The human brain consists of approxi-
mately 1,000 billion neurons with an association point somewhere between 1,000
and 100,000. There is a way that information is distributed in the brain, enabling
us to extract more than one piece of information when needed. A neural net-
work is composed of processing nodes that are connected by edges. Each neuron
receives input from other neurons and generates output, which is then passed on to
other neurons. Synapses are the edges that connect neurons, and synaptic weight
is the intensity of the connection between two neurons. A neural network’s output
is determined by the weights of the connections between neurons and the inputs
to neurons.
The weights are adjusted in such a way that the network’s output is a near approx-
imation of the desired output. This is referred to as network training. Once trained,
the network can be used to make predictions about new data. This is known as infer-
ence. A neural network that has been designed to recognize photographs of cats, for
example, can be used to identify a cat in a new image. An example of an ANN is a
digital logic gate that receives an input and outputs an output. In the case of a “OR”
gate with two inputs, the outcome will be “On.” When both of the inputs are set to
“OFF,” the outcome will be configured to “Off.” In this instance, the outcome is
determined by the input. Because our brain’s neurons continuously learn, the outputs-
to-inputs connection is constantly changing.
An ANN has the advantage of being able to be trained to recognize patterns
that are too complicated for typical machine learning techniques. For example,
ANNs have been used to model handwriting recognition, marketing, operations,
the telecom industry, image classification, and even stock market prediction. The
downside of using ANNs is that they can be very computationally intensive and
therefore require powerful computers to train. Additionally, ANNs can be diffi-
cult to understand and interpret, making them less transparent than other machine
learning algorithms.[1]
17.3.1 Supervised Training
During supervised learning, the sources of data and the findings are delivered. After
processing the inputs, the network compares the obtained outputs to the predicted
results. Because of the errors propagating back through the framework, the factors
that govern the network change. This process is continued as the weights are changed
over and over. A “training set” is a collection of data that can be used for training.
Since the connection weights improve over time, the same data is analyzed repeat-
edly while the network is being trained.
Supervised training must save a repository of data that will be utilized to evaluate
the system once it has been trained. The developer must clearly assess the inputs,
outputs, layers, elements per layer, connections between layers, summation, transfer,
training, and initial weights if a network is unable to tackle the task. The modifica-
tions necessary to create a strong network are where the “science” of neural network-
ing is found.
Finally, when no further learning is required and the system has been successfully
trained, the weights could be “frozen” if required. This network is then translated
into hardware in some systems so that it can be quick. When used in production,
some systems don’t lock themselves in and instead keep learning.[4]
17.3.2 Unsupervised Training
The other kind of training is unsupervised training. Feeding the network inputs
but not the anticipated outputs is known as unsupervised training. After that, the
system should decide which features to use to categorize the input data. This is
frequently described as self-organization or flexibility. The concept of unsuper-
vised learning is continually developing. Because of this ability to adjust to their
surroundings, science fiction–style computers would be capable of continuing to
learn on their own when confronted with unusual scenarios and new locations.
There are many instances in life where precise training sets are lacking. Some
198 Artificial Intelligence and Knowledge Processing
of these circumstances entail military action when the use of modern weaponry
and battle strategies may be encountered. There is still potential for this subject
of study because of the unpredictable nature of life and people’s desire to be
prepared.[4]
17.4.2 Pattern Recognition
The area of quality control sees the most use of neural networks as pattern recogniz-
ers. Several automated applications are currently in use; they are developed to isolate
a single defective item from hundreds or thousands. Human inspectors lose focus or
grow weary. Systems now assess solder connections, welds, and cuts. One manufac-
turer is currently developing a prototype of a system that assesses paint colour. To
check if fresh paint batches are the proper hues, this technology digitizes images of
the paint samples.
The use of neural networks as processors for sensors is another significant area
where they are being applied to pattern recognition systems. The few useful bits
of information that are occasionally provided by sensors may get lost in the sea of
data. They are looking at the display, searching for “the needle in the haystack,”
and people can become bored. There are numerous uses for sensor processing in the
ANN Model for Analytics 199
defence sector. These neural network systems have demonstrated success in target
recognition. Sensor processors collect information using infrared cameras, earth-
quake recorders, and sonar sensors. Then, potential phenomena are identified using
those data.
17.4.3 Finance
The financial industries are embracing neural networks in significant ways. Lending
institutions, banks, and credit card businesses all deal with imprecise choices. This
is where the ANN comes into the picture. Forms must be filled out as part of the
loan approval procedure for a loan officer to decide whether to approve the loan.
Trained neural networks using the data from previous decisions are now using
the information from these forms. In fact, such packages provide info upon which
input, or a mixture of inputs, is being used, which is weighted most heavily on the
decision to comply with government criteria regarding the reasons why applications
are being denied.
17.4.4 Image/Data Compression
Numerous experiments have been conducted to demonstrate the real-time data
decompression and compression ability of neural networks. These auto-associative
networks can break down 8 bits of data into 3, reversing the process, and then break-
ing 8 eight bits into 3 again. They are not lossless, though. Due to this bit loss, they
are unable to effectively compete with conventional techniques.
17.4.5 Servo Control
One of the most intriguing applications of neural networks is the control of complex
systems. The majority of conventional control systems use a single set of formulas to
model how each of the system’s processes operates. Those formulas need to be man-
ually tweaked to adapt a system for a particular procedure. It is a time-consuming
process that requires adjusting parameters until the right mixture is discovered that
yields the anticipated outcome.[4]
17.5.2 Hamming Network
The front portion of the Hamming network, which is a continuation of the Hopfield
network, now includes an expectation-maximization classifier. The Hamming net-
work is composed of three layers. The input layer has an equal quantity of nodes as
the number of different binary attributes. It has a Hopfield layer for categories with
an equal number of nodes as categories, or classes. The formal Hopfield architecture,
which has the same number of intermediate layer nodes as input nodes, is very dif-
ferent from this. In the end, there’s an output layer with an equal quantity of nodes as
the category layer. See Figure 17.3.
ANN Model for Analytics 201
17.5.5 Recirculation
Data is solely processed in one way in a recirculation network, and learning is com-
pleted entirely with local data. The condition of this processing entity, as well as the
input data on the specific connection that needs modification, supply the majority of
the information. Because the recirculation network uses the unsupervised method,
no favored output vector is required at the final layer. The framework is auto-associative
when there are exactly as many outputs as inputs.
The visible and invisible layers of this network are situated between the input
and output levels. The learning rule’s purpose is to construct an internal depiction
of the data shown in the visible layer in the hidden layer. Compressing input data
in the hidden layer by employing fewer processing components is a good illustra-
tion of this. In this instance, it is possible to think of the hidden representation as a
condensed form of the apparent representation. The layers that are both visible and
concealed are completely interconnected in both ways. Additionally, every element
in both the visible and hidden levels is linked to an unbalanced element. These con-
nections’ variable weights adapt in the same way that the network’s variable weights
do.[4] See Figure 17.6.
ANN Model for Analytics 203
17.7.1 TensorFlow
TensorFlow is the most prominent deep learning framework in use today, with
firms such as NVIDIA and Uber using it in addition to Google. It is also utilized
on a regular basis by AI practitioners and data scientists. TensorFlow is a Python
framework that may be used to develop deep learning models. However, it requires
a lot of code to design the network structure. TensorFlow relies on resources such
as a powerful graphics processing unit (GPU) for efficient computations, which
means it is an expensive and time-consuming task. The fundamental disadvan-
tage of TensorFlow is that it operates on a static computation graph and must be
performed each time any modifications are made. However, the platform itself has
been accepted because of its extensibility and capability, implying that the tradeoff
may still be worthwhile.
17.7.2 PyTorch
One of TensorFlow’s biggest competitors in the deep learning framework market is
PyTorch. Facebook developed this open-source tool that’s built with Python. Besides
powering most of Facebook’s services, other companies, like Johnson & Johnson
and Twitter, are also making use of PyTorch. PyTorch is a Python-based library that
offers support for debugging tools, as well as an emphasis on machine learning. In
contrast to TensorFlow’s static graph, PyTorch has a dynamic computation graph and
facilitates, which is an easy way to visualize the language that is being used. This
makes it easier for developers to see how their code will impact a project without
needing to set things up beforehand. PyTorch simplifies neural network training by
utilizing modern technologies such as data parallelism and distributed learning. The
PyTorch community is also very active, with regularly published pre-trained models.
TensorFlow, on the other hand, exceeds PyTorch in terms of cross-platform compati-
bility, thanks to Google’s vertical integration with Android, which makes additional
resources available to TensorFlow users.
17.7.3 Keras
Keras is a deep learning framework that was created on top of well-known frame-
works, notably TensorFlow as well as the Microsoft Cognitive Toolkit. Keras isn’t
quite as programmable as PyTorch or TensorFlow. However, it is the greatest place
to start learning neural networks for beginners. Keras simplifies the creation of huge
ANN Model for Analytics 207
and complicated models with a few commands. While this reduces configurability, it
also makes it more accessible as an application programming interface (API), mak-
ing it more useful in any context.[8]
17.8 CONCLUSION
ANNs have already become an important component of the technology field. They
can be used to recognize handwritten text, which can be useful in businesses like
banking, as well as many other vital fields such as medicine.[9] ANNs consist of arti-
ficial neurons, which are computational models that try to imitate the human brain.
They can be used to do complex analyses in a wide range of fields, from medicine to
engineering, as well as to design the future generation of computers. Neural networks
are useful in the field of medicine because they may be used to design models of the
human anatomy that can aid doctors to diagnose ailments more accurately. Because
of advances in ANN technology, intricate medical scans may now be evaluated more
accurately and efficiently. Many complex problems will be handled by neural net-
work–based devices themselves. They will learn and improve from their mistakes.
Perhaps in the future, we will be able to connect humans and machines. This would
translate into humans controlling or operating machines and robots. We might be
able to engage with our surroundings through our thoughts.
REFERENCES
[1] Kukreja, Harsh, N. Bharath, C. S. Siddesh, and S. Kuldeep. 2016. “An introduction to
the artificial neural network.” International Journal of Advance Research and Innovative
Ideas in Education 1: 27–30.
[2] “Architecture of Artificial Neural Network.” 2022. Dot Net Tutorial s, June 30. https://
dotnettutorials.net/lesson/architecture-of-artificial-neural-network/.
[3] “Artificial Neural Networks - Javatpoint. ” 2021. www.jav atpoint.com. https://www.
javatpoint.com/keras-artificial-neural-networks.
[4] Anderson, Dave, and George McNeill. 1992. “Artificial Neural Networks Technol-
ogy.” Kaman Sciences Corporation 258 (6): 1–83.
[5] Hopfield, J.J. 1988. “Artificial Neural Networks.” IEEE Circuits and Devices Maga-
zine 4 (5): 3–10. doi:10.1109/101.8118.
[6] Hemachandran, K., S. Khanra, R. V. Rodriguez, and J. Jaramillo (Eds.), 2022. Machine
Learning for Business Analytics: Real-Time Data Analysis for Decision-Making. CRC
Press, New York.
[7] “Introduction to ANN | Set 4 (Network Architectures) - GeeksforGeeks.” 2018. Geeksfor
Geek s, July 17. https://www.geeksforgeeks.org/introduction-to-ann-set-4-network-
architectures/.
[8] Kaushik, Vanshika. 2021. “8 Applications of Neural Networks | Analytics Steps.” An
alyticsSteps. https://www.analyticssteps.com/blogs/8-applications-neural-networks.
[9] Baxt, W.G. 1995. “Application of Artificial Neural Networks to Clinical Medicine.” The
Lancet 346 (8983): 1135–1138. doi:10.1016/s0140-6736(95)91804-3.
18 AI and Real-Time
Business Intelligence
J. Bhuvana, M. Balamurugan, Mir Aadil, and
Hemachandran K
18.1 INTRODUCTION
The use of artificial intelligence (AI) in the workplace is fundamentally altering
how firms operate. There are business applications that use AI to improve cus-
tomer service and increase sales. Enterprise leaders want to employ AI to enhance
their companies’ operations and guarantee a return on their investment, but they
confront significant obstacles. Expert systems are produced using knowledge
engineering technologies. Expert systems combine a rules engine and a sizable,
extensible knowledge bank. They are used to support decision-making in a range
of areas, including manufacturing, customer service, healthcare, and financial
services.
This chapter aims to analyze the origin, evolution, and development of business
intelligence (BI) and its relationship with AI. The aim is to define the incidence of BI
in business activities and analyze scientific activity and advances of BI to define new
research horizons in this field.
This chapter explores the history, development, and link between AI and
BI and to establish new research views in this area; thus, it is important to
characterize the scope of BI in corporate activities and scientific activity and
improvements.
18.2 BACKGROUND
18.2.1 The Beginning of Business
In the current industrial age, there is a fusion of AI, robotics, analytics, and other
advanced technologies. As AI becomes a massive market, expect business dynamics
to evolve. Without a solid AI business model, no one would be able to extract the full
value of an AI-based idea or technology.
Prediction has always been an element of human life since the dawn of commerce.
Today, organizations must use prediction if they want to keep up with the rapid devel-
opments in technology. The Kodak firm bluntly rejected that a digital camera was
ever a possibility for the future since it was unable to grasp the good potential of
such a device. A BI system is required to complete this task quickly and effectively
(Jourdan, Rainer et al. 2008).
18.2.3 Impact of AI in Business
AI is used by businesses for a number of tasks, including data collection and
process simplification. A human brain would take much longer to process mas-
sive volumes of data than AI is capable of doing. Then, AI software can provide
synthesized courses of action to the human user. This will enable us to lever-
age AI to speed up decision-making by playing out potential outcomes of each
action.
by AI business models. Business people are better able to identify product catego-
ries and market segments once their business strategy is clear. To avoid inconsis-
tencies, the key is to be explicit about the business strategy and to ask questions.
Use AI to assess whether option A or option B is the superior choice by focusing
on certain ideas.
Many businesses seek to use AI technology to boost productivity, reduce operat-
ing expenses, boost customer happiness, and enhance revenue. Businesses can gain
greatly from using AI. However, enormous advantages also present great obstacles.
Renew products and business operations by using intelligent AI technologies like
machine learning and natural language processing. One of the best ways to enhance
the current business model is through this. AI has a big impact on how businesses
operate. By implementing and integrating AI technology, we can automate, improve,
and conserve time- and money-consuming traditional procedures. Operational effec-
tiveness and productivity standards will also increase with the support of AI. Making
quicker and more logical business decisions is possible with the proper AI business
model. AI also aids in preventing human errors (Wixom and Watson 2010; Liang
and Liu 2018).
Changes in business operations made possible by AI technology are known
as business model innovations. These might consist of novel approaches to client
engagement, data management, and analysis, as well as novel approaches to market-
ing and sales of goods.
18.2.4 Responsible AI
A governance structure called responsible AI (RAI) outlines how a company
is tackling the issues related to AI. It is up to the software engineers and data
scientists to create fair, reliable AI standards. Different companies have dif-
ferent requirements for the steps needed to stop prejudice and guarantee trans-
parency. Google and Microsoft have both explicitly urged AI legislation. The
word “responsible” is a catch-all phrase that refers to both ethics and democra-
tization. Frequently, the data used to train machine learning models can intro-
duce bias into AI. It stands to reason that when the training data is skewed, the
decisions made by the programming would similarly be skewed. It is more and
clearer that standards in AI are required as software systems with AI elements
proliferate.
AI and Real-Time Business Intelligence 213
One of the main goals of RAI is to avoid situations where a small change in the value
of an input can drastically impact the results of a machine learning model. According
to corporate governance principles, responsible AI should be a form that cannot be
altered by humans, or other programming should be used to meticulously document
the model-development procedure. In addition, bias should not be introduced into
machine learning models when training on data. See Figure 18.3.
18.4.2 Retail
BI tools are one of the most important assets, since maintaining current customers
is a successful and long-term business plan. If businesspeople have the most recent
information on their customers, especially the most lucrative ones, they will be able
to market the items and services that are most pertinent to their needs and prefer-
ences, including data gathering and analysis to identify which products should be
enhanced and which should be phased out.
18.4.3 Trading
The banking and finance sectors have quickly embraced personalization, which is a
hot topic in every market. Therefore, having an edge over the competition is essential.
Based on the data availability, they can quickly drive the customer experience using
BI technology. Market trends may be used by businesses to plan out new investment
opportunities, consumer behavior can be predicted using analytics, and products can
be made for the unique needs of each customer.
18.4.4 E-Commerce
Most businesses store data across many different systems and formats. Data
processing and reporting become challenging and time-consuming as a result.
Using a BI solution helps reduce the difficulties associated with data kept in
many tools and spreadsheets. BI systems use instantaneous information to pro-
vide a more comprehensive view of workplace activity at any given time. The
numbers don’t lie. With a fully integrated BI solution, it helps to achieve whole
corporate success.
18.4.5 Marketting
The data from customer relationship management (CRM) can be used to calculate
the profitability of marketing initiatives. The effectiveness of a campaign as a whole, the
cost of advertising, and email performance can all be utilized to determine where
messages are reaching customers and where they need to be improved.
219
220
TABLE 18.1 (Continued)
Top 10 Tools Ised in Business Intelligence (Selecthub.com)
S. No. Name of Tool Description Pros Cons
9 Sisense • End-to-end data analytics platform with an embeddable, scalable architecture for • Consumer assitance and • Data Preparation,
the whole customer and provider base of the business; it enables corporate analysts guidance Modeling
to merge enormous datasets from various sources into a single, unified database. • Data synthesis • Training
• On the front end, anyone with any level of technical expertise may create • Data representation
visualizations, reports, and dashboards to explore and share insights that advance • User friendly
enterprises. Teams are given the ability to examine important metrics and data • Cost
insights right where they work thanks to Fusion, its AI-driven, cloud-native
analytics service. All sizes of enterprises can use it, and it is available as an
The various entities under investigation utilize these tools to gather data and base
judgments on it. The major objective of these [Selecthub (2020)] tools is to make
strategic decisions and present the analytics that could be a deciding element. They
have various functionalities and user interfaces.
18.7 CONCLUSION
BI is used by corporate people for many different reasons, so it helps to get perfect
and authentic suggestions in various fields like manufacturing, commercialization,
conformity, and employment. Making better business decisions is just one of the
many advantages that companies may experience after incorporating BI into their
operational frameworks. Additional benefits include greater data quality; rapid,
factual reporting and data analysis; better employee satisfaction; and improved
economics.
REFERENCES
Casadesus-Masanell, R., & Ricart, J. E. (2011). How to design a winning business model.
Harvard Business Review, 89(1/2), 100–107.
Hastie, T., Tibshirani, R., Friedman, J. H., & Friedman, J. H. (2009). The elements of statis-
tical learning: Data mining, inference, and prediction (Vol. 2, pp. 1–758). New York:
Springer.
Hedman, J., & Kalling, T. (2003). The business model concept: Theoretical underpinnings and
empirical illustrations. European Journal of Information Systems, 12(1), 49–59.
Johnson, M. W., Christensen, C. M., & Kagermann, H. (2008). Reinventing your business
model. Harvard Business Review, 86(12), 57–68.
Jourdan, Z., Rainer, R. K., & Marshall, T. E. (2008). Business intelligence: An analysis of the
literature. Information Systems Management, 25(2), 121–131.
León, M. C., Nieto-Hipólito, J. I., Garibaldi-Beltrán, J., Amaya-Parra, G., Luque-Morales,
P., Magaña-Espinoza, P., & Aguilar-Velazco, J. (2016). Designing a model of a digi-
tal ecosystem for healthcare and wellness using the business model canvas. Journal of
Medical Systems, 40(6), 1–9.
222 Artificial Intelligence and Knowledge Processing
Liang, T. P., & Liu, Y. H. (2018). Research landscape of business intelligence and big data
analytics: A bibliometrics study. Expert Systems with Applications, 111, 2–10.
Mishra, S., & Tripathi, A. R. (2021). AI business model: An integrative business approach. Jour-
nal of Innovation and Entrepreneurship, 10(1), 1–21.
Rong, K., Lin, Y., Shi, Y., & Yu, J. (2013). Linking business ecosystem lifecycle with platform
strategy: A triple view of technology, application and organization. International Jour-
nal of Technology Management, 62(1), 75–94.
Selecthub. (2020). Business intelligence software tools comparison. Acces sed online at https://
www.selecthub.com/business-intelligence-tools/, March 2020.
Technology Advice. (2020). Guide to business intelligence software. Acces sed online at
https://technologyadvice.com/business-intelligence/, March 2020.
Turban, E., Sharda, R., & Delen, D. (2010). Decision support and business intelligence sys-
tems (9th ed.). Upper Saddle River, NJ: Prentice Hall Press.
Wirtz, B. W. (2011). Business model management. Design–Instrumente–Erfolgsfaktoren von
Geschäftsmodellen, 2(1).
Wixom, B., & Watson, H. (2010). The BI-based organization. International Journal of Busi-
ness Intelligence Research (IJBIR), 1(1), 13–28.
19 Introduction to Statistics
and Probability
Abha Singh and Reham Alahmadi
19.1 INTRODUCTION
In artificial intelligence (AI), a computer may mimic human behavior, thanks to
advancements in technology and science. System learning is a type of AI that lets
machines learn on their own from previous data without the need for explicit pro-
gramming. The objective of AI is to create a computer system that is as intelligent
as humans in order to tackle complicated issues. I witnessed multiple instances of
misinterpretation of AI in various debates. AI includes machine learning as one of
its parts.
The relationship between machine learning and statistics is so close that the lines
between the two are blurred at times, and explicit references to statistical study are
made when appropriate. Several years before the buzz began, an integrated curricu-
lum between the schools of statistics and computer science taught AI, data science,
and machine learning with the goal of training data scientists. We may say that the
perspectives of machine learning and statistics are substantially overlapping in terms
of content.
Reasoning based on probabilities and statistics has had an impact on a lot of
different parts of the theory behind AI. Do the prior interpretations of probability
do justice to the fresh applications that have emerged over the course of the last
several decades? Gambling and other games of chance were the inspiration for
the development of probability theory. What fresh insights into the philosophy of
probability can we get from the contemporary use of probability in the field of AI?
19.2 STATISTICS
Statistics is a branch of math that looks at how data is collected, organized, analyzed,
interpreted, and shown. It is sometimes known as “numerical analysis.” When using
statistics to solve a problem in a scientific method, it’s common to start with a statis-
tical population or model that will be looked at in more depth later. During the course
of their professional and personal lives, engineers and scientists are continuously
confronted with collections of information or data, which may be overwhelming.
Techniques for organizing and summarizing data, as well as methods for deriving
conclusions based on the information contained in the data, are provided by the field
of statistics.
When it comes to comprehending the world around us, statistical ideas and
approaches are not only valuable, but they are also often required. They give
avenues for acquiring fresh insights into the behavior of a wide range of phenomena
that you will face in your chosen area of engineering or scientific concentration. In
the face of uncertainty and variance, the field of statistics teaches us how to make
educated judgements and informed choices based on available information. There
would be no need for statistical procedures or statisticians if there were no uncer-
tainty or variation.
Inferential statistics are used in political polling. Interviewing every Indian of vot-
ing age would be costly and impracticable, and this group is known a population.
Statisticians can afford to survey just a few thousand people to evaluate the opinion
of the whole Indian electorate. This group is known a sample. Statisticians use data
from a random sample of voters to infer (reach a conclusion) about how the whole
population wants to vote. Statistical inference helps make such inferences.
Population
The group of all people or things that are being looked at in a statistical study.
Sample
The subset of the population from whom information is gathered and analyzed.
Random sample
A random sample is one that is selected at random. It’s more correct to define it as
a sample random selection.
Statistical inference
Statistical inference refers to methods for drawing and assessing the trustworthi-
ness of inferences about a population based on information acquired from a
sample of the population.
Data
When we talk about data, we’re talking about discrete bits of information like
numbers or percentages. Data is, in a more technical sense, a collection of
numerical or qualitative values pertaining to one or more individuals or things.
19.2.1 Collection of Data
There are a number of ways for collecting or obtaining data for use in statistical analysis.
Three of the most often used techniques are as follows:
1. Observational Method
This method is depends on the observation of things, people, marketing, and
others also.
Introduction to Statistics and Probability 225
2. Experimental Method
A coin is flung into the air. The experiment may have two potential results,
assuming that the coin does not fall on the edge of the table. The possibili-
ties are heads or tails. When this experiment is performed, no one can pre-
dict what will happen as a result of the conclusion. If desired, you may toss
the coin as many times as you like.
Many experiments use the experimental method like a blood pressure (BP)
measuring experiment, health measuring experiment method, manufactur-
ing product from a machine, etc.
3. Survey Method
A survey is a method used to collect information from people, such as before the
polls and marketing surveys. One of the most essential survey metrics is the response
rate (i.e., the proportion of those selected who complete the survey). Surveys may be
administered in several ways, such as:
• personal interview;
• telephone interview; and
• self-administrated questionnaire
In the world of data, there are two types. The first is qualitative data, and the second
is quantitative data. Quantitative data also has two types. One is continuous type data
and the second is discrete type data.
Discrete data
Discrete data is a finite or countable number such as 0, 1, 2, 3, . . .
For example, a potato is a finite number.
Continuous data
Continuous range of numbers without gaps, stops, or interruptions or leaps result-
ing in infinitely many potential values.
For example, the measurement of petrol is 32.6789 liters.
Measurement level
There are four main ways to categorize data: nominal, ordinal, interval, or ratio.
We can make better decisions based on the data we collect and which tech-
nique we utilize when applying statistics to real-world situations.
Nominal level: The nominal level is presented as a name, a category, and an order
such as low or high, but it cannot be arranged in any order.
For example:
1. Person responds like yes/no/cannot say.
2. Is a particular place beautiful or not?
226 Artificial Intelligence and Knowledge Processing
55, 35, 97, 74, 65, 53, 32, 78, 75, 62, 55, 35, 97, 74, 65, 53, 32, 78, 75, 62
In light of the fact that this data is raw data, if we want to look at the
maximum (highest scores) or minimum (lowest marks), we must search in the
table. If we present the raw data provided to us in an ascending or descending
sequence, we can simply determine the highest and lowest grades that students
have received.
The data range refers to the difference between the highest and lowest data
values.
So, the range = highest mark – lowest mark = 95 – 32 = 63
Example: Consider the following: the average measured BPs of 30 patients at one
hospital are as follows:
10 20 36 92 95 40 50 56 60 70 92 88 80 70 72
70 36 40 36 40 92 40 50 50 56 60 70 60 60 88
Introduction to Statistics and Probability 227
To refresh your memory, the term “frequency” of BP of patient refers to the practice
of measuring a patient’s BP several times over the course of a certain period of time.
Take, for example, the BP of four patients is 70. So the frequency of 70 patients is
4. We have presented the information in the form of a table (see Table 19.1).
Example: 100 weekly observations from the city’s cost-of-living index study were
included in the following list:
96 67 28 32 65 65 69 33 98 96
76 42 32 38 42 40 40 69 95 92
75 83 76 83 85 62 37 65 63 42
89 65 73 81 49 52 64 76 83 92
93 68 52 79 81 83 59 82 75 82
86 90 44 62 31 36 38 42 39 83
87 56 58 23 35 76 83 85 30 68
69 83 86 43 45 39 83 75 66 83
92 75 89 66 91 27 88 89 93 42
53 69 90 55 66 49 52 83 34 36
TABLE 19.1
An Ungrouped Frequency Distribution Table or a Simple Frequency
Distribution Table
Blood 10 10 20 36 40 50 56 60 70 72 80 88 92 95
Pressure (mmHg)
Number of patients (the frequency) 1 1 1 3 4 3 2 4 4 1 1 2 3 1
TABLE 19.2
Cost of Living Index vs. Number of Weeks Sample 1
Cost of Living Index Number of Weeks
20–29 3
30–39 14
40–49 12
50–59 8
60–69 18
70–79 10
80–89 23
90–99 12
Total 100
228 Artificial Intelligence and Knowledge Processing
TABLE 19.3
Cost of Living Index vs. Number of Weeks Sample 2
Cost of Living Index Number of Weeks
19.5–29.5 3
29.5–39.5 14
39.5–49.5 12
49.5–59.5 8
59.5–69.5 18
69.5–79.5 10
79.5–89.5 23
89.5–99.5 12
Total 100
Introduction to Statistics and Probability 229
TABLE 19.4
Chocolate Likes Poll
Type of Chocolate Galaxy Twix Dairy Milk Kit-Kat Cadbury Oreos
Men 10 8 9 8 7 6
Women 5 4 7 10 10 10
Children 8 6 3 6 10 9
Example: Take a look at the chocolate made by a few different companies and
take a poll to find out which chocolate is the most popular among men, women, and
children. See Table 19.4.
Now we found their average response as follows:
Our aim is to discuss here the mean (or average), i.e. the mean is the sum of all
observations divided by the total number. Mean is denoted by x , read as “x bar.”
Therefore, we may refer to the average stated earlier in terms of the mean.
Example: Find the mean of the given data 3, 5, 7, 9, 5, 4, 3.
3 + 5 + 7 + 9 + 5 + 4 + 3 36
Mean ( x ) = = = 5.143
7 7
Example: In Table 19.5, we will now determine the mean BP of hospital patients
(i.e., mean of the ungrouped frequency distribution).
TABLE 19.5
Determine the Mean BP of Hospital Patients
Blood 10 20 36 40 50 56 60 70 72 80 88 92 95
Pressure (mmHg) (x)
Number of patients (the 1 1 3 4 3 2 4 4 1 1 2 3 1
frequency) (f)
fx 10 20 108 160 150 112 240 280 72 80 176 276 95
TABLE 19.6
Grouped Frequency Distribution Table
19.2.2.2 Median
The median is another often used measure of the center. In essence, the median of a
dataset is the value that separates the lowest and highest 50% of the data.
DEFINITION: The median of a set of data, sorting the information in ascending
order.
Example: Consider two instances of shift overtime compensation data for a single
business. Determine the median of the datasets.
I dataset salary: $500 500 800 940 300 300 400 300 700 550 900 750 2060
II dataset salary: $500 940 300 400 700 550 900 750 2060 400
Solution: To find the median of dataset I, first we arrange the data in increasing
order: 300 300 300 400 500 500 550 700 750 800 900 940 2060. The total number of
observations is 13, so (n + 1) / 2 = (13 + 1) / 2 = 7. Therefore, the seventh observation
in the sorted list, which is 550, represents the median.
To find the median of dataset II, initially, we organize the data in ascending order:
300 400 400 500 550 700 750 900 940 2060. The number of observations is 10,
so (n + 1) / 2 = (10 + 1) / 2 = 5.5. Therefore, the median is 625, which is the midpoint
between the fifth and sixth observations in the sorted list.
19.2.2.3 Mode
The value that comes up most often in a set of data is called the mode.
If no value comes up more than once in the dataset, there is no mode.
Example: I- The mode of set 4 5 3 3 3 4 7 8 8 10 is 3
II- The mode of set 3 5 6 4 7 2 10 11. There is no mode, because no highest fre-
quency occurs.
TABLE 19.7
Skill Measures
Classes High Skills Measures Low Skills Measures
5th 12 6
6th 11 0
7th 15 8
8th 20 11
9th 17 12
10th 10 10
11th 14 9
6. Compiles data on student skills. During one year, the numbers of students
in different classes of college were tabulated by the high skill and low skill,
which resulted in Table 19.7 (this is based on approximation).
Calculate the mean on based on both high skills and low skills measurements.
19.2.2.4 Probability
In the following paragraphs, you will acquire the knowledge necessary to compute the
likelihood that a certain result will be obtained from an experiment. Although it was
first utilized in gambling, since then, probability has been used a lot in many different
fields, like the physical sciences, business, the biological and medical sciences, and even
weather forecasting. When we carry out various tests, such as tossing coins, throwing
dice, and so on, and saw the results of these endeavors. In the next section, you’ll learn
how to figure out how likely it is that a certain result will happen in an experiment.
the collection of all the potential results of an experiment is referred to as the
sample space X. Points y located inside X are referred to as examples of outcomes,
realizations, or components. Events are a term that refers to subsets of X.
Example. If a coin is flipped twice, then we will get the following results: HH,
HT, TH, and TT. The outcome of the first coin toss being heads is denoted by the
symbol A, which is read as “HH, HT.”
The notation of probability is denoted by “P” and presents the event by symbols
“A,” “B,” and “C.” So we can write the probability P(A) = the probability that event
A will happen, P(B) = the probability that event B will happen, P(C) = the probability
that event C will happen, and so on. The basic rule of probability is to carry out (or
watch) a certain process and keep track of the number of times event A really does
take place. Assume that a certain process includes n distinct simple events and that
each of those simple events has an equal probability of taking place. If event A can
occur in any of these different ways, then
Rule Probability:
(i) A probability should always be expressed as a fraction or decimal value that
falls between 0 and 1.
(ii) The chance (probability) of something that can never happen is zero.
(iii) The chance (probability) of an event that will definitely take place is equal
to one.
(iv) The chance (probability) of each given occurrence, denoted by the letter A,
ranges between 0 and 1. To put it another way, 0 ≤ P(A) ≤ 1.
Complementary Events: The set of all possible outcomes in which the occur-
rence of the event represented by “A” does not take place is referred to as “A,”
and it is the complement of the event signified by A = 1- A .
Example: There are 8 red balls, 7 blue balls, and 6 green balls in a box. One ball is
chosen by chance. How likely is it that the ball is neither red nor green?
Solution: let E = neither red nor green = 7, total sum of ball (S) = 21
n(E) 9
\ P(E) = =
n(S) 20
Example: Table 19.8 contains data from a study of two airlines which fly to Riyadh,
in the kingdom of Saudi Arabia.
Find the probability that the flight selected is Flynus Airlines, which was on
time.
Solution: The probability of P (Flynus Airlines on time) = 43/87.
TABLE 19.8
Study of Two Airlines
No. of Flights on Time No. of Late Flights Totals
Saudi Airlines 33 6 0.6
Flynus Airlines 43 5 0.4
Totals 76 11 87
234 Artificial Intelligence and Knowledge Processing
Addition Rule: The probability that either event A or event B occurs as the only
result of the operation is calculated using the addition rule, which may be expressed
as P (A or B) (or that both events occur). The word “or” is the central focus of this
paragraph. This is known as the inclusive or, because it may indicate either one or
the other, or even both.
Formal Addition Rule:
where P (A and B) signifies the chance that A and B both occur as a result of a pro-
cedure trial at the same time.
Disjoint or Mutually Exclusive: If events A and B cannot occur concurrently,
they are discontinuous (or mutually exclusive). (In other words, discontinuous events
don’t happen at the same time.)
Complementary events P (A) and P (B) are disjointed, so it is not possible for two
things to happen at the same time.
Venn diagrams for events that are not disjointed and that are disjointed are shown
in Figure 19.2 and Figure 19.3.
Example: There are 20 boys and 45 girls in a class. The likelihood that a pupil
will not be a boy if they are chosen at random is:
FIGURE 19.2 Venn diagram for events that are not disjointed.
Introduction to Statistics and Probability 235
Example: What is the probability of getting a number higher than three on a 6-sided die?
Solution: The die numbers that are greater than 3 are 4, 5, and 6.
A and B are independent events if none affects the other’s likelihood. (Events are
independent if one doesn’t alter the others’ probability.)
P ( A and B ) = P ( A ) • P ( B / A ) = P ( B) • P ( A / B) .
TABLE 19.9
Survey of Undergraduate Students at a University’s Faculty of Science
Gender Major Total
Biology Physics Mathematics
Male 22 10 8 40
Female 3 5 2 10
Total 25 15 10 50
TABLE 19.10
Study of Two Airlines that Fly to Riyadh
No. of Flights on Time No. of Late Flights Totals
Saudi Airlines 33 6 39
Flynas Airlines 43 5 48
Totals 76 11 87
Introduction to Statistics and Probability 237
TABLE 19.11
Probabilities
A B Totals
D 0.2 0.4 0.6
E 0.28 0.12 0.4
Totals 0.48 0.52 1
Find the probability that the flight selected is Flynas Airlines, which was on time.
Ans: The probability of P(Flynas Airlines on time) = 43/87
5. A random ticket is picked from a hat from among the tickets with numbers
ranging from 1 to 20. What’s the chance that the winning ticket has a num-
ber that’s a multiple of three or five?
6. Suppose events A, B, D, and E have probabilities as given in Table 19.11.
REFERENCES
[1] K. Morik, “A note on artificial intelligence and statistics.” In Applications in Statistical
Computing, pp. 127–138. Springer, Cham, 2019.
[2] Z. Ghahramani, “Probabilistic machine learning and artificial intelligence.” Nature 521,
no. 7553 (2015): 452–459.
[3] William A. Gale, and P. Daryl, “Artificial intelligence research in statistics.” AI Maga-
zine 5, no. 4 (1984): 72–72.
[4] William Mendenhall, Robert J. Beaver, and Barbara M. Beaver, Introduction to Proba-
bility and Statistics. Cengage Learning, Boston, United States, 2012.
[5] Jiaying Liu, Xiangjie Kong, Feng Xia, Xiaomei Bai, Lei Wang, Qing Qing, and Ivan
Lee, “Artificial intelligence in the 21st century.” IEEE Access 6 (2018): 34403–34421.
[6] K. Hemachandran, S. Khanra, R. V. Rodriguez, and J. Jaramillo (Eds.), Machine Learn-
ing for Business Analytics: Real-Time Data Analysis for Decision-Making. CRC Press,
New york, 2022.
[7] Jay L. Devore, Probability and Statistics for Engineering and the Sciences. Cengage
Learning, Boston, United States, 2011.
20 Real Impacts of Machine
Learning in Business
B. R. S. S. Sowjanya, S. Pavan Siddharth,
and Vishwa KD
20.2 ALGORITHMS
20.2.1 Decision Tree Algorithm
Popular machine learning methods that can be applied to both classification and
regression issues include decision trees. A decision tree employs a tree-like structure
to manage decisions and the potential outcomes and repercussions of those actions.
Every internal node denotes a test on an attribute, and every branch reflects the out-
come of the test. A decision tree’s outcome will be more accurate the more nodes it
contains. Decision trees have the virtue of being logical and simple to use, but they
are not the absolute best when it comes to accuracy. In operations research, decision
trees are widely applied, particularly in machine learning, strategic planning, and
decision analysis.
240 Artificial Intelligence and Knowledge Processing
20.2.3 Clustering Algorithm
Data points are clustered or grouped into several clusters based on similarities and
differences; this process is called clustering. The items that have the most commonal-
ities stay in the same group and have little to no overlap with items from other groups.
Numerous activities, including image segmentation, statistical data analysis, mar-
ket segmentation, etc., can benefit from the usage of clustering techniques.
K-means clustering, hierarchical clustering, density-based spatial clustering
of applications with noise, and other popular clustering algorithms are some
examples[3].
FIGURE 20.3 The advantages resulting from the application of machine learning in a
business organization.
• PayPal, a corporation that offers financial services, found that using the
H2O.ai service in the fraud detection domain enhanced the accuracy by
94.8% and cut the model training time to just under two hours.
• In the example of the Canadian company Imagia, the use of Google Cloud
AutoML in the healthcare sector allowed for a 16-hour to 1-hour reduction
in test processing time and an improvement in diagnosis outcomes.
• Meredith Company, with the help of the media and entertainment industry,
Google Cloud AutoML, was used to classify content and increase customer
knowledge about the emerging trends.
• Using DataRobot’s email marketing system, a Danish marketing firm, One
Marketing Ltd., reduced spam for clients, raised mail open and click rates
by 14% and 24%, and increased the ticket sales by 83%.
20.3.1 Google
Google has been at the forefront of machine learning use since its inception. It offers a
helpful and individualized experience to its consumers by using machine learning algo-
rithms. For instance, Google uses machine learning in its picture search and translation
capabilities to stop unlawful work or commerce, such as illicit fishing, by utilizing satellite
data. Google services already incorporate machine learning, such as Gmail and Search[4].
20.3.1.1 Gmail
Compared to what is in your mailbox, social, promotional, and primary emails may
change. Google filters this since it classifies the email appropriately. The computer’s
threshold is tuned using machine learning, so when a user tags a message consis-
tently, Gmail automatically makes real-time adjustments to its threshold and then
learns for subsequent categorization.
Real Impacts of Machine Learning in Business 243
20.3.1.3 OK Google
This is an intelligent personal assistant app that facilitates task completion, informa-
tion discovery, and reservation making. When it’s pouring outside, you can quickly
find nearby restaurants, purchase movie tickets on the go, and locate the theater that
is the closest to your location. It also aids in your navigation to the theater. In other
words, if you have a smartphone, you don’t need to worry about a thing because
Google handles everything.
20.3.2 IBM
The platform offered by IBM’s Watson artificial intelligence and machine learning
tool is made to improve both the intelligence of your company and the performance
of your employees. A variety of cutting-edge application programming interfaces
(APIs), specialized tools, and software as service applications are available with
Watson. Inferring that Watson was created with complex use cases in mind and with
experts in mind so that it can effortlessly connect with the platforms they already
use for their everyday job, this guarantees easy access to the information you need to
make the best decisions.
The cornerstone of your competitive edge, your data, models, learning, and
API, are all completely under your control with Watson. Because of its tremen-
dous learning capacity, it can learn more with less. Watson can assist in making
judgments that help organizations generate more money by forecasting trends
using data.
20.3.3 NASA
The finding of new extraterrestrial objects requires machine learning. Machine learn-
ing is essential to find patterns in the massive amount of data produced by NASA sat-
ellites and spacecraft in order to make exciting future discoveries. Machine learning
can be used to intelligently oversee spacecraft repairs, find undiscovered planets in
other galaxies, and uncover other fascinating things.
Rank: A 1–10 scale representing the streaming time of each show for a given date.
Year to Date Rank: A 1–10 scale representing the overall rank relative to all
other shows that year (these rankings shift around quite a lot, as this is recal-
culated by the day).
Last Week Rank: A 1–10 scale showing the overall rank for the prior week.
Title: The name of the show or special in question.
Netflix Exclusive: Whether the show is a Netflix exclusive.
Netflix Release Date: The date which the show debuted on Netflix.
Days in Top 10: How many total days a show has appeared in the top 10 by the
as of date.
FIGURE 20.5 The 15 most frequent top shows with aggregate days.
FIGURE 20.6 Plotting of the shows/series as the days on number 1 and days in top 10 simultaneously.
Shazam created a model that examined song excerpts and assigned a “signature”
to each in order to address this problem. This functioned by building a spectrogram
for the track fragment and then searching for amplitude peaks. See Figure 20.7.
The two “feature” elements for every track were then created from these track
signatures so that they could be represented graphically. This provided a heat map
of how each musical genre appeared visually after being divided into the human-
assigned genres, which could be used to cross-reference with some other tracks to
assign a genre automatically.
Even though the specifics are a little complicated, the final result is that Shazam
can automatically evaluate songs and assign tags and genres to them while maintain-
ing a high level of accuracy.
learning models speculate if the market is going up or if it going down in the next
moment. Here, we discuss a model which predicts the price of the stock based on the
fundamentals of the company by using the XGBoost algorithm. See Figure 20.8.
20.4 CONCLUSION
Machine learning is a technology that enables companies to efficiently derive insights
from unstructured data. Machine learning algorithms can be used to constantly learn
from a set of data to identify patterns and actions, among other things. The machine
learning approach is dynamic and constantly evolving, allowing businesses to stay
up to date on market and client expectations.
REFERENCES
[1] Osisanwo, F. Y., Akinsola, J. E. T., Awodele, O., Hinmikaiye, J. O., Olakanmi, O., &
Akinjobi, J. (2017). Supervised machine learning algorithms: Classification and comparison.
International Journal of Computer Trends and Technology (IJCTT), 48(3), 128–138.
[2] Evgeniou, T., & Pontil, M. (1999, July). Support vector machines: Theory and applica-
tions. In Advanced Course on Artificial Intelligence (pp. 249–257). Springer.
[3] Mirtaheri, S. L., & Shahbazian, R. (2022). Machine Learning: Theory to Applications.
CRC Press.
[4] Ahmed, H., Jilani, T. A., Haider, W., Abbasi, M. A., Nand, S., & Kamran, S. (2017).
Establishing standard rules for choosing best KPIs for an e-commerce business based
on google analytics and machine learning technique. International Journal of Advanced
Computer Science and Applications, 8(5).
248 Artificial Intelligence and Knowledge Processing
[5] Taecharungroj, V. (2021). Google Maps amenities and condominium prices: Investi-
gating the effects and relationships using machine learning. Habitat International, 118,
102463.
[6] Ono, M., Rothrock, B., Otsu, K., Higa, S., Iwashita, Y., Didier, A., . . . Park, H. (2020,
March). MAARS: Machine learning-based analytics for automated rover systems.
In 2020 IEEE Aerospace Conference (pp. 1–17). IEEE.
[7] Hemachandran, K., Khanra, S., Rodriguez, R. V., & Jaramillo, J. (Eds.). (2022). Machine
Learning for Business Analytics: Real-Time Data Analysis for Decision-Making. CRC
Press, New York.
21 A Study on the
Application of Natural
Language Processing
Used in Business
Analytics for Better
Management
Decisions
A Literature Review
Geetha Manoharan, Subhashini Durai,
Gunaseelan Alex Rajesh, and Sunitha
Purushottam Ashtikar
21.1 INTRODUCTION
The field of artificial intelligence and data science known as natural language pro-
cessing (NLP) is expanding rapidly, and it makes use of sophisticated speech and text
processing tools. The goal of this line of research is to provide methods for the auto-
matic analysis and presentation of human language. Automatic summarization, part-
of-speech tagging, disambiguation, entity and relation extraction, sentiment analysis,
natural language understanding (NLU), and speech recognition are only some of the
methods used by NLP to make sense of ambiguities in human language. Numerous
NLP-related software tasks, such as morphological and syntactic analysis, have been
satisfactorily solved for online use.
So they can be used as software, just like software as a service (SaaS) applica-
tions in cloud computing. Users can easily and quickly access a large pool of flexible
computing resources over the Internet with minimal overhead thanks to the “cloud
computing” approach. Examples of cutting-edge technologies that form the basis
of this paradigm include the Internet, virtualization tools, grid computing, and web
services. Because of this, cloud computing combines SaaS and utility computing.
The idea of cloud computing is to make flexible, affordable, and reliable computing
resources available on-demand.
Researchers and other users with an interest in the topic are currently very inter-
ested in cloud-based NLP analysis services. It enables researchers to set up, share,
and utilize language processing tools and components in line with the SaaS (data
as a service) and PaaS (platform as a service) models. The reviews of cloud-based
NLP services, however, are surprisingly scarce. Some of the most well-known cloud-
based NLP services and application programming interfaces (APIs) are Amazon
Comprehend, Microsoft Azure Cognitive Services, Google Cloud Natural Language,
and several third-party alternatives. Machine learning is used by the Amazon
Comprehend (AWS) service to identify the language and extract key phrases from
a text. Tokenization, sentiment analysis, and text file automation organization are
all features of Amazon Comprehend, which integrates with any AWS-supported
application.
The portfolio of NLP tools organized into various, more specialized services
and uses is called the Microsoft Azure Cognitive Services. By way of illustra-
tion, the Azure Text Analytics API can be used by programmers to create tools
that analyse the sentiment or determine the language of a given text. The Azure
Language Understanding Intelligent Service, on the other hand, is capable of
deciphering things like user intent. An invaluable achievement for developers is
the creation of chatbots, voice-powered products, and customer care platforms.
Additionally, Google Cloud Natural Language can perform entity extraction, senti-
ment analysis, syntax analysis, and classification. The difference between this API
and others is that it is powered by the same comprehensive learning modules that
power Google’s own language understanding and query comprehension systems
for Google Search and Google Assistant.
There are numerous third-party services and APIs for NLP. While Monkey-Learn
and other suppliers offer services to automate procedures based on unstructured data,
in contrast, as an illustration, businesses like Diffbot provide features via a paid API
that enable users to precisely extract data from websites. In contrast to the reviews
provided earlier, we investigate the NLP-related methods and technologies that are
currently in use, as well as cloud-hosted NLP services, and we talk about NLP and
big data methods and technologies, including information extraction through NLP
within big data. This chapter’s primary goal is to present a review of the literature
on various studies carried out in different fields. This is for the purpose of learning
and highlighting the essentials of NLP services currently used in various fields for
various reasons and purposes.
interface. Users who lack the time or motivation to learn the machine’s language
will benefit from NLP because it facilitates communication. One way to define
a language is as a system of rules, while another way is to view it as a system of
symbols. When two or more symbols are combined, the result is a transmission or
broadcast of information.
NLP consists of natural language generation (NLG) concepts and NLU,
which are concerned with reading and writing text, respectively. NLU and NLG
approaches and developments are related to big data. A few common NLP tech-
niques include probabilistic context-free grammar, part-of-speech tagging, word
sense disambiguation, and lexical acquisition. Text mining techniques based on
NLP include information extraction, topic modelling, text summarization, clas-
sification, clustering, question answering, and opinion mining, to name just a
few. In linguistics, “annotation” most often refers to metadata that characterizes
words, sentences, or other metadata. Automatically assigning descriptors to input
tokens is known as “tagging” in the annotation process. Tagging words as nouns,
verbs, and adverbs, among other categories, is called part-of-speech tagging.
This method takes a natural language like English and creates a meta-language
from it. To begin any NLP task, segmentation must be performed. The unpro-
cessed form of electronic text is merely a string of characters. Because of this, it
needs to be broken down into smaller linguistic chunks. It is not a complete list,
but it includes things like words, periods, numbers, alphabetic characters, and
other symbols.
Tokenization is another name for this process. Linguistic entities (tokens)
must be assigned to classes for NLP tasks. The two parts of NLP that concen-
trate on reading and writing text, respectively, are called NLU and NLG. Big
data is relevant to NLU and NLG methodologies and advancements. Common
NLP methods include lexical acquisition, probabilistic context-free grammar,
part-of-speech tagging, and word sense disambiguation. Methods for text min-
ing that rely on NLP include information extraction, topic modelling, text sum-
marization, classification, clustering, question answering, and opinion mining.
An annotation is a type of metadata that describes words, sentences, or other
types of metadata in linguistics. “Tagging” refers to the process of automati-
cally assigning descriptors to input tokens during the annotation process. Part-
of-speech tagging is the process of translating from a natural language like
English into a meta-language made up of different types of words. A crucial step
in every NLP process is text segmentation. Electronic text essentially only con-
sists of a string of characters when it is not processed. As a result, it needs to be
divided into linguistic units. These units might be anything from words to punc-
tuation to numbers to alphanumeric characters and other symbols or the Apache
UIMA project. Statistical, lexical affinity, and keyword spotting techniques all
fall under the category of syntax-centred NLP. The most simplistic strategy is
keyword spotting, which is also likely the most popular due to its availability
and affordability. In comparison to keyword spotting, lexical affinity is a little
more sophisticated.
252 Artificial Intelligence and Knowledge Processing
Since the late 1990s, statistical NLP has been the primary focus of research-
ers in the field. The foundation is in linguistic theory that employs well-known
machine learning approaches like support vector machines, expectation maximi-
zation, conditional random fields, and maximum likelihood. The semantic fea-
tures of statistical methods are often inadequate. Web data analysis automatically
downloads, extracts, and assesses data from cloud documents and services in order
to locate pertinent information. The fields of database management, information
retrieval, NLP, and text mining all have connections to web analysis. The study
of the Web’s data can be broken down into three distinct areas: content mining,
structure mining, and usage mining. Text, photos, audio, video, code, metadata,
and hyperlinks are just some of the many data sources that can be mined when
engaging in web content mining. In this chapter, we provide models for under-
standing the architecture of hyperlinks on the Internet. However, recovery is a
digital problem that needs fixing quickly. Sentiment analysis of data gleaned from
social networking sites like Twitter calls for extensive data pre-processing. Due to
the data’s enormous volume, high degree of unstructuredness, and incredible rate
of production, parallel implementations of pre-processing algorithms are required.
Frequency mapping, removing any unnecessary words or symbols, converting a
string to a vector, and other pre-processing techniques are all examples of pre-pro-
cessing. According to B. Bharathi and Josephine Varsha (2022), in their study they
compared seven transformer model variations on five difficult NLP tasks and seven
datasets. They design experiments to focus on their capacity for sustained atten-
tion while isolating the effects of pretraining and hyperparameter settings. They
also present various approaches for analysing attention behaviours to shed light
on model details beyond metric scores. They found that long-range transformers’
attention has benefits for content selection and query-guided decoding, but they
also have unrecognised drawbacks like failing to pay enough attention to tokens
that are far away.
the enormous amount of energy that is increasingly being used for training and running
computational models, the climate impact of artificial intelligence, and NLP research
in particular, has become a serious issue. As a result, effective NLP is gaining popu-
larity. This significant initiative does not, however, have clear guidelines that would
enable systematic climate reporting on NLP research. They contend that this short-
coming is one of the factors contributing to the fact that few NLP publications provide
key data that would enable a more in-depth analysis of environmental impact. They
suggest a climate performance model card as a workaround whose main goal is to be
practically applicable with little knowledge of the experiments and underlying com-
puter hardware. They also explain why taking this action will help raise awareness
of how NLP research affects the environment and opens the door to more in-depth
discussions. Balkir et al. (2022) and Belz (2022) say that methods in explainable arti-
ficial intelligence (XAI) are frequently driven by the desire to identify, measure, and
mitigate bias as well as improve the fairness of machine learning models. However,
it’s frequently not made clear how an XAI method can aid in overcoming biases. They
discuss the ways now used to detect and reduce bias using explain ability methodol-
ogies, and they briefly explain current ability and fairness research trends. They also
highlight the barriers that limit the use of XAI techniques to address fairness issues.
analytical approach of machine learning allows for rapid evaluations and decisions
to be made regardless of the dataset being used.
critical look at current approaches and made recommendations for how to use NLP
to enhance content-based recommenders.
Development of software requires the use of the problem-solving method of
decomposition. However, it is regarded as the hardest programming ability for begin-
ners to learn. Researchers studied decomposition in basic programming classes using
case studies, surveys, and guided experiments. The exponential development of fields
like machine learning and NLP has undoubtedly paved the way for more scalable
solutions.
According to Nikita Klyuchnikov et al. (2022), neural architecture search (NAS)
is a rapidly evolving field of study with promising potential. However, due to its
requirement of substantial computational resources for training numerous neural
networks, NAS proves impractical for researchers who have limited or no access
to high-performance clusters and supercomputers. As a way around this problem
and to guarantee repeatable tests, a handful of benchmarks with pre-computed per-
formances of neural architectures have recently been introduced. While useful for
computer vision applications in general, these benchmarks were developed using
image datasets and convolution-derived architectures. Using the language modelling
task, which is fundamental to NLP, has resulted in several important contributions,
including the creation of a search space for training recurrent neural networks on text
datasets, the development of methods for both intrinsic and extrinsic evaluation of
trained models via evaluations of semantic relatedness and language understanding,
and the testing of multiple NAS algorithms to show how the pre-computed residuals
work. It is believed that the benchmark will aid in the development of new NAS tech-
niques that are well suited for recurrent architectures and give the community access
to more trustworthy empirical findings.
The essential components of this method are organizing the text, extracting aspect-
based sentiment scores for each text item, and applying an opinion mining algorithm.
The sentiment analysis data table is subjected to an action rule mining method. The
strategy’s proposed application is to the challenge of increasing customer satisfac-
tion ratings. The dataset of consumer reviews of repair services was extensively and
appropriately assessed. The research findings were also used to help create a user-
friendly recommendation system for the Web that may advise companies on how to
improve their services in order to increase their profitability.
21.13 CONCLUSION
The analysis of publications in Scopus-listed journal articles that employ NLP as its
primary analytic method show how textual data may be used to promote manage-
ment ideas in many fields. The discussion of its use began with the introduction of
NLP as an analytical technique, along with the necessary toolkits and procedures, as
well as its advantages and disadvantages. This study makes use of this opportunity
to draw attention to the technological and managerial limitations associated with the
application of NLP in the field of management research; this will help direct future
studies. Furthermore, the study discusses the use of NLP in the field of business ana-
lytics. Business analytics is an upcoming trend in global business management. By
using NLP, performing business analytics becomes more improvised. Like business
analytics, big data analytics is also a hot topic to be learned and adopted in various
fields of analytics. The study discusses how the information is extracted and the
260 Artificial Intelligence and Knowledge Processing
process involved in using NLP in big data analytics. The study also discusses the use
of NLP in various fields like education, linguistic analysis, machine learning, cyber
security, business, social media, and medicine. Thus, this study helps to learn about
NLP, its application techniques, and its current use in various fields.
REFERENCES
Balkir, E., Kiritchenko, S., Nejadgholi, I., & Fraser, K. C. (2022). Challenges in Applying
Explainability Methods to Improve the Fairness o f NLP Models. http://arxiv.org/
abs/2206.03945.
Belz, A. (2022). A Metrological Perspective on Reproducibility in NLP. Computational Lingu
istics, 1–11. https://doi.org/10.1162/coli_a_00448
Berbatova, M. (2019). Overview on NLP Techniques for Content-Based Recommender
Systems for Books. Proceedings of the Student Research Workshop Associated with
RANLP, 55–61. https://doi.org/10.26615/issn.2603-2821.2019_009
Chen, J., Tam, D., Raffel, C., Bansal, M., & Yang, D. (2021). An Empirical Survey of Data
Augmentation for Limited Data Lea rning in NLP. http://arxiv.org/abs/2106.07499.
Cheng, L., Ge, S., & Liu, H. (2022). Toward Understanding Bias Correlations for Mitig ation
in NLP. http://arxiv.org/abs/2205.12391.
Gardner, M., Artzi, Y., Basmova, V., Berant, J., Bogin, B., Chen, S., Dasigi, P., Dua, D., Elazar,
Y., Gottumukkala, A., Gupta, N., Hajishirzi, H., Ilharco, G., Khashabi, D., Lin, K., Liu,
J., Liu, N. F., Mulcaire, P., Ning, Q., . . . Zhou, B. (2020). Evaluating Models’ Local
Decision Boundaries via C ontrast Sets. http://arxiv.org/abs/2004.02709.
Garrido-Muñoz, I., Montejo-Ráez, A., Martínez-Santiago, F., & Alfonso Ureña-López, L.
(2021). A Survey on Bias in Deep NLP. https://doi.org/10.20944/preprints202103.0049.v1
Haque, R., Islam, N., Islam, M., & Ahsan, M. M. (2022). A Comparative Analysis on Suicidal
Ideation Detection Using NLP, Machine, and Deep Learning. Technologie s, 10(3), 57.
https://doi.org/10.3390/technologies10030057.
Hershcovich, D., Frank, S., Lent, H., de Lhoneux, M., Abdou, M., Brandl, S., Bugliarello,
E., Piqueras, L. C., Chalkidis, I., Cui, R., Fierro, C., Margatina, K., Rust, P., &
Søgaard, A. (2022). Challenges and Strategies in Cross- Cultural NLP. http://arxiv.org/
abs/2203.10020.
Hershcovich, D., Webersinke, N., Kraus, M., Bingler, J. A., & Leippold, M. (2022). Towards
Climate Awareness in NLP Research. http://arxiv.org/abs/2205.05071.
Houlsby, N., Giurgiu, A., Jastrzebski, S., Morrone, B., de Laroussilhe, Q., Gesmundo, A., Atta-
riyan, M., & Gelly, S. (2019). Parameter-Efficient Transfer Lear ning for NLP. http://
arxiv.org/abs/1902.00751.
Hsu, E., Malagaris, I., Kuo, Y.-F., Sultana, R., & Roberts, K. (2022). Deep Learning-Based
NLP Data Pipeline for EHR-Scanned Document Information Extraction. JAMI A Open,
5(2). https://doi.org/10.1093/jamiaopen/ooac045.
Klyuchnikov, N., Trofimov, I., Artemova, E., Salnikov, M., Fedorov, M., Filippov, A., & Bur-
naev, E. (2022). NAS-Bench-NLP: Neural Architecture Search Benchmark for Natu-
ral Language Processing. IEEE Access, 10, 45736–45747. https://doi.org/10.1109/
ACCESS.2022.3169897.
Lewis, P., Perez, E., Piktus, A., Petroni, F., Karpukhin, V., Goyal, N., Küttler, H., Lewis, M.,
Yih, W., Rocktäschel, T., Riedel, S., & Kiela, D. (2020). Retrieval-Augmented Genera-
tion for Knowledge-Intensi ve NLP Tasks. http://arxiv.org/abs/2005.11401.
Lorenzini, J., Kriesi, H., Makarov, P., & Wüest, B. (2021). Protest Event Analysis: Devel-
oping a Semiautomated NLP Approach. American Behavior al Scientist. https://doi.
org/10.1177/00027642211021650
A Study on the Application of Natural Language Processing 261
Malamas, N., Papangelou, K., & Symeonidis, A. L. (2022). Upon Improving the Performance
of Localized Healthcare Virtual Assistants. Healthcare (Switzer land), 10(1). https://doi.
org/10.3390/healthcare10010099
Pais, S., Cordeiro, J., & Jamil, M. L. (2022). NLP-Based Platform as a Service: A Brief
Review. Journal of Bi g Data, 9(1). https://doi.org/10.1186/s40537-022-00603-5.
Rau, D., & Kamps, J. (2022). The Role of Complex NLP in Transformers for Text Ranking?
https://doi.org/10.1145/3539813.3545144
Varsha, J., & Bharathi, B. (2022). Proceedings of the Second Workshop on Speech and
Language Technologies for Dravidian Languages, pages 158-164 SSNCSE NLP@
TamilNLP-ACL2022: Transformer based approach for detection of abusive comment
for Tamil language.
Voa, N. N. Y., Vu, Q. T., Vu, N. H., Vu, T. A., Mach, B. D., & Xu, G. (2022). Domain-
Specific NLP System to Support Learning Path and Curriculum Design at Tech Uni-
versities. Computers and Education: Artificial Int elligence, 3. https://doi.org/10.1016/j.
caeai.2021.100042.
Wang, L., Shen, Y., Peng, S., Zhang, S., Xiao, X., Liu, H., Tang, H., Chen, Y., Wu, H., &
Wang, H. (2022). A Fine-grained Interpretability Evaluation Benchmark fo r Neural
NLP. http://arxiv.org/abs/2205.11097.
22 Detection of Polarity
in the Native-Language
Comments of Social
Media Networks
Sudeshna Sani, Dipra Mitra, and
Soumen Mondal
22.1 INTRODUCTION
The quantity of information stored on the Internet grows by the day in our digital
age, and the overall to date, there has been a massive amount of data stored. It is
no longer possible to determine or study the pattern manually and the behaviors
of this massive amount of data from diverse types of people. This data, however,
has been made public since it contains extremely useful information on the atti-
tudes of many persons from many categories all across the world. As a result, it
has become critical to use automated systems to summarize this massive volume
of data.
The analysis of such a vast population’s thoughts has become increasingly chal-
lenging, demanding the application of new approaches. Several studies on English
sentiment analysis have been undertaken, and many of these studies have produced
remarkable results. However, because Bangladeshi language sentiment analysis
has gotten so little attention, there are a lot of research prospects in this area.
According to this chapter, words’ vector representations and sentiment infor-
mation can both be employed jointly in the analysis of Bengali comments for
sentiment. Each comment is rated as excellent or poor based on the opinions of
the people who posted it in a Bengali microblogging service. A collection of sin-
gle and multiline comments of people’s viewpoints are collected through surveys,
observing that the categorization of feelings is influenced by the sentiment infor-
mation in the comments as well as the context of the remarks. A new method
for combining these two forms of data has been developed, and it has produced
impressive results.
studies. Paul Lewis et al. [1] and Cui et al. [2] focused on reviews of products
found online. They separated the input into two groups: positive and negative. They
examined over 100,000 product reviews from a variety of sources. Jagtap et al.
used the support vector machine (SVM) and hidden Markov model (HMM) [3].
Alm and colleagues used a mixed classification algorithm to extract the teacher
evaluation emotion, and it worked effectively [4]. Emotions can be classified as
positive, negative, or neutral. Phrases were split into three polar categories. Using
the winnow parameter modifying approach, they were able to attain 63 percent
accuracy. Unigram was used by Agarwal et al. [5]. To extract Twitter sentiments,
they used a tree and feature-based model which outperformed the unigram model.
They obtained a 61 percent accuracy rate. Zou et al. [6] proposed a method for
learning a multilingual unlabeled dataset with word embeddings when it comes to
semantic similarity; their model outperformed baselines. Brown clusters, embed-
dings from Collobert and Weston (2008), as well as hierarchical log- bilinear
embeddings were all investigated by Turian et al. [7]. Chen et al. [8] provided a few
methods for distinguishing word embedding models, which have been released.
They demonstrated that even without possessing the structure, embeddings may
detect surprising semantics in texts. Tang et al. [9] proposed a method for obtaining
information about words, both contextual and sentiment, by using a methodology
they developed. Sentiment-specific word embedding is a technique for obtaining
information on both the context and the sentiment about words. They used their
model to extract sentiments from Twitter. They acquired an accuracy of roughly
83 percent.
Omer Levy and Yoav Goldberg [10, 11] (2014) and Mikolov et al.’s (2013) skip-
gram model with negative sampling was generalized for word representation.
They extracted contexts based on dependency and demonstrated that they pro-
duce various forms of similarities. Vocabulary extension, statistical sharing, and
embedding structure are considered. According to Andreas and Klein [12] and
Hellinger PCA, Lebret and Collobert [13], word embeddings have three potential
benefits. They devised a system to determine how words are represented in con-
text, a word co-occurrence matrix. They got an accuracy of roughly 89 percent.
To determine word embedding models, Levy et al. [14] used a neural network and
word embedding models. Bengali is the subject of a few studies. To determine the
emotion of Bengali microblog postings, Chowdhury and Chowdhury [15] used
maximum entropy with SVM (MaxEnt). They tried combining the two strategies
with different types of attributes. Contextual valency analysis was described by
Hasan et al. [16] as a tool for detecting feelings in Bengali literature. They used
overall positivity, total negativity, and total neutrality, which can all be deter-
mined with a part of speech (POS) tagger, and then calculated the final results.
Das [17] proposed an approach for detecting feelings in Bengali and English texts
using a computer method. He categorizes feelings into six groups. They are joyful,
depressed, angry, disgusted, terrified, and surprised. Hasan et al. [18] introduced
an emotion analyzer that can determine people’s feelings. Positive and negative
sentiment phrase patterns, as well as sentiment orientations, provide sentiment
information. Islam et al. used Facebook to detect the emotion of Facebook status
264 Artificial Intelligence and Knowledge Processing
in Bengali. In reference [19], the authors employed the naive Bayes model with the
naive Bigram and Bayes approach, achieving an F-score of 0.72. We utilized the
sentiment of Bengali comments detected using Hellinger PCA and word embed-
ding [11]. A matrix of co-occurrence of words is generated using skip-gram to
establish the context. Sliding windows are established to capture pertinent terms
in the windows, and information from the comments is collected.
22.3 METHODOLOGY
The Subjectivity Word List and SentiWordNet (Esuli et al., 2006; Wilson et al.,
2005) are two extensively used lexical resources in English for subjectivity detection.
SentiWordNet is an artificially generated English lexical database that gives each
WordNet synset a positivity and negativity score ranging from 0 to 1. SentiWordNet
1.1 for English was released and the same authors have offered English translations.
The vocabulary of subjectivity was devised using hand-crafted resources and entries
extracted from corpora. The items from the subjective part of the entry’s depend-
ability has been categorized as either strong subjective or weak subjective in the
subjectivity lexicon.
POS Tagger: A POS tagger scans each word for its assigned parts of speech in
a language’s text (and other token), such as noun, verb, adjective, and so on; how-
ever, most computational applications utilize finer-grained POS tags like ‘noun-
plural.’ Kristina Toutanova created the tagger in the first place. Since then, Dan
Klein, Christopher Manning, William Morgan, Anna Rafferty, Michel Galley,
and John Bauer have worked to improve language speed, performance, usability,
and support.
To tokenize our statement, we use Stanford’s POS tagger. Then we choose only
those parts of speech that can influence the polarity of a sentence, or polar terms.
Table 22.1 lists the abbreviations for several parts of speech.
In the same context, similar phrases occur more frequently. WORD2VEC [20]
converts each word into a vector representation. Similar words cluster together in
TABLE 22.1
POS Types
Pos_name Pos_abbreviation Sentiwordnet_Abr
Noun NN n
Adjective JJ a
Verb VB v
Adverb RB r
Noun NNS n
Adjectives JJS a
Detection of Polarity in the Native-Language Comments 265
TABLE 22.2
Statistics on Bengali Corpus
NEWS BLOG
Number of documents in total 200 –
Number of sentences total 4434 500
The average number of sentences in each paragraph 50 –
Number of different word forms 45,807 8675
A document’s average number 488 –
of word forms
The total amount of unique word forms 35,176 2478
the WORD2VEC model’s vector space. WORD2VEC keeps the words’ syntactic
meanings and sorts them by syntactic similarity. As a result of their syntactic struc-
ture, equivalent words in WORD2VEC vector space stay closer, but opposite emotion
polarity words may also stay closer, resulting in poor sentiment classification results.
As a result, in sentiment analysis, the polarity score of each word is crucial. Two
steps were taken: For polarity detection, we used WORD2VEC word embedding to
gather related terms and SentiWordNet to gather lexically similar words. We created
a novel strategy that combines WORD2VEC’s similarity score of co-occurring words
to overcome each word in the query comment’s disadvantages of WORD2VEC’s
emotion polarity score. We used two separate domain corpora, namely NEWS and
BLOG, to assess subjectivity. Although sentiment lexicons are often domain agnos-
tic, they are a useful place to start. There are more domains available to provide an
adaptation or a fine-tuned approach in the literature. SentiWordNet (Bengali) is used
in a subjective classifier that evaluates its coverage using a modest number of rules.
Table 22.2 shows the size of both the corpus and the sample.
Then we present our technique for determining the overall polarity of the sen-
tence, which includes terms that could improve, increase, or decrease the polarity of
the related word.
For comparison with SentiWordNet (English), the same subjectivity detection meth-
odology was used as applied to the IMDB Movie Review and Multi Perspective
Question Answering (NEWS) corpora (English). SentiWordNet (Bengali) has con-
siderable coverage, according to the results of the subjectivity classifier on both cor-
pora. The word list for subjectivity utilized in the subjectivity detection method was
developed using the same IMDB corpora as in this study. SentiWordNet is a network
of SentiWords (Bengali); on the other hand, it is corpus-independent and has excel-
lent coverage.
The goal of this test is to determine how reliable sentiment lexicon polarity
scores are. Beginning with a dictionary words and phrases that are both positive
and negative is a frequent approach to sentiment analysis. These lexicons are a
collection of lexicons that are used to label the previously out-of-context polar-
ity of entries. In what ways may the present SentiWordNet (Bengali), a previ-
ous polarity lexicon, help with text polarity identification? To test the reliability
of SentiWordNet (Bengali) polarity scores, a classifier for polarity was devel-
oped using SentiWordNet (Bengali) and various other linguistic characteristics.
According to the feature ablation approach, the produced SentiWordNet (Bengali)
is reliable in terms of the scores related to it. The findings of a SentiWordNet-based
polarity classifier are shown in Table 22.3.
The SentiWordNet (Bengali) polarity scores must now be taken seriously.
Unfortunately, there isn’t a single paper in the literature that discusses the polarity
classification accuracy that just uses prior polarity lexicon. A comparable investigation
will be necessary in the future, but our findings suggest SentiWordNet is a network
of words and could provide a solid foundation (approximately 50 percent accuracy).
TABLE 22.3
Using SentiWordNet for Polarity-Wise Performance (Bengali)
Polarity Precision Recall
Positive 66.59% 62.89%
Negative 85.57% 75.87%
Detection of Polarity in the Native-Language Comments 267
negative remarks. Because the positive and negative training datasets are founded
on the views of a variety of persons, the training datasets have a high level of clarity
and accurately reflect the actual situation. Though this sort of labelling reflects the
actual situation, uncertainty may develop due to the variety of tags. This ambiguity
can be eliminated by considering the opinions of a vast number of people, which we
have done.
TABLE 22.4
Average Classification Accuracy at Each Step for Bengali Corpus
Steps of Execution Data Accuracy in Terms of TPR and TNR
Obtained (%)
1 4000 90%
2 8000 91%
3 12,000 91.5%
4 16,000 92.5%
5 20,000 93%
Average 20,000 sample data 93.20%
TABLE 22.5
Precision and Recall for Highest Number of Dataset
Languages Domain Precision Recall
English MPQA 86.08% 93.33%
IMDB 89.90% 96.55%
Average of English 87.99% 94.94%
Bengali NEWS 82.16% 96.00%
BLOG 84.6% 90.4%
Average of Bengali 83.38% 93.20%
268 Artificial Intelligence and Knowledge Processing
TABLE 22.6
Confusion Matrix Bengali Blog
Model Predicted as Positive Model Predicted as Negative
Really Positive Comments 48.4% 4.6%
Really Negative Comments 5% 42%
Detection of Polarity in the Native-Language Comments 269
22.7 CONCLUSION
The way words are represented in a sentence can influence their qualities. The con-
text determines the meaning of the phrases. The context and word properties of a
number of sentences is determined by word embedding. Other statistical techniques,
including Bengali sentiment analysis, are heavily reliant on sentence structure.
However, the outcomes of word embedding are feelings determined as a result of the
surroundings aspects of the words, which are independent of the sentence patterns.
We used word embedding on our own collection of comments, articles, and blogs in
Bengali obtained recently because it is a novel technique for analysis. A collection
of extremely both good and negative terms, along with their opposition scores, is
created. The accuracy is increased when these results, as well as the neutralization
word valence shifter, are coupled.
The result is 93.20 percent, which is highly intriguing and important for future
research and tuning for native languages. Because the accuracy produced by our
model rises with the size of the dataset, we are optimistic that if it is possible to pro-
duce a gold-standard dataset, this strategy can be followed where we will not have
a sufficient dataset for native language comments. The graph depicting the level of
precision shows that as the dataset grows larger, the accuracy increases; therefore,
we’re aiming to enhance the outcomes.
REFERENCES
[1] M. Paul Lewis, Gary F. Simons and Charles D. Fennig (eds.), Ethnologue: Languages of
the World, Nineteenth edition. Dallas, Texas: SIL International, 2016.
[2] Hang Cui, Vibhu Mittal and Mayur Datar, “Comparative Experiments on Sentiment
Classification for Online Product Reviews,” Proceedings of the 21st National Confer-
ence on Artificial Intelligence, AAAI, Boston, MA, 2006.
[3] Balaji Jagtap and Virendrakumar Dhotre, “SVM and HMM Based Hybrid Approach
of Sentiment Analysis for Teacher Feedback Assessment,” International Journal of
Emerging Trends & Technology in Computer Science (IJETTCS), Volume 3, Issue 3,
May–June 2014.
[4] C. Alm and D. Roth and R. Sproat, “Emotions from Text: Machine Learning for Text-
Based Emotion Prediction,” Proceedings of Human Language Technology Conference
and Conference on Empirical Methods in Natural Language Processing (HLT/EMNLP),
ACM, Pages 579–586, 2005.
[5] Apoorv Agarwal, Boyi Xie, Ilia Vovsha, Owen Rambow and Rebecca Passonneau, “Sen-
timent Analysis of Twitter Data,” LSM’11 Proceedings of the Workshop on Languages
in Social Media, Pages 30–38, 2011.
[6] Will Y. Zou, Richard Socher, Daniel Cer and Christopher D. Manning, “Bilingual Word
Embeddings for Phrase-Based Machine Translation,” Proceeding of Conference on
Empirical Methods in Natural Language Processing, Pages 1393–1398, 2013.
[7] Joseph Turian, Lev Ratinov and Yoshua Bengio, “Word Representations: A Simple and
General Method for Semi-Supervised Learning,” Proceedings of the 48th Annual Meet-
ing of the Association for Computational Linguistics, pages 384–394, Uppsala, Sweden,
11–16 July 2010.
[8] Yanqing Chen, Bryan Perozzi, Rami Al-Rfou, and Steven Skiena, “The Expressive
Power of Word Embeddings,” ICML 2013 Workshop on Deep Learning for Audio,
Speech, and Language Processing, Atlanta, USA, June 2013.
270 Artificial Intelligence and Knowledge Processing
[9] Duyu Tang, Furu Wei, Nan Yang, Ming Zhou, Ting Liu and Bing Qin, “Learning
Sentiment-Specific Word Embedding for Twitter Sentiment Classification,” Proceedings
of the 52nd Annual Meeting of the Association for Computational Linguistics, pages
1555–1565, Baltimore, Maryland, USA, June 23–25, 2014.
[10] Omer Levy and Yoav Goldberg, “Dependency-Based Word Embeddings,” Proceedings
of the 52nd Annual Meeting of the Association for Computational Linguistics (Short
Papers), pages 302–308, Baltimore, Maryland, USA, June 23–25, 2014.
[11] Md. Saiful Islam, Md. Al- Amin and Shapan Das Uzzal, “Word Embedding with Hell-
inger PCA to Detect the Sentiment of Bengali Text,” The 19th International Confer-
ence on Computer and Information Technology (ICCIT–2016), December 18–20, North
South University, Dhaka, 2016.
[12] Jacob Andreas and Dan Klein, “How Much Do Word Embeddings Encode About Syn-
tax?” Proceedings of the 52nd Annual Meeting of the Association for Computational
Linguistics (Volume 2: Short Papers), June 2014.
[13] Remi Lebret and Ronan Collobert, “Word Embeddings through Hellinger PCA,” Idiap
Research Institute, Rue Marconi 19, CP 592, 1920 Martigny, Switzerland, arXiv preprin-
tarXiv:1312.5542, 2013.
[14] Omer Levy, Yoav Goldberg and Ido Dagan, “Improving Distributional Similarity with
Lessons Learned from Word Embeddings,” Transactions of the Association for Compu-
tational Linguistics, Volume 3, pp. 211–225, 2015. Action Editor: Patrick Pantel. Sub-
mission batch: 1/2015, Revision batch 3/2015, Published 5/2015.
[15] Shaika Chowdhury and Wasifa Chowdhury, “Sentiment Analysis for Bengali Microblog
Posts,” International Conference on Informatics, Electronics & Vision (ICIEV), 2014.
[16] K. M. Azharul Hasan, Mosiur Rahman and Badiuzzaman, “Sentiment Detection from
Bengali Text using Contextual Valency Analysis,” 17th Int’l Conf. on Computer and
Information Technology, Daffodil International University, Dhaka, Bangladesh, 22–23
December 2014.
[17] Dipankar Das, “Analysis and Tracking of Emotions in English and Bengali Texts:
A Computational Approach,” Proceedings of the 20th International Conference on
World Wide Web, WWW 2011, Hyderabad, India, March 28–April 1, 2011.
[18] K. M. Azharul Hasan, Sajidul Islam, Mashrur-E-Elahi and Mohammad Navid Izhar,
“Sentiment Recognition from Bangla Text,” Technical Challenges and Design Issues in
Bangla Language Processing, IGI Global, 2013.
[19] Md. Saiful Islam, Md. Afjal Hossain, Md. Ashiqul Islam and Jagoth Jyoti Dey, “Super-
vised Approach of Sentimentality Extraction from Bengali Facebook Status,” The 19th
International Conference on Computer and Information Technology (ICCIT-2016),
December 18–20, North South University, Dhaka, 2016.
[20] Tomas Mikolov, Kai Chen, Greg Corrado and Jeffrey Dean, “Efficient Estimation of
Word Representations in Vector Space,” Proceedings of International Conference on
Learning Representations, ser. ICLR ’13, 2013.
23 Machine Learning
Techniques for
Detecting and
Analyzing Online
Fake Reviews
Yashwitha Buchi Reddy, Ch. Prathima, Swetha
Jaladi, B. Dinesh, and J.R. Arun Kumar
23.2.1 Methodology
Naïve Bayes: A naïve Bayes is a classification machine learning model for
classification.
P ( B) | A)P( A)
The classifier’s crux is built on the Bayesian statistics. P( A | B) =
P ( B)
We can use Bayes’ theorem to calculate the odds of A happening if B has already
happened. A is the assumption, and B is the confirmation [7]. The determinants of
individual in this case are expected to be autonomous. In other words, the pres-
ence of one trait has no bearing on the existence of the other. As a result, it is
naïve. Naïve Bayesian forecasts the probability of various classes depending on
multiple attributes using a similar technique. This method is most typically used
for text categorization and multiclass issues. Forecasting the class label data set
is simple and quick. It also has the ability to forecast many classes. Whenever the
isolation requirement is met, a naïve Bayes classification outperforms other algo-
rithms, especially regressors, and requires a smaller data set. It performs effec-
tively using categorical input variables as opposed to numeric input variables. For
numerical variables, a normal distribution is assumed. Naïve Bayes is a popular
Machine Learning Techniques for Detecting and Analyzing 273
in-text classification method because of its autonomous concept and strong perfor-
mance in solving multiclass situations [6]. Hence it is used in various applications
that uses smaller data sets. Rather than numerical values, naïve Bayes works bet-
ter with category input variables.
KNN: This is straightforward and quick way to estimate the class of the
testing data set. K-nearest neighbor (KNN) is a simple machine learning tech-
nique centered on learning that is supervised. The KNN method saves all cur-
rent information and classifies new data pieces into similar groupings [1]. This
means that using the KNN method, new data may be quickly filtered into the
well-categorized sets. The KNN technique is useful for both classification and
regression applications; however, it is more typically employed for classifica-
tion. As seen in Figure 23.1, there are significant modules of data clustered
using KNN.
Predicting the class label set of data is simple and quick. KNN is a non-parametric
technique involving no decisions based on data [8] because it does not instantly
understand from the test data set; it also is known as a slow learner algorithm. Rather,
it stores the database and then, when the time comes, performs a categorization func-
tion on it.
Logistic Regression: A supervised learning approach used to forecast a depen-
dent categorical predicted value is logistic regression. In essence, logistic regression
may be useful if you have a huge quantity of data to categorize. For example, if
you were given a cat and an apple and asked whether they were animals or not, you
would expect the cat to be classified as a creature and the fruits to be classified as
non-animal. Your goal is to correctly label the animal, which is dependent on your
data. There are just two viable solutions in this example: animal or not an animal.
However, you may configure your regression analysis with more than two potential
groups (multinomial logistic regression). As seen in Figure 23.2, there are data points
calculated using logistic regression.
Advantages:
i. The precision is excellent.
ii. Simple to understand.
iii. Extremely effective.
iv. There is no requirement for skilled personnel.
FIGURE 23.6 Graph of accuracies of test cases for the assessment of fake reviews.
23.3.1 Results
Evaluation was done on our proposed system on the Yelp data set. Firstly we get
the accuracy of each algorithm. Here are some test cases which are considered on
different algorithms: As seen in Table 23.1, there are significant test cases for the
assessment of fake reviews. Table 23.2 shows that there are significant values of
accuracies of test cases for the assessment of fake reviews. Figure 23.6 shows a graph
of accuracies of test cases for the assessment of fake reviews.
Machine Learning Techniques for Detecting and Analyzing 277
TABLE 23.1
Test Cases for the Assessment of Fake Reviews
Test Case Input Expected Actual Output P/F
Output
Examine the test Test- data path. The data set must be The data set was Pass
data. successfully read. successfully
retrieved.
Preparing the dataset Pre-processing starts Pre-processing Pre-processing was Pass
for analysis should be done finished
on the dataset. successfully
Model construction Model construction Models must be The model was Pass
for the clean created using the successfully
data necessary created [5]
algorithms.
Fake Input provided. Output should be Resulted Pass
review estimation whether review is successfully
fake or not.
TABLE 23.2
Accuracy Assessment of Fake Reviews
S. No Algorithm x
1 KNN 0.60175
2 Naïve Bayes 0.73725
3 XGBOOST 0.88075
4 Logistic Regression 0.88625
5 CATBOOST 0.88875
6 LGBM 0.89025
Test cases:
23.3.2 Conclusion
We have effectively built a detection approach system light gradient boosting model
(LGBM) for fake reviews that are found on numerous sites. This is done in a viewer
atmosphere using Python programming on the Django framework. More feature
278 Artificial Intelligence and Knowledge Processing
choices and the ability to detect other types of bogus reviews might be implemented
in the future. With the improved data set, we want to examine detection approaches
and deploy the most effective and valid computational methods for recognition. As
seen in Figure 23.7, there is an approach of LGBM accuracy for detecting fake review.
REFERENCES
[1] C. C. Aggarwal, “Opinion mining and sentiment analysis,” in: Machine Learning for
Text. Springer, Cham. pp. 413–434, 2018.
[2] O. A. A. C. A. I. R. Barbado, “A framework for fake review detection in online consumer
electronics retailers,” Information Processing & Management, pp. 1234–1244, 2019.
[3] S. Tadelis, “The economics of reputation and feedback systems in e-commerce,” IEEE
Internet Computing, vol. 20, no. 1, pp. 12–19, 2016.
[4] M. J. H. Mughal, “Data mining: Web data mining techniques, tools and algorithms,”
Information Retrieval, vol. 9, no. 6, pp. 12–15, 2018.
[5] V. V. B. L. A. N. G. A. Mukherjee, “What yelp fake review filter might be doing?” in
Seventh International AAAI Conference on Weblogs and Social Media, 2013.
[6] N. J. A. B. Liu, “Review spam detection,” in Proceedings of the 16th International Con-
ference on World Wide Web, 2007.
[7] E. E. A. A. Gherbi, “Detecting fake reviews through sentiment analysis using machine
learning techniques,” Iaria/Data Analytics, pp. 564–569, 2017.
[8] R. P. A. U. A. P. W. V. Singh, “Sentiment analysis of movie reviews and blog posts,”
Advance Computing Conference (IACC), pp. 893–898, 2013.
24 A Study on the Application
of Expert Systems as
a Support System for
Business Decisions
A Literature Review
Geetha Manoharan, Subhashini Durai,
Gunaseelan Alex Rajesh, and
Sunitha Purushottam Ashtikar
24.1 INTRODUCTION
Gabriel Lanzaro and Michelle Andrade (2022) said speed limits balance safety and
traffic flow. Establishing a speed limit typically entails choosing a base speed (such as
operational speed or design speed) and modifying it in accordance with a number of
additional factors. For instance, the typical recommendations in Brazil list a number
of factors that influence speed limits but do not outline how to select a speed limit for
a particular stretch of highway. So, in accordance with Brazilian practise, the deci-
sion-maker must make a decision on a particular issue that primarily depends on an
expert opinion. This chapter suggests a fuzzy expert system for determining Brazil’s
highway speed limits. The system takes into account six input variables. Membership
functions and fuzzy rules were generated by expert evaluations of simulated high-
way scenarios. The experts used linguistic factors and suggested speed limits as they
assessed the scenarios. Afterwards, a Mamdani fuzzy controller was created. For the
simulated highway scenarios, the expert’s responses were compared to the control-
ler’s outputs. For additional system validation, some case studies of Brazilian highway
segments were used. Results demonstrated that the fuzzy system can generate outputs
that concur with professional assessments and current speed limits. This study’s fuzzy
controller can be used to help professionals set speed limits on Brazilian highways.
According to Megdad et al. (2022), mint is a grassy, perennial plant that grows quickly
and widely, whose leaves are green, fragrant, tart, and refreshing, with square-
shaped legs that are bifurcated and erect, and they can be as tall as 10–201 cm. It
calls Europe and Asia home. The significant effects are relieving pain, gallbladder
issues, gas release, anti-inflammatory properties, and nerve relaxation. The mint
plant also has a number of other advantages. Even though mint is the best plant to
use as a starter crop in gardens, it is susceptible to a number of common ailments
that stunt its development. Get the proper disease diagnosis and treatment using
this expert system’s primary objectives. The paper suggests the expert system be
designed for farmers and agriculture enthusiasts to diagnose mint diseases such as
mint rust, Verticillium wilt, anthracnose, powdery mildew, black stem rot, stem and
stolon canker, and Septoria leaf spot. The study provides a summary of diseases as
well as information on their causes and, whenever possible, treatment recommenda-
tions. Designing and putting into practise the suggested expert system requires the
use of the CLIPS expert system language. Results: Al Azhar University agricultural
students and a group of agriculture-interested friends deemed the proposed mint
disease expert system satisfactory. The suggested expert method can be extremely
helpful to farmers and those who are interested in agriculture.
After the filtering stage, we located 15 particular research papers. The following three
main areas of interest were highlighted by the key findings: the causes of hypertension,
expert system techniques, and the different kinds of sensors used in wearable tech-
nology. The most frequent cause of hypertension that can be measured by wearable
technology is blood pressure. For expert systems, we found that machine learning,
neural networks, and fuzzy logic are the three most popular methods. In research on
hypertension, the wrist band is the most popular sensor for wearable devices.
Due to the constantly growing trend of using it as a tool for international commu-
nication, Japanese instruction in colleges is no longer effective. The study employs
an expert system as the theoretical foundation and back propagation (BP) neural
network technology as the auxiliary teaching system for Japanese teachers and stu-
dents to address the needs and growth of society, teacher shortages, and a disregard
for students’ fundamental knowledge of Japanese language teaching in colleges. This
method of test organisation involves categorising and summarising the test questions
according to the knowledge domains and level of difficulty. This method helps teach-
ers create tests. By identifying knowledge gaps, this system allows students to learn
independently and practise more effectively with half the effort. After extensive data
testing and operations, the system has proven to be realistic and useful.
are combined to create a novel PVC recognition algorithm. For K-means clustering,
features from ECG heartbeats were taken out by a long-term memory-based auto-
encoder (LSTM-AE) network. So, using the clustering results as a starting point,
templates were created and decided upon. Finally, a set of rules, including template
matching and rhythm characteristics, was used to determine the PVC heartbeats.
Three quantitative parameters, sensitivity (Se), positive predictive value (P+), and
accuracy (ACC), were used to evaluate the effectiveness of the proposed method
using data from the MIT-BIH Arrhythmia database and the St. Petersburg Institute
of Cardiological Technics database. 92.47 percent and 93.18 percent training accu-
racy on the two test databases. The test accuracy for the two databases was 87.51 per-
cent and 87.92 percent, respectively. The third China Physiological Signal Challenge
2020 training set PVC scores were 36,256 and 46,706. These scores could win open-
source competitions. The results showed that an expert system and deep learning can
improve PVC identification from single-lead ECG recordings.
was proposed as a means of reducing the ambiguity and uncertainty inherent in mak-
ing a breast cancer diagnosis and relieving the strain placed on the network nodes of
the underlying fuzzy neural network system by removing irrelevant features used for
prediction or diagnosis. Using an enhanced Gini index RF-based feature importance
measure algorithm, the five most appropriate features were selected from the dataset in
the Wisconsin breast cancer diagnostic database. Two classification models were cre-
ated using logistic regression, SVM, K-nearest neighbour, ranaïveforest, and gaussian
naive Bayes. Consequently, models with all features (32) and the five best features (31)
were used. The comparison’s outcome demonstrates that in terms of accuracy, sensitiv-
ity, and specificity, the models with the fittest features performed better than those with
full features. The five fittest features were used to build an expert system with 99.33%
accuracy, 99.41% sensitivity, and 99.24% specificity. The system is more accurate, sen-
sitive, and specific compared to previous studies using fuzzy neural networks or other
artificial intelligence techniques on the same dataset to diagnose breast cancer. The
z-test result demonstrated improved accuracy for early breast cancer diagnosis.
due to uncertainty, ambiguity, and vagueness. Using a novel BRB with interval-
valued references (BRB-IR), which is proposed in this paper, it is possible to build
models by combining qualitative knowledge with quantitative data. To begin with, an
optimisation algorithm that is non-linear is used to optimise the interval-valued ref-
erential values that experts have provided. The P-CMA-ES algorithm also optimises
other model parameters. A pipeline leak detection case study was created to validate
the model. Compared to the classic BRB, the proposed BRB-IR demonstrates that
the latter is inefficient and fails to adequately capture expert knowledge.
24.16 CONCLUSION
Traditional management modes require a lot of manpower and time to monitor
massive amounts of data, so they can’t perform real-time evaluations and miss the
best time to solve problems. Therefore, effective computing techniques are needed.
288 Artificial Intelligence and Knowledge Processing
REFERENCES
Algehyne, E. A., Jibril, M. L., Algehainy, N. A., Alamri, O. A., & Alzahrani, A. K. (2022).
Fuzzy Neural Network Expert System with an Improved Gini Index Random Forest-
Based Feature Importance Measure Algorithm for Early Diagnosis of Breast Cancer
in Saudi Arabia. Big Data and Cognitive Computing, 6(1). https://doi.org/10.3390/
bdcc6010013
Al-Qadi, M. H., El-Habibi, M. F., Megdad, M. M. M., Alqatrawi, M. J. A., Sababa, R. Z., &
Abu-Naser, S. S. (2022). Developing an Expert System to Diagnose Tomato Diseases.
International Journal of Academic Engineering Research, 6. www.ijeais.org/ijaer.
Aslem, Y. I., & Abu-Naser, S. S. (2022). CLIPS-Expert System to Predict Coriander Diseases.
International Journal of Engineering and Informati on Systems (IJEAIS), 6. www.ijeais.
org/ijeais.
Cai, Z., Wang, T., Shen, Y., Xing, Y., Yan, R., Li, J., & Liu, C. (2022). Robust PVC Identifica-
tion by Fusing Expert System and Deep Learning. Biosensors, 12(4): 185. https://doi.
org/10.3390/bios12040185
Chu, H. (2022). Research on Expert System of Japanese Auxiliary Teaching Based on BP
Neural Network. Mo bile Information Systems. https://doi.org/10.1155/2022/7719392.
El-Habibi, M. F., Megdad, M. M. M., Al-Qadi, M. H., Alqatrawi, M. J. A., Sababa, R. Z., &
Abu-Naser, S. S. (2022). A Proposed Expert System for Obstetrics & Gynecology Dis-
eases Diagnosis. International Journal of Academic Multid isciplinary Research, 6.
www.ijeais.org/ijamr.
El-Hamarnah, H. A., Lafi, O. I. A., Radwan, H. I. A., Al-Saloul, N. J. H., & Abu-Naser, S. S.
(2022). Proposed Expert System for Pear Fruit Diseases. International Journal of Aca-
demic and Applied Research, 6. www.ijeais.org/ijaar.
Lafi, O. I. A., El-Hamarnah, H. A., Al-Saloul, N. J. H., Radwan, H. I. A., & Abu-Naser, S. S.
(2022). A Proposed Expert System for Broccoli Diseases Diagnosis. International Jour-
nal of Engineering and Informat ion Systems (IJEAIS), 6. www.ijeais.org/ijeais.
A Study on the Application of Expert Systems 289
Lanzaro, G., & Andrade, M. (2022). A Fuzzy Expert System for Setting Brazilian Highway
Speed Limits. International Journal of Transpo rtation Science and Technology. https://
doi.org/10.1016/j.ijtst.2022.05.003.
Megdad, M. M. M., Ayyad, M. N., Al-Qadi, M. H., El-Habibi, M. F., Alqatrawi, M. J. A., Sababa,
R. Z., & Abu-Naser, S. S. (2022). Mint Expert System Diagnosis and Treatment. Inter-
national Journal of Academic Informat ion Systems Research, 6. www.ijeais.org/ijaisr.
Mohd Sani, M. I., Abdullah, N. A. S., & Mohd Rosli, M. (2022). Review on Hypertension
Diagnosis Using Expert System and Wearable Devices. International Journal of Electri-
cal and Computer Engineering, 12(3), 3166–3175. https://doi.org/10.11591/ijece.v12i3.
pp3166-3175.
Pawan, E., Thamrin, Rosiyati M. H., Widodo, Widodo, Bei, Sariaty H. Y., & Luanmasa, Junus
J. (2022). Implementation of Forward Chaining Method in Expert System to Detect
Diseases in Corn Plants in Muara Tami District. International Journal of Computer and
Information System, 3(1).
Radwan, H. I. A., El-Hamarnah, H. A., H Al-Saloul, N. J., A LAfi, O. I., Abu-Naser, S. S., &
Edward Feigen Baum, by. (2022). A Proposed Expert System for Passion Fruit Diseases.
International Journal of Academic Engineering Research, 6. www.ijeais.org/ijaer
Raihan, M., Hassan, M. M., Hasan, T., Bulbul, A. A. M., Hasan, M. K., Hossain, M. S., Roy, D. S.,
& Abdul Awal, M. (2022). Development of a Smartphone-Based Expert System for
COVID-19 Risk Prediction at Ear ly Stage. Bioengineering, 9(7). https://doi.org/10.3390/
bioengineering9070281.
Salabun, W., Wieckowski, J., & Watrobski, J. (2022). Swimmer Assessment Model (SWAM):
Expert System Supporting Sport Potential Measureme nt. IEEE Access, 10, 5051–5068.
https://doi.org/10.1109/ACCESS.2022.3141329
Soleymani, M., & Nejad, M. O. (2018). Supply Chain Risk Management using Expert Sys-
tems. International Journal of Current Eng ineering and Technology, 8(04). https://doi.
org/10.14741/ijcet/v.8.4.12.
Sun, C., Yang, R., He, W., & Zhu, H. (2022). A Novel Belief Rule Base Expert System with
Interval-Valued Referen ces. Scientific Reports, 12(1). https://doi.org/10.1038/s41598-
022-10636-8.
Tang, J., & Deng, Y. (2022). The Design Model of English Graded Teaching Assistant Expert
System Based on Improved B/S Three-Tier Structure System. Mo bile Information Sys-
tems. https://doi.org/10.1155/2022/4167760.
Wulansari, R. E., Sakti, R. H., Ambiyar, A., Giatman, M., Syah, N., & Wakhinuddin, W.
(2022). Expert System for Career Early Determination Based on Howard Gardner’s
Multiple Intelligence. Journal of Applied Engineering and Technological Science, 3(2).
Zhou, W., Zhao, X., Wang, X., Zhou, Y., Wang, Y., Meng, L., Fan, J., Shen, N., Zhou, S., Chen,
W., & Chen, C. (2022). A Hybrid Expert System for Individualized Quantification of
Electrical Status Epilepticus During Sleep Using Biogeography-Based Optimization.
IEEE Transactions on Neural Systems and Rehabilitat ion Engineering, 30, 1920–1930.
https://doi.org/10.1109/TNSRE.2022.3186942
25 Applications of
Artificial Intelligence on
Customer Experience
and Service Quality of
the Banking Sector
An Overview
Sravani Elaprolu, Channabasava Chola,
Varanasi Chandradhar, and Raul V. Rodriguez
25.1 INTRODUCTION
There are various stages involved in the banking industry from processing the loan
application of the customers to ensuring the safe banking transactions for each cus-
tomer until they maintain the services with the banks. Customers are looking for
better service wherever they collaborate with the products and services; in other
words, better customer experience is the demand that the customers put forward.
As the technology evolved in the past decades, industries have embarked on using
state-of-the-art technology, namely, artificial intelligence, whereby superior quality
of service can be delivered to the customers. The importance of the banking industry
and its influence on the development of the country is discussed in Section 25.1 and
how artificial intelligence and its different applications improves processes involved
in the banking industry will be discussed in Section 25.2. And the conclusion part
will take place in Section 25.3.
asset at later point in time but also to enhance the process of the loan allocation to
the right projects. Approving a loan to a non-profitable project will indicate poor
investment of resources, which affects the performance of the banks as well as eco-
nomic growth of a country. Because providing loans is one of the key functions of
the banks, failing in that core activity will severely affect the growth of the banks
(Park, 2012).
Besides, banks have to lend the money to the borrowers for making a profit (Ince
and Aktan, 2009) that will contribute to the growth of financial activities, economic
development activities, and industrial activities (Cetorelli and Gambera, 1999). At
the same time, bank loan availability will be dropped significantly if a bank crisis
happens, which leads to a reduction in the loan supply offered by the bank (Huber,
2018). In India, public-sector banks possess more than three-fourths of total assets
belonging to the entire banking sector wherein the state bank of India itself has
17 percent of the total commercial banking assets (Goldberg, 2009). Banking insti-
tutions can perform effectively in providing loans to individuals and firms as per
their demand if the market share of the bank is quite large. When banks charge a
high interest rate, plan to achieve high margin, and decide to decrease the loan sup-
ply, economic growth and creation of jobs will be affected and the unemployment
rate will be increased (Feldmann, 2015). In addition to that, if the entry barrier is
high in the banking industry, the initial expense will be higher, and as a result the
high interest rate will be charged to make a profit and foreign banks will hesitate
to invest.
Gross domestic product (GDP) is a measurement of economic development,
and is the monetary value of output (finished goods) of production of various
industries within a country for a particular time (Atay and Apak, 2013); this GDP
estimation is highly affected when the banking industry output is exaggerated
(Outlon, 2013). Non-performing assets should be recovered to strengthen the
banking industry in a stable manner (Tan and Floros, 2012), which contributes to
the economic activities.
25.1.3 Banking Crisis
When the depositors withdraw money from the bank due to the perception that the
bank is untrustworthy, the bank system will fail. Without deposits from customers,
it is difficult for banks to run the business irrespective of whether the situation is
normal or in crisis (Kunt et al., 2000). Depositors always look for good-functioning
banks and higher interest rate to invest their money (Goldberg, 2009). During the
crisis, the bank loan issue will be drastically reduced and bank assets will drop,
which paves the way for a reduction in the growth of output and investment. The
crisis affects the growth of output volume not only in the year of crisis but also in
the following year. After few years of crisis happened, the growth of output can
be recovered, but the recovery may not possible for credit of the banks within that
time frame with respect to growth. The banking interest rate for deposits will be
higher during the crisis and the subsequent year in order to gain more deposits and
maintain the existing deposits (Kunt et al., 2000). The inflation rate is inversely pro-
portional to the output of the country, and it is found that that both inflation rate and
output growth are correlated in a negative way (Haslag, 1995). Despite an increase
in the interest rate, the interest rate should be higher than the inflation rate to lure
the customer for gaining deposits, but there is no significant difference in interest
rates before and after the crisis and no proof to show banks have given a higher
interest rate than the inflation rate. Banks that have maintained liquidity even after
depositors withdraw most of the deposits need help from the central bank and its
authorities (Kunt et al., 2000).
management to serve the customers better if they focus on these four major elements:
preserving the existing customer, enticing the new customer, motivating the cus-
tomers to have profound collaboration with the bank, and updating customers with
the banks’ new services (Laketa et al., 2015). Additionally, the banking industry
can gain more deposits from the retail depositor if it treats them properly (Puri and
Rocholl, 2008).
credit risk and negative credit risk of the applicants. The positive credit risk indicates
high probability of the applicants to fail to repay the loan, whereas the negative credit
risk shows low probability of the applicants to fail to repay the loan. The bank man-
agers who are overwhelmed with customer data should take the right decision in
order to approve the application with negative credit risk and deny the application
with positive credit risk. Now, artificial intelligence has assisted managers to take
better decisions (Eletter et al., 2010).
As mentioned earlier, classification is the technique used to classify between good
credit and bad credit of the applicant. This credit score is applicable for the com-
pany, municipality, state, financial institution, and so on. It is the value obtained from
credit score processing, which is used by debt givers, bond buyers, and government
officers; the risk involved is inversely proportional to the credit score value, which is
based on the various indicators such as economic condition of the applicant, capital
involved, collateral offered by the applicant, the capacity of the applicant, and the
behavior history of the applicant. Though the most commonly used models are logis-
tic regression and linear discriminant analysis, the former is performing prediction
of dichotomous outcomes and linear relationships of the two variables, which are,
in fact, not required for multivariate normality assumption and the latter has also a
drawback in the assumption that the variables are linear, but in reality the variables
are non-linear. Artificial intelligence techniques such as decision tree, genetic algo-
rithm, artificial neural network, and support vector machine are giving better results
than traditional statistical methods. Three models, namely, support vector machine,
neural network, and decision tree, are used for classification along with the fuzzy
C-means clustering technique. However, hybrid approaches have been outperform-
ing the previously mentioned individual methods with respect to the accuracy of the
prediction (Ghodselahi and Amirmadhi, 2011).
25.2.3 Phishing Websites
Phishing websites are the websites which allure the people to disclose their user-
name and password that will be used for various illegal transactions. Data mining
algorithms, one of the techniques of artificial intelligence, is used to detect those
phishing websites, and prediction of these websites is also possible with associative
and classification algorithms. Various estimations have revealed that the cost per vic-
tim keeps on increasing. In particular, emails are used to lure the banking customers
to fall into this trap which is promoted by constantly sending spam mails to many
people. Data mining technique will be helpful to get the required information that is
most pertinent to the user from the tons of data available. There are 27 major feature
vectors, which are a conglomerate of different indicators such as URL and domain
identity, security and encryption, source code and JavaScript, page style and con-
tent, web address bar, and social human factors. Various approaches, namely, PART,
PRISM, JRip, C4.5, MCAR, and CBA, are performed by Aburrous et al. (2010b)
to find out the best approach. MCAR outnumbered in terms of accuracy and speed
among all other methods. A fuzzy data mining algorithm is used to identify auto-
matically the phishing websites, particularly e-banking websites, but still finding an
important feature to achieve this goal is not easy with this technique (Aburrous et al.,
2010a).
25.2.4 Banking Failures
Banking failures will happen when the banks are not making a profit; this is due
to various reasons, namely, high competition in the market, emerging non-banking
institutions, unexpected threat to loan portfolios, and financial distress. The failure
of big banks is dangerous as it will lead to a disruption in the whole financial activ-
ity. In 1980, big banks failed to secure against non-performing loans, which is one
of the reasons for the bank failure and system collapse (Boyd and Gertler, 1994).
Predicting risk and making the right decision towards the approval of credit will be
helpful to avoid the inevitable situations like bankruptcy and fraud detection (Moro
et al., 2015).
Financial soundness indicators (FSI) are used to measure the financial vulner-
abilities happening in banks, which is classified into two main indicators such as
the encouraged indicators and the core set indicators. It can be abridged by differ-
ent criteria such as capital adequacy, asset quality, management quality, earning
ability, liquidity, and sensitive to the market. Three models, namely, discriminant,
logit, and probit analyses, were introduced to reveal the banking failure in advance
by three years (Fernando et al., 2011). The adoptive neuro-fuzzy inference system
(ANFIS) is one of the technique applied in finance, which is useful to predict
the failure of the events in the banking system (Messai and Gallali, 2015). Banks
are not only assisting in the economic stability of a country but also reinforc-
ing the financial system of a country. Fuzzy logic and neural network techniques
are the apparent techniques used to find the change in efficiency and productivity
of the banks (Sharma et al., 2013).
296 Artificial Intelligence and Knowledge Processing
In addition to this, three models were executed to predict the currency crisis,
and those models are logit regression, decision tree, and artificial neural network
(ANN). The unpleasant situation of a bank can be estimated from the ratio of the
non-performing loans to the total gross loans. Because the non-performing loans
are the major indicators of the probable financial crisis, ANN is performed with
main variables of distress of a bank, which are loan loss reserves of non-perform-
ing loans, return on equity average, and loan loss provision to gross loans ratio
(Messai and Gallali, 2015). The neural network gives the better percentage in
predicting bank failure, as concluded by Messai and Gallali (2015) and Elzamly
et al. (2017).
25.2.5 Alarm System
Improving the banking system security from robberies in the banks and the ATMs,
artificial intelligence gives a better solution than the conventional emergency but-
ton alarming system. This system performs in three stages: artificial vision first
takes a photo for image processing to get the features, the ANN classifies the
event from the obtained pattern and gives the status of the warning messages.
Based on the classification of the neural network, the output class is determined.
If the output is 1, this means the alarm should be activated and a warning message
should be sent using global system for mobile communication (GSM) technology
(Ortiz et al., 2016)
25.2.7 Customer Loyalty
The relationship between banker and customer is paramount not only to keep hold
of existing customers but also to enhance the loyalty of the customers. The customer
relationship can be built strongly if the needs and expectations of the customers that
change over the period of time are fulfilled by the banking institutions. The loyalty of
customers can be improved if the customers are attracted from the good-quality ser-
vices, which should be provided at low prices. The customer loyalty can be predicted
in the banking industry by using ANN that is already used by other industries for the
same purpose. After collecting the data, important variables should be taken from
all the available variables by using factor analysis which makes the data ready for
further modeling. In this prediction model, feed-forward back-propagation is used in
the algorithm along with the ANN. K-fold cross validation is used, where K subsets
are obtained from categorization of the data during the training of the dataset, and
performance of the algorithm can be evaluated from the coefficient of efficiency and
root mean square error after the testing of the dataset. The obtained result of predict-
ing customer loyalty from the ANN proves that high accuracy is possible (Kishada
et al., 2016).
risk from the previously mentioned feature vectors. This combination gives a better
result, which is consistent after training the model properly.
25.2.10 Chatbots
If there is an issue or enquiry related to the products or services offered by banking
institutions, the customers have to contact the officers to get the problem solved, but
this process is kind of tedious, repetitive, and time-consuming. Due to the advance-
ment of the technology, many industries have been benefitted from this technology
and it is working well in the businesses. Moreover, Watson, developed by IBM, is
designed to answer the queries; this is done by applying machine learning algorithm
and natural language processing (NLP), which helps to retrieve the information and
represent the inbuilt domain knowledge. Implementing these bots are really useful
to serve the customer better, which is already done by most of big growing banks
(Singh et al., 2018).
25.3 CONCLUSION
Millions of customers undergo multiple transactions in a day as a routine matter.
This generates data, which is stored and maintained as big database. Moreover, there
are lots of manual work to perform to carry out most of the processes in the banking
industry. Now, artificial intelligence has made it easy to carry out this work for both
bank employees and customers. This kind of sophisticated work has become a simple
task, which has never been seen before due to machine learning techniques.
The banking sector has been improving its service quality by providing various
effective tools to ensure the safety and comfort of customers. The technology keeps
on improving day by day; it is better to incorporate this technology into the different
fields of the business. The state-of-the-art technology is mandatory in maintaining
as well as enhancing the security in the banking system, and other sections of the
banking industry are ready to implement the latest technology. In this digital era,
Applications of Artificial Intelligence 299
customers are also expecting their bank to be up-to-date. The technology upgrade-
ability will not only uplift the service and security but also improve the reputation of
the bank. Nowadays, Internet banking and mobile banking are attractive to custom-
ers due to its effectiveness and user-friendliness.
Many studies show that different models have been launched to maximize the
accuracy of the process, which is a good thing for the banking industries as well as
the customers. This is a win-win situation for both. Due to the competition from the
non-banking sectors, banks have to adapt the latest trending technologies used in the
digital era to improve the service quality. The technology provides more positive effect
in the banking industry. To make the process smooth and spontaneous in the business,
artificial intelligence techniques should be utilized in the banking industry. Fortunately,
artificial intelligence has been giving a plethora of applications to make the banks reach
their greatest efficiency, which also paves the way for a new dimension of the bank.
REFERENCES
Aburrous, M., Hossain, M.A., Dahal, K., & Thabtah, F. (2010a). Associative Classification
Techniques Predicting e-Banking Phishing Web Sites. MCIT. pp. 9–12.
Aburrous, M., Hossain, M. A., Thabtah, F., & Dahal, K. (2010b). Intelligent phishing detection
system for e-banking using fuzzy data mining. Journal of Expert Systems with Applica-
tions, 37 (12). pp. 7913–7921.
Atay, E., & Apak, S. (2013). An overview of GDP and internet banking relations in the
Europe–n Union versus China. Procedia - Social and Behavioral Sciences, 99. pp. 36 –45.
doi:10.1016/j.sbspro.2013.10.469.
Awasthi, P., & Sangle, P. S. (2013). The importance of value and context for mobile CRM services
in banking. Business Process Management Journal, 19 (6). pp. 864–891. doi:10.1108/
BPMJ-06-2012-0067.
Boyd, J. H., & Gertler, M. (1994). The role of large banks in the recent U.S. Banking crisis.
Federal Reserve Bank of Minneapolis Quarterly Review, 18 (1). pp. 1–21.
Campiglio, E. (2015). Beyond carbon pricing: The role of banking and monetary policy
in financing the transition to a low- carbon economy. Ecological Economics, 121.
pp. 220–230.
Cetorelli, N., & Gambera, M. (1999). Banking Market Structure, Financial Dependence and
Growth: International Evidence from Industry Data. Federal Reserve Bank of Chicago.
pp. 1–39.
Dahari, Z., Abduh, M., & Fam, K. S. (2015). Measuring service quality in Islamic banking:
Importance-performance analysis approach. Asian Journal of Business Research, 5 (1).
pp. 15–28. DOI 10.14707/ajbr.150008.
Dubey, V. (2019). FinTech innovations in digital banking. International Journal of Engineer-
ing Research & Technology (IJERT), 8 (10), pp. 597–601.
Eletter, S. F., Yaseen, S. G., & Elrefae, G.A. (2010). Neuro-based artificial intelligence model
for loan decisions. American Journal of Economics and Business Administration, 2 (1),
pp. 27–34.
Elzamly, A., Hussin, B., Naser, S. S. A., Shibutani, T., & Doheir, M. (2017). Predicting critical
cloud computing security issues using artificial neural network (ANNs) algorithms in
banking organizations. International Journal of Information Technology and Electrical
Engineering, 6 (2). pp. 40–45.
Feldmann, H. (2015). Banking system concentration and unemployment in developing coun-
tries. Journal of Economics and Business, 77. pp. 60–78. https://doi.org/10.1016/j.
jeconbus.2014.08.002.
300 Artificial Intelligence and Knowledge Processing
Fernando, C., Chakraborty, A., & Mallick, R. (2011). The Importance of Being Known: Rela-
tionship Banking and Credit Limits. Accounting and Finance. Faculty Publication Series.
Paper 4. pp. 1–28.
Ghodselahi, A., & Amirmadhi, A. (2011). Application of artificial intelligence techniques
for credit risk evaluation. International Journal of Modeling and Optimization, 1 (3).
pp. 243–249.
Goldberg, L. S. (2009). Understanding Banking Sector Globalization. IMF Staff Papers, 56,
171–197. doi:10.1057/imfsp.2008.31.
Gutierrez, P. A., Segovia-Vargas, M. J., Salcedo-Sanz, S., Hervas-Martınez, C., Sanchis, A.,
Portilla-Figueras, J. A., & Fernandez-Navarro, F. (2010). Hybridizing logistic regression
with product unit and RBF networks for accurate detection and prediction of banking
crises. Omega, 38, pp. 333–344. doi:10.1016/j.omega.2009.11.001.
Haslag, J. H. (1995). Monetary Policy, Banking, and Growth. Federal Reserve Bank of Dallas.
pp. 1–29.
Heng, S. (2015). Augmented reality: Specialised applications are the key to this fast-growing
market for Germany. Deutsche Bank Research, Current Issues Sector Research.
pp. 1–14.
Huber, K. (2018). Disentangling the effects of a banking crisis: Evidence from German
firms and counties. American Economic Revi ew, 108 (3). pp. 868–898. https://doi.
org/10.1257/aer.20161534.
Ince, H., & Aktan, B. (2009). A comparison of data mining techniques for credit scoring in
banking: A managerial perspective. Journal of Business Economics and Management,
10 (3). pp. 233–240.
Johnston, R. (1997). Identifying the critical determinants of service quality in retail banking:
Importance and effect. International Journal of Bank Marketing, 5/4. pp. 111–116.
Kishada, Z. M. E., Wahab, N. A., & Mustapha, A. (2016). Customer loyalty assessment in
Malaysian Islamic banking using artificial intelligence. Journal of Theoretical and
Applied Information Technology, 87 (1). pp. 80–91.
Kunt, A. D., Detragiache, E., & Gupta, P. (2000). Inside the crisis: An empirical analysis of bank-
ing systems in distress. Journal of International Money and Finance, 25 (5). pp. 702–718.
Laketa, M., Dusica, S., Laketa, L., & Misic, Z. (2015). Customer Relationship Management: Con-
cept and Importance for Banking Sector. UTMS Journal of Economics, 6 (2). pp. 241–254.
Messai, A. S., & Gallali, M. I. (2015). Financial Leading indicators of banking distress: A micro
prudential approach: Evidence from Europe. Asian Social Science, 11 (21). pp. 1–13.
Moro, S., Cortez, P., & Rita, P. (2015). Business intelligence in banking: A literature analysis
from 2002 to 2013 using text mining and latent Dirichlet allocation. Expert Systems with
Applications, 42 (3). pp. 1314–1324.
Ortiz, J., Marin, A., & Gualdron, O. (2016). Implementation of a banking system security
in embedded systems using artificial intelligence. Advances in Natural and Applied
Sciences, 10 (17). pp. 95–101.
Oulton, N. (2013). Has the growth of real GDP in the UK been overstated because of mis-
measurement of banking output? Centre for Economic Performance. pp. 1–12.
Park, J. (2012). Corruption, soundness of the banking sector, and economic growth:
A cross-country study. Journal of International Money and Finance, 31, pp. 907–929.
doi:10.1016/j.jimonfin.2011.07.007.
Payne, E. M., Peltier, J. W., & Barger, V. A. (2018). Mobile banking and AI-enabled mobile
banking: The differential effects of technological and non-technological factors on digi-
tal natives’ perceptions and behavior. Journal of Research in Interactive Marketing, 12
(3). pp. 328–346. https://doi.org/10.1108/JRIM-07-2018-0087.
Puri, M., & Rocholl, J. (2008). On the importance of retail banking relationships. Journal of
Financial Economics, 89. pp. 253–267. doi:10.1016/j.jfineco.2007.07.005.
Applications of Artificial Intelligence 301
Sharma, D., Sharma, A. K., & Barua, M. K. (2013). Efficiency and productivity of banking
sector: A critical analysis of literature and design of conceptual model. Qualitative
Research in Financial Markets, 5 (2). pp. 195–224.
Singh, M., Singh, R., Pandey, A., & Kasture, P. (2018). Chat-Bot for Banking Industry. Inter-
national Conference on Communication, Security and Optimization of Decision Support
Systems(IC-CSOD 2018). pp. 247–249. ISBN: 978-0-9994483-1-1.
Sundarkumar, G. G., & Ravi, V. (2015). Engineering Applications of Artificial Intelligence.
Elsevier Ltd. pp. 368–377.
Tan, Y., & Floros, C. (2012). Bank profitability and GDP growth in China: A note. Journal of
Chinese Economics and Business Studies, 10 (3). pp. 267–273.
Tavana, M., Abtahi, A.R., Caprio, D, D., & Poortarigh, M. (2018). An artificial neural network
and Bayesian network model for liquidity risk assessment in banking. Neurocomputing,
275. pp. 2525–2554.
26 Prediction of Terrorist
Attacks throughout the
Globe Using the Global
Terrorism Database
A Comparative Analysis
of Machine Learning
Prediction Algorithms
Happy and Manoj Yadav
26.1 INTRODUCTION
The terrorist attack is one of the most significant threats to a particular geographi-
cal area’s government. The Institute for Economics and Peace (IEP) has released its
annual study related to the Global Terrorism Index (GTI), 2020 [8] and states that a
total of 13,826 people lost their lives with an impact of property damage of around
$1,777.60 million due to terrorism all over the world even though it fell by 15.5 percent
for the fifth following year after reaching a new peak in 2014.
The Global Terrorism Database (GTD) [5] is one of the major widespread unclas-
sified, open-source online databases. The National Consortium for the Study of
Terrorism and Responses to Terrorism (START) manages this database. The same is
available for research for both individuals and industries.
As per the necessity of the demand for future terrorist attack prediction systems,
most research scholars solve the problem based on historical data analysis of ter-
rorist attacks. The prior research mainly focuses on three aspects of GTD. One is
predicting the null values available in the GTD, like the organization responsible for
that attack. The second is to predict the future event by numerous machine learning
and deep learning models. The third is to predict the behavior and the relationship
between the terrorist organizations.
This chapter is divided into the following sections: a literature review of related
work available in this context in Section 26.1.1. The research methodology explains
the data set, data pre-processing steps, tools used, and purposed system details,
which are available in Section 26.2. Section 26.4 contains the results after data
Xia & Gu, 2019 [18] proposed a model called Terrorist Knowledge Graph (TKG).
This graph is updated regularly for better prediction by extracting information from
the Wikipedia page. The TKG also provides a better understanding of both humans
and machines to produce results.
Alhamdani et al., 2018 [2] provide a review of the recommendation system of
GTD using deep learning techniques to uncover the social media propaganda for
terrorism using the GTD.
Kolajo & Daramola, 2017 [9] proposed a system model which uses different
social media sources to identify terrorist activities using Apache Spark technology to
achieve the desired results. Zijuan & Shuai, 2017 [20] use big data analysis methods
to study terrorist attacks and analyze strategies to review the data related to previous
terrorist attacks.
Toure & Gangopadhyay, 2016 [17] solve the various aspects of terrorism with the
help of various software and methodologies. Using this system, they aim to calculate
risks in different locations and risks in near-future terrorist attacks. Hegde et al., 2016
[6] proposed their work on visual analytics on GTD with the current events of social
media, i.e., social network analytics.
Pagan, 2010 [12] analyzed different pre-processing techniques to reduce the
noise in the data. Further they used other ML techniques to minimize the error and
enhance the accuracy of the system. Ozgul et al., 2009 [11] proposed a model to
solve various terrorist assaults which were unsettled in the GTD. Using the clustering
method, they used the crime prediction model (CPM) to predict unsolved attacks in
Turkey between 1970 and 2005.
26.2.1 Dataset
For the proposed work, the GTD is used [5]. GTD is an open-source repository based
on terrorist incidences from 1970 to 2020 and is updated periodically. The National
Consortium is responsible for regularly updating, maintaining, and studying terror-
ists and answers to terrorism (START) at Maryland University [8]. This database
consists of more than 201,183 terrorist incidence information and is updated every
year. These incidents are further labeled by more than 135 attributes or columns to
dissipate the possibilities and totality of the event. Various essential attributes of the
GTD are as follows:
1. iyear, imonth, iday: Year, month, and date, respectively, in the numeric
form on which the terrorist attack happened.
2. country, country_txt, city, latitude, longitude: Country code, country name,
city name in which the attack happened, and coordinates in the form of lat-
itude and longitude for that location.
Prediction of Terrorist Attacks 305
3. crit1, crit2, crit3: crit1, crit2, and crit3 are binary numerical values that
identify the purpose of the terrorist attack.
4. attacktype1_txt: This attribute contains the details of attack categories.
Examples: hijacking, bombing/explosion, and many more.
5. weaptype1_txt: Kind of firearm used in the violence. Examples: explosives,
firearms, incendiary, and many more.
6. nkill, nkillter: nkill counts the total number of people who die in that partic-
ular incidence, while nkillter only counts the number of terrorists who die in
that particular incidence.
7. Nwound: Number of confirmed non-fatal injuries for both victims and
terrorists.
A detailed description of all 135 attributes can be found in the GTD
Codebook of studying Terrorists and Responses to Terrorism, ID: 231483.
This document was initially uploaded on 7 September 2019 and updated in
August 2021, as shown in Figure 26.1.
on the user’s requirement, so some of the attributes or columns were dropped in the
pre-processing step. The irrelevant columns were dropped based on the following
criteria.
2.2.3. Based on the user requirement: Remove the columns that are not fit as
per the user’s desires.
2.2.4. Renaming the columns: Column or attribute names in the GTD were also
renamed for ease of work. These changes help in better understanding and
reduce the confusion at the time of data analysis. For example, ‘gname’
is renamed to ‘Group_Name,’ ‘targtype1_txt’ to ‘Targtype_Name’ and
others.
2.2.5. Replacing unknown values: We use the most frequent item method to
replace null values. We replace unknown cities, attack types, and tar-
get types with the most frequent cities, attack types, and target types in
that specific country, along with replacing unknown group names and
weapon types with the most frequent group names and weapon types in
that specific region. Pseudo-code for replacing unknown values is shown
in algorithm 1.
2.2.6. Handling irrelevant values: Even after removing null values, some
of the GTD columns and attributes still have irrelevant values in the
form of ‘–’ (negative) values like the ‘doubtter’ attribute or ‘0’ (zero)
in the ‘iday,’ ‘imonth,’ and ‘iyear.’ For the ‘doubtter’ negative values
were directly replaces with ‘1’ because they are most likely to be true
for this attribute. The ‘iday,’ ‘imonth,’ and ‘iyear’ unknown values
were replaced with the help of most frequently day, month, and year,
respectively.
2.2.7. Dropping the duplicate and missing values rows: Finally, we drop the
duplicate and missing values from the GTD to remove the redundancy
and data inconsistency.
2.2.8. Terrorist attack criteria: One of the fundamental and most significant
problems in any attack for law enforcement agencies is to identify if an
attack is a terrorist attack or not. To differentiate between attacks, we
use ‘Crit1,’ ‘Crit2,’ and ‘Crit3’ as the different motivations behind the
terrorist attack.
Prediction of Terrorist Attacks 307
26.3.1 Proposed System
The proposed system in this chapter is mainly focused on data analysis and visu-
alization using existing historical data to uncover information, such as year-wise
attack count, a prime target, attacking methods, region-wise terrorist activity, most
affected countries list, and others, along with the prediction of future terrorist attacks
using various machine naïveing algorithms such as kNN, naive Bayes, neural net-
work, logistic regression, RF, tree, Cn2 rule inducer, and SVM. Machine learning
algorithms are further used to predict the severity of terrorist attacks based on the
consolidated yield of data analysis and visualization.
Further accuracy of the proposed system for the different machine learning algo-
rithms is compared to find the best suitable algorithm for this problem, which is
further compared with the existing systems to evaluate enhancement achieved using
the proposed approach.
The system architecture shown in Figure 26.3 represents the methodology used
to achieve this desired task in diagrammatic representation to carry out the pro-
posed work. The pseudo-code for one of the machine learning algorithms used to
achieve the task, i.e., kNN for training and implementing the GTD, is shown in
Algorithm 2.
26.4 RESULT
In the analysis in Figure 26.8, Iraq is the most impacted nation by terrorism,
followed by Pakistan and Afghanistan, whereas India is the fourth-most affected
country by terrorism.
The visual representation in Figure 26.9 clearly shows that in South Asia, facility/
infrastructure attacks, followed by armed attacks, were the primary weapon used by
the terrorists belonging to particular groups.
Figure 26.10 shows that Taliban became the most active terrorist group worldwide
after the year 2000.
The visual representation in Figure 26.11 reveals the pattern between attacks vs.
those killed in a particular country.
Figure 26.12 shows that the months of May, April, and August are the most exposed
months with the highest risk ratio, i.e., 10.05, 9.68, and 9.25, respectively, for terrorist
312 Artificial Intelligence and Knowledge Processing
Month Percentage
5 10.05%
4 9.68%
8 9.25%
6 9.03%
7 9.00%
11 8.57%
10 8.27%
1 8.14%
3 7.44%
9 7.02%
12 6.96%
2 6.57%
attacks and the 25th and 15th of every month are the riskiest for a terrorist attack com-
pared to other days of the month. The 25th of the month has the highest risk ratio, i.e.,
3.67, followed by the 15th with a 3.61 risk ratio, as shown in Figure 26.13.
Further, for the prediction of whether terrorist attacks are successful or not, the terrorist
attack prediction model created in the Orange data mining tool is shown in Figure 26.14
Day Percentage Day Percentage
1 3.48 % 16 3.32%
2 3.36 % 17 3.10%
3 3.11 % 18 3.10%
4 3.28 % 19 3.33%
5 2.87 % 20 3.18%
6 3.02 % 21 3.37%
7 3.52 % 22 3.01%
8 3.52 % 23 3.06%
9 3.16 % 24 3.13%
10 3.53 % 25 3.67%
11 3.24 % 26 3.47%
12 3.30 % 27 3.40%
13 3.42 % 28 3.30%
14 3.12 % 29 2.96%
15 3.61 % 30 3.15%
31 1.74%
The terrorist attack prediction model produces the results based on various
machine learning prediction algorithms such as RF, neural network, naive Bayes,
logistic regression, kNN, CN2 rule inducer, decision tree, and SVM.
For the same area under the ROC curve (AUC), classification accuracy (CA), F1,
precision and recall for attack prediction, and if an attack is successful or not are as
shown in Figure 26.15.
The tree representation of attack severity prediction of successful terrorist attacks
using machine learning tree algorithms is shown in Figure 26.16. The severity of a
terrorist attack is successfully predicted with the accuracy of 89.98 percent using the
neural network algorithm, along with others, as shown in Table 26.1.
When the proposed system output is compared with the existing available sys-
tems, it is noticed that the proposed system reveals some very critical information
regarding the terrorist attacks that happened in the past. It also produces a signifi-
cantly better result for the prediction of terrorist attacks compared to the already
available systems that use the same data set. The existing model recommended by
Sattar et al., 2021 [14] predicts with an accuracy of 98.12%; Pan, 2021 [13] proposed
a system predicting the terrorist attacks within the range of accuracy between 97.16%
and 96.82%; Olabanjo et al., 2021 [10] proposed a hybridized classical model, which
Prediction of Terrorist Attacks 315
TABLE 26.1
Attack Severity Prediction Accuracy Matrix
Model AUC CA F1 Precision Recall
NenaïveNetwork 0.504 0.898 0.849 0.806 0.898
Naive Bayes 0.503 0.898 0.849 0.806 0.898
Logistic Regression 0.504 0.898 0.849 0.806 0.898
KNN 0.502 0.898 0.849 0.806 0.898
Random Forest 0.505 0.898 0.849 0.823 0.898
Decision Tree 0.504 0.898 0.849 0.821 0.898
CN2 Rule Inducer 0.504 0.898 0.849 0.821 0.898
SVM 0.502 0.501 0.595 0.816 0.501
TABLE 26.2
A Comparison of Intended Systems Work on GTD
Related Model Proposed By Related Model Accuracy (%) Proposed Work Accuracy (%)
Sattar et al., 2021[14] 98.12%
Pan, 2021 [13] 96.82%–97.16% 98.20%
Olabanjo et al., 2021 [10] 97.81%
Huamaní et al., 2020 [7] 75.45%–90.414%
Singh et al., 2019 [15] 82%
Agarwal et al., 2019 [1] 68%–82%
Toure et al., 2016 [17] 96.3%
Gao et al., 2019 [4] 94.8%
gave an accuracy of 97.81%; and Huamaní et al., 2020 [7] can predict terrorist attack
with an accuracy between 75.45% and 90.414%, which is significantly low compared
to the proposed system. With the help of better pre-processing techniques and attri-
bute selections, the proposed system can produce better prediction results with an
accuracy of the 98.2 and 89.8 percentile for the prediction of attack and successful-
ness or severity of an attack, respectively.
Table 26.2 compares the proposed system with the existing systems that use dif-
ferent methods and practices on the GTD.
26.5 CONCLUSION
This chapter compared eight machine learning prediction algorithms for the pro-
posed system, including RF, neural network, naive Bayes, logical regression, kNN,
CN2 rule inducer, tree, and SVM. This experiment shows that RF, neural network,
naive Bayes, kNN, and logical regression have the highest CA, up to 98.2%, fol-
lowed by logical regression and CN2 rule inducer, which have the same classification
316 Artificial Intelligence and Knowledge Processing
accuracy but low AUC. Here, SVM produces the least significant result, with the
accuracy of 7.9% and 50% for the AUC.
For the prediction of attack severity, RF produces the best results with the accu-
racy of 89.8% and AUC with 50.5%, while SVM again produces the worst accuracy
for the prediction of attack severity with 50.10%.
With the help of data visualization and machine learning prediction algorithms,
lots of shocking and vital information is disclosed, giving information related to
attack patterns, weapons used, and most vulnerable locations and months and
days. It has been observed that May 25th and 15th are the most vulnerable days in
a calendar year. It also shows that every terrorist group has different geographical
areas in which they operate. The topmost terrorist groups also operate in different
geographical locations and never operate in other groups’ areas. Other facts also
reveal that the Taliban and Kurdistan Workers Party (PKK) have a perfect attack
pattern that makes them almost unpredictable. Our prediction system also pre-
dicts future terrorist attacks, one of law enforcement agencies’ basic but essential
problems.
REFERENCES
Agarwal, P., Sharma, M., & Chandra, S. (2019). Comparison of machine learning approaches
in the prediction of terrorist attacks. 2019 12th Inte rnational Conference on Contempo-
rary Computing (IC3), 1–7. https://doi.org/10.1109/IC3.2019.8844904.
Alhamdani, R. S., Abdullah, M. N., & Sattar, I. A. (2018). Recommender system for global
terrorist database based on deep learning. International Journal of Machine Learning and
Computing, 8(6), 6.
Bhatia, K., Chhabra, B., & Kumar, M. (2020). Data analysis of various terrorism activities
using big data approaches on global terrorism database. 2020 6th International Con-
ference on Parallel, Distributed and Grid Computing (PDGC), 137–140. https://doi.
org/10.1109/PDGC50313.2020.9315784.
Gao, Y., Wang, X., Chen, Q., Guo, Y., Yang, Q., Yang, K., & Fang, T. (2019). Suspects pre-
diction towards terrorist attacks based on machine learning. 2019 5th International
Conferenc e on Big Data and Information Analytics (BigDIA), 126–131. https://doi.
org/10.1109/BigDIA.2019. 8802726.
Hegde, L. V., Sreelakshmi, N., & Mahesh, K. (2016). Visual analytics of terrorism data. 2016
IEEE International Confer ence on Cloud Computing in Emerging Markets (CCEM),
90–94. https://doi.org/10.1109/CCEM.2016.024.
Prediction of Terrorist Attacks 317
Huamaní, E. L., Mantari, A., & Roman-Gonzalez, A. (2020). Machine learning techniques to
visualize and predict terrorist attacks worldwide using the global terrorism database.
International Jou rnal of Advanced Computer Science and Applications, 11(4). https://
doi.org/10.14569/IJACSA.2020.0110474.
Kolajo, T., & Daramola, O. (2017). Leveraging big data to combat terrorism in developing
countries. 2017 Conference on Info rmation Communication Technology and Society
(ICTAS), 1–6. https://doi.org/10.1109/ICTAS.2017.7920662.
Olabanjo, O. A., Aribisala, B. S., Mazzara, M., & Wusu, A. S. (2021). An ensemble machine
learning model for the prediction of danger zones: Towards a gl obal counter-terrorism.
Soft Computing Letters, 3, 100020. https://doi.org/10.1016/j.socl.2021.100020.
Ozgul, F., Erdem, Z., & Bowerman, C. (2009). Prediction of past unsolved terrorist attacks.
2009 IEEE International C onference on Intelligence and Security Informatics, 37–42.
https://doi.org/10.1109/ISI.2009.5137268.
Pagan, J. V. (2010). Improving the classification of terrorist attacks a study on data pre-
processing for mining the global terrorism database. 2010 2nd International C onfer-
ence on Software Technology and Engineering, 5608902. https://doi.org/10.1109/
ICSTE.2010.5608902.
Pan, X. (2021). Quantitative analysis and prediction of global terrorist attacks base d on machine
learning. Scientific Programming, 2021, 1–15. https://doi.org/10.1155/2021/7890923.
Sattar, I. A., Alhamdani, R. S., & Abdulla, M. N. (2021). Design and implementation recom-
mender system for Iraqi terrorist database based on deep learning. 2021 7th International
Engineering Conference “Research & Innovation amid Global Pandemic” (IEC), 32–36.
https://doi.org/10.1109/IEC52205.2021.9476083.
Singh, K., Chaudhary, A. S., & Kaur, P. (2019). A machine learning approach for enhancing
defence against global terrorism. 2019 12th Inte rnational Conference on Contemporary
Computing (IC3), 1–5. https://doi.org/10.1109/IC3.2019.8844947.
Spiliotopoulos, D., Vassilakis, C., & Margaris, D. (2019). Data-driven country safety monitor-
ing terrorist attack prediction. Proceedings of the 2019 IEEE/ACM International Con-
ference on A dvances in Social Networks Analysis and Mining, 1128–1135. https://doi.
org/10.1145/3341161.3343527.
Toure, I., & Gangopadhyay, A. (2016). Real time big data analytics for predicting terrorist
incidents. 2016 IEEE S ymposium on Technologies for Homeland Security (HST), 1–6.
https://doi.org/10.1109/THS.2016.7568906.
Xia, T., & Gu, Y. (2019). Building terrorist knowledge graph from global terrorism database
and Wikipedia. 2019 IEEE International Conferenc e on Intelligence and Security Infor-
matics (ISI), 194–196. https://doi.org/10.1109/ISI.2019.8823450.
Zhenkai, L., Yimin, D., & Jinping, L. (2020). Analysis model of terrorist attacks based on big
data. 2020 Chinese Control and Decision Conference (CCDC), 3622–3628. https://doi.
org/10.1109/CCDC49329.2020.9164626.
Zijuan, L., & Shuai, D. (2017). Research on prediction method of terrorist attack based on ran-
dom subspace. 2017 International Conference on Co mputer Systems, Electronics and
Control (ICCSEC), 320–322. https://doi.org/10.1109/ICCSEC.2017.8446815.
27 Deep Learning
Approach for Identifying
Bird Species
B. Harsha Vardhan, T. Monish,
P. Srihitha Chowdary, S. Ravi Kishan,
and D. Suresh Babu
27.1 INTRODUCTION
In the study and research of birds, knowing the species of a bird is critical. We all like
birdwatching in our free time, but our skills are insufficient to identify them. Even
professionals such as ornithologists have difficulty in memorizing all of the species.
Even if they remembered a few species, recognizing them is difficult due to the wide
variety of sizes, shapes, and colors they come in.
As a result, our awareness of birds is insufficient to identify them. Birds are often
recognized through a photo, audio, or video. Birds can be identified using process-
ing technology that records audio or video. However, other sounds in the surround-
ings, such as insects and real-world noises, make processing such information more
difficult.
As a result, rather than utilizing audio or video to identify birds, it is best to
use a photo. Humans nowadays find it tough to collect and assemble various bird
photos. Even if they are collected, recognizing them by referring to books is a
far more complex and time-consuming task. Rather than depending on books,
we created an interface based on a deep learning model that allows us to quickly
identify species.
similarity. They used the cosine similarity function. There are many methods in
order to reduce the dimensionality of the vectors like MDS, which stands for mul-
tidimensional scaling; PCA, which stands for principal component analysis; and
Isomap, which stands for isometric feature mapping, which is used for reducing
the high dimensionality vectors. They used the PCA method.
A. Thakur et al. [2] proposed a multi-layer alternating sparse-dense frame-
work in order to identify the species of a bird. They began by using short-time
Fourier transform (STFT) to convert the input audio recording into something
like a magnitude spectrogram having 512 FFT points on 20-ms frames with 50%
overlap. With archetypal analysis (AA), 256 archetypes were learned for each
class. Concatenating the previous and next frames around the present frame adds
information to the current frame of the spectrogram. Following that, a class’s
spectrograms are translated into a super-frame-based matrix representation. They
presented a technique AA to manage the data’s outliers. To identify the audio clip,
the bird vocalizations are split, and their associated super-frames are translated
into the compressed super-frame format.
B. Chandu et al. [3] proposed a method that involves two stages. In the first
stage, they collected all the sound recordings of different bird species and per-
formed different pre-processing techniques on them. In general, all the audio
signal recordings will be recorded by using a microphone that records other
unnecessary frequencies along with the bird audio frequency. For this, the
pre-processing technique used was pre-emphasis which reduces all the other
unnecessary frequencies by using a filter. Because an audio signal is not a sta-
tionary signal, it has distinct qualities that can change with the duration of time.
So, they first divided the recorded audio based on its length into a number of
frames and then extracted the signal. The frame length was calculated by con-
sidering the total length of the audio signal and the sampling period used. After
dividing into frames, they performed the silence removal operation for each
frame using the thresholding function. If any signal falls under this thresh-
old limit, it was considered background noise. Following the elimination of
background sounds, the next stage was reconstruction, which was a process of
integrating all of the frames acquired after the framing and silent removal pro-
cesses. The next step was to create a spectrogram, which may be done by first
translating the time domain data to frequency. For this operation, they used
Fourier transforms. This process was repeated for all the audio recordings. They
performed this operation in order to identify the species based on their record-
ings very easily since the spectrogram of one species is different from others.
Next, they trained the neural network using these spectrograms. This method
achieves 97% accuracy.
C. N. Silla et al. [4] proposed a method that involves several steps. The first one
is feature extraction, the second one is model construction, and the third one is
evaluating the performance of the model. They utilized the MARSYAS framework
to extract features. They took three approaches for categorization. The first tech-
nique is flat classification, the second is a classifier with a local model per parent
320 Artificial Intelligence and Knowledge Processing
node, and the final one is a global-model hierarchical classification method. The
flat classification approach is just like a multi-class classification approach and it
can be used for the problems which are labeled up to the leaf nodes. To perform
this multi-class classification, they used the classic naïve Bayes algorithm. In the
local model hierarchical classification, they used the LCPN approach, which stands
for local classifier per parent node approach. A multi-class classifier is used to
train each non-leaf node in the class hierarchy in this technique. For testing this
approach, they used a top-down approach. The naïve Bayes algorithm was utilized
to conduct multi-classification in this second technique. In the third technique,
global-model hierarchical classification, a single algorithm was employed to pre-
dict the class at any stage of the hierarchy. For this, they used GMNB, which stands
for global model naïve Bayes algorithm. They compared these three approaches,
and according to their experimental results, the global approach achieved higher
accuracy than the flat and local model approaches. They performed these opera-
tions on 74 bird species only.
G. Sarasa et al. [5] employed NCD, which refers to normalized compression
distance, as a similarity metric to identify bird species in this study. They tested
the performance of NCD by using six compression algorithms which belonged
to different compression families. Four of them are LZMA, LZMA2, PPMD,
and Deflate provided by the 7z software. The CompLearn Toolkit includes two
of them: Zlib and BZlib. They then employed the MQTC-based hierarchical
clustering technique. Based on their findings, the normalized compression dis-
tance can be utilized as an alternate method for determining the species of a
bird.
M. M. M. Sukri et al. [6] identified the bird species using ANN which stands
for artificial neural network. First, they collected the bird sound records and per-
formed pre-processing operations on them. They used the PSD method, which
stands for power spectral density, in order to perform the pre-processing operation.
Next, they trained the ANN model with the pre-processed data. The classifier in
ANN is the MLP, which refers to multilayer perceptron. This MLP takes the cho-
sen characteristics as input and produces a unique output for each bird species. To
perform all these operations, they first recorded the raw data in mp3 format and
converted it to a. wav file.
M. T. Lopes et al. [7] collected the audio recordings of 1619 song recordings of
75 bird species. For feature set operation, they compared different frameworks like
MARSYAS, IOIHC, and Sound Ruler. Their results showed that the performance of
the MARSYAS feature set was better than the other two feature sets. Based on this
reason they used this MARSYAS framework to extract the features. Next, they split
all the audio recordings according to pulses, which was a short sound interval with
high amplitudes. To split the recordings into pulses, they used the audacity audio
processing tool. They performed the experiments on the first database and second
database according to three dimensions. One was the use of pulses and the second
one was using classifiers. The third one was selecting the most frequent classes.
They compared different classifiers like naïve Bayes, KNN, SMO-polynomial,
MLP, and SMO-Pearson on the two databases. According to their testing results
for the first database for classes 3, 5, and 8, the SMO with a polynomial kernel
Deep Learning Approach for Identifying Bird Species 321
27.3 METHODOLOGY
27.3.1 Dataset
A large collection of images of 315 different bird species are taken. For each type of
bird species, a minimum of 110 images are taken in order to cover the species with
different shapes, sizes, colors, and angles. In this way, all the species are organized
into training and testing.
• There are 47,555 photos of 315 distinct bird species in the dataset.
• For training, 45,980 photos from 315 species were used.
• For testing 1575 photos from 315 species are used, each with 5 images.
27.3.2 Proposed Work
During the training of the model, all the images are loaded, resized into 224 by 224
pixels, and image augmentation to them in order to avoid the overfitting of the model.
The image augmentation contains transformations like shear range, zoom range, and
horizontal flip. Figure 27.1 represents the image augmentation of a bird.
• The shear range is used for slanting the image. We fix one axis and stretch
the image at a certain angle.
• The zoom range is to zoom the image. If the value is less than 1.0, it will
zoom in the image, and if the value is greater than 1.0, it will zoom out the
image
• The horizontal flip is used to flip the image either horizontally or vertically.
27.3.3 Proposed Diagram
Figure 27.2 represents the diagrammatic representation of the process. First, load
the dataset of different bird species and apply preprocessing techniques to them. We
322 Artificial Intelligence and Knowledge Processing
applied different transformations to them like rescaling, shear range, zoom range,
and horizontal flip in order to avoid the model overfitting. After applying all the
transformations to it, all the species are labeled as 0, 1, 2, . . ., which keeps track of
mapping the bird to the corresponding species indices, where the indices are pre-
dicted by the model. The next step is to train the model. For training, a training data-
set is used which consists of 45,980 images of 315 different bird species. We used the
Adam optimizer and categorical cross-entropy for the loss function, and the number
of epochs is 50. During training, it collects all the feature maps like the body, color,
and size of the birds and finally saves the trained model. The next step is to develop
an interface with the trained model using streamlit in order to provide convenience to
the user to upload an image. When a bird is passed as an input by the user, the model
will identify the bird from the image; extract the features from it; and identify the
species based on its body, color, and size.
27.4 ARCHITECTURE
Figure 27.3 represents the architecture of the model. In this model, there are 16 layers
which are 1 input layer, 13 conv2D layers, 5 Maxpooling2D layers, 1 flatten layer,
and 1 dense layer. The input layer will take an image in 224 by 224 pixels and pass it
to conv2D layers. The conv2D layer is a convolution layer that generates a tensor of
outputs by winding a convolution kernel with layers input. The maxpooling2D layer
will downsample the input window of size defined by pool size. The flatten layer
converts the data into a one-dimensional array for input to the following layer. The
dense layer will transmit all of the preceding layer’s outputs to all of its neurons, with
each neuron representing an output to the next layer.
27.5 RESULTS
Figure 27.4 represents the user interface. The basic design of the user interface is
developed using streamlit. Streamlit is an open-source application framework written
Deep Learning Approach for Identifying Bird Species 323
in the Python language. It enables us to quickly develop web applications for data
science and machine learning. Major Python libraries like sklearn, Keras, PyTorch,
SymPy (latex), NumPy, pandas, and Matplotlib are all compatible with it.
Figure 27.5 represents the selection of the input image. There is a browse file
option in the interface, which is shown in Figure 27.4, through which the user can
select the required image. The format of the image should be in JPG or PNG file
324 Artificial Intelligence and Knowledge Processing
format. When the user selects the browse file option, it asks for an image to upload,
which is shown in Figure 27.5.
Figure 27.6 represents the species of a bird. After the selection of the required
image by the user, then the user can get the species of the given bird image with the
predict button, which is shown in Figure 27.6. Figure 27.7 Represents the Confusion
matrix without normalization. Figure 27.8 represents the confusion matrix with nor-
malization. Taking the normalized values into consideration, the confusion matrix
with normalization is plotted and is shown in Figure 27.8.
326 Artificial Intelligence and Knowledge Processing
Figure 27.9 represents the accuracy graph. We can conclude that this model achieves
better accuracy compared to other models and gives better and more accurate results.
Figure 27.10 represents the loss graph. We can conclude that this model gets less
loss during training.
328 Artificial Intelligence and Knowledge Processing
Figure 27.11 represents the receiver operation characteristics (ROC) curve. Each
line in Figure 27.11 corresponds to each bird species in the dataset.
REFERENCES
[1] A. K. Reyes and J. E. Camargo, “Visualization of Audio Records for Automatic Bird
Species Identification,” 2015 20th Symposium on Signal Processing, Images and Com-
puter Vision (STSIVA), 2015, pp. 1–6, doi: 10.1109/STSIVA.2015.7330415.
[2] A. Thakur, V. Abrol, P. Sharma and P. Rajan, “Compressed Convex Spectral Embed-
ding for Bird Species Classification,” 2018 IEEE International Conference on Acous-
tics, Speech and Signal Processing (ICASSP), 2018, pp. 261–265, doi: 10.1109/
ICASSP.2018.8461814.
[3] B. Chandu, A. Munikoti, K. S. Murthy, G. Murthy and C. Nagaraj, “Automated Bird
Species Identification using Audio Signal Processing and Neural Networks,” 2020
International Conference on Artificial Intelligence and Signal Processing (AISP), 2020,
pp. 1–5, doi: 10.1109/AISP48273.2020.9073584.
[4] C. N. Silla and C. A. A. Kaestner, “Hierarchical Classification of Bird Species Using
Their Audio Recorded Songs,” 2013 IEEE International Conference on Systems, Man,
and Cybernetics, 2013, pp. 1895–1900, doi: 10.1109/SMC.2013.326.
Deep Learning Approach for Identifying Bird Species 329
28.1 INTRODUCTION
In recent times artificial intelligence (AI) has gained relevance in various sectors.
The term is broad, and almost all sectors utilizes this advanced technology. AI
becomes more and more important in the energy sectors because the potential that
AI holds is enormous and the development that it can provide for the future design of
the energy sector is astounding. The applications where AI can be utilized is almost
infinite: smart grids, electricity trading, coupling etc. The digitalization is here, and
along with it the energy sector is moving into digitalization. AI is versatile in that it
can make the energy industry very efficient and evaluate the data for better perfor-
mance of energy-sector digitization.
As electricity supplies more departments, sectors, and applications, the energy
sector is becoming the most crucial pillar of the world’s energy supply. From con-
ventional energy to renewable energy, AI has a part in everything. AI satisfies the
requirements like forecasting, coordination, digitization, power tracking, etc., to
establish smooth operation in power grids and other energy sectors. AI is already
transitioning the hurdles of the energy transmission in various domains, and it also
concentrates on grid operation optimization, energy distribution, and demand-side
management. AI applications in the energy sector have been successful and prom-
ising. The innovation, acceleration, and transition in energy sectors are becoming
highly efficient and interconnected for a better future [1].
However, AI holds greater potential to provide more innovations to help with
global improvement. Harnessing the power of AI for a global transition is certainly
possible, provided industry collaboration for more innovations takes place.
28.2.2 Plant Reliability
28.2.2.1 Predictive
Advances analytics is used to oversee the machines or equipment performance and
predict errors, disruptions or disturbance, and failures along the lead time to track
corrective plans [1].
330 DOI: 10.1201/9781003328414-28
AI in the Energy Sector 331
28.2.2.2 Values-Based
Find the best balance between the time and expense of repairs and the equivalent
increase in plant performance using advanced analytics [1].
28.2.3 Efficiency
The amount of electricity generated and produced for each unit of fuel consumption
is optimized [1].
28.2.4 Performance Management
Various aspects of plant operations on a day-to-day basis utilize a series of tools to
help optimize performance [1].
learning and data analytics, to optimize energy systems, improve efficiency, and enable
sustainable energy transitions. The figure provides a visual representation of the role of
AI in revolutionizing the energy sector towards cleaner and more renewable sources.
AI is no special bullet. Technology in any form can replace the energy sector.
However, provided the urgency scale and complex global transmission, we have to
be very specific about what we are looking for [2]. AI accelerates the transition of
energy while also extending access to services in energy, innovation, affordable, and
resilient service. Now is the time for all the industries and foundations to enable the
AI-accessed energy future to complete and build trust for collaboration around the
AI-enabled transition [2].
AI and the energy sector are greatly more interrelated than people could imagine.
AI stands at the very centre of revolutionizing the entire energy sector in the future. AI
can efficiently help to overcome the variable and unpredictable nature by accelerat-
ing and adapting conventional energy sources [2].
Texas power outages in February 2021 played a big role in the freezing tem-
peratures. The electricity infrastructure in the world is starting to fail slowly due to
weather conditions and climate intervals [2].
AI in the Energy Sector 333
28.10.2 Energy-Efficiency Programs
The Sustainable Development Goal of energy efficiency deserves serious attention.
An AI-powered energy efficiency programme oversees energy consumption, creates
smart forecasts, and regulates peak usage. The energy efficiency of a facility can
be increased by an average of 10.2% to 40% using model-based predictive control.
Up-to-date forecasts can be provided using predictive analytics and ML. The design
and implementation of energy-efficiency initiatives within businesses, municipali-
ties, and states are therefore based on these projections [3].
28.10.3 Smart Heaters
Due to their control over the entire heating system, smart heaters can be used in con-
temporary green solutions. Here, reasonably directing the electricity to allow it to divert
any unused energy to specific locations is the key strategy [3]. The end-to-end signup
process for smart heaters was developed to create a self-learning home system, and it
will be given a legacy flow that was incomplete and missing payment capability [4].
A self-learning home system helps to reduce the cost of energy consumption,
which in turn when implemented in a heating system, an element uses considerably
high amounts of energy. By obtaining the usage data AI can help it use the heating
coil in such a way that it is used only when it’s required and the amount of water
heated and temperature, etc., are controlled [3].
prior intrusions. In this field, ML has already had considerable success, as seen in the
detection and protection of Trojans, for instance [5].
Many consumers have negative views of AI, particularly in the context of smart
home devices. This makes sense given that the data of the most private zone is col-
lected and discloses a lot about its users. According to studies, the main barrier to the
adoption of smart metres is the uncertainty around the usage of personal data. These
worries are well-founded because there is still no law on how to handle this private
information, which is crucial for the development of the future electrical grid.
The energy consumption of AI itself is a different criticism against it. Large
amounts of electricity are used in the processing of data. Analyzing how to con-
struct data centres to be as energy-efficient and climate-neutral as feasible is vital
when using AI to improve the energy system [6]. This conundrum might be resolved
by placing data centres close to renewable energy production facilities, delaying
power-intensive computing operations to times when there is plenty of power avail-
able, using more energy-efficient IT hardware, or writing software that uses the least
amount of processing power possible [5].
A wide range of relevant application scenarios for AI in the energy sector is avail-
able to support the energy transition and provide a climate-friendly energy system.
But it will be essential to safeguard user information and make AI use clear and
understandable [5].
28.15 CONCLUSION
The potential for AI to speed up the global energy transition is significantly greater.
Although AI applications in the energy sector have been creative and promising,
adoption is still relatively low. With AI, there is a fantastic chance to hasten the
development of the interconnected, highly efficient, and emission-free energy infra-
structure that we will need for a better future. The intersection of AI and energy
is a fantastic place to start for individuals wishing to alter the future of the energy
sector. Technology innovation has fundamentally altered how we think about these
two industries and how they interact. They is the perfect environment for creative
thinkers to leave their imprint and have the potential to impact the world in ways we
haven’t even imagined [7].
REFERENCES
[1] “Artificial intelligence,” Next-kraftwerke.com, 2018.https://www.next-kraftwerke.com/
knowledge/artificial-intelligence (accessed Dec. 20, 2022).
[2] B. Boswell, S. Buckley, B. Elliott, M. Melero, and M. Smith, “An AI power play: Fueling
the next wave of innovation in the energy sector,” McKinsey & Company, May 12, 2022.
https://www.mckinsey.com/capabilities/mckinsey-digital/how-we-help-clients/an-ai-
power-play-fueling-the-next-wave-of-innovation-in-the-energy-sector (accessed Dec. 20,
2022).
[3] Espen Mehlum, D. Hischier, M. Caine, and World Economic Forum, “Here’s how AI
will accelerate the energy transition,” World Economic Forum, Sep. 2021. https://www.
weforum.org/agenda/2021/09/this-is-how-ai-will-accelerate-the-energy-transition/
(accessed Dec. 20, 2022).
338 Artificial Intelligence and Knowledge Processing
[4] “3 ways AI is powering innovation in the energy sector,” Aimagazine.com, May 23,
2022. https://aimagazine.com/technology/3-ways-ai-is-powering-innovation-in-the-
energy-sector (accessed Dec. 20, 2022).
[5] “Artificial intelligence (AI) in the energy sector of the future | Informatec,” Informatec.com,
2020. https://www.informatec.com/en/artificial-intelligence-ai-energy-sector-future
(accessed Dec. 20, 2022).
[6] “Artificial intelligence in energy: Use cases and solutions,” N-ix.com, Sep. 12, 2022.
https://www.n-ix.com/artificial-intelligence-in-energy/ (accessed Dec. 20, 2022).
[7] A. Israel, “How AI will transform the energy sector,” Sifted, Oct. 05, 2021. https://sifted.
eu/articles/ai-energy-transform/ (accessed Dec. 20, 2022).
29 Artificial Intelligence
in Fashion Design
and IPRS
Kaja Bantha Navas Raja Mohamed, Divya Batra,
Shivani, Vikor Molnar, and Sunny Raj
29.1 INTRODUCTION
Artificial intelligence (AI) is the ability of robust computer-based technology to per-
form tasks and generate results that are usually done by humans. AI is playing major
roles like advisory, purchase and trend prediction, online fashion retail, smart wear-
able products, mass customer interaction, electronics textiles, fashion, and lifestyle
accessories [1]. The use of AIe alongside machine learning, deep learning, natural lan-
guage processing, visual recognition, and data analytics reshaped the fashion indus-
try. Automatic tagging and virtual dressing room applications in the fashion industry
are interesting. Through machine vision techniques, we can eliminate manual tagging
and make our fashion catalogue management process with visual analytics and AI
in fashion. Customers can vary and validate their own images based on background,
shape, texture and pose using generative adversarial network (GAN) and deep convo-
lutional generative adversarial network (DCGAN) techniques. This chapter describes
how the adoption of AI has contributed to fashion design. It is mainly divided into two
parts. The first part briefly explains the role of AI in fashion design. The second part
cites the World Intellectual Property Organization (WIPO) Hauge Design registration
portal design IPRs in terms of AI. Lastly, the chapter discusses the limitations of this
study and the breach of our own privacy using AI in our lives.
including healthcare, robotics, marketing, and business analytics. People often tend
to think that AI, machine learning, and deep learning are same since they have
common applications but they are different. AI is the science of getting machines
to mimic human behaviors, but machine learning is the subset of AI that focuses
on getting machines to make decisions by feeding them data. On other hand, deep
learning is the subset of machine learning that uses the concept of neural networks
to solve complex problems, so to sum it up AI, machine learning, and deep learning
are interconnected fields. Machine learning and deep learning need AI to provide
the set of algorithms and neural networks to solve data-driven problems; however,
AI is not restricted to only machine learning and deep learning; it covers a vast
domain of fields, including natural language processing, object detection, computer
vision, robotics expert systems, and so on. AI can be structured along three evolu-
tionary stages that are artificial narrow intelligence, artificial general intelligence,
and artificial super intelligence. Artificial narrow intelligence, also known as weak
AI, involves applying AI only to specific tasks, for example, Alexa, face verification
in iPhone, and Autopilot feature of Tesla. It operates within a limited predefined
range of functions; there’s no genuine intelligence or no self-awareness despite
being a sophisticated example of weak AI. Artificial general intelligence is also
known as strong AI, and it involves machines that possess the ability to perform any
intellectual task that a human being can. Machines don’t possess the human sort
of abilities. We have a robust processing unit which will perform high-level com-
putations but it’s not yet capable of thinking and reasoning like a human. There
are many experts who doubt that AI will ever be possible. AI has an intelligent
agent structure. Intelligent agent is an operation that takes a decision itself to put
AI into action. It works on the basis of perception and action. It does this work on
basis of three components, i.e. sensors, actuators, and effectors. Through sensors,
AI observes changes in the outer environment, and with actuators it performs the
role of controlling and moving a system.
There are five different types of AI are shown in Figure 29.1.
and time-consuming tasks. Because of this, the designer can utilize that saved time
on idea building by focusing more on creativity and design aspects. AI acts like a
personal assistant for designers; for example, the Sensei stitch of Adobe. It helps in
creating better marketing experiences for the clients. It collapses the time between
ideation and execution.
AI analyzes user behavior: AI can detect what our target market or client needs
and wants. It can identify designs and platforms that users like with the help of algo-
rithms. With the help of AI designers can create more user-oriented designs. It also
suggests interesting tips and tricks for enhancing creativity. For example, Prisma
Labs is a platform that helps users to express their feelings with the camera. Lensa
is its Android version. It helps in editing photos for complex features easily like face
retouch, eyebrow shading, etc.
AI creates multiple versions: AI can create multiple forms after recognizing a
pattern. The algorithm extracts colors and patterns of a design and then creates thou-
sands of variants, within the range of identified colors and patterns. It can be very
helpful in logo and banner designing. It can be used to create various unique designs.
It can give new ideas to designers when they are struggling with creativity. Various
visual options can give customers a variety of choices and help them to make choices
342 Artificial Intelligence and Knowledge Processing
easily. For example, Nutella Unica, with an algorithm, created millions of combina-
tions, and they sold millions of jars.
AI brings value to customers: Personalized user experience is the key to cus-
tomer satisfaction. Platforms that make users feel loved and cared for attract and
retain users. Look at Netflix. It recommends a series based on the user’s history.
It also tells you why it is recommending the series ‘because you watched Lucifer’.
It saves users from a lot of effort. Instead of going through hundreds of available
options, the user finds the content they are interested in without any effort. It builds
a strong bond between the brand and the user. In today’s era everyone wants individ-
uality and personalized experiences. The companies which provide this feature to
their customers gain profits. It enhances customer satisfaction and loyalty. The best
example for this is Netflix, as mentioned earlier.
inventory tracking, reducing the risk of raw material waste. This is exactly
what AI provides to inventory management. AI in the supply chain has a pleth-
ora of applications [4–6]. Operational procurement using intelligent data and
chatbots, supply chain planning to forecast demand and supply, warehouse
management to optimize stock, faster and more accurate shipping to reduce
lead times and transportation expenses, and optimal supplier selection using
real-time data are all areas of impact in the supply chain and logistics [7].
In styling: Furthermore, the use of AI in fashion is helping each of us to
choose the best outfits that suit our body types and fashion preferences. These
AI-enabled clothing and costumes are adjusted not only for different situations
and weather but also for the user’s preferences and needs, body type, colors,
and current fashion trends [8, 9].
Customer experience: The AI customer experience creates hyper-relevant
digital ads. According to Joanna Coles, the former chief content officer of
Hearst Magazines, “People hate advertising.” This is because, according to
Marc Pritchard, the chief brand officer at Procter & Gamble, ads are often
irrelevant and sometimes “just silly, ridiculous or stupid.” Ninety-one percent
of people say advertisements are very invasive these days [10]. To deal with the
challenge, brands are using AI to show relevant ads to the viewers. Machine
learning is helping companies to predict what kind of advertisement viewers
would like to watch based on their online behavior, profile, and audience seg-
mentation [11].
Empowers personalized search: To enhance the online user experience com-
panies are using AI. Consumers like to buy from the brands that provide them
more personalized experience and individuality. As today’s generation seeks
their individuality in products and services, they are buying for their use. Many
e- commerce platforms are categorizing and recommending the products and
services based on user behavior. It provides feasibility to the customer to reach
their choice of products, thus reducing their frustration in getting what they want.
24/7 customer service: For every brand to operate smoothly, there is need for
providing customer service. But it is very difficult and expensive to provide it
via human teams only. AI helps companies to provide such services via email,
chat, messaging, SMS, and voice platforms. This takes queries of customers
through their preferred channels. With human staff only, it will be hard to
manage and would be costly to provide the omni channel experience that cus-
tomers want.
Visual search: Visual search scans and detects user-input photos, similar to
text-based search, and returns the most relevant search results. Customers may
search for what they need without having to explain it, making online shopping
more convenient and enjoyable. Users can capture screenshots of online outfits,
recognize shoppable gear and accessories in the image, and then find the same
outfit and shop for comparable fashions using AI-enabled apps.
Automated authentication: Forgeries related to fashion counterfeit items are
also detected using computer vision and machine learning. Detecting fakes
used to necessitate the use of professional traditions or the expert eye of other
law enforcement professionals. AI systems can now detect counterfeit things
344 Artificial Intelligence and Knowledge Processing
that increasingly resemble the real thing [11]. Customs and border officials are
using AI to help verify the authenticity of high-quality items that are com-
monly counterfeited, such as handbags and sunglasses. Ordinary buyers may
fail to spot counterfeit items from a third-party seller while browsing vast
online marketplaces. When a customer buys a product that appears to be genu-
ine but isn’t, it can leave a sour taste in their mouth and affects their perception
of the brand.
theft, including theft of digital data stored in the cards carried in your wallet.
The smart tracker comes with location tracking and anti-lost alarm system fea-
tures [14]. The alarm rings on your smartphone and smart tracker as soon as your
tracker goes out of range. The replaceable battery offers a battery life of over six
months.
Amazon’s Fashion Look replaced by Style by Alexa: Amazon’s Echo Look came
out in 2017 to help people get fashion and styling advice digitally [15]. But by 2020,
this was replaced by Amazon’s own Style by Alexa and StyleSnap application or
webpage. StyleSnap is an AI-powered feature built into the Amazon app, and it’s here
to help you find looks you love quickly and easily [16]. All you have to do is take a
photograph or screenshot of an outfit, upload it onto the Amazon app, and you’ll be
presented with items that look just like the ones in the picture. Sometimes, they’re
even the exact same. It’s truly that easy.
Brand originality authentication AI: Dupe Killer by Deloitte: Dupe Killer
is a new piece of technology that searches for design infringements using AI by
learning the shape or configuration of a product and seeking out copies. This is
different from detecting counterfeit goods, where the name is stolen and traded
upon. Instead, Dupe Killer operates in a world where the only clues are visual.
Counterfeits claim to be the brand, while design infringements lean on the brand’s
key features without ever mentioning the original product and are tricky to track
and remedy. Some of the listed industrial design IPRs for fashion and accessory
design are shown in Figure 29.2. The figure includes industrial design with world
intellectual property rights reference numbers also. We have captured these indus-
trial design images from the world intellectual property rights industrial design
global database [17, 18].
29.6 LIMITATIONS
During the research it was observed that the products listed under WIPO are pat-
ented, but there are not enough products powered by AI which are patented in the
world. Although the innovations are amazing, they come with some drawbacks that
has restricted the inventor from gaining a patent for their technologically intelligent
creation, for example, the VOLT+ sunglasses mentioned earlier. Also, there are AI
products like some wallets being sold in the market from online and offline channels
that have yet not received a patent or a trademark. Tag8 Wallet is one of the examples
of the same. Another factor that limits AI in some way is the fact that AI is taking
away our security from us, as it breaches our privacy by capturing all our data, not
knowing what is personal and what is not [19, 20]. Having a smartphone itself is like
giving away your privacy, for it collects every detail of our day and stores it where we
can’t reach and help ourselves from restraining any misuse of it.
29.7 CONCLUSION
AI has been available for two decades now; today what we see is the evolution of AI
in machines and technology. But this isn’t limited to here; it is being incorporated
by every industry in this world because of the obvious benefits it has. There is much
346 Artificial Intelligence and Knowledge Processing
more to deep dive into to explore what more can with done with the help of AI. AI
has conquered a good space in the fashion and design industry with complete rights
or IPRs, especially during the last five to seven years.
Artificial Intelligence in Fashion Design and IPRS 347
REFERENCES
[1] Artificial Intelligence and the Fashion Industry, Professor Francesca Masciarelli,
MariannaPupillo.
[2] Artificial Intelligence for Fashion: How AI is Revolutionizing the Fashion Industry by
Leanne Luce.
[3] Artificial Intelligence for Fashion Industry in the Big Data Era By Sébastien Thomassey
and Xianyi Zeng.
[4] https://www.igi-global.com/article/artificial-intelligence/211137.
[5] https://www.sciencedirect.com/science/article/pii/S014829632030583X.
[6] The Use of AI in Inventory Management - TFOT (thefutureofthings.com).
[7] https://builtin.com/artificial-intelligence/ai-in-supply-chain.
[8] How AI is Changing Fashion: Impact on the Industry with Use Cases, https://medium.
com/vsinghbisen/how-ai-is-changing-fashion-impact-on-the-industry-with-use-cases-
76f20fc5d93f.
[9] https://textilelearner.net/artificial-intelligence-in-fashion-industry.
[10] https://www.netomi.com/ai-customer-experience.
[11] https://hbr.org/2022/03/customer-experience-in-the-age-of-ai.
[12] https://www.theverge.com/2020/9/1/21404004/anti-procrastination-smart-glasses-
productivity-boosting-auctify-specs-indiegogo.
[13] https://www.forbes.com/sites/bernardmarr/2021/06/21/ai-glasses-you-can-try-on-and-
try-out-with-ar/?sh=4b22b4297e6d.
[14] https://www.croma.com/tag8-dolphin-wallet-tracker-rfid-protect-800021-black-/
p/233155?utm_source=google&utm_medium=ps&utm_campaign=sok_pla_ssc-
other_home_appliances&gclid=Cj0KCQjwxtSSBhDYARIsAEn0thRj9UvW
49WK3X8W-mvqPKOMBCV9TYvc00T4G0Hm73ZSsY8l-N8Bi-8aAufUEALw_wcB.
[15] https://www.lutzker.com/ai-and-copyright-in-the-fashion-industry/.
[16] http://tesi.luiss.it/25378/1/212661_PUPILLO_MARIANNA.pdf.
[17] https://digitalcommons.uri.edu/cgi/viewcontent.cgi?article=1007&context=tmd_major_
papers.
[18] https://www.amazon.com/stylesnap.
[19] https://www.insider.com/guides/style/amazon-stylesnap-review#click-the-camera-icon-
in-the-upper-right-hand-corner-of-the-amazon-app-1.
[20] https://www.engadget.com/2016-03-18-ralph-lauren-polotech-review.html.
30 Artificial Intelligence
in Education
A Critic on English
Language Teaching
Dr. Harishree and Jegan Jayapal
30.1 INTRODUCTION
Artificial intelligence (AI) is an evolving field of study which plays a crucial role in
post-pandemic education. The COVID-19 pandemic lockdown has made every field
rely on web-based technology to sustain itself. Concurrently, the education field has
also transferred its traditional teaching method and shifted to e-learning platforms. The
students have been encouraged to self-learn with the guidance of the teachers through
online networks. The usage of various information and communication technology
(ICT) tools has paved the way for teachers to reach the students effectively in a lock-
down situation. Because of the precipitated shift in the teaching-learning platforms,
the scope of artificial intelligence in education (AIEd) has grown immensely. The
term artificial intelligence has been “used to describe a collection of technologies that
can solve problems and perform tasks to achieve defined objectives without explicit
human guidance” (Schmidt & Strasser, 2022). Consequently, we can assure that AI can
aid academicians in achieving a successful teaching-learning process post-pandemic.
Besides, AI comprises multi-layered technologies which drive updated data and algo-
rithms that can help education to transform according to the need of the hour.
Moreover, integrating AI-powered tutors has become a vital part of the future of
education. Approval of courses to be offered by virtual universities by the governments
(e.g., Virtual University of India) across the nations is a case in point for the conspicuous
role of AI in education. Virtual universities can be online without a physical building,
and universities and recognized institutes offer online degree programmes in addition
to their physical programmes. In these virtual universities, students can ultimately earn
their degrees through online learning, like the online Master of Business Administration
at the University of Illinois Urbana-Champaign. India’s National Educational Policy
(NEP) 2020 has also encouraged more techno-based courses. As per recent modifica-
tions in the educational policy of India, all engineering graduate students must take at
least one online course on massive open online course (MOOC) platforms during their
course of study to earn their degree in India. Many foreign universities provide online
diplomas for students who can learn the subject remotely (e.g., PG Diploma Business in
Finance by the University of Essex). But most of these courses will be monitored by the
course offering faculty, and each student of the course is evaluated individually by the
348 DOI: 10.1201/9781003328414-30
Artificial Intelligence in Education 349
teacher with the help of AI systems. It is also economically viable for students world-
wide to gain knowledge from the high-quality teachers in well-reputed universities.
According to (Ouyang & Jiao, 2021), AIEd has three paradigms: AI-directed,
AI-supported, and AI-empowered. In all these paradigms, AI helps the learners in
different roles as recipients, collaborators, and leaders. So, while using AI in teach-
ing/learning, the role of the teacher and student should be considered. As a recipient,
the learner can get the results of their task performed through AI; as a collaborator,
the learner can use AI to improve their learning as a collaborator; and as a leader, the
learner can use AI to lead others with the advancement in the AI while provid-
ing language learning tasks. But there were a few issues concerning facilitating the
education to the target learners in each paradigm. For instance, AI could not iden-
tify the amount and kind of information it requires about the learners to deliver the
assistance, the extent to which the learner’s information should be integrated with AI
systems, and the ways to address the complexity of AI systems with educational con-
texts. Therefore, frameworks have evolved in AIEd to help practitioners implement
AI tools and techniques in classroom teaching according to the teaching-learning
environment. There are a few notable frameworks discussed in this chapter to give
insight into integrating AI in the teaching-learning process of language teaching and
education. Moreover, the chapter also discusses the usage of AI in various aspects of
teaching foreign languages in the present pandemic situation.
Later, Hwang et al. (2020) developed the framework (Figure 30.1) evaluating the four
roles of AI in education. The four roles of AI are intelligent tutor, intelligent tutee,
intelligent learning tool or partner, and policymaking advisor. Firstly, the intelligent
tutor has been working through intelligent tutoring systems (ITSs), adaptive/person-
alized learning systems and recommendation systems of AI to foster the students’
education. For example, AutoTutor uses a dialogue-based tutoring system to instruct
students on physics, computer literacy and critical thinking, and ASSISTments that
give real-time feedback for students and data-driven reports for teachers. In addition,
350 Artificial Intelligence and Knowledge Processing
intelligent tutee has no special attention in the field of AI, but there are some exam-
ples like Microsoft Tay (a chatbot), which was shut down due to inappropriate com-
ments. Third, an intelligent learning tool or partner helps the learners to collect and
analyse data by focusing on the higher-order thinking skills (i.e., analyse, synthesis,
and evaluate). Finally, a policymaking advisor can help to develop policymaking in
the education field. The AI tools will help identify current trends and issues in edu-
cational settings; accordingly, new educational policies can be built and evaluated.
This framework helps in understanding the role of AI in education by addressing
the different roles and responsibilities in the field of education. The teaching, tools
to teach, policymaking, and personally addressing the learners’ progress are part of
the education system. In addition, we can also identify that AI can help in developing
our education system. It makes it vital for teachers and academicians to nurture AI
in our educational curriculum.
Consecutively, a common framework for AI in higher education has been pro-
posed by Jantakun et al. (2021), using seven components to implement AI in higher
education. The components are, namely, user interactive components of technology
of AI, component and technology of AI, roles of AIEd, machine learning and deep
learning, decision support system (DSS) modules (student, teaching and research
modules), application of AI in education, and AI to enhance campus efficiencies.
Subsequently, the eXplainable AI education (XAI-ED) framework was developed
by Khosravi et al. (2022). The framework concentrates on human-computer inter-
action and the cognitive and learning sciences with six main aspects, as represented
in Figure 30.2. This framework provides information on the basic concepts of
AIEd. As the name of the framework suggests, it explains the models available in
AI, approaches that can be used in AI, pitfalls of AI, the main stakeholders of AI,
the main benefits of using this AI interface framework, and designing effective AI
tools. Briefly, this framework considers both users, using present tools, analysis of
the issues in using AI, and effective methods to develop new and advanced tools.
AI and machine learning technology for adaptive learning have yet to be developed
to meet all learners’ needs.
Schmidt and Strasser (2022) have proposed a domain model, an evaluation model,
and a learner model as three foundational pillars of the adaptation model for the
machines to facilitate adaptive learning. According to the authors, a domain model
contains information about the parts of the language that have been addressed in the
exercises, which address all levels of grammatical concepts. Further, the evaluation
model will assess the learners’ performance individually, and the learner model will
collect and update the learners’ proficiency with respect to the domain model’s con-
tent. Hence, this serves best to improve the learner’s adaptive learning environment.
In addition, Interact4School is a government-funded interdisciplinary project con-
centrating on Focus on Form (FOF). FOF is a language education approach where
354 Artificial Intelligence and Knowledge Processing
learners will be aware of the grammatical form of the language features they use for
communicative purposes in the target language. In this project, a mixed-methods
approach has been applied: language testing, classroom videography, questionnaires,
interviews with learners and teachers, and conditions and effects of adaptive, indi-
vidualized practice with intelligent feedback (Schmidt & Strasser, 2022). Framework
for user modelling and adaptation (FUMA) is a student-based framework that applies
machine learning in the courses in MOOC-like platforms to provide personalized
tutoring to students. Hyper-personalization AI technology has been enabled by
machine learning, where Carnegie Learning (that offers immediate guidance to the
students and interacts with the learners closely) has evolved as an intelligent instruc-
tional program. Correspondingly, these AI tools and software facilitate language
tutoring and learning in our current 21st-century technology-based education system.
Further, adaptive learning can be assisted by employing question-answering AI tools
as part of language teaching.
providing feedback and the status of the learning progress personally to all individ-
ual learners. In connection with the feedback, AI also provides tools and software
for grading learners’ performance automatically and provides individual feedback by
simplifying the teachers’ major workload.
AI-based teaching, and so on. Hence, academicians should take the necessary steps
to improve educational policies, and educational institutes should cater to the needs
of upcoming AI-based education.
30.7 CONCLUSION
Many governments across the world still suggest that educational institutes use
online modes in the classroom. AIEd has become accessible to everyone worldwide,
but rural institutes still cannot use advanced technology above a certain level. Since
educational institutes have already experimented with the implementation of AI in
pedagogy, they can investigate the limitations and obstacles between the students
and AIEd. This chapter throws light on the development of AI in the educational
field through different frameworks and how AI tools can be implemented in teaching
the English language. The education tools are not limited to language tutoring, but
can be adapted to any field of study. As mentioned in the chapter, AI is still bloom-
ing, and different avenues to integrate AI into classroom teaching can be explored.
Although AI provides many advantages, there are also some practical difficulties
in real-life to imply AI in education. Hence, academicians and researchers should
concentrate on possible solutions to overcome those difficulties. Eventually, AI can
provide excellent support in academic institutes’ teaching and learning processes.
AI can assist teachers in promoting students’ learning, but it is important to note
that the pedagogical intervention of teachers is crucial in any learning environment.
REFERENCES
Akhtar, B. (2015). An Automated Grading and Feedback System for a Computer Literary
Course. Appalachian State University, Boone, NC.
Guan, C., Mou, J., & Jiang, Z. (2020). Artificial intelligence innovation in education: A twenty-
year data-driven historical analysis. International Journal of Innovation Studies, 4(4),
134–147. https://doi.org/10.1016/j.ijis.2020.09.001.
Hwang, G.-J., Xie, H., Wah, B. W., & Gašević, D. (2020). Vision, challenges, roles and research
issues of artificial intelligence in education. Computers and Education: Artificial Intelli-
gence, 1. https://doi.org/10.1016/j.caeai.2020.100001.
Jantakun, T., Jantakun, K., & Jantakoon, T. (2021). A common framework for artificial intel-
ligence in higher education (AAI-HE model). International Education Studies, 14(11).
https://doi.org/10.5539/ies.v14n11p94.
Khosravi, H., Shum, S. B., Chen, G., Conati, C., Tsai, Y.-S., Kay, J., . . . Gašević, D. (2022).
Explainable artificial intelligence in education. Computers and Education: Artificial
Intelligence, 3. https://doi.org/10.1016/j.caeai.2022.100074.
Langley, P. (2019). An integrative framework for artificial intelligence education. The Ninth
AAAI Symposium on Educational Advances in Artificial Intelligence (EAAI-19),
9670–9677.
Ouyang, F., & Jiao, P. (2021). Artificial intelligence in education: The three paradigms.
Computers and Education: Artificial Intelligence, 2. https://doi.org/10.1016/j.
caeai.2021.100020.
Ramesh, D., & Sanampudi, S. K. (2022). An automated essay scoring systems: A system-
atic literature review. Artificial Intelligence Review, 55(3), 2495–2527. https://doi.
org/10.1007/s10462-021-10068-2.
Artificial Intelligence in Education 357
Schmidt, T., & Strasser, T. (2022). Artificial intelligence in foreign language learning and
teaching: A CALL for intelligent practice. Anglistik: International Journal of English
Studies, 33(1), 165–184.
Sing, S., Das, N., Michael, R., & Tanwar, P. (2016). The question answering system using NLP
and AI. International Journal of Scientific & Engineering Research, 7(12), 55–60.
Thongprasit, J., & Wannapiroon, P. (2022). Framework of artificial intelligence learning plat-
form for education. International Education Studies, 15(1). https://doi.org/10.5539/ies.
v15n1p76.
Yang, S. J. H., Ogata, H., Matsui, T., & Chen, N.-S. (2021). Human-centered artificial intelli-
gence in education: Seeing the invisible through the visible. Computers and Education:
Artificial Intelligence, 2. https://doi.org/10.1016/j.caeai.2021.100008.
31 Review of Learning
Analytics Techniques
and Limitations in
Higher Education
A 21st-Century Paradigm
Tuhin Utsab Paul and Tanushree Biswas
31.1 INTRODUCTION
The present century is the age of analytics. Analytics has made inroads in the educa-
tional sector also. In academics, it is known as learning analytics. Learning analyt-
ics is the process of analyzing students and improving the educational performance
based on the outcome of the analysis. These analytics also help educational institutes
to improve their educational policies and move towards a more outcome-based ped-
agogy. Students around the world need advanced skills to succeed in the globalized,
knowledge-based world of today. To succeed in this highly connected complex world
scenario, students need certain 21st-century skills:
1. Collaboration
2. Skilled communication
3. Knowledge construction
4. Self-regulation
5. Real-world problem solving and innovation
6. Use of ICT in learning
Learning analytics will be an assessment and guiding tool for assessing the outcome
of the skill-based education system and help in improving and planning policy for
futuristic education. The National Education Policy 2020 by the Government of India
also stressed the 21st-century skill-based education and outcome-based assessment.
Higher education institutions must evaluate the learners’ needs and also the require-
ment of Industry 4.0 and develop curriculum and pedagogy based on them. Learning
analytics is one of the tools which help in assessing the outcome and performance of
learners. The main purpose of learning analytics is to improve the outcome-based
learning experience of the students and to improve the teaching pedagogy and teaching-
learning environment for better productivity. Conventionally, questionnaires and
surveys are used for identifying the strategy of the teaching-learning environment.
358 DOI: 10.1201/9781003328414-31
Review of Learning Analytics Techniques 359
But in the 21st century, and with the effects of a global pandemic, information and
communication technology (ICT)–based education had taken a front seat. Because of
online teaching, in addition to survey-based analysis, computer-aided analysis can also
be done to assess the learning outcome of students. This chapter focuses on the var-
ious learning analytics technologies that are currently in use and provides a detailed
review of the various cutting-edge research studies in the field of learning analytics.
Moreover, this chapter also discusses the gap in research in the area of learning ana-
lytics and how those gaps in research can lead the way forward for further research.
A wide variety of technology had been used to achieve the goals of learning ana-
lytics. The technologies include adaptive learning, artificial intelligence, predictive
modeling, data mining, clustering, machine learning and neural networks. Various
data mining technologies such as support vector analysis, clustering, or linear regres-
sion are used to analyze students’ performance. Statistical methods such as Pearson
spearman and Bray Curtis methods, or Euclidean squared methods are used for
evaluating the performance of learners. With the growth of online teaching-learning
environment, learning analytics is also used in online teaching. Mobile technology,
social network analysis, IOT technologies, and computer vision techniques are used
for learning analytics.
Carlota et. al. (2014) [6] explored how machine learning systems can be used in
specific learning requirements, such as sequencing and performance prediction, and
he proposed a minimally invasive algorithm for web services–based sequencing for
integrating learning analytics with e-learning systems.
Since, in today’s Internet-driven world, learners learn not only from books or
classrooms but a lot of Internet and social networking sites, Tobias et. al. [7] used
web analytics, which is an established field of study, in the area of learning analytics
for collecting and analyzing learners’ interaction on online learning platforms. This
work uses Google Analytics with a MOOC platform to capture behavioral data about
learners’ activities. But this study had a limitation in terms of learner-specific metrics
and a concern about data privacy of learners.
In the article by Ali et al. [8], researchers developed a learning analytics tool called
LOCO-Analyst which analyzes the feedback from all different aspects of teach-
ing-learning related to student activities in the LCMS. Instead of working on a pre-
defined set of factors which could trigger the system and alarm the teacher regarding
student performance, for example, in TeacherADVisor (TADV) and Student Inspector,
this tool, on the other hand, specifically examines the feedback from different domains
by tracking the students’ activities and interactions among themselves as well as the
teachers and broadly categorizes the analysis into local and global feedback.
On the other hand, eLAT, a learning analytics toolkit [9] developed by Dyckhoff
et. al., can be integrated with any virtual learning environment (VLE) which can
extract the data and further segregate it into different criteria like teacher-student
profiles, lecture notes and resource materials, assignments and submission data, stu-
dent activities like posting queries or responding or communicating in the VLE, etc.
The researchers experimented with a prototype at RWTH Aachen University and
conducted several iterative processes of developing the said prototype. The eLAT
user interface provides an interactive panel where teachers could easily monitor the
students’ visibility in the VLE.
Some of the learning analytics tools have also developed focusing on the students
posts, chats, or any communication over the discussion forum; for example, SNAPP
[10]. Wise et. al. [11] discussed how learning analytics could also play a vital role
specifically on online discussion platforms. The researchers emphasized the platform
called Visual Discussion Forum, where the tool represents a tree structure of all the
discussions and comments made in the platform. This helps the students in having a
clear graphical representation of the topic of conversation. The researchers further
362 Artificial Intelligence and Knowledge Processing
careful analysis and evaluation, learning analytics can easily monitor the progress of
the learner and could detect the possible drop-outs as well. Since not much research
has been done in the field of implementation of learning analytics in the MOOC,
there is scope in exploring and validating the concept.
31.6 CONCLUSION
The education system has been completely transformed due to the COVID-19 pan-
demic, and as a result, teaching pedagogy has changed dramatically, which has paved
the way for the new reforms in the system, especially the concept of the e-classroom.
Since the teaching has shifted from the physical classroom to remotely and on
digital platforms, the challenge to keep the concentration level of all the students
during the whole online session by the teacher has become a daunting task. The lack
of physical classroom teaching is affecting the students as well to focus on every
topic. This could affect the quality and the effectiveness of the teaching.
If one can use the concept of learning analytics on the students’ behavior during
the online classes, the teacher could be able to examine the behavior, which in turn
could help in improving the teaching efficiency.
Computer vision techniques can be used to analyze the student behavior pattern
throughout a series of online classes with the help of high-definition webcam along
364 Artificial Intelligence and Knowledge Processing
with the use of headphones and microphones. Using a stable bandwidth as well
as a stable online platform for teaching, one could analyze the attention level of a
student on the topic of discussion in the class. During online class, by recording the
screen, eye, and head movements of students pattern recognition techniques can be
analyzed and the duration of concentration looking at the screen can be captured.
Hence, an estimate of focused and concentrated learning phase can be determined.
Techniques such as data mining and statistical methods like regression analysis,
regression trees, correlation matrices, Pearson Spearman, Bray-Curtis methods,
and various clustering methods and other visualization techniques could assess
student performance. This shall give a concise picture of a teaching characteristic
of a teacher, as well the students’ concentration level during the class. The teachers
thus could identify students who need guidance and in a way can modify certain
teaching styles in their online sessions. Those analyses can also help in modifying
the teaching pedagogy to help make the class more interesting and helpful to the
learners.
REFERENCES
[1] Charles S. Elliott, “JACME2T: An industry - academic consortia to enhance continuing
engineering education”, FIE Conference, 1998.
[2] Khalid Isa, Shamsul Mohamad and Zarina Tukiran, “Development of INPLANS: An
analysis on students’ performance using neuro-fuzzy”, Symposium on Information
Technology, vol 3, pp. 1–7, 2008.
[3] Carlos Márquez-Vera, Cristóbal Romero Morales and Sebastián Ventura Soto, “Predict-
ing school failure and dropout by using data mining techniques”, IEEERITA, vol. 8,
pp. 7–14, 2013.
[4] Usamah bin Mat, Norlida Buniyamin, Pauziah Mohd Arsad and Rosni Abu Kassim, “An
overview of using academic analytics to predict and improve students’ achievement:
A proposed proactive intelligent intervention”, IEEE Conference on Engineering Edu-
cation (ICEED), pp. 126–130, 2013.
[5] Hsu-Chen Cheng and Wen-Wei Liao, “Establishing an lifelong learning environment
using IoT and learning analytics”, ICACT, pp. 1178–1183, 2012.
[6] Carlotta Schatten, Martin Wistuba, Lars Schmidt Thieme Sergio and Gutierr´ez-Santos,
“Minimal invasive integration of learning analytics services in intelligent tutoring sys-
tems”, ICALT, pp. 746–748, 2014.
[7] T. Rohloff, S. Oldag, J. Renz and C. Meinel, “Utilizing web analytics in the context of
learning analytics for large-scale online learning,” 2019 IEEE Global Engineering Edu-
cation Conference (EDUCON), Dubai, United Arab Emirates, pp. 296–305, 2019.
[8] Liaqat Ali, Marek Hatala, Dragan Gašević and Jelena Jovanović, “A qualitative evalua-
tion of evolution of a learning analytics tool”, Computers & Education, Vol. 58, No. 1,
pp. 470–489, 2012.
[9] Anna Lea Dyckhoff, Dennis Zielke, Mareike Bültmann, Mohamed Amine Chatti and
Ulrik Schroeder, “Design and implementation of a learning analytics toolkit for teach-
ers”, Journal of Educational Technology & Society, Vol. 15, No. 3, pp. 58–76, 2012.
[10] S. Dawson, A. Bakharia and E. Heathcote, “SNAPP: Realising the affordances of real-
time SNA within networked learning environments”, Learning, pp. 125–133, 2010.
[11] Alyssa Friend Wise, Yuting Zhao and Simone Nicole Hausknecht, “Learning analyt-
ics for online discussions: A pedagogical model for intervention with embedded and
extracted analytics”, Proceedings of the Third International Conference on Learning
Analytics and Knowledge, pp. 48–56, 2013.
Review of Learning Analytics Techniques 365
B genomics, 55
Global Terrorism Database (GTD), 302
Babylon’s AI, 61 greedy equivalence search (GES), 163
backward equivalence search (BES), 163 gross domestic product (GDP), 291
bacterial pneumonia, 67
Bayesian networks, 159 H
benign, 79
big data, 35 hamming network, 200
hidden layer, 196
C Hopfield network, 199
human-created consciousness, 17
case-based reasoning, 188
channel of thought, 21 I
chatbots, 298
chronic obstructive pulmonary disease input layer, 196
(COPD), 67 intellectual limit, 18
clinical care, 59 intelligence and augmented reality, 298
clustering algorithm, 240 Intelligent Language Tutoring Systems
Complete Ensemble Empirical Mode (ILTS), 352
Decomposition with Adaptive intrinsic motivation, 23
Noise (CEEMDAN), 122 IoT, 31
confusion matrix, 113
Convolution Neural Network K
(CNN), 70
cyber-physical system, 32 Keras, 206
K nearest neighbor, 273
D
L
data augmentation, 112
decision tree, 91 learning management system (LMS), 359
deep learning, 6 logic-based system, 191
digital twins, 334 logistic regression, 93
DNA, 49 LoRa, 37
E M
echocardiography, 60 machine learning, 5
efficiency, 331 magnetic resonance imaging (MRI), 55
empowers personalized search, 343 Medical Care–Associated Pneumonia (MCAP), 69
367
368 Index