Debabala Swain Machine Learning and Information 2020
Debabala Swain Machine Learning and Information 2020
Debabala Swain Machine Learning and Information 2020
Debabala Swain
Prasant Kumar Pattnaik
Pradeep K. Gupta Editors
Machine
Learning and
Information
Processing
Proceedings of ICMLIP 2019
Advances in Intelligent Systems and Computing
Volume 1101
Series Editor
Janusz Kacprzyk, Systems Research Institute, Polish Academy of Sciences,
Warsaw, Poland
Advisory Editors
Nikhil R. Pal, Indian Statistical Institute, Kolkata, India
Rafael Bello Perez, Faculty of Mathematics, Physics and Computing,
Universidad Central de Las Villas, Santa Clara, Cuba
Emilio S. Corchado, University of Salamanca, Salamanca, Spain
Hani Hagras, School of Computer Science and Electronic Engineering,
University of Essex, Colchester, UK
László T. Kóczy, Department of Automation, Széchenyi István University,
Gyor, Hungary
Vladik Kreinovich, Department of Computer Science, University of Texas
at El Paso, El Paso, TX, USA
Chin-Teng Lin, Department of Electrical Engineering, National Chiao
Tung University, Hsinchu, Taiwan
Jie Lu, Faculty of Engineering and Information Technology,
University of Technology Sydney, Sydney, NSW, Australia
Patricia Melin, Graduate Program of Computer Science, Tijuana Institute
of Technology, Tijuana, Mexico
Nadia Nedjah, Department of Electronics Engineering, University of Rio de Janeiro,
Rio de Janeiro, Brazil
Ngoc Thanh Nguyen , Faculty of Computer Science and Management,
Wrocław University of Technology, Wrocław, Poland
Jun Wang, Department of Mechanical and Automation Engineering,
The Chinese University of Hong Kong, Shatin, Hong Kong
The series “Advances in Intelligent Systems and Computing” contains publications
on theory, applications, and design methods of Intelligent Systems and Intelligent
Computing. Virtually all disciplines such as engineering, natural sciences, computer
and information science, ICT, economics, business, e-commerce, environment,
healthcare, life science are covered. The list of topics spans all the areas of modern
intelligent systems and computing such as: computational intelligence, soft comput-
ing including neural networks, fuzzy systems, evolutionary computing and the fusion
of these paradigms, social intelligence, ambient intelligence, computational neuro-
science, artificial life, virtual worlds and society, cognitive science and systems,
Perception and Vision, DNA and immune based systems, self-organizing and
adaptive systems, e-Learning and teaching, human-centered and human-centric
computing, recommender systems, intelligent control, robotics and mechatronics
including human-machine teaming, knowledge-based paradigms, learning para-
digms, machine ethics, intelligent data analysis, knowledge management, intelligent
agents, intelligent decision making and support, intelligent network security, trust
management, interactive entertainment, Web intelligence and multimedia.
The publications within “Advances in Intelligent Systems and Computing” are
primarily proceedings of important conferences, symposia and congresses. They
cover significant recent developments in the field, both of a foundational and
applicable character. An important characteristic feature of the series is the short
publication time and world-wide distribution. This permits a rapid and broad
dissemination of research results.
** Indexing: The books of this series are submitted to ISI Proceedings,
EI-Compendex, DBLP, SCOPUS, Google Scholar and Springerlink **
Pradeep K. Gupta
Editors
Machine Learning
and Information Processing
Proceedings of ICMLIP 2019
123
Editors
Debabala Swain Prasant Kumar Pattnaik
Department of Computer Science School of Computer Engineering
Rama Devi Women’s University Kalinga Institute of Industrial Technology
Bhubaneswar, Odisha, India Deemed to be University
Bhubaneswar, Odisha, India
Pradeep K. Gupta
Department of Computer Science
and Engineering
Jaypee University of Information
Technology
Waknaghat, Himachal Pradesh, India
This Springer imprint is published by the registered company Springer Nature Singapore Pte Ltd.
The registered company address is: 152 Beach Road, #21-01/04 Gateway East, Singapore 189721,
Singapore
Organization
Chief Patron
Patron
Steering Committee
General Chairs
v
vi Organization
Convener
Organizing Chairs
Organizing Secretaries
Organizing Committee
Finance Chair
ix
Contents
xi
xii Contents
xvii
Mobile Device Transmission Security
Policy Decision Making Using
PROMETHEE
Abstract This paper focuses on the brand of mobile devices having the better quality
transmission security policy available in the market as per the need of the customers.
However, the criteria for each brand of the mobile device have its own functions. So,
to choose the reasonable one among accessible options is a challenge and leads
to decision-making issues. These issues might be tended to by multiple-criteria
decision-making (MCDM) approach. PROMETHEE is one of the decision-making
processes that encourage clients to choose the appropriate option depending on their
own observation and the criteria they take into consideration. The experiment has
been performed based on the feedback provided and the outcome is recorded.
1 Introduction
The increasing demand of mobile phones among all types of users brings a
significant change in the market scenario. But mobile phones are not free from the
perspective of security threats [1]. The quantity of aggressor performing program
assault is expanding ongoing year, whose target is running various sorts of applica-
tions on cell phones [2]. There is perhaps one sort of Trojan that can contaminate
clients’ web seeking engine and change web pages or exchange online transactions.
A few methodologies may be utilized to shield clients from this sort of assault, for
example, online transactions approval and validity, site, of customer validation, secu-
rity code advancement, and so forth. The job of security has been acknowledged to
3 Literature Review
There are numerous MCDM procedures like ELECTRA [3], TOPSIS [4], AHP (ana-
lytical hierarchical process (AHP) [5], analytic network process (ANP) [6], and GRA.
Our work focuses on PROMETHEE technique. PROMETHEE technique is depen-
dent on a common examination of every elective alternative regarding every one of
the chosen criteria. With the end goal to perform elective ranking by PROMETHEE
technique, it has an obligation to characterize preference function P(a, b) for alter-
natives a and b for characterizing criteria. Let f (a) be the estimation of the criteria
for alternative a and also f (b), of similar criteria for alternative b. It is viewed as that
alternative a is superior to b if f (a) > f (b). The preference happens from values 0 to
1 (Table 1).
Higher preference is characterized by a higher value from the given interval. The
administrator needs to pursue specific preference function for each criterion. The
following Fig. 1 demonstrates six summed up criterion function which is gotten by
selecting the extreme four inside the table of criteria from the given basis. All the six
summed up capacity is mentioned by linear functions with the exception of the Gauss
criterion. It is important to decide the estimation of function preference activity “a”
in connection to activity “b” for every criterion and ascertains index of preference
for the foundation an in connection to b. The index preference (IP) is calculated in
the following way:
n
IP(a, b) = Wj Pj (a, b), Wj = 1 (1)
j
4 An Analogy
Considering a situation where mobile transmission security policy, rather than the
four perspectives say possessions, reliability, connectivity, confidentiality, we addi-
tionally found from the feedback that profile settings, technology, culture, and econ-
omy are other huge viewpoints to be considered in defining mobile phone transmis-
sion security policy. The accompanying part depicts the eight parts of the information
security arrangement connected in this investigation.
4.1 Criteria
The criteria for said analogy have been discussed in Table 2 with possession as C1,
reliability as C2, connectivity as C3, confidentiality as C4, economy as C5, profile
setting as C6, technology as C7, culture as C8.
Following Fig. 1 shows the model for mobile transmission security policy which
consists of eight criteria and 10 alternatives with possession, reliability, connectivity,
confidentiality, economy, profile settings, technology, culture as criteria and a1–a10
as different mobile brand device.
The feedback taken from different mobile device users is taken and the data is
represented in a tabular form in Table 3. Each column represents the value of each
criterion for the corresponding brand of mobile device.
The PROMETHEE technique is applied in Table 3 and the net flow is the result
produced as a result in Table 4. The ranking is then depicted from the net flow.
Table 3 The table with criteria, alternatives, and weights of the respective criteria
Criteria
Fig. 2 Each alternative along with their net flow and rank
Hence mobile brand a2 is better than others in security and economy. So, customer
should prefer mobile brand a2. Figure 2 represents each alternative with values as
their criteria with different legends.
In Fig. 2, the net flow value of each alternative is shown as column values (left)
and the rank is shown by line values (right). The alternatives are the legends. Lower
the line value more suitable is the brand off mobile device.
5 Conclusion
References
Abstract Individual person’s speech is verbal way to have conversation with others.
Speech many time probably becomes to know that individual person is in stressful
condition or normal. These can lead with appropriate assessment of the speech signals
into different stress types to evoke that the individual person is in a fit state of mind.
In this work, stress identification and classification algorithms are developed with
the aid of machine learning (ML) and artificial intelligence (AI) together with MFCC
feature extraction methods. The machine learning and AI-based approaches use an
intelligent combination of feature selection and neural optimization algorithms to
train and to improve the classification and identification accurateness of the system.
Comparison is done with approach of classical neural networks and fuzzy inference
classifiers. The proposed system is suitable for real-time speech and is language and
word independent. The work is implemented using MATLAB 2014 version.
N. P. Dhole (B)
Department of Electronics and Telecommunication Engineering, PRMIT&R Badnera, Badnera,
Maharashtra 444701, India
e-mail: npdhole34@gmail.com
S. N. Kale
Department of Applied Electronics, Sant Gadge Baba Amravati University, Amravati,
Maharashtra 444602, India
e-mail: sujatakale@sgbau.ac.in
1 Introduction
Through speech, individuals’ features like loudness, pitch, intensity, frequency, time,
amplitude, etc. can be noted. These features can be calculated for both stressed and
normal speeches. It becomes important to detect state of a person suffering from high
workload, noises, and physical environmental factors. The various problems faced
by stressed persons are psychological, perceptual, physiological, and physical [1].
Table 1 displays stress types and various issues related to it. The need has been
raised to study stress in speech in the current years to divert more attention toward
person’s behavior and outlook. This speech identification has been strongly increased
for significant contributions to national and international industries as well as soci-
eties. Detection of stress in speech has become a challenge in human perception and
their intelligence. Stress and emotion go hand in hand with equal meaning. Emotions
may be sentiments having fury, sorrow, cheerfulness, terror, hatred, bliss, upset, and
tediousness. But the core stress issues are left uncovered by emotions. Stress is always
concerned with mental and physical ascertains. Knowledge of a speaker’s stressed
state condition can also be beneficial in improving the performance of speech identifi-
cation algorithm system which gets along with human stress mechanism. Researchers
are working from decades for stress recognition from speech signal so that one can
limit a person going under depression or suicide. There is a scope to work in this
field for the benefits of the society. It is very essential to establish a system which
will identify stress and different types of stress.
The speech data used for the intention of this research study is selected with stressed
and anxiety benchmark databases with different languages.
Stress Detection in Speech Signal Using Machine Learning and AI 13
The Berlin disturbing and stressed speech database or corpus consists of ten sen-
tences [2]. This database is in German language consisting of sentences audio wave
files of both genders under seven stressed sentiment classes. They are all related
to anxiety feelings of a terrified person. The recording is done in studio conditions
using super recording equipment. Berlin emotional speech database is broadly used
in emotional and stress speech recognition. This database has been chosen for the
reasons; as the quality of its recording is very good, it is public and popular database
of emotion recognition. The Berlin University created the emotional and disturb-
ing speech database in Department of Communication Science consisting of 535
stressed speech wave files of 16 kHz having 3–8 s duration. The number of samples
is provided in Table 2.
H (Z ) = 1 − bz −1 0 ≤ b ≤ 1
the edges of each frame. The MATLAB syntax used for hamming window is, w =
hamming (L) where w is an L point symmetric hamming window. To each frame,
discrete Fourier transform is applied and is given by,
N −1
x[n]e− j (4π )/N 0 ≤ k ≤ N − 1
2
X [k] =
n=0
The Fourier spectrum of a signal is inconsistently quantized and carries out Mel
filter bank procedure. Firstly, the window frames are identically broken on the Mel-
scale and further transformed back to the frequency spectrum. This spaced spectrum
is then multiplied to the Fourier power band to achieve the MFCC filter bank coef-
ficients. A Mel filter collection consists of filters that are linearly gapped at low and
high frequencies capturing the phonetically vital quality of the stressed input sig-
nal while smoothening irrelevant variation in the higher frequency spectrum. The
Mel-frequency coefficient is calculated using the following equation.
N −1
F(M) = log [X (k)] H m(k)
2
0≤m≤M
k=0
The discrete cosine transform (DCT) equation of the filter bank providing the
MFCC coefficients as,
M
π n(m − 1)
d[n] = F(M) cos , 0≤n≤M
m=0
2M
• Selected features.
• Amount of neurons.
• Learning function.
• Training samples.
• Accuracy.
Neurons in the starting, i.e., input layer are MFCC selected features. These
observations are then stored in the machine learning lookup tables (MLUTs).
Once the MLUT is filled with sufficient entries, the artificial intelligence (AI)
selection performs classification of the speech signal. This performs neural net eval-
uation for all the neural nets stored in the MLUT and obtains the classification results
from each of the entries. From all these results, the most frequently occurring class
is selected, and the solutions which classified the evaluation sequence to that class
are marked with +1, rest others are marked with −1. After several evaluations, all
the negatively marked solutions are discarded, and the MLUT filling algorithm is
applied to fill in those discarded slots. During this step, the previously evaluated
signals are also taken into the training set so that the system can self-learn and be
adaptive to changing inputs.
The proposed algorithm is tested on three different databases, and the results and
observations are described in the next section. It is also compared with existing neural
network-based approaches in terms of classification accuracy and sensitivity.
Stress Detection in Speech Signal Using Machine Learning and AI 17
The stress detection system is under five different categories namely as,
• Stress Type 1
• Stress Type 2
• Stress Type 3
• Stress Type 4
• Normal speech or no stress.
The proposed classifier approach is compared with existing neural network-based
approaches and obtained an improvement in overall system performance.
Eliminating noise from speech signal is a difficult task for stress speech recognition.
An algorithm is developed using function fitting network (FitNet) for noise elimi-
nation. It is being developed and trained separately filtering out noise from stressed
speech. This network has input layer in combination with hidden and output layers.
The input layer has amount of neurons in speech signal where noise is added. The
improved signal after removing noise is the one neuron in the output layer. The num-
bers of outputs are five. The FitNet separately has different architecture, effective
or optimized training rules and parameters to reduce noise from speech signal. The
FitNet is capable of boosting the rate and enhance the speech by obtaining elevated
PSNR and less MSE [6].
Artificial neural network (ANN) approach and the FitNet algorithm for signal
improvement through enhancement aid to eliminate noise in every signal of speech
[7]. Optimization algorithm used in training process is Gradient descent momentum
(GDM) with function fitting network. The whole process can be seen from Fig. 1
Original wave
1
-1
0 0.5 1 1.5 2 2.5 3 3.5
4
x 10
Noisy wave
2
-2
0 0.5 1 1.5 2 2.5 3 3.5
4
x 10
Denoised wave
100
-100
0 0.5 1 1.5 2 2.5 3 3.5
4
x 10
Fig. 1 Original, noisy, and denoised wave for an untrained speech signal using FitNet
18 N. P. Dhole and S. N. Kale
that includes the original speech signal where a noise is added, and after filtering, the
denoised speech signal is obtained which is untrained signal. The untrained signal is
given to all seven classifiers and calculated its accuracy.
The training algorithm used here is Gradient descent momentum adjusted the ANN
weights to decrease MSE. This is amongst the best training algorithms to reduce
maximum possible errors [9]. For training the multilayer feed-forward networks,
the GDM is a standard optimization algorithm to excellence the performance of
neural network training. GDM training algorithm revises the weights and biases of
the neural network toward the direction of degradation of the function. The FitNet
mends into feed-forward networks containing hidden layer and adequate neurons,
and the output is calculated directly from the input through FitNet. The process for
GDM with FitNet includes,
1. Initialization of weights.
2. Compute error signal.
3. Pass error signal through the modified subband filter.
4. Update the weight based on normalized least mean square adaptive filter.
5. Perform weight stacking to get the new weights.
Adaptive filter is used in this algorithm for speech improvement through enhance-
ment. As noise is nonstationary signal which overlies with the frequencies of speech
signal which in turn reduces the identification efficiency. Adaptive filter thus becomes
important which operates properly in unknown and time unstable environments.
Adaptive filter improves the performance by statistical uniqueness of its operation.
4 Results
All the ANN classifiers namely multilayer perceptron, radial basis function, support-
vector machine, recurrent neural network, feed-forward back propagation neural
network, adaptive neuro-fuzzy inference system, and proposed classifiers which are
designed are exhaustively studied. To all these classifiers, all three databases are
applied to compute the stress identification accuracy and sensitivity analysis.
APIA percentage for SVM classifier is shown in Table 5. The APIA percentage
(values of the average percentage of the identification accuracy APIA values are
calculated using different kernels such as linear, quadratic, polynomial, multilayer
perceptron, and Gaussian for SVM. Their accuracy in identifying stress types is
obtained.
Table 5 Stress APIA percentage of SVM classifier using various SVM kernels
Databases APIA percentage
Linear kernel Quadratic Polynomial Multilayer Gaussian
kernel kernel perceptron kernel
kernel
Berlin 58.6 55.3 54.3 54.6 57.5
database
Database 57.9 54.2 53.2 57.6 56.5
developed
Toronto 58.9 53.2 56.7 58.9 54.9
emotional
speech
database
20 N. P. Dhole and S. N. Kale
APIA percentage for FFBP NN for three different databases namely Berlin, Toronto
Emotional, Database developed are displayed in Tables 6, 7, and 8. Here, the obser-
vations are considered for 12 training algorithms as Levenberg–Marquardt (L M),
APIA percentage for ANFIS for three databases is shown in Tables 9 and 10. Accu-
racy is calculated for the stress by entering number of membership functions and by
varying number of training epochs to ANFIS classifier (Table 11).
The MFCC features are extracted for the classifiers SVM, RBF, RNN, MLP, ANFIS,
FFBPNN, and proposed classifier. Table 12 shows percentage accuracies of all neural
network classifiers for three databases (Fig. 2).
22 N. P. Dhole and S. N. Kale
Fig. 2 Graphical
representation of accuracy of
different classifiers
The proposed classifier is used to classify the stress type defined as Type 1, Type 2, and
Type 3, Type 4, and normal. For this classifier, stress types and normal speech accura-
cies are computed for three databases namely Berlin, Toronto emotional speech, and
database developed. Table 13 depicts the proposed classifier stress types of average
APIA (percentage).
Table 15 Performance
Languages MMSE PSNR (DB) Time required (s)
parameters of GDM
algorithm Berlin 0.023 27.98 61.56
TES database 0.016 34.76 64.80
Databases 0.020 37.24 60.12
developed
In this research works, the proposed classifier builds a strong evidence of the stress
levels that depend on gender. The gender-specific stress accuracy is shown in Table 14
for databases Berlin and database developed. These databases comprise male and
female speech signal.
The stress classification results are represented by removing noise. The training
of the FitNet using GDM training algorithm is shown in Table 15. It consists of
MMSE, PSNR in DB, and the time required in seconds for training GDM algorithm.
These three different parameters are calculated for all three databases. The PSNR is
computed at the output, where the signal is completely free from noise.
Table 16 experiments are carried out based on testing the GDM training algorithm.
Sensitivity of all seven classifiers against noise is calculated.
Stress Detection in Speech Signal Using Machine Learning and AI 25
5 Conclusions
This paper presented a proposed classifier for detection of stress from speech signal.
Various classifiers such as SVM, RBF, RNN, MLP, ANFIS, FFBP NN, and proposed
classifier are implemented to identify the stress into speech signal and compared.
For the SVM classifier, linear kernel is possessing highest accuracy. The FFBPNN
classifier is having the highest APIA percentage obtained by OSS learning rule
amongst all twelve learning rules. For ANFIS classifier, the results obtained from
the APIA percentage are highest when the numbers of training epochs are increased.
Performance parameters MMSE, PSNR, and time of GDM algorithm are determined
for every database wherein ANN model FitNet is constructed to enhance speech
signal and to eliminate noise from any clattered signal of stressful speech. Finally,
these works conclude that amongst all the classifiers, the proposed classifiers possess
the highest accuracy of 97.52% including the gender stress identification. Again the
sensitivity of the proposed classifier is least 2.48% than all classifiers.
References
1. Steeneken, H.J.M., and J.H.L. Hansen. 1999. Speech under stress conditions: overview of the
effect on speech production and on the system performance. In IEEE International Conference
on Acoustics, speech, and Signal Processing (ICASSP), Atlanta, Georgia, vol. 4, 2079–2082.
2. Burkhardt, F., A. Paeschke, M. Rolfes, W.F. Sendlmeier, and B. Weiss. 2005. A database of
German emotional speech. In International Conference on Speech Communication (INTER-
SPEECH), vol. 5, 1517–1520.
3. Toronto Emotional Speech Set. Database available on http://tspace.library.utoronto.com.
4. Database Developed: By using audacity software download from http://audacity.sourceforge.
net/download.
5. Bors, A.G., and G. Gabbouj. 1994. Minimal topology for a radial basis function neural network
for pattern classification. Digital Signal Processing: A Review Journal 4 (3): 173–188.
6. Omaima, N.A., and A.L. Allaf. 2015. Removing noise from speech signals using different
approaches of artificial neural network. International Journal of Information Technology and
Computer Science 7: 8–18.
7. Karam, M., H.F. Khazaal, H. Aglan, and C. Cole. 2014. Noise removal in speech processing
using spectral subtraction. Journal of Signal and Information Processing 1026–1033.
8. Widrow, B., et al. 1975. Adaptive noise cancelling: Principles and applications. Proceedings
of the IEEE 63: 1692–1716.
26 N. P. Dhole and S. N. Kale
9. Chatterjee, Kalyan, et al. 2013. Adaptive filtering and compression of bio-medical signals using
neural networks. International Journal of Engineering and Advanced Technology (IJEAT) 2
(3): 323–327.
10. MathWorks, Neural Network Toolbox 7.0, MathWorks Announces Release 2014a of the
MATLAB and Simulink Product Families, MathWorks Inc. 2014.
e-Classroom Using IOT to Revolutionize
the Classical Way of Knowledge Sharing
1 Introduction
2 Literature Survey
Many attendance systems are developed using RFID. Recently, a lot of work has
been done to improve it. In [1], authors have proposed an RFID-based attendance
system. Students have to carry RFID tags, and readers are installed in every class. A
server application is installed on laptop which process information from all readers.
Readers use Wi-fi module to transmit information. The proposed system may require
additional hardware cost to install the RFID system. Another drawback is that tag
e-Classroom Using IOT to Revolutionize the Classical Way … 29
collision will occur if more than one person enters the room at a time. Moreover,
there are chances of fraudulent access.
Biometric-based attendance system makes individual identification easy without
using any cards or tags, and they also provide enhanced accuracy and security. This
paper [2] describes portable attendance management using fingerprint biometric and
Arduino. This approach has a drawback of additional hardware and maintenance
costs. Another limitation of this approach is that passing fingerprint scanner during
lecture can lead to distraction.
Hand gesture recognition system has received great attention in the recent few
years because of its manifold applications and its ability to make human–computer
interaction easier and natural. Gestures can be static which are less complex or
dynamic with high complexity. Different methods have been proposed for acquiring
information necessary for recognition of gestures. Some methods used hardware
devices such as data glove devices for extraction of gesture features. Other methods
are based on the appearance of the hand using the skin color to segment the hand
and extract process image for identifying the gesture [3], and based on it, respective
action is taken. Proper training is required for identification of gestures and storage
of images to recognize hand gestures.
In this paper [4], interactive projector screen is controlled using hand gestures.
In the proposed system, camera detects hand gesture and applies image processing
techniques to identify the respective command stored in the database which acts
as a virtual mouse and controls the OS of the projector screen. The downside of
this methodology is that the presenter has to be in a particular area, i.e., in the range
covered by camera for proper gesture recognition. Further, it requires complex image
processing techniques for diagnosing the gestures and processing the action of the
virtual mouse which is slower than actual performance of touch screen. Separation
of gesture from background is troublesome. The proposed method requires high
definition camera to capture good quality of image which increases the cost, and
sometimes extreme environment lighting will create disturbance in the system, and
orientation of gestures also increases the complexity in the system.
3 Proposed System
The main aim of this module is to automate the traditional method of keeping records
of attendees. Two android applications are implemented, one for teacher and other
for students using IEEE 802.11a technology.
30 N. Vatwani et al.
The flowcharts, as shown below, explains the working of Android app for the
speaker and audience.
Steps for educator:
A. Login into the system.
B. Fill the session details.
C. Start attendance.
Steps for attendees:
A. Login into the system.
B. Scan for available Wi-fi networks.
C. Select the SSID of educator (Fig. 1).
D. Verification of SSID.
E. Authentication of user using face recognition.
In order to overcome the problem of false attendance, it has to be ensured that
the attendance is marked from lecture room, and authenticated attendee is giving
attendance.
The first step, i.e., scanning of available APs and selecting educator’s SSID,
verifies that attendance is recorded from room. User authentication is accomplished
using face recognition API. The image is compared with the image copy stored in
user’s application at the time of registration. If and only if authentication is successful,
the credentials are sent to server for marking attendance (Fig. 2).
e-Classroom Using IOT to Revolutionize the Classical Way … 31
Speech recognition has gained lots of importance in recent years. It is used for various
purposes such as dictation tool and offline speech recognition tool.
The purpose of this module is to make the session delivery appear natural and
hands free. This module is explained as follows:
The database of various instructions such as Next Slide, Go ahead 3 slides, etc., is
maintained. The command given by the speaker is amplified using wireless micro-
phone; thus, the speaker need not to be loud and can access the software from
anywhere in the room (Fig. 3).
For further processing of this command, Google speech recognition API is used.
The output Python string from speech recognition tool (Google API) is compared
with data in the database. When a match is found, appropriate command is executed
using keypress event. The keypress event is simulated using ctypes library of Python.
Pseudo Code:
A. Set the block size to be processed at once.
B. Set the sampling frequency.
C. Set the buffer size.
D. Record audio using speech recognition module of Python.
E. Send the recorded speech to Google speech recognition API.
32 N. Vatwani et al.
The design goal of this module is to digitize the sharing of notes. For implementing
this, speech recognition module on Raspberry Pi is used. The working of this module
is as described as
A. Content of blackboard is captured on the command given by the speaker.
B. The captured content is shared as deliverables via cloud or is directly sent to
students as an attachment.
C. The PowerPoint presentations are also shared as an email attachment.
NEXT SLIDE command transitions current slide to the next slide on the voice
command given by the user. Similarly, the system is tested for various commands
such as PREVIOUS SLIDE, GO BACK 3 SLIDES, GO FORWARD 3 SLIDES, etc.
The system responds correctly according to commands.
For capturing notes, camera is placed over the projector, and the snaps of black-
board are captured on command given by the speaker. The images captured through
the camera are shared with the attendees via cloud or email.
e-Classroom Using IOT to Revolutionize the Classical Way … 35
5 Conclusion
In this paper, an advanced approach has been proposed to revolutionize the classical
methods of conducting a knowledge sharing seminar (or lecture). The proposed tech-
nique has overcome the drawbacks of tracking attendance using RFID. By replacing
e-Classroom Using IOT to Revolutionize the Classical Way … 37
References
1. Nguyen, H.K., and M.T. Chew. 2017. RFID-based attendance management system. In 2017
Workshop on Recent Trends in Telecommunications Research (RTTR). Palmerston North, New
Zealand: IEEE Publications.
2. Zainal, Nur Izzati, Khairul Azami Sidek, Teddy Surya Gunawan, Hasmah Manser, and Mira
Kartiwi. 2014. Design and development of portable classroom attendance system based
on Arduino and fingerprint biometric. In The 5th International Conference on Information
and Communication Technology for The Muslim World (ICT4M). Kuching, Malaysia: IEEE
Publications.
3. Jadhav, Dnyanada, and L.M.R.J. Lobo. 2014. Hand gesture recognition system to control
slide show navigation. International Journal of Application or Innovation in Engineering &
Management (IJAIEM) 3 (1).
4. Sharma, Rishabh, Raj Shikher, Nutan V. Bansode, and Prachi R. Rajarapollu. 2016. Interactive
projector screen with hand detection using gestures. In International Conference on Automatic
Control and Dynamic Optimization Techniques (ICACDOT). Pune, India: IEEE Publications.
An Efficient Heart Disease Prediction
System Using Machine Learning
Abstract With the shifting gears of lifestyle, large populations of the world are
getting prone to the heart diseases. It is becoming one of the topmost reasons for
loss of life. As the death rate is increasing due to coronary diseases, the people of
healthcare department depend largely on the patient’s data to predict if the patient
may have a risk of heart disease. Not every time can the doctors go through every
minute detail of the data and predict accurately. It is time consuming and risky.
The aim of the paper is to find best predicting algorithm which can help the non-
specialized doctors or medical technicians in predicting the risk of disease. The
prediction system uses different machine learning algorithms like logistic regression,
support vector machine, k-nearest neighbor, Gaussian naïve Bayes, decision tree
classifier and random forest classifier. The prediction accuracy for logistic regression
is found to be the highest among all with 88.29% accuracy.
1 Introduction
D. Swain (B)
Department of IT-MCA, Vishwakarma Institute of Technology, Pune, India
e-mail: debabrata.swain7@yahoo.com
P. Ballal · V. Dolase
Department of Masters in Computer Application, Vishwakarma Institute of Technology, Pune,
India
B. Dash
School of Computer Engineering, K.I.I.T. University, Bhubaneswar, Odisha, India
J. Santhappan
NorthBay Solutions, Houston, TX, USA
© Springer Nature Singapore Pte Ltd. 2020 39
D. Swain et al. (eds.), Machine Learning and Information Processing,
Advances in Intelligent Systems and Computing 1101,
https://doi.org/10.1007/978-981-15-1884-3_4
40 D. Swain et al.
and stroke. By 2030, the number of deaths caused by heart diseases is expected to
reach 23.3 million [1]. Coronary heart disease is an abnormality in the blood vessels
that connect to the heart muscles. The most common reason behind an individual
suffering from a heart disease is unhealthy diet, physical inactivity, tobacco, alcohol
and cigarettes. The effect of these reflects in a very harmful way as increased blood
pressure, increased glucose, increased blood lipids, obesity and increased body mass
index (BMI). These dangerous effects on heart are measured in any primary medical
care facility or hospitals. They indicate the chances of developing risk of heart attack,
heart stroke, heart failure and many other such heart complications.
Unhealthy lifestyle leads to obesity which is one of the major reasons behind
heart stroke or attack. Because of such fatty deposit on the inner lining of blood
vessels, the blood vessels become weak and fragile. The walls of the heart thicken
with an increased pressure of blood in the arteries which causes blockage. If the
walls become thick, it slows down the flow of blood to the heart eventually leading
to heart disease which can be life threatening [2]. Heart disease risk can be avoided
or prevented by ceasing the use of cigarettes, alcohol and tobacco. Reducing the use
of salt, reducing the consumption of fried products, consuming fruits and vegetables,
daily exercise or small physical activity can also reduce the risk behind heart disease.
Medical treatment of diabetes, blood pressure and glucose can also be done to be
safe. These medical treatments can be given by medical authorities after detecting the
presence or risk of heart disease. The detection of coronary disease depends upon the
clinical information of the patient. If the clinical information is present accurately,
only then can the risk of heart disease be predicted accurately. The health care and
medical industry is regularly collecting clinical information of patients on a large
scale which can be processed. The data processing technique leads to discovery of
unknown or concealed information. This clinical information will assist the doctors
or medical authorities in taking more accurate and effective decisions which can be
lifesaving [9].
Heart disease can be prevented or addressed in an appropriate way by early detec-
tion of the risk of disease. People who have been detected early in life about the risk
of heart disease can be counseled about many ways in which they can prevent it or
be prepared for the same. Doctors or medical technicians can analyze the patient and
his/her lifestyle characteristics and predict if the patient can be prone to heart disease
or no. But, it is a time consuming and risky process to follow [10].
To predict if a person may have heart disease, immense understanding of data
and its attributes relationship with the target variable has to be studied. Machine
learning gives a wide range of algorithms which help in co-relating this relationship
and predicting the value of the target variable. The dataset that is used in this research
work is Framingham Dataset which includes 16 attributes. The factors which affect
the risk prediction of heart disease are age, number of cigarettes smoked per day,
the presence of blood pressure, diabetes, cholesterol level, body mass index (BMI),
heart rate and glucose level.
An Efficient Heart Disease Prediction System … 41
2 Literature
Aakash Chauhan, Aditya Jain et al. have studied the rules for prediction of the risk of
coronary disease among patients using evolutionary learning. A deep analysis of data
mining had been carried out to achieve higher accuracy. Computational intelligence
approach is used to find the relation among disease and patients. The dataset used for
this research is Cleveland Dataset with 13 attributes included [2]. Data preprocessing
is done to get the data clean and in quality. Keel is a Java tool to simulate evolutionary
learning, and hence, it is used [2]. An accuracy of 0.53 has been achieved on test
dataset.
Frequent pattern growth association mining is applied on dataset to obtain better
association rules. The study has evaluated to the conclusion that more the rules, better
is the prediction of coronary disease.
A. H. Chen, S. Y. Huang et al. have developed a system using artificial neural
network that evaluates the patient’s clinical data to help healthcare professionals in
predicting the heart-related risk factors. The dataset used by authors is from UCI
machine learning repository. The approach used included mainly three steps. First,
13 out of the 14 features were selected which are sex, age, trestbps, cholesterol,
chest pain type, blood sugar, number of vessels colored, thal, resting ECG, heart
rate, exercise-induced angina, slope and old peak [3]. Second, the author proposed
a classification-based algorithm which uses artificial neural network based on the
13 features. An accuracy of nearly 80% is achieved. In the third and last step, they
developed a graphical user interface to get user-friendly access.
Ankita Dewan and Meghna Sharma designed an algorithm that is efficient and
hybrid with backpropagation approach for prediction. The author proposed use of
neural networks from set of different classification techniques for nonlinear data. In
this proposal, the focus is on backpropagation (BP) algorithm which is considered as
the best classifier of artificial neural network using updating technique of weights by
propagating the errors backward [4]. A drawback is also identified; that is, the risk of
getting stuck in local minima solution while solving an efficient optimizing technique
for further improvement of accuracy is used. The author has implemented various
kinds of techniques which predict heart disease; also, a methodology is proposed for
hybrid technique to implement in the future with accuracy of almost 100% or with
least error [4].
Aditi Gavhane, Gouthami Kokkula et al. have proposed a heart disease prediction
system using multilayer perceptron (MLP) in machine learning that gives a result if a
user may get CAD. MLP gives a better accuracy and efficiency than other algorithms
[5]. If the usability of the system increases, it will spread an awareness to reduce
the heart disease death rate. The system uses Cleveland dataset which is available in
UCI data repository. It is developed in PyCharm IDE using Python coding. Finally, a
graphical representation of the result is shown to conclude that multilayer perceptron
gives better accuracy [5].
Shashikant U. Ghumbre and Ashok A. Ghatol have suggested the implementation
of SVM for diagnosis of heart disease. A detailed description of heart disease and a
42 D. Swain et al.
support system for decision making which diagnosis the heart disease using support
vector machine algorithm and radial basis functional network structure is proposed
by them. The data is collected from various patients. A comparison of algorithms
like support vector machine (SVM), multilayer perceptron (MLP), BayesNet, rule,
radial basis function (RBF) and J46 has been shown in the paper in tabular format [6].
The diagnosis is performed on this data, and results are obtained; thus, concluding
that support vector machine (SVM) using sequential minimizing optimization is as
good as the artificial neural network (ANN) and some other models of machine
learning. The overall accuracy obtained is of 85.51% with fivefold cross-validation
and 85.05% with tenfold cross-validation [6]. The performance of SVM classification
is high. Hence, it can be considered for the heart disease diagnosis.
R. Kavitha and E. Kannan have shown the analysis of feature selection and fea-
ture extraction in this paper. A framework is created for the proposed system. The
framework creation includes various steps which are feature extraction using princi-
pal component analysis and outlier detection. In order to obtain better performance,
wrapper filter along with feature subset selection is implemented. The scoring func-
tions like Pearson’s correlation coefficient and Euclidean distance show an average
or below average performance in comparison with the proposed system [7]. The
proposed framework not only predicts the presence of risk of heart disease but also
shows the reduction in the attributes that are required to diagnose a heart disease. The
author presented a scope in the future work. It has addressed the presence of outlier in
the class boundary, which has shown a noticeable improvement in the performance
of the system. The system can also be used for the prediction of those class labels
which are unknown [7].
C. Sowmiya and Dr. P. Sumitra have conducted an assessment of five different
classification algorithms for heart disease prediction. The five algorithms are deci-
sion tree classifier, naïve Bayesian neural network, artificial neural network, KNN,
support vector machine. The authors have proposed an algorithm of support vec-
tor machine (SVM) and Apriori algorithm [8]. The authors have also performed
data mining technique. Data mining techniques mainly involve four sub-techniques
mainly classification, cluster, feature selection and association rule. They have con-
cluded that the techniques used in classification-based algorithm are more effective
and show better performance compared to the previous implemented algorithms. The
paper also includes brief description of the different types of heart diseases which
are angina, arrhythmia, fibrillation, congenital heart disease, coronary artery disease,
myocardial infarction and heart failure [8].
3 Logistic Regression
Classification is done on a daily basis in many ways by people. For example, checking
the classification type of tumor and cancer, checking if email received is promotional,
social media or spam, etc. Logistic regression is a type of classifier algorithm in
machine learning. Logistic regression is quite identical to linear regression. It is a very
An Efficient Heart Disease Prediction System … 43
simple and elementary level of algorithm which solves many real-world classification
problems easily. It can be used for binomial classification as well as multi-valued
classification.
Logistic regression’s main purpose is to relate the data and interpret the relation
that is shared among the dependent variable and single or multiple ordinary vari-
able which are independent. It is an appropriate classifier to use when the target or
dependent variable is divided into two values, that is, binary. There are three types
of logistic regression, namely binary, multi-valued and ordinary classifier.
4 Dataset Description
5 Proposed System
The objective of the system proposed by the paper is to detect risk of heart disease
using patient’s data. The dataset contains one dependent variable and 15 independent
variables, and the logistic regression has proven the best performance among other
algorithms. The logistic regression is originated from statistics. Binary classification
is done using logistic regression, i.e., for problems with two class values. Logistic
function, that is, sigmoid function is the core of logistic regression. This was origi-
nally developed for the purpose of evaluating growth of population and calculation
of rising quality of population.
The dataset used for training and testing purpose may contain errors and dupli-
cate data. In order to create a reliable dataset and to maximize the quality of data,
the data cleaning procedure is important. The following steps for identification and
classification of errors have been used.
1. The first step used to clean the data is finding incomplete data or the null values
and dealing with them to improve the performance of model. The missing values
found are 489 out of 4239 rows which are approximately 12% of actual data.
2. In this case, appropriate solution is to remove such rows that contain null values.
44 D. Swain et al.
The large amount of data sometimes produces worst performance in data analytics
applications. As the algorithm is based on column-wise implementation, this makes
algorithm slower in performance as the number of data columns grows. Here, the
first step in data reduction is to minimize the number of columns in dataset and lose
minimum amount of information possible at the same time.
For this, backward feature elimination is used, in which the process is started
with all features, and the least significant features are removed with each iteration to
improve the performance of model. This process is repeated until no improvement is
An Efficient Heart Disease Prediction System … 45
detected on removal of features. The eliminated data column is BPMeds as it did not
perform well. The remaining 14 features which gave better performance and high
results are selected.
Dataset may contain complex and unknown relations between variables. The discov-
ery and classification of feature dependency are important for any model. Pearson
coefficient is applied to identify the correlation between two features.
5.4 Flowchart
See Fig. 1.
5.5 Algorithm
may happen: The model can overfit or underfit which is harmful for the results of
system. In order to prevent that, the proportion of splitting is taken as 90–10%.
Model Training and Testing. In model training, relationship between a label and
its features is identified. This is done by showing the model a set of examples from
dataset. Each example helps to define how each feature affects the label. This is
known as model training.
Logistic regression has performed highest among other algorithms. Logistic
regression is considered to be a supervised classifier which trains the model with
labeled data. In this, the output is the target variable. It can take only binary discrete
values for a set of features which are the input. The target variable is binomial, based
on number of class values. Logistic regression has the following types:
1. Multinomial classifier: The dependent target variable can have more than two
classes which are unordered like ‘Class 1’ and ‘Class 2’ versus ‘Class 3.’
2. Binomial classifier: The dependent target variable can have exactly two classes:
0 and 1 representing true and false, respectively.
3. Ordinal classifier: It deals with dependent target variable with ordered class. A
test outcome can be sectioned as: very poor, poor, good and very good. Here,
each section is given an outcome values like 0, 1, 2 and 3 (Fig. 2).
An Efficient Heart Disease Prediction System … 47
In order to define curve steepness, the b0 , which is constant, moves the curve in
both left and right direction. Hence, this slope b1 states the steepness of curve. With
few modifications, the following equation can be derived in terms of odds ratio.
p
= exp(b0 + b1 x)
1− p
1
p=
−(b0 +b1 x1 +b2 x2 +···+b p x p )
1+e
TP + TN
Accuracy =
TP + TN + FP + FN
The testing data contains 376 rows whose classification values are shown in Figs. 3
and 4.
6 Performance Analysis
The suggested flowchart was followed to calculate and analyze the accuracy of
different classification models. Table 2 shows the accuracy percentage of each
classifier.
An Efficient Heart Disease Prediction System … 49
In the paper, several binary classification algorithms that are very useful in detecting
heart disease are analyzed. Logistic regression has proven to be the best classifica-
tion algorithm to classify the risk of heart disease with 88.29% accuracy. Logistic
regression is a classification-based technique that contributes high effectiveness and
obtains high accuracy compared to other algorithms. The future work can be analyz-
ing the performance for parameter tuning in order to increase accuracy. The future
work can be set as- (1) Analyzing the performance (accuracy) of the system using
parameter tunning. (2) Implementation of ensemble based classifier to improve the
performance of the system.
References
1. Ali, S., S.M. Adnan, T. Nawaz, M. Obaidullah, and Sumair Aziz. 2017. Human heart sounds
classification using ensemble methods.
2. Chauhan, A., A. Jain, P. Sharma, and V. Deep. 2018. Heart disease prediction using evolu-
tionary rule learning. In 2018 4th International Conference on Computational Intelligence &
Communication Technology (CICT ), Ghaziabad, 1–4.
3. Chen, H., S.Y. Huang, P.S. Hong, C.H. Cheng, and E.J. Lin. 2011. HDPS-Heart disease
prediction system. In 2011 Computing in Cardiology, Hangzhou, 557–560.
4. Dewan, A., and M. Sharma. 2015. Prediction of heart disease using a hybrid technique in data
mining classification. In 2015 2nd International Conference on Computing for Sustainable
Global Development (INDIACom), New Delhi, 704–706.
5. Gavhane, Aditi, Gouthami Kokkula, Isha Pandya, and Kailas Devadkar. Prediction of heart
disease using machine learning, 1275–1278. https://doi.org/10.1109/iceca.2018.8474922.
6. Ghumbre, S.U. and A.A. Ghatol. 2012. Heart disease diagnosis using machine learning algo-
rithm. In Proceedings of the International Conference on Information Systems Design and
Intelligent Applications 2012 (INDIA 2012) held in Visakhapatnam, India, January 2012, vol
132, ed. S.C. Satapathy, P.S. Avadhani, and A. Abraham. Advances in Intelligent and Soft
Computing. Berlin: Springer.
7. Kavitha, R., and E. Kannan. 2016. An efficient framework for heart disease classification using
feature extraction and feature selection technique in data mining. In 2016 International Con-
ference on Emerging Trends in Engineering, Technology and Science (ICETETS), Pudukkottai,
1–5.
50 D. Swain et al.
8. Sowmiya, C., and P. Sumitra. 2017. Analytical study of heart disease diagnosis using classifica-
tion techniques. In 2017 IEEE International Conference on Intelligent Techniques in Control,
Optimization and Signal Processing (INCOS), Srivilliputhur, 1–5.
9. Swain, Debabrata, Santosh Pani, and Debabala Swain. 2019, April. An efficient system for the
prediction of Coronary artery disease using dense neural network with hyper parameter tuning.
International Journal of Innovative Technology and Exploring Engineering (IJITEE) 8 (6S).
ISSN: 2278–3075.
10. Swain, Debabrata, Santosh Pani, and Debabala Swain. 2019, July. Diagnosis of Coronary artery
disease using 1-D convolutional neural network. International Journal of Recent Technology
and Engineering (IJRTE) 8 (2). ISSN: 2277–3878.
Automatic Helmet Detection
in Real-Time and Surveillance Video
Abstract In current world, number of vehicles growing day by day results in higher
accidents. So, “helmet” is the one key element for ensuring safety of bike riders.
Knowing the fact, people tend to avoid wearing helmet. Government had imposed
various rules making it compulsory to wear helmet and fine being levied on offend-
ers. But it is not possible to track each rider in current manual tracking and video
surveillance system. A model for detection and classification of bike riders who are
wearing the helmet as well as those who are not wearing it is proposed in this paper.
The proposed model trained on COCO dataset uses only one neural network per
image and that is quicker than R-CNN and Fast R-CNN as they use multiple neural
networks.
1 Introduction
not wearing helmets. Additionally, about 98 two-wheeler users without helmets lost
their lives in road accidents every day.
As per the survey of 2018, over 57% of the Indian motorcyclists do not wear
helmets. For enforcement, government has introduced many systems which used
real-time CCTV to capture riders without helmets in order to punish them and issue an
e-Challan. But this system still needs human intervention to manually identify riders
without helmets from the control room hence reduces the efficiency and effectiveness
of the model as it will not be available 24 * 7.
Problem of identifying whether the person is wearing helmet or not can be cat-
egorized as the problem of detecting an object. Object detection can be treated as
a classification or a regression problem. All the techniques such as Faster R-CNN,
Fast R-CNN, R-CNN treat problem as classification one, but the main issue with
these methods is that process is divided into layers of convolutional (on an average
divided into thousands of layers), which takes longer time to process the data and
not well suited for real-time processing.
Whereas, you only look once (YOLO), single-shot detector (SSD) treat the prob-
lem as regression problem. As their name suggests, they process over the image only
once, thus not need to process over again and again as it was done in R-CNN and
other related methods, making YOLO and SSD methods faster and best suited for
real-time processing as used in real-time traffic surveillance system.
The aim of this paper is to find a method with less human intervention and achieve
accuracy in recognizing riders without helmet. The proposed approach is divided into
five phases: background subtraction, object detection, classification of vehicles,
segment ROI (region-of-interest) and detection of helmet.
2 Related Work
Noticing to the seriousness of this problem, researchers have carried out a lot of
work in this domain. Prasad and Sinha [1] proposed a method which was imple-
mented to detect the objects in complex, light variant, changing background. Used to
solve the problem of detecting object of same colour as that of its background. The
pre-processed frames achieved using contrast stretching and histogram equalization
were provided as an input to the object detection module. Here, they went through
various operations like background subtraction, colour segmentation, morphological
operations (dilation and erosion). The recognition was done by analysing the dimen-
sions of the shapes, filtration of objects by area in binary image and analysing the
estimated perimeter.
Chiverton [2] proposed a method of helmet detection from a video feed. The
classifier (SVM) was equipped with training on the histograms. These histograms
were derived from the head regions of image data. Then, the motorcycle rider was
automatically segmented from the video by extracting the background. The results for
the motorcycle acquired were in the form of sequence of regions called tracks. These
tracks were further classified, and an average result was calculated. But drawback
Automatic Helmet Detection in Real-Time and Surveillance Video 53
is that every circular object around the bike was recognized as “helmet” resulting
in very low accuracy. Additionally, the vehicles in speed were also identified which
need high computation hence leading it to a very expensive model.
Gujarati et al.’s [4] model finds number of traffic offenders. Classification of com-
modity objects was done using COCO model. Developing training dataset followed
by classification of person riding the bike and the helmet was done using TensorFlow.
Meanwhile, number plate recognition by OCR optical character recognition (OCR)
was performed using tesseract.
Aires et al. [3] have implemented the model of helmet detection using hybrid
descriptor based on local binary pattern, which is a visual descriptor, histograms of
oriented gradients, i.e. a feature descriptor, the Hough transform which is a feature
extraction technique and support vector machine (classifier). Since it is a classifica-
tion algorithm, it uses large number of layers of convolution neural network, hence
making it more time consuming.
Desai et al. [5] aimed to avoid the number of road accidents and developed a
system to detect helmet. They have carried out this work by dividing it into two
tasks: (a) Fall detection (b) Helmet detection.
Fall detection was done using background subtraction and OCR, whereas helmet
detection was achieved using background subtraction followed by HOG transform
descriptor, popularly used for detecting curves. This system is linked with GPS, i.e.
if any fall is detected, the system will extract the license number plate using OCR,
find owner’s information and will report to the nearby hospital and owner’s family.
In [6], Duan et al. suggested a robust approach to track the vehicles in real time
from single camera. In order to obtain faster results, integrated memory array pro-
cessor (IMAP) was used. However, it proved to be an expensive model due to its
dedicated hardware requirements and thus inefficient.
In [7], K Dahiya et al. divided their work into two phases: (a) Detection of bike
riders (b) Detection of bike riders without helmet. For detecting bike riders, the
features were extracted using HOG, scale-invariant feature transform (SIFT) and
LBP. SVM was used to classify the objects into two categories, i.e. “bike riders”
and “others”. Once the bike riders were detected, again the features were extracted
for the upper one-fourth part of the image using HOG, SIFT and LBP. Then, using
SVM, the classification whether “helmet” or “without helmet” was carried out.
3 Proposed Methodology
Fig. 2 A visual
representation of mean
subtraction of RGB value
CNN is just like another neural network. It consists of different neurons with learned
weights (W) and biases (B). Every neuron gets input, takes weighted sum of them
and passes them to an activation function and provides output. As shown in Fig. 3,
convolutional neural network has neurons organized in three dimensions [W (width),
H (height) and D (depth, i.e. BGR colour)].
CNN operates over volume. Each one layer revamps the 3D input volume into 3D
output volume neurons activation.
Fig. 4 Filter (5 × 5 × 3) over image (32 × 32 × 3) generates 2D activation map (28 × 28)
5. The Dth depth slice of size W2xH2 is the result of valid convolutional in the
output.
*No Pooling Layer is used
Neurons having fully connected layers, i.e. the neurons are associated with all other
neurons belonging to preceding layer.
4. Highlighting Detected Helmets in frame: Darknet gives the coordinates of the
helmet containing regions. With the help of those coordinates, rectangular blob
is plotted on the frame with the label “helmet” and confidence score.
4 Activation Function
The task of object detection consists of identifying the image site where an object is
available, in addition also classifying those objects. Preceding technique for this, alike
the R-CNN and its various forms, this way makes the use of pipeline which allowed
task completion in multiple strides. The indicated can be sluggish in operation and
too tough to hone (optimize) because every single component needs to be trained
independently. We try to accomplish it all by using a single neural network. A neural
network is fed with lots of information, and here, activation function comes to help
as it suppresses the irrelevant data points. Results obtained after applying activation
function contribute towards making the model more optimized and less prone to
produce error or output far away from the expected one.
In the proposed model, following mentioned is the activation function. But there
is lot to cover in the activation function, so explained in parts.
A. First part:
obj 2 2
L1
L2
λcod Ii j xa − x̂a + ya − ŷa
a=0 b=0 (1)
(Loss function)
Equation (1) helps to compute the loss which is related to previously predicted
position bounding box (x, y). The function figures out a sum corresponding to each
one bounding box predictor (j = 0, …, L1) of the grid cell (i = 0, …, L2). obj here
is defined as below mentioned:
1, for object being found in grid cell a and the bth related to bounding box predictor
is “answerable” for the same prediction 0, else.
58 S. Kumar et al.
B. Second part:
L1
L2 √ 2 √ 2
obj
λd Ii j wa − ŵa + va − v̂a
a=0 b=0 (2)
(Predicted box dimension)
Using Eq. (2), the loss is linked to the predicted box dimensions, i.e. width or
height.
C. Third part:
L1
L2 2
L1
L2 2
obj
Ii j Ca − Ĉa + λnd nd
Iab Ca − Ĉa
a=0 b=0 a=0 b=0 (3)
(Confidence level)
Using Eq. 3, loss related to confidence level in accordance with each bounding
box is being calculated. C here depicts the confidence level, and the Ĉ presented here
as intersection over union (IOU) of the predicted bounding box. obj is equivalent
to the “one” when an object being found inside the cell and else 0. no obj is the
adverse.
The λ parameter is utilized to individually weight components of the function of
loss. Above process is paramount to enhance the stability of model. The maximum
cost related to coordinate predictions (λ d = 5), and the minimum cost related for
confidence predictions in case where no object is found (λ nd = 0.5).
D. The last section is related to classification loss:
L2
obj 2
Ii j pi (c) − p̂i (c)
a=0 c∈cls (4)
(Classification loss)
The videos those are used for testing were captured using a CCTV camera installed
on public roads and which records the video in day as well as night. The videos
were captured at 30 frames per second (FPS) and contained resolution of image is
1280 × 720 pixels. The videos we considered were on an average of 3 min length.
Python and the OpenCV library included as the key tools used to implement all the
algorithms.
YOLOv3 is a good performer when it is compared with another model, and it
almost outperforms 80% of the existing model by using Formula (5). The neural
network framework we used is Darknet, written in C language, i.e. a framework
Automatic Helmet Detection in Real-Time and Surveillance Video 59
Fig. 5 Detection of helmet and its confidence score after implementation of algorithm in the video
frames
60 S. Kumar et al.
Calculations:
TP
Accuracy = X 100 (5)
T P + FP
6 Conclusion
References
1. Prasad, Shashank, and Shubhra Sinha. 2011. Real-time object detection and tracking men-
tioned at: proposed solution, proposed object detection and tracking technique for unknown
environment.
2. Chiverton, J. 2012. Helmet presence classification with motorcycle detection and tracking, 8–14.
3. Silva, Romuere, Kelson Aires, Thiago Santos, Kalyf Abdala, and Rodrigo Veras. 2013.
Automatic detection of motorcyclists without helmet, 5–11.
4. Devadiga, Kavyashree, Pratik Khanapurkar, Shubhankar Deshpande, and Yash Gujarathi. 2018.
Real time automatic helmet detection of bike riders, 2–4.
5. Desai, Maharsh, Shubham Khandelwal, Lokneesh Singh, and Shilpa Gite. 2016. Automatic
helmet detection on public roads, 1–2.
6. Duan, B., W. Liu, P. Fu, C. Yang, X. Wen, and H. Yuan. 2009, February. Real-time on road
vehicle and motorcycle detection using a single camera, 1–6.
7. Dahiya, Kunal, Dinesh Singh, C. Krishna Mohan. 2016. Automatic detection of bike-riders
without helmet using surveillance videos in real-time, 2–5.
Camera Model Identification Using
Transfer Learning
Abstract Wide series of forensic problems can be solved by detecting the cam-
era model from copyright infringement to ownership attribution. There are many
proposed methods for detection of the camera model. A method to identify the cam-
era model of any image is proposed in this paper. It involved feature extraction
and classification. CNN-based architectures are best suited for the task of image
classification.
1 Introduction
An investigator wants the footage of crime which was captured on tape. If somehow
the investigator manages to get the footage, the question remains if the footage
is real. The basic aim of this project is to classify images in which the camera
model was taken. This can thus help the investigation team and the general crowd
to know the un-tampered truth. Further, this can stop the media from producing
fake news for the sake of publicity and business. It can also impede political parties
from using false methods in the campaigning to cheat the common man and come to
power. This problem can be solved with machine learning using convolutional neural
networks [1].
This paper proposes a unique solution to this problem by identifying the camera
model of the input image by using metadata and deep learning. Metadata can be a
source to identify the camera model from which the image/video has been captured
as it contains detailed information such as DPI, ISO number, shutter speed, and date
and time when the image was clicked.
The issue with this method, i.e., using metadata, which contains information about
make and model of camera and timestamp when the image was captured, in the image
is that it can be easily tampered and manipulated by using simple computer software.
This makes it an unreliable source and cannot be trusted.
The second method uses deep learning to classify the input image according to
the camera model based on the factors such as compression ratio, pixel density, color
contrast, and other image properties which are unique to the lens of the camera. Deep
learning uses convolutional neural networks to train the model and classify images.
To increase the performance of the trained model, we used transfer learning. The
dataset consists of images from three different camera models [2]. The data had to
be pre-processed and cleaned to boost the computation and to increase the accuracy.
2 Literature Review
A SmallNet consists of three convolutional layers [3]. The image is taken as a matrix
of dimensions 256 × 256 which is further pre-processed by passing through a kernel
of size 5 × 5. The filtered image is of size 252 × 252. The output is then given as
an input to three convolutional layers with kernel stride of 64, 64, and 32 feature
maps with sizes of 126 × 126, 63 × 63, and 63 × 63, respectively. The activation
function is ReLU. The output is max pooled with 3 × 3-sized kernels. Now the
output is classified using three full connection layers, with 256, 4096, and 3 nodes,
respectively, with softmax activation function.
The illustration is shown in the image. This model gave an accuracy of about
49.04% for training data after 10 epochs. Three types of camera models were used
in the training data (Fig. 1).
A cycle is not formed between the layers of this type of neural network. Information
flows in a single direction, from the input layers through the hidden layers followed
by output layer. There is no back-propagation—information is not fed back to the
input. Each layer consists of nodes, and the path between nodes has certain weights.
Product of weight and the input are calculated at each node, and the resultant value
Camera Model Identification Using Transfer Learning 63
is compared to a certain threshold. The neuron gets fired if the value is greater than
the threshold. There are three layers, viz. input layer, hidden layer, and output layer.
A neural network must have at least one hidden layer but can have as many of them
as necessary.
Following are the details of architecture referred:
Input layer—8192 nodes
Hidden layer—64 nodes
Output layer—3 nodes.
3 CNN Architecture
For the image recognition and classification areas, nowadays CNN has become the
effective method of machine learning. The basic skeleton of a CNN is: Input → Con-
volution layer → Pooling layer → Convolution layer → Pooling layer → Fully
Connected Layer → Output [4].
Extracting features from an input image is the primary goal of convolution in case
of a CNN. Every image is considered as a matrix of pixel value. A small matrix of
size n × n with pixel values 0 and 1 is considered, and another smaller matrix of
size m × m (where m < n) is considered. The convolution of the first matrix and
the second matrix also called as feature detector matrix is computed. The resultant
matrix is called as ‘Feature Map’ [5].
The following three characteristics control the dimensions of resultant Feature
Map:
(a) Depth: The number of filters used for convolution is the Depth [4].
(b) Stride: When scanning the feature horizontally and vertically, the number of
pixels which we skip is the Stride [4].
(c) Zero-padding: Zeros are added to the matrix of Feature Map to tweak its
dimensions if required [4].
64 M. Kulkarni et al.
Pooling layer helps to optimize the dimension of each feature map and keeps only
the most important information which makes computations faster. Max pooling is
performed only after ReLU is applied individually on every feature map. All these
layers fetch important features from input images, introduce nonlinearity, and make
the features equivariant to scale and translation. The overall output of convolution
layer and pooling layer is the extracted features from input images [6].
Fully connected layer gets its input from the output of latter pooling layer. It
uses these features to classify the input images into various classes, and it uses the
softmax activation function in the output layer. All neurons in the previous layer are
connected to all neurons on the next layer.
The whole process of classification using convolutional neural networks can be
recapitulated as shown below:
1. Initialize all the filters with some arbitrary values.
2. Input to the network is an image which is then passed to the above-discussed
steps—convolution, ReLU, and the pooling layers followed by the fully con-
nected layer. Based on the calculations, the output probability is calculated for
every class.
3. Error is calculated at the output layer, which is nothing but the square of summa-
tion of differences of expected probability and output probability for each class.
The goal is to minimize this error so that the output accuracy is maximum. This
is done using back-propagation technique in convolutional neural network [7].
4. Using back-propagation, the gradient of error w.r.t all the weights is calculated.
Now by using gradient descent algorithm, the initially assigned arbitrary values
are updated according to the value of error.
5. Now Step 2 to Step 4 is repeated for every image in the training dataset.
Thus, this requires heavy computation, and usually, a graphics processing unit
(GPU) is preferred to boost the computation power and reduce the required overall
training time.
Transfer learning [3] is a machine learning method in which a model created for
some task is again used to develop a model for another task. The weights and biases
of a pre-trained model epitomize the features for that respective dataset. Very few
neural network models are built from scratch (random initialization of weights as
discussed in Sect. 3) because it is very difficult to get huge data for a particular
use case. Hence, using the already available relevant pre-trained network’s weights,
starting point weights can be very useful and beneficial to build a strong and accurate
neural network for classification [8].
For example, if a model is trained for a huge dataset with car images, then that
model will contain features to classify edges or horizontal/vertical lines that can be
used for some other problem statement [3].
Camera Model Identification Using Transfer Learning 65
The data, i.e., the images, were pre-processed by cropping each image in a matrix
of 512 × 512 pixels. The training data used for the ResNet-50 [9] CNN model
contain 825 files of three different camera models of which 80% have been used for
training and 20% have been kept for validation. For testing, different images captured
by camera were used. We used keras framework in python with TensorFlow [10]
backend to create convolutional neural network for classification.
Residual networks provide much better results than many other image classifica-
tion models. It prevents the saturation of accuracy as we go deeper into the network by
making jump connections between layers, which in turn reduces the time complexity.
The version used in this project is ResNet-50, where ‘50’ stands for 50 layers. Each
jump connection is after skipping two layers of 3 × 3 convolution with last layers
being unfreezed fully connected layers with output layers of three nodes representing
the three camera models, which are the three classes.
We achieved a significant improvement of 20% in our accuracy by using ResNet
as the transfer learning model with initial layers frozen. ResNet gives much faster
results with reduced error, thus solving the problem of saturation [11].
If we use many layers in a residual network, then there is a chance that gradient
might get vanished. This can have negative effect on the expected accuracy. To
increase the accuracy, skip connections have been used in ResNet-50. We fine-tuned
the parameters of fully connected layer to increase the overall accuracy of our model.
The ResNet-50 architecture is shown in Fig. 2.
4 Conclusion
The trained model has been tested, and the following results were obtained. The
accuracy obtained after 25 epochs was 80.61%, and the loss has decreased to 0.6345
after using ResNet-50 model for transfer learning.
Following results were obtained for three classes (Table 1).
From the results, it can be seen that ResNet-50 gave most desirable results when
compared to other training models for camera model identification. It can also be seen
66 M. Kulkarni et al.
that the size of ResNet-50 is quite less as compared to other models with accuracy
closer to that of ResNet-50.
The performance of the model rises quickly during the training period when we
use transfer learning as compared to model being trained without transfer learning.
With the use of skip connection, there is very less possibility that deeper layers will
worsen the result provided by the previous layers.
The power of ResNet can only be utilized if we use proper and clean input data.
The accuracy can be improved significantly by using more images in the training
dataset, i.e., by increasing the data and by increasing the number of epochs.
Acknowledgements The authors feel grateful to and they wish their profound indebtedness to their
guide Prof. Milind Kamble, Department of Electronics and Telecommunication, Vishwakarma
Institute of Technology, Pune. The authors also express their gratitude to Prof. Dr. R. M. Jalnekar,
Director, and Prof. Dr. Shripad Bhatlawande, Head, Department of Electronics and Telecom-
munication, for their help in completion of the project. The authors also thank all the anonymous
reviewers of this paper whose comments helped to improve the paper.
References
1. Tuama, Amel, Frédéric Comby, Marc Chaumont. 2016. Camera model identification
with the use of deep convolutional neural networks. Available via http://www.lirmm.fr/
~chaumont/publications/WIFS-2016_TUAMA_COMBY_CHAUMONT_Camera_Model_
Identification_With_CNN_slides.pdf. Accessed January 13, 2018.
2. Antorsae. 2018. IEEE’s Signal Processing Society—Camera Model Identification Kaggle Com-
petition. Available via https://github.com/antorsae/sp-society-camera-model-identification.
Accessed March 25, 2018.
3. Shah, Anuj. 2018. Transfer learning using Keras. Available via https://github.com/
anujshah1003/Transfer-Learning-in-keras—custom-data. Accessed February 3, 2018.
4. Anonymous. 2018. Convolutional neural networks for visual recognition. Available via http://
cs231n.github.io/convolutional-networks/. Accessed February 21, 2018.
5. Bondi, Luca, Luca Baroffio, David Guera, Paolo Bestagini, Edward Delp, and Stefano Tubaro.
2015 First steps toward camera model identification with convolutional neural networks.
Journal of Latex Class Files 14: 1–4. arXiv:1603.01068.
6. Ujjwalkarn. 2016. An intuitive explanation of convolutional neural networks. Available via
https://ujjwalkarn.me/2016/08/11/intuitive-explanation-convnets/. Accessed March 17, 2018.
7. Kuzin, Artur, Artur Fattakhov, Ilya Kibardin, Vladimir Iglovikov, and Ruslan Dautov. 2018.
Camera model identification using convolutional neural networks, 1–4. arXiv:1810.02981.
68 M. Kulkarni et al.
8. Jay, Prakash. 2017. Transfer learning using Keras. Available via https://medium.com/
@14prakash/transfer-learning-using-keras-d804b2e04ef8. Accessed February 21, 2018.
9. Kaggle. 2018. ResNet-50 Pre-trained Model for Keras. Available via https://www.kaggle.com/
keras/resnet50. Accessed February 26, 2018.
10. Google. 2015. Tensorflow. Available via https://www.tensorflow.org/. Accessed January,
February 9, 2018.
11. Koustubh. 2018. ResNet, AlexNet, VGGNet, inception: Understanding various architectures
of convolutional networks. Available via http://cv-tricks.com/cnn/understand-resnet-alexnet-
vgg-inception/. Accessed March 23 2018.
12. He, K, X. Zhang, S. Ren, and J. Sun. 2015. Deep residual learning for image recognition, 1–8.
arXiv:1512.03385.
Sentiment Classification of Online
Mobile Reviews Using Combination
of Word2vec and Bag-of-Centroids
1 Introduction
customer’s reviews or ratings given by them for that product [3, 7]. The problem here
is that it becomes difficult to manually analyze thousand of reviews posted daily on
various digital platforms and then to decide about shopping the product. The solu-
tion to this problem is sentiment classification. Sentiment classification mines those
large numbers of reviews and classifies them into different classes. Feature vector
extraction process is an important phase of sentiment classification. The choice of the
features used for classification can improve classification performance. A context-
based feature vector, Word2vec is used in our approach which is more efficient from
traditional feature vectors like Bag-of-Words, term-based feature vector [5].
The rest of the paper is divided as follows. Section 2 gives a brief review of the
related work in sentiment analysis. Section 3 explains the proposed work. Section 4
discusses and analyzes the experimental results. It also gives the conclusion of the
paper and the future aspects of the proposed work.
2 Related Work
A lot of work has been done in the field of sentiment classification during the last cou-
ple of years. Pang et al. [13] used movie review data for sentiment classification and
used unigrams as the feature vector. The author has also experimented with another
feature vector like bigrams, POS-based features their combination as features. The
author found that unigram features performed best among all the features with SVM.
Pang and Lee [12] introduced a novel machine learning method for sentiment clas-
sification. They extracted only the subjective sentences as feature vector from the
whole text data by applying minimum cut in graph technique. The author found
significant improvement in sentiment classification accuracy. Narayanan et al. [10]
applied negation handling, for feature extraction phase used bag of n-grams, and for
feature selection, mutual information was used. They found a significant improve-
ment in accuracy with naive Bayes classifier and achieved an accuracy of 88.80% on
IMDB movie reviews dataset. Bansal et al. [2] used Word2vec as feature vector with
different dimensions of window size and found improvement when the dimensions
were increased. They found the combination of CBOW with random forest has per-
formed the best among all the classifiers with 90.6622% accuracy and AUC score
of 0.94. Alshari et al. [1] experimented with Word2vec feature vector by lowering
their dimensions. This was done by clustering of word vectors obtained based on the
opinion words of sentiment dictionary. The experimental results proved that the pro-
posed feature vector performed well in comparison with simple Word2vec, Doc2vec,
and Bag-of-Words’ feature vectors. Zhang et al. [17] proposed a method in which a
candidate feature vector is obtained by clustering of the Word2vec feature vector. A
group of synonyms that are near to product feature was obtained by clustering. Fea-
ture selection method, lexicon-based, and POS-based were applied on the clustered
Word2vec feature vector obtained. They compared their results with TF-IDF feature
vector and obtained improved results.
Sentiment Classification of Online Mobile Reviews … 71
3 Proposed Method
In this section, the workflow of the proposed method is given, and then the steps of
the method are explained in detail (Fig. 1).
The corpus is taken from the Amazon unlocked mobile phone reviews publicly
available on Kaggle [6] which consists of online customer reviews of mobile phones
sold on Amazon. For the analysis purpose, we took only reviews and corresponding
rating field from the dataset. As we are doing machine learning-based classification,
the dataset is divided into training and testing set in the ratio of 70:30.
The first step consists of getting the cleaned data from the raw data. Unwanted digits,
symbols, HTML tags, stop words that do not contribute to the classification process
are removed from the data reviews. It also involves stemming and conversion of all
the words from upper case to lower case.
Word2vec Bag-of-Centroids
Pre-processing
of
Data reviews
W2V+BOC
The second step involves transforming the preprocessed data into the feature vec-
tor. Here the preprocessed data is tokenized into words and converted to numerical
representation so that it can be given as input to the machine learning classifier.
In our proposed method, feature vectors used are
• Combination of Bag-of-Centroids and Word2vec (Unigrams)
• Combination of Bag-of-Centroids and Word2vec—with phrases (Unigrams +
Bigrams).
Unigrams refer to single word, whereas bigrams refer to a sequence of two words.
Word2vec—with phrases—means all the unigrams and bigrams formed are not con-
sidered since many of them are not really useful. So those phrases (Unigrams +
Bigrams) are taken into consideration which satisfies a threshold value.
The calculation of the feature vector involves the following steps. First, the
Word2vec feature vector is calculated from preprocessed data. Then Bag-of-
Centroids’ feature vector is calculated. Then both Word2vec and Bag-of-Centroids
are combined, which will be given as input to the sentiment classification phase.
In this sub-step, the preprocessed data is converted into Word2vec feature vec-
tor. First, a brief description of Word2vec with its mathematical representation is
explained and then the implementation details of Word2vec are given.
Word2vec
Word2vec is a context-related feature vector representation developed by Mikolov
[8] at Google in 2013. They also enhanced their work by introducing N-gram phrases
with Word2vec [9]. The word vectors produced are located in the vector space such
that the words with similar context are close to each other in the vector space There are
two models of Word2Vec–CBOW (Continuous Bag-of-Words) in which the target
word is predicted using the surrounding context words, Skip-gram uses the opposite
technique as compared to CBOW, and the surrounding context words are predicted
from the target word.
which is equal to the dimension of the resulting word vector. The input to hidden
layer connection can be represented by W VD matrix (V ∗ D size) that maps the words
of the input layer to the hidden layer. We can calculate the output of the hidden layer
as:
H = X ∗ WVD . (1)
The output layer consists of neurons of V dimension which is equal to the number
of words in the vocabulary containing Softmax values. The hidden-to-output layer
connection can be represented as W DV matrix (D ∗ V size) that maps the result of
hidden layer to the output layer. The output layer produces the output as:
O = H ∗ WDV . (2)
Since we need the probabilistic values for the words in the vocabulary, the values
of the output layer are converted to corresponding probabilistic values by using
Softmax function. The Softmax function to calculate probability can be represented
as:
exp score Wordtarget , Wordcontext
P Wordtarget Wordcontext = (3)
wevocab exp(score(Wordw , Wordcontext ))
The numerator calculates the similarity between the target word and the context
word. The maximum is the similarity, and the maximum is the probability. The
denominator normalizes the probability of the other words in the vocabulary with
the context word so that the sum of all is equal to one. As discussed above, the
model is trained by maximizing its log-likelihood on the training set. The training
can be further extended for other target–context pairs from the vocabulary. In this
way, the Word2vec model learns the relationship between the words and finds the
vector representation of the words in the vocabulary.
The implementation is done as follows. First, we parse each of the reviews of
the training dataset into sentences and obtain the corresponding Word2vec feature
vector vocabulary for each word present in training dataset. Four forms of Word2vec
are considered, i.e., CBOW, CBOW with phrases, Skip-gram, and Skip-gram with
phrases. In the second step, the transformation of the training and testing data reviews
to Word2vec feature vectors is done; i.e., for each training/testing review, we get its
corresponding Word2vec feature vector. In this process, a review is transformed into
a feature vector by averaging feature vectors of words appeared in that review and
in the vocabulary list created. This process is done for each of the four forms of
Word2vec that were considered in the previous step.
74 P. Choudhari and S. Veenadhari
In this step, the training and testing sets are transformed into Bag-of-Centroids’
(BOC) feature vector. It means that the training and testing data reviews are trans-
formed into clusters of Word2vec feature vector obtained in the previous step, and
those clusters are used as feature vectors. As Word2vec places semantically related
word vectors near each other, so we have used BOC feature vector to utilize this prop-
erty of Word2vec. The clustering algorithm used is K-means clustering algorithm.
Initially, the number of clusters is obtained by partitioning five-word vectors per clus-
ter from Word2vec vocabulary. K-means is then applied on the CBOW Word2vec
vocabulary to get the clusters. The training and testing sets of reviews are then trans-
formed into BOC. This transformation is done as follows: First, an empty array is
created whose size is equal to the number of clusters, and then while looping over
the words of the review, if the words in the review are found in the vocabulary, then
we need to find in which cluster the word belong to and increment that cluster by
one.
• Find the minimum Euclidean distance and assign each of the data point from O1 ,
O2 , …, OV to the nearest cluster centroid.
If the cluster centroid is denoted as ccn , then each of the data point O is assigned
to a cluster by calculating
arg min d(ccn , O)2 where d() is the Euclidean distance (5)
ccn∈CC
• Take the average of the data points in each cluster and find the new cluster centroid.
The new cluster centroid is calculated as:
1
ccn = Oi (6)
|Pi | O ∈P
i i
Sentiment Classification of Online Mobile Reviews … 75
In this step, the Bag-of-Centroids’ feature vector and Word2vec feature vector are
combined to get the final feature vector which is then fed to the machine learning
classifiers. The combination of feature vector can boost the performance of classi-
fication in terms of accuracy and F1 score as compared to Word2vec which is used
individually. The combination is done by adding the clusters obtained in the previous
step to each of the four forms of Word2vec that were considered previously, i.e.,
• CBOW Word2vec + Clusters of CBOW.
• CBOW with phrases (CBOW-PH) Word2vec + Clusters of CBOW.
• Skip-gram (SG) Word2vec + Clusters of CBOW.
• Skip-gram with phrases Word2vec (SG-PH) + Clusters of CBOW.
Negative Positive
Predicted
Fig. 2 Confusion matrix of MLP classifier (W2V + BOC) with 40,277 instances and skip-gram
(unigrams + bigrams) phrases
76 P. Choudhari and S. Veenadhari
Tables 1, 2, 3 and 4 show that the results obtained with the proposed feature
vector are better than our previous work results, i.e., with Word2vec. LRCV and
MLP classifiers both performed well in case of accuracy and F1 score for all the
dataset sizes. The best accuracy and F1 score are obtained with Skip-gram phrases
(Unigrams + Bigrams) model. As the number of reviews increases in the dataset,
the feature vector performed well.
There are many aspects of proposed work which can be investigated in the future.
The feature vector can be experimented with other product domain. In the pro-
posed method, we have used K-means clustering, so another clustering algorithm
like BIRCH clustering in which incremental and dynamic clustering is possible and
can be implemented in the future.
Table 1 Comparison of accuracy on different classifiers with Word2vec on different database sizes
No. of Classifier CBOW CBOW-PH SG SG-PH (uni
reviews used (unigrams) (uni + (unigrams) + bigrams)
bigrams)
6655 MLP 0.808713 0.792188 0.810215 0.803205
LRCV 0.819229 0.812719 0.824737 0.826740
RF 0.794692 0.805709 0.753130 0.704056
DT 0.754632 0.799199 0.708563 0.661492
GNB 0.621432 0.665999 0.571357 0.553330
16,678 MLP 0.784572 0.788569 0.765987 0.712030
LRCV 0.844924 0.837730 0.842326 0.840727
RF 0.784772 0.799161 0.757994 0.738209
DT 0.767986 0.740807 0.743805 0.709233
GNB 0.603717 0.611511 0.585132 0.562150
26,764 MLP 0.837983 0.801494 0.815567 0.781320
LRCV 0.850560 0.838356 0.850311 0.841843
RF 0.785181 0.784433 0.780448 0.731258
DT 0.739975 0.768742 0.748319 0.720672
GNB 0.610336 0.601494 0.600623 0.573350
40,227 MLP 0.848123 0.817135 0.813903 0.819869
LRCV 0.852846 0.848206 0.851852 0.851106
RF 0.787223 0.787555 0.768001 0.740244
DT 0.773469 0.750352 0.751346 0.701135
GNB 0.603778 0.591515 0.596404 0.576187
Sentiment Classification of Online Mobile Reviews … 77
Table 3 Comparison of F1 score on different classifiers with Word2vec on different database sizes
No. of Classifier CBOW CBOW-PH SG SG-PH (uni
reviews used (unigrams) (uni + (unigrams) + bigrams)
bigrams)
6655 MLP 0.849488 0.834727 0.849663 0.846544
LRCV 0.861314 0.855263 0.866003 0.866615
RF 0.838455 0.851227 0.802088 0.751367
DT 0.803055 0.849756 0.756485 0.698752
GNB 0.655738 0.707584 0.597744 0.576046
16,678 MLP 0.825510 0.828747 0.805190 0.744549
LRCV 0.881091 0.874730 0.878485 0.876911
RF 0.831112 0.843580 0.806085 0.789321
DT 0.822123 0.789482 0.798110 0.764677
GNB 0.638469 0.646031 0.618523 0.590850
26,764 MLP 0.875108 0.840088 0.853468 0.819638
LRCV 0.885584 0.875456 0.885349 0.878213
RF 0.830600 0.830576 0.827411 0.779481
DT 0.788364 0.821253 0.798283 0.779167
GNB 0.642849 0.633280 0.633108 0.600420
40,227 MLP 0.884768 0.856437 0.851749 0.858739
LRCV 0.888105 0.883948 0.887064 0.886775
RF 0.833398 0.834538 0.814594 0.790118
DT 0.823271 0.802335 0.799358 0.752962
GNB 0.638221 0.623836 0.632294 0.607775
Sentiment Classification of Online Mobile Reviews … 79
References
6. https://www.kaggle.com/PromptCloudHQ/amazon-reviews-unlocked-mobile-phones/.
Accessed 12 Mar 2019.
7. Liu, Bing. 2012. Sentiment analysis and opinion mining. Synthesis Lectures on Human
Language Technologies.
8. Mikolov, T., K. Chen, G. Corrado, and J. Dean. 2013. Efficient Estimation of Word
Representations in Vector Space. arXiv: 1301.3781v3 [cs.CL].
9. Mikolov, T., I. Sutskever, K. Chen, G. Corrado, J. Dean. 2013. Distributed Representations of
Words and Phrases and their Compositionality. arXiv:1310.4546v1 [cs.CL].
10. Narayanan, Vivek, Ishan Arora, and Arjun Bhatia. 2013. Fast and accurate sentiment classifi-
cation using an enhanced naive bayes model. In Intelligent Data Engineering and Automated
Learning, IDEAL 2013. Lecture Notes in Computer Science 8206, 194–201.
11. Pang, B., and L. Lee. 2008. Opinion mining and sentiment analysis. Foundations and Trends
in Information Retrieval 2: 1–135.
12. Pang, B, and L. Lee. 2004. A sentimental education: Sentiment analysis using subjectivity
summarization based on minimum cuts. Association for Computational Linguistics.
13. Pang, B., L. Lee, and S. Vaithyanathan. 2002. Thumbs Up? sentiment classification using
machine learning techniques. In Annual Conference on Empirical Methods in Natural Language
Processing, Association for Computational Linguistics.
14. Rong, Xin. 2016. word2vec parameter learning explained. arXiv: 1411.2738v4 [cs.CL].
15. Van Looy, A. 2016. Sentiment analysis and opinion mining (business intelligence 1). In Social
media management, Springer texts in business and economics. Cham: Springer International
Publishing Switzerland. https://doi.org/10.1007/978-3-319-21990-5_7.
16. Zalik, K.R. 2008. An efficient k-means clustering algorithm. Pattern Recognition Letters 29:
1385–1391.
17. Zhang, Dongwen, Hua Xu, Zengcai Su, and Yunfeng Xu. 2015. Chinese comments sentiment
classification based on word2vec and SVMperf. Expert Systems with Applications 42: 1857–
1863.
Emotion Analysis to Provide Counseling
to Students Fighting from Depression
and Anxiety by Using CCTV Surveillance
Abstract Around 18.5% of the Indian population of students suffer from depression
and around 24.4% of students suffer from anxiety disorder. Depression and anxiety
are treatable through counseling and certain medicines, and thus, to avail to this huge
percentage of students, the help that they require is provided in many reputed colleges
and universities. These colleges and universities hire professional counselors to cater
to the needs of these students. But, as all problems are not easy to overcome, in this
situation also, there is a huge problem. That problem is of students not venting
out their need for counseling due to various reasons, and hence, they do not go
to counselors to get themselves back in happy life. To conquer such problems, a
solution is proposed in this paper, that is, the use of CCTV surveillance recording
that is now readily available in various colleges and universities, along with sentiment
analysis of each and every student. Their emotional well-being will be monitored
through their facial landmark recognition, and if certain students are showing signs of
depression through his or her activities, then their names are given to their respective
counselors, so as to provide them with care and support and right guidance to start
their life afresh. This paper makes use of computer vision, image processing, and
convolutional neural network to complete the above-mentioned task.
1 Introduction
Every phase of a person’s life is affected by grim conditions like depression and
anxiety which are together termed as mental pain and has impacted one’s career,
social life, sense of self-worth and interpersonal relationships [1]. Physical pain is
more dramatic than mental pain but it is harder to acknowledge mental pain than
physical pain. Mental pain is hard to cope with and is frequently attempted to be kept
under wraps which only enhances the impact of mental pain. There is no definite
definition for depression as all its victims have unique stories to tell about how
depression and anxiety have engulfed them and left them live in dark [1].
To fight against depression and anxiety, one needs to acknowledge and seek help
and advice from counselors who are expert in the fields of psychological problems.
Students in schools and colleges are provided with this facility of seeking advice
from counselors as and when they would require. But because of concealing habits,
these facilities are often left unused and many students find themselves dragged in
the deepest oceans of depression. To overcome such problem, this paper is initiated
with the main motive of making the counselors reach out to the students who are
in need of their advice and care. This motive has further raised an issue on the
determination of those students who need help which our proposed system tries to
overcome by studying the emotions of every student in classrooms by making use of
CCTV cameras which are installed in almost all schools and colleges. Every student’s
emotions will be monitored in real time which will be used for creating a feedback
on what was the emotion that was predominant on monthly basis. These feedback
records will be sent to the counselor appointed by particular school or college and
a special mail will be sent to the counselor containing the names of students who
showed predominantly negative emotions so that the counselor can keep track of
those students and help them with right guidance.
The proposed system is divided into four segments: starting with face recognition
to determine the names of students who are under surveillance by making use of open
CV [2]. Facial recognition is automation for identifying people and used for verifying
a person with help of existing databases. Facial recognition is used in this system to
identify individual student and update database consisting of student’s name and roll
number with respect to the identification done in the first segment. Second segment
is facial expression recognition in which convolutional neural network and machine
learning are intensively used [3]. In facial expression recognition, emotion analysis
is done on all students who are under surveillance by making use of facial landmarks
in multiple frames. For emotion analysis, emotions are categorized in seven parts,
namely happy, sad, neutral, depressed, surprised, angry and disgust. The assessment
done on the basis of emotions is updated in the database in real time for all the
students and then comes the third segment which is all about feedback generation in
which Python libraries are used. Feedback is generated from the updated databases
on daily basis which will be sent to the counselors for their further study and names
of the students who are showcasing negative emotions like anger, disgust, sadness
or depressed emotions are sent via mail to counselors so as to keep an eye on their
Emotion Analysis to Provide Counseling to Students Fighting … 83
activities and reach out to them to help them out of their miseries. These segments
will be discussed in detail in later sections.
This paper consists of five sections. Second section is about works already done
in this field. Third is about few basic concepts that should be known before pro-
ceeding further. Fourth section is about the proposed system architecture in which
four segments mentioned above will be discussed in detail along with the results
obtained after completion of all four segments. Finally, fifth section will be about
future endeavors and conclusion along with shortcomings of the system proposed.
2 Related Works
These tough tasks have been made easy by immense advancement in the field of
machine learning. In some difficult visual tasks, the emergence of deep convolutional
neural network has made it possible for the machines to match and sometimes,
even exceed human performance. This section displays few research papers that
demonstrate similar kind of machines.
For making use of LBPH-based improved face recognition at low resolution, a
reference was taken from the work done by Aftab Ahmed et al. Their paper employs
the local binary pattern histogram (LBPH) algorithm architecture to address the
human face recognition in real time at the low level of resolution [4].
Byeon and Kwak proposed a video-based emotion recognition that was developed
by making use of three-dimensional CNN. Their machine achieved an accuracy of
95% with a data set consisting of 10 people [5].
Monika Dubey and Prof. Lokesh Singh gave a review in Automatic Emotion
Recognition Using Facial Expression to introduce needs and applications of facial
expression recognition. They worked on six basic facial expressions [6].
3 Basic Concepts
This paper makes use of words like positive images for images containing face and
negative images for images not containing face [7]. Some of the concepts that should
be known before moving forward in this paper to proposed system architecture are
as follows.
by Viola and Jones [8] and is trained on millions of negative and positive images.
This trained algorithm is now used to detect faces in other images. The algorithm
has four stages:
Selection of Haar feature. Haar features share its similarity with convolutional
kernel [8] but differ slightly in terms of the images. To determine a single feature,
each window is placed over the picture. The obtained feature is usually single value
which is computed by subtraction of the summation of pixels under the window’s
white from the summation of pixels under the window’s black part. It will not be
easy to perform this step on large scale and hence comes in the application of integral
images [9]. Figure 1 shows selection of Haar features.
Creation of integral images. Those images in which the current pixel value is the
sum of pixel values before that pixel are called integral images [10]. An example of
integral image along with normal image is shown in Fig. 2.
AdaBoost training. AdaBoost is a lavish approach of keeping only relevant features
and discarding irrelevant features from the image [11]. It helps in improving the
accuracy of classification of positive/negative images of classifier as it is a training
task for detection of face, which chooses only known features. AdaBoost refers to a
particular method of training a boosted classifier. A boost classifier is a classifier in
the AdaBoost refers to a particular method of training a boosted classifier. A boost
classifier is a classifier in the form
T
G T (a) = gt (a) (1)
t=1
where each gt is a weak learner that takes an object a as input and returns a value
indicating the class of the object.
Cascading classifiers. In the concept of cascading classifiers, features are assembled
in diversified platforms of classifiers as a replacement of using all the features in a
window at once [12]. These assembled features are applied in one by one fashion
after determining the success of various platforms. If a window is unsuccessful in first
platform, then it is discarded and no further features are applied, and if it is successful,
second platform of features is applied and hence the procedure continues. When a
window passes all the platforms, then window is termed as a face region.
Faces are composed of microvisual patterns and LBP makes use of these patterns
to extract feature vector that will classify a positive image from a negative image.
Thus, LBP is also known as visual texture descriptor [4]. In training period, images
are divided into blocks as shown in Fig. 3.
LBP makes use of nine pixels at a time, i.e., 3 × 3 window with a prime focus
on pixel that is located at the window’s center for every block. Then comparison is
done between center pixel value and every neighboring pixel’s value existing in that
window. If the value of neighboring pixels is equal to or greater than the value of
center pixel, then its value is set to 1, and for rest, set it to 0. A binary number is
formed by reading pixel values in a clockwise direction which is then converted into
decimal number and that decimal number serves as the new value of the center pixel.
This task is done for every pixel contained in a block. Figure 4 shows an example of
how LBP works.
Histogram is made by making use of all the block values. Thus, for each block of
an image, a histogram is obtained. Figure 5 depicts one such histogram.
Eventually, one feature vector for single image is obtained after concatenating all
the block histograms. One feature vector consists of all the features relevant for an
image.
The most effective form of visual communication is facial expression and it pro-
vides proclamation about emotional state, mindset [13]. Reaction to emotional states,
intentions or social communications of a person can be determined by changes in
facial appearance is known as Facial expressions To communicate the emotions and
express the intentions, the facial expression is the most powerful, natural, nonver-
bal and instant way for humans. Figure 6 shows the block diagram of the proposed
system architecture. Each step will be explained in detail.
Emotion Analysis to Provide Counseling to Students Fighting … 87
The first stage of any vision system is acquisition of image or video. Acquisition of
video is required for the proposed system. Open CV lightens this task of acquiring
video by providing a built-in library function:
cap = cv2.videoCapture(0)
Once a video is captured, a frame from a video which will be referred to as image is
taken into account to classify whether the image under observation is a positive image
or a negative image. Definition of positive and negative image is mentioned in basic
concepts. For the purpose of classifying the image into positive or negative image,
the proposed system works on Haar cascade algorithm. Open CV again facilitates
the functioning of the proposed system by providing a built-in library function:
Face_cascade =
cv2.CascadeClassifier("Haarcascades/haarcascade_frontal
face_default.xml")
faces =face_cascade.detectMultiScale(image_array,
scaleFactor = 1.5, minNeighbors = 5)
The above code snippets help in detection of multiple faces in an image and this
whole process is repeated for a large set of image constituting a video.
Results obtained after recognizing faces in an image using Haar cascades are
given in Figs. 7 and 8.
Images of various people are captured and stored in database with their unique
college/university ID number. After the creation of data set consisting of all the
images of students and faculty members, LBP features, as explained earlier in basic
concepts, are extracted from each test image. The information obtained from the
study of LP features is used to classify and recognize various faces. For the proposed
system, a total of 2500 images were taken, 100 images per person, which means that
faces of 25 people were taken in for creating data set. The images, after recognition of
whether they are positive images or negative images, are compared with the database
88 S. Sinha et al.
images. For comparison, LBP features of the testing image are compared with the
LBP features of the images available in the database. If sufficient amount of features
matches, then the face is said to be identified or else termed as ‘unknown face.’ The
classifier helped in drawing similarity between stored images and the testing images
[3]. If the testing image matches any one of the stored images, then ID assigned
against that image is displayed on the frame. Local binary pattern histogram (LBPH)
face recognizer is used in the proposed system.
model = cv2.face.createLBPHFaceRecognizer()
Results obtained after face identification done by comparing the LBP features of
the testing image with the images in the data set are shown in Figs. 9 and 10.
Emotion Analysis to Provide Counseling to Students Fighting … 89
The natural, most powerful, non-verbal and instant way for humans to communicate
is by facial expressions which give an insight into one’s emotional state and inten-
tions and also of the social interactions one is involved in. Facial expressions are the
result of changing facial appearance caused by the movement of facial features like
eyebrows, mouth, nose and eyes. Of all the ways of communication, facial expression
is the fastest mode of communication [14]. Conversations are more effective because
90 S. Sinha et al.
of facial expressions than those conversations that involve just spoken words. Emo-
tions play a vital role as it conveys one’s feelings about the subject and is held as
the essential thing for effective interactions among the society. Humans are filled
with a lot of emotions, but six of them are termed as basic facial expressions by
modern psychology. These six universal emotions are: happiness, sadness, surprise,
disgust, fear and anger [15]. But in this research paper, another expression is taken
into account which is the most common expression portrayed a person when he/she
is not communicating with other people, and that emotion is termed as neutral in the
proposed system. The facial muscle movement helps in determining which type of
emotion is being depicted by a person.
In this paper, we make use of emotions_classifier.predict, a given method in Open
CV.
Table 1 gives out proper definitions of the emotions and also gives a brief descrip-
tion about the movement of the facial features that result in a particular emotion
[6].
Figures 11 and 12 have shown results acquired during run time emotion analysis.
After getting the percentage of all the six emotions, we inserted them in a data
set along with the id of each student as primary key for identification and then
the data about their emotions is taken for the generation of line graphs for easy
understanding of the emotions shown by the respective individual. The graphs further
help in feedback generation which is done by creating list which consists of average
of all the emotions shown by the individual whole day and then the max operation is
used to determine the maximum value and the name of the corresponding emotion
is given as the overall emotion shown by the individual. Graphical representation of
emotion analysis is displayed in Figs. 13 and 14.
Final feedback obtained after study of the visualization of emotions done by the
proposed system by making use of few calculations is displayed in Figs. 15 and 16.
The paper proposes an emotion analyzing system that will help in analyzing per-
son’s emotions to help them in case of negative emotions portrayed by them for
substantially long time. The system is created by the means of Haar cascade, linear
binary pattern histogram, 3D convolutional neural networks and machine learning.
The proposed system works for a single face detection and identification in a frame
Emotion Analysis to Provide Counseling to Students Fighting … 91
as of now but in the near future it will be made to recognize and identify multiple
faces in a single frame. The system is working with the accuracy of 83%.
The proposed system works on unique data set created for a college and will
recognize their emotional well-being and will provide a single word remark for
study of a person’s overall emotional state which can then be sent as a mail to the
counselor for their further study and to let them help students who need but never say.
The system is suitable for an organization that has its entire campus under CCTV
surveillance. This paper tries to evaluate the emotion showcased by students and
faculty of the universities and colleges so as to determine if they would need any
type of counseling by the counselor assigned. The feedback generated by the system
is given to the counselor for their study of the behavior of students and providing
faculties so that they can reach out their helping hand to the required students when
required.
92 S. Sinha et al.
References
1. Balasubramanian, S. 2017. #LetsTalk, Health and Life, Mental Health, Taboos: ‘I didn’t want
to Say I’m Depressed. Help Me.’ A Student Opens Up. https://www.youthkiawaaz.com/2017/
04/depression-college-students-india.
2. Xu, Z., Y. Jiang, Y. Wang, et al. 2019. Local polynomial contrast binary patterns for face
recognition. Neurocomputing 355: 1–12.
3. Revina, I.M., and W.R.S. Emmanuel. 2018. A survey on human face expression recognition
techniques. Journal of King Saud University-Computer and Information Sciences 1–8.
4. Ahmed, A., J. Guo, F. Ali, et al. 2018. LBPH based improved face recognition at low resolution.
In 2018 International Conference on Artificial Intelligence and Big Data (ICAIBD), 144–147.
5. Byeon, Y.H., and K.C. Kwak. 2014. Facial expression recognition using 3D convolutional neu-
ral network. International Journal of Advanced Computer Science and Applications (IJACSA)
5 (12): 107–112.
6. Dubey, M., and P.L. Singh. 2016. Automatic emotion recognition using facial expression: a
review. International Research Journal of Engineering and Technology 03 (02): 488–492.
7. Dang, L.M., S.I. Hassan, et al. 2019. Face image manipulation detection based on a
convolutional neural network. Expert Systems with Applications 129: 156–168.
8. Viola, P., and M. Jones. 2001. Rapid object detection using a boosted cascade of simple features.
Accepted Conference on Computer Vision and Pattern Recognition.
9. Park, K.Y., and S.N. Hwang. 2014. An improved Haar-like feature for efficient object detection.
Pattern Recognition Letters 42: 148–153.
10. Badgerati. 2010. Computer vision—the integral image. https://computersciencesource.
wordpress.com/2010/09/03/computer-vision-the-integral-image/.
11. Chai, S., B. Kisačanin, and N. Bellas. 2010. Special issue on embedded vision. Computer
Vision and Image Understanding 114 (11): 1115–1316.
12. Kun, B., L.L. Zhao, L. Fang, and S. Lian. 2012. 3D face recognition method based on cascade
classifier. Procedia Engineering 29: 705–709.
13. Khan, R.A., A. Meyer, H. Konik, et al. 2013. Framework for reliable, realtime facial expression
recognition for low resolution images. Pattern Recognition Letters 34 (10): 1159–1168.
14. Lv, C., Z. Wu, X. Wang, and M. Zhou. 2019. 3D facial expression modeling based on facial
landmarks in single image. Neurocomputing 355: 155–167.
15. Babu, D.R., R.S. Shankar, G. Mahesh, et al. 2017. Facial expression recognition using Bezier
curves with hausdorff distance. In 2017 International Conference on IoT and Application
(ICIOT), 1–8.
Reconstructing Obfuscated Human Faces
with Conditional Adversarial Network
Abstract In today’s era of advanced forensic and security technologies, the prob-
lem of identifying a human face from a low-quality image obtained from low-quality
hardware or other reasons is a major challenge. Trying to extract meaningful infor-
mation from these images is very difficult. These low-quality images have mainly
two kinds of distortion in it, i.e. blurring and pixelation. Prior attempts have been
done using different machine learning and deep learning techniques, but the desired
high-quality images are not obtained. In this paper, we have used the conditional
adversarial network to reconstruct highly obfuscated human faces. Various previous
works on the conditional adversarial network have suggested it as a general-purpose
solution for image-to-image translation problems. The conditional adversarial net-
work is able to learn mapping from the provided input image to resulting output
image. It is also able to learn the loss function to train the mapping. We have exam-
ined the result of this model using pixel loss function which gives the exact mapping of
obtained high-quality human face with ground truth; furthermore, we have examined
the capabilities of this model with very high-level obfuscated images.
1 Introduction
to note, it is often just the face that is obfuscated because it has the most identifying
features, with the rest of the image might be not obfuscated and in better quality.
Also, when we capture images using low-quality cameras, we often do not get
enough details on the image when we zoom in. For example, in case of a closed
circuit, cameras were used for security. In this case also, we can take the low-quality
image and using conditional generative adversarial network, and we can reconstruct
the image with better quality.
In this project, we examine the use of conditional adversarial nets to reconstruct
obfuscated human face having two types of obfuscation, i.e. pixilation and blurring.
This technique is widely accepted as the general-purpose technique for image-to-
image translation. These types of works are usually done using convolutional neural
networks (CNN), but we found that using generative adversarial networks, the results
obtained are much better.
Furthermore, we also explore the model’s capabilities by using images with high
levels of obfuscation and evaluating its performance on reconstructing the original
images.
2 Related Work
Similar work has been done in the past focusing on obfuscated images and super-
resolution. We can see this demonstration where subjects can still recognize the faces
of famous people in static images despite being blurred [1]. Higher accuracy can be
achieved by showing the obfuscated faces in motion. Lander et al. also show that
obscured faces using methods such as pixilation can still be recognized [2].
Gross et al. also show that privacy is not protected when using low pixilation
levels as obfuscated faces can still be recognized and their image quality enhanced
[3]. McPherson et al. examine the topic of classifying images that underwent various
forms of obfuscation highlighting issues with obfuscation [4].
In this paper, they show how neural networks can classify images that underwent
obfuscation processes such as pixilation and blurring by examining different datasets
such as MNIST, CIFAR-10, AT&T, and FaceScrub. They demonstrate that neural
networks can be used to accurately classify images despite obfuscation processes
since these images, while unrecognizable to humans, still contain enough information
to enable appropriate classification.
There are four main approaches to the super-resolution task as shown by Yang
et al.: prediction models, edge-based models, image statistical models, and patch-
based models [5]. Interpolation-based methods are used by prediction models to
generate smoother regions in high-resolution images. Edge-based models try to learn
priors to generate sharp images, but they usually lack texture.
Image statistical models use image properties to predict the details in higher
resolution images. Lastly, patch-based models learn mapping functions from smaller
lower resolution patches to their corresponding high-resolution version.
Reconstructing Obfuscated Human Faces with Conditional … 97
3 Method
Generative adversarial networks (GANs) are a type of generative networks that are
able to learn an association, given a noise vector z to resulting image y, G: z implies
y. Conversely, these conditional GANs are able to understand a mapping from given
images x and a noise vector z. This mapping is mapped to y G:{x, y} implies y.
We train the generator to obtain outputs which cannot be differentiated from “real”
images by a discriminator D that is trained in an adversarial manner. We train the
discriminator for the purpose of it detecting the “fakes” produced by the generator.
3.1 Objective
For the purpose of testing the significance of tuning our discriminator, a compar-
ison is made to a variant that is without condition, where input x is not taken into
account by the discriminator:
L L1 (G) = Ex,y,z y − G(x, z)1 (3)
For the construction of our generator and discriminator, we have taken a reference
from ones present in [6]. Convolution BatchNorm ReLu [7] is taken as the preferred
module by generator and the discriminator (Fig. 1).
One of the most prominent attributes of problems in translating one image to another
image is that a high-resolution input image is being translated to a high-resolution
output image. Also, for the problem we have considered, the input image and output
image are different in exterior viewing. But both of them are results of the same
fundamental form. So, the form in the given input is meticulously in line with form
in the output. We try to model the generator design considering these things.
A lot of former findings [8–12] for tasks in this field have made use of an encoder–
decoder network [13]. In models like that, we feed the input through different layers
that increasingly down sample. This happens until we reach a bottleneck layer. After
that, the process moves backward. In such implementation, the model needs for all
of the information to go through all the layers. This includes the bottleneck. In a lot
of translation problems involving images, there is a substantial amount of basic data
passed between the input and output layers, and it would be useful to pass this data
For the purpose of modeling high frequencies, focusing on the structure in local image
patches is enough. Therefore, we use a discriminator architecture called PatchGAN
that only scans the image in scale of N * N patches. The discriminator attempts to
determine if everyone of the individual N * N patch of given image is genuine or
not. We operate the discriminator in a convolutional manner throughout the image,
and we take the mean of all responses and get the output of D.
N can be of any size but smaller than the image. Given so, it can produce high
excellent outputs. The reason it is beneficial is because, a small PatchGAN contains
less number of parameters, performs quicker, and can also be used in case of bigger
inputs. That type of discriminator successfully determines the image as a Markov
random field. Assumption is made that the independence in the middle of pixels is
divided by not less than a patch diameter. This relation is also a usual supposition in
architectures of texture and style. Therefore, the PatchGAN could be considered as
a type of texture/style loss.
4 Dataset
For our experiments, we use the CelebA human face dataset. The features of the
CelebA dataset are as follows:
• 202,599 face images of various celebrities.
• 10,177 unique identities, but names of identities are not given.
• 40 binary attributes annotations per image.
• 5 landmark locations.
We perform preprocessing on this dataset. For each image, we first crop the image
in 128 * 128 maintaining the face at the center of the image. The purpose of this
is so we can focus on the construction of the face and ignore the majority of the
background in the images.
Then for each image, we obfuscate it. For each obfuscation method, we copy the
original image and then run the given obfuscation method (pixelation and Gaussian
blur) (Fig. 2).
100 S. Rajgure et al.
5 Experiment
For our experiments, we trained the dataset of 200 k human faces for 10 epochs with
batch size of 100 images. As discussed earlier, we used generator as unet_256 and
basic discriminator. The images given as input and the output images are both 1–3
channel.
We have manually blurred the dataset using Gaussian blur and similarly pixelated
the images for our training purpose. The generator generates a random image at first.
The first image generated is just a noise. Then that image is sent to discriminator
along with its original image, and the discriminator tells the generator whether the
image generated is real or fake.
In first attempt, the discriminator is going to say its fake and the generator is
going to try to improve the image it generates. This happens multiple times until the
discriminator cannot tell the difference between real and fake images and tells us
that the fake image is real.
Once our model’s generator generates an image that the discriminator thinks it is
real, we can pass a sample image to it to see the reconstructed face of the obfuscated
face.
Making use of U-Net model, basic data is able to get across the network very fast.
We see if this produces improved outputs. The encoder–decoder is made by cutting
off the connections that are skipped in U-Net. The encoder–decoder model is not
able to understand to produce pragmatic outputs in our tests.
The edge of the U-Net did not turn out to be fixed to conditional GANs; when
L1 is used to train both U-Net and the encoder–decoder network, it is observed that
U-Net produces superior outputs.
We have two layers present in the generator architecture, and we reduce our filter
from initial numbers to 1 by dividing it by 2, repetitively.
Reconstructing Obfuscated Human Faces with Conditional … 101
In discriminator, we have three convolution layers and each layer has 64 filters. Here,
a PatchGAN is used. What this does is, it takes a patch of size N * N in the generated
image and penalizes structure at the scale of patches. It tells if the N * N patch is
real or not. We operate this discriminator in convolutional manner across the image
and taking the mean of all responses to get the final result of discriminator.
6 Evaluation
We evaluate the accuracy of our generated image by comparing each pixel value of
the generated image with the original image and calculating the accuracy. We do this
for all the pixels in our image and take the average. The accuracy we are getting is
around 95%.
For blurred images, using Gaussian blur, keeping the value of sigma at 4 and
varying the kernel size, we achieved different degrees of blurred images.
The result obtained is the average of 10% of total samples that we have taken
from our dataset as part of testing dataset.
Here are the results obtained.
See Fig. 3.
See Fig. 4.
See Fig. 5.
See Fig. 6.
Reconstructing Obfuscated Human Faces with Conditional … 103
See Fig. 7.
Here is the result for blurred images.
Kernel size
7×7 9×9 11 × 11
Accuracy (%) 97.2 96.4 95.1
Kernel size
7×7 9×9
Accuracy (%) 96 95.6
7 Conclusion
References
1. Lander, K., and L. Chuang. 2005. Why are moving faces easier to recognize? Visual Cognition
12 (3): 429–442.
2. Lander, K., V. Bruce, and H. Hill. 2001. Evaluating the effectiveness of pixelation and blurring
on masking the identity of familiar faces. Applied Cognitive Psychology 15 (1): 101–116.
3. Gross, R., L. Sweeney, F. de la Torre, and S. Baker. 2006. Model-based face de-identification.
In 2006 Conference on Computer Vision and Pattern Recognition Workshop (CVPRW’06),
161–161, June 2006.
4. McPherson, R., R. Shokri, and V. Shmatikov. 2016. Defeating image obfuscation with deep
learning. CoRR, abs/1609.00408.
5. Yang, C.-Y., C. Ma, and M.-H. Yang. 2014. Single-image superresolution: A benchmark. In
Proceedings of European Conference on Computer Vision.
6. Radford, A., L. Metz, and S. Chintala. 2016. Unsupervised representation learning with deep
convolutional generative adversarial networks. ICLR 2 (3): 16.
7. Ioffe, S., and C. Szegedy. 2015. Batch normalization: Accelerating deep network training by
reducing internal covariate shift. ICML 3: 4.
8. Pathak, D., P. Krahenbuhl, J. Donahue, T. Darrell, and A.A. Efros. Context encoders: Feature
learning by inpainting. In CVPR, 2, 3, 13, 17.
9. Wang, X., and A. Gupta. 2016. Generative image modeling using style and structure adversarial
networks. ECCV 2 (3): 5.
10. Johnson, J., A. Alahi, and L. Fei-Fei. 2016. Perceptual losses for real-time style transfer and
super-resolution. ECCV 2: 3.
11. Zhou, Y., and T.L. Berg. 2016. Learning temporal transformations from time-lapse videos.
ECCV 2 (3): 8.
12. Yoo, D., N. Kim, S. Park, A.S. Paek, and I.S. Kweon. 2016. Pixellevel domain transfer. ECCV
2: 3.
13. Hinton, G.E., and R.R. Salakhutdinov. 2006. Reducing the dimensionality of data with neural
networks. Science 313 (5786): 504–507.
14. Ronneberger, O., P. Fischer, and T. Brox. 2015. U-net: Convolutional networks for biomedical
image segmentation. MICCAI 2: 3.
A Proposed Wireless Technique
in Vehicle-to-Vehicle Communication
to Reduce a Chain of Accidents Over
Road
be helpful to avoid serial collision of vehicles, because knowing the forward vehicle
movement the rear vehicle can slow down their speed in order to avoid accident.
1 Introduction
Due to the rapid increase in population, the use of vehicle is rapidly increasing. It
leads to an increase in accident. This technique is helpful in critical condition where
the sensors failed to measure the data. Critical condition can be dust or mud over
sensor, physical damage of sensor, very high temperature etc. V2V communication
was started in early 1980. V2V communication should be intelligent and effective
to solve the traffic problem, and it should have features to avoid the accident too.
The communication should involve messages regarding lane change, speed slow
down. This information can be shared with a set of vehicle so that they can take
corrective measures to avoid any type of hazard or accident. Safety driving system
is collaborative in nature that means safety driving should consider a set of vehicles.
Collaborative driving will be hazard-free when the communication between them is
effective, because accident in a busy traffic not only hamper the culprit vehicle but also
all the vehicle nearby to it. Collaborative driving focuses on five strategies: (a) cruise,
(b) tracking, (c) combination, (d) splitting, and (e) lane changing. These strategies
control the vehicle movement. The parameters regarding vehicle movement is stored
in the vehicle storage system. This information along with the above five strategies
can be used to develop an effective control mechanism for a vehicle communication
system. This technique will be helpful to avoid collision in un-signaled zone. This
technique can predict the collision time between intermediate vehicles.
2 Background Study
Wireless communication
driver before an accident happens. So, corrective measures can be taken to avoid
accident such as automatic braking to slow down the speed [2].
WAVE
WAVE stands for wireless access in vehicular environment. It supports the appli-
cations based on the intelligent transportation system. IEEE 802.11p is an amend-
ment to 802.11 standard for the inclusion of WAVE. It defines the enhancement to
802.11 standard such as data transmission between two high-speed vehicles or data
transmission between vehicle and infrastructure [3].
Zigbee Technology
3 Related Work
Zunyi Zhang et al. have given a model to avoid the accident near the traffic junction.
He has given a thoughtful study on the accelerating and de-accelerating behavior at
these points. A warning message in voice or image is to be provided at the downhill
road section of the intersection. Dynamic information is found to be more appropriate
than that of static information. It is found that voice only warning is more effective
than that of voice and image warning technique.
Saleh Yousefi et al. had analyzed the effect of inter-vehicle communication over
the chain accident. They had used Markov chain for modeling the problem. The model
can evaluate the influence of different road traffic parameters (i.e., vehicle density,
velocities, and decelerations) and network parameters (i.e., delay, penetration rate)
on the probability of chain collisions. They have given a study on the effect of driver
behavior as well as vehicle characteristics on the probability of accidents. The effect
of technology penetration rate is also considered in the proposed model. Simulation
result reveals that the proposed model has an acceptable accuracy.
Mahmoud Taghizadeh et al. had given report regarding chain of accident for
a group of vehicles traveling in a convoy. Collaborative communication is used
to avoid collision. Various parameters such as reaction time, time headway, and
recommendation speed are used to calculate the number of collisions under different
situations. An analytical model combined is designed to explore the effect of number
of collisions over low market penetration rate.
108 S. Barik et al.
Gaur Saurabh Kumar et al. had given a vehicle application model to be deployed
over a networking framework by considering some important vehicle parameters.
This requires a productive research work. Vehicular ad hoc network (VANET) can
be used for an alert system. That means it can give alert signal for traffic jam. This
will be helpful to reduce the traveling time by road. When an accident occurs, the
rear vehicle’s drivers can be given emergency signals. Traffic police and ambulance
can be called by sending emergency messages.
Jaiswal Siddhant and Dr. D. S. Adane had proposed an algorithm which is based
on routing mechanism to be used in a hybrid situation. This technique uses clustering
methods for transfer of packets. It is designed to run in adverse situations. It supports
static as well as dynamic infrastructure.
Tejpreet Singh et al. had provided thoughtful study over VANET. It focuses mainly
on the dynamic nature arising due to the change in network topology. This can lead
to the security and communication issues. He suggested that wormhole attack is one
of the very crucial attacks to be taken care of because this can create problems in the
communication system. These attacks can result in denial-of-service (DoS) attack,
masquerading, data tampering, etc. Different routing protocols (AODV, OLSR, and
ZRP) are analyzed on the basis of performance metrics like thoughtful, delay, jitter.
Performance is done in two cases: one is with wormhole attack and the other is
without wormhole attack.
Lee et al. proposed a protocol called wireless token ring MAC protocol (WTRP) in
order to communicate a set of vehicles. The vehicles drive cooperatively. R-ALOHA,
slot reservation MAC protocols are discussed for inter-vehicle communication.
Bin Hu et al. have proposed vehicle-to-vehicle/vehicle-to-roadside communica-
tion protocol to avoid collision in vehicular ad hoc networks (VANETs). Emer-
gency warning messages are transmitted via vehicle-to-vehicle (V2V) and vehicle-
to-roadside (V2R) communications to achieve multipath diversity routing. A multi-
channel (MC) technique is proposed to improve communication reliability and low
latency. Simulation results show that the proposed technique is capable of improving
the message delivery ratio and obtaining low latency.
E. Abinaya and R. Sekar had given a model to optimize the traffic to be used
at traffic intersection points. It needs vehicle data such as speed, position in real
time. The delay can be minimized by using the oldest job first (OJF) algorithm. A
comparison study is given with other techniques such as Webster method, Pre-timed
signal control method, and Vehicle-actuated method.
Xue Yang et al. have proposed vehicle-to-vehicle (V2V) and vehicle-to-roadside
(V2R) communication technology for cooperative collision warning. It can avoid
collision by giving warning messages. Achievement of low latency is a challenge
here. The simulation shows that low latency condition is met. Also, the bandwidth
is used effectively in the different road environments.
A Proposed Wireless Technique in Vehicle-to-Vehicle … 109
4 Proposed Architecture
Wireless
Receiver
LCD display
Brake Indicator
I/O ports. ATmega 328 provides high processing speed with low power consumption.
It can achieve throughput of about 1 MIPS per Mhz. The chip operates within 1.8–
5.5 V. It provides serial communication techniques and supports SPI protocol. The
memory unit includes 2 KB SRAM, 1 KB EEPROM, 32 KB ISP flash memory, and
32 general-purpose registers. The other units are SPI serial port, timer with internal
oscillator, serial programmable USART, internal and external interrupts, etc.
ATmega-328 receives the data from front vehicle through a wireless receiver. It
can also transmit data to the rear vehicle using the wireless transmitter. The input data
such as speed, steering, brake, and indicator are taken from the vehicle system. The
microcontroller ATMEG-328 recognizes the critical condition by analyzing these
parameters.
When the microcontroller recognizes a critical condition, it informs it to the rear
vehicle which is displayed on a LCD display system. LCD display system can be
combined with buzzer sound as a warning signal.
The critical condition can be
(i) Speed of a vehicle suddenly reaches to zero.
(ii) Left and right movements of vehicle without indicator.
We use IR sensor to detect the rear vehicle. After detection, the front vehicle sends
a password in order to communicate with the vehicle using Bluetooth communication
technique. This technique pairs two communication and proceeds for communication
of data. After every two minutes, IR sensor resumes its operation. If a rear vehicle
has changed the lane and a new vehicle has been replaced in its position, then IR
sensor detects it in its nest iterative phase [4].
LCD
Liquid-crystal display (LCD) is used as an output unit in the proposed system. LCD
is connected to the microcontroller. LCD displays the required message that the
microcontroller wants to give. The microcontroller decides the type of message to
be generated, which is forwarded to transmitter for transmission.
Ultrasonic Sensor
Ultrasonic sensor uses ultrasonic sound waves to compute the distance to an object.
The best thing about ultrasonic sensor is that they can sense all types of materials. It
can work within 4–10 m range. Ultrasonic uses a transducer to send and to receive
A Proposed Wireless Technique in Vehicle-to-Vehicle … 111
the sound pulse. The time gap between sending and receiving the pulse is used to
calculate the distance to an object. Obviously, object in our case is the vehicle. When
an object is detected, the ultrasonic sensor sends signal to the microcontroller. Now,
the microcontroller through the LCD found that distance is within a critical range. In
the proposed architecture, there will be two ultrasonic sensors: one at the front side
and another is at the rear side of vehicle.
For the system design to be carried out, we have to consider different issues that
will be considered as its challenges and the factors which will be considered as its
challenges.
Advantages: The system creates a group of vehicles connected with each other
that means it creates a combined traffic control system where the critical or urgent
information related to the accident is shared in real time.
It increases the number of vehicles to be controlled in traffic.
It helps the drivers to organize their time in other productive areas since the vehicle
is self-organized and well-coordinated to avoid collision.
Challenges: Natural calamities and extreme weather conditions such as heavy rainfall
or heavy snowfall can disrupt the communication system.
In near future, self-communicating vehicles along with artificial intelligence are
going to lead the transportation sector. This will result in stealing the job of many
drivers.
The additional unit of communication has to be added in the design unit of vehicle.
So, the extra cost is to be added in the manufacturing process. Hence, the vehicle
becomes expensive.
5 Conclusion
We found from a survey that nearly 1.25 million people die in road accidents each
year. As the population is increasing day by day, the use of vehicle is increasing which
is putting a great impact on road accident. The proposed approach is useful to control
the chain of accident. We will use IR sensor and ATMEGA-328 microcontroller in
our device along with Bluetooth mechanism to accomplish wireless transmission
and receive data. The main objective of our system is:
(a) To provide an effective communication between two vehicles, that means, focus
on less communication time and more response time in order to avoid accidents
quickly.
(b) More safety driving depends on the range of IR sensor that we have used.
(c) Wireless transmitters must receive data along the line of sight direction otherwise
communication can lead to a failure system.
The proposed architecture must focus on the above important issues to design a
fail-safe and robust system.
112 S. Barik et al.
References
Abstract The exisitng conversation system where the user interacts with the virtual
system with voice and virtual system replies to the user based on what user speaks.
In this context whenever user makes some gestures to communicate with the virtual
system, the virtual system will miss out those communications. For example, user
instead of speaking, may nod head for “yes” or “no” and user can also use hand signals
to respond to the virtual system. If these events are not addressed then the conversation
is not very interactive and natural human-like interaction will start losing important
information. The paper describes how the user body moments/gestures will help
effective conversation with the virtual system and virtual conversation system can
understand the user misspelled conversation, missed conversation effectively with
user gesture/body movements.
1 Introduction
The present paper describes a method and system to predict the human body move-
ments/gestures in images or video streaming data which will help the virtual system
missing conversation, misspelled conversation and fusion of voice with the user body
movements. This leads to virtual machine conversation more human-like and effec-
tively. The key point-based method using deep learning approaches will be used to
predict the human body movement.
Human body movement/gesture detection is an important problem in human–
machine interface systems for effective virtual conversation/interaction with the
machine and the user. The fundamental constraints of natural way of interaction
such as speech, touch, contextual and environmental awareness and immersive 3D
experience with a goal of computer which can see, listen, learn and talk as a human
[1]. In this way, systems can be used to infer non-verbal communication between
user and machines. There are challenges for this task such as identifying body parts
in the images/sequence of images and inferring body gestures from them such as
thumbs up, fist, victory, head nodding, head rotation, and multiple people sitting.
The existing approaches [2, 3] for gesture detection assumes prior conditions like
body parts are shown in a predefined area of image and identify the gestures from
that area using any of the computer vision algorithms. Other few approaches for
predicting the different body movements use different heuristic which can not be
easily extended to new movements or gestures. This paper describes a system which
does not require any such prior assumptions and jointly performs different body
gesture detection using geometry of different body parts and other features paired
with different machine learning algorithms. Along with geometry and the system
also use key points of the human body in this solution which requires drastically
smaller training data and gives more accurate inferences. The prediction of body
movements/gestures in real time will help the virtual system more interactive with
the user.
To compute the user body movements in real time, the system uses machine
learning methods, geometry-based approaches for detection of human body parts,
human body movements such as hand detection, head detection and gestures such as
hand gesture identification, head movements, person standing and sitting, and human
count.
For human body parts’ detection, our approach consists of a deep learning-
powered method which robustly locates body parts in an image/video frames and
robust enough to handle any kind of orientation, position, lighting, background, and
other variations. For gesture identification, after detection of body parts in an image or
frame, we use machine learning-based classification approach which can be trained
for the existing gestures and new gestures. For generalization, the system employing
key point-based prediction to identify the left hand/left leg or right hand/right leg
and the gestures associated with the hand and legs such as fist, thumbs up, victory,
ankle lock and withdraw feet which requires less data for training the model and
predicting the same in real time. The prediction of these body movements or ges-
tures will help when voice conversation got missed and the virtual system will able
to make the conversation with the user based on the system predicted user gestures
to communicate to the user efficiently.
The system will also help when user misspelled in the conversation and the user
gestures will help to understand the context of the user and communication will be
Virtual Conversation with Real-Time Prediction of Body … 115
easy in the virtual environment. In this system, we can augment or fuse the user voice
with user movements/gestures to communicate in virtual environment with the user.
The rest of the paper is organized as follows. Section 2 describes the related work.
Section 3 discusses the detailed solution approach on human body movement detec-
tion for hand gestures and other body parts’ detection using key points, geometrical
and shape features. The subsection of Sects. 3 also discusses the role of gesture
events computation on virtual machine interaction. Section 4 deals with real-time
human body movement detection using video streaming application. Section 5 out-
lines the application scenario in the context of user interviews. Section 6 provides
the conclusions and future applications followed by References.
2 Related Work
This section discusses the existing work carried out on human gesture/body move-
ment detection and the role of these movements or gestures in virtual conversation
systems.
Wu and Hideki [4] proposed a real-time human motion forecasting system which
visualizes the future pose/movements of human using simple RGB camera in virtual
reality. The authors used residual neural networks and recurrent neural networks
to estimate the human 2D pose. The authors also used residual linear networks to
recover 3D pose from predicated 2D pose. The lattice optical flow algorithm is used
for joints movement estimation.
The authors Badler [5], Kuffner [6] and Ng [7] described different approaches
for estimation of human movements for autonomous action prediction. They also
portrayed 3D virtual humans for interactive applications for conversations. They
have used mid-level PaT-Net parallel finite state machine controller, low-level motor
skills and a high-level conceptual action representation system that can be used
to drive virtual humans through complex tasks. These complex tasks offer a deep
connection between natural language instructions and animation control.
Nguyen [8] used key point-based approach for identifying the gestures. A prob-
abilistic distribution for each key point indicating the likelihood of producing an
observable output at that key point is also derived using this approach. The char-
acteristic data pattern is obtained for the new gestures can be compared with the
patterns of previously stored known gestures. These gestures will help us to compute
the confusion matrix. The confusion matrix describes possible similarities between
new and known gestures. This approach is purely probabilistic matching approach
to identify the match of existing and new gestures.
Zhou and Cheng [9] outlined hidden Markov models (HMM) and geometrical fea-
ture distributions of hand trajectory of a user to achieve adaptive gesture recognition.
The method acquires a sequence of input images of a specific user and recognizes
the gesture of specific user from the sequence of images using the trained gestured
model and geometrical features extracted from the hand trajectory of the user. The
116 G. Agnihotram et al.
geometrical features are extracted from the gesture model based on the relative posi-
tions of state transition points and starting point of the gesture. Again, for detection
the authors used sequence images to recognize the gesture.
Huang et al. [10] and Smith [11] describe the dynamic gesture recognition using
geometrical classification. In each image, the hand area is segmented from the back-
ground and used to estimate parameters of all five fingers. The authors proposed
method that classifies the hand images as one of the postures in the predefined
database and applies a geometrical classification algorithm to recognize the gesture.
The algorithm also combines skin colour model with motion information to achieve
real-time hand segmentation performance and considers each dynamic gesture as a
multi-dimensional volume and uses a geometrical algorithm to classify each vol-
ume. Authors applied this approach on gesture recognition for computer, TV control
systems and game control systems.
Kurakin et al. [12], Chen et al. [13] and Marin et al. [14] used depth information for
dynamic hand gesture recognition. The authors used sensors and 3D images to capture
depth information to recognize gestures. The hand gesture recognition will help to
predict the human activity understanding for many of the practical applications.
This is a real-time approach for dynamic hand gesture recognition. The approach
is fully automatic and robust to variations in speed and style in hand orientations.
The approach is based on the action graph which shares similar robust properties
with standard HMM and requires less training data by allowing states shared among
different gestures. The system is evaluated on a challenging dataset of twelve dynamic
American Sign Language (ASL) gestures and obtained promising results.
Shan et al. [15] tried with particle filter and mean shift algorithms for visual
tracking. The authors found both the algorithms which have their own respec-
tive strengths and weaknesses. The authors integrated the advantages of the two
approaches for improved tracking. The authors incorporated the MS optimization
into particle filtering to move particle to local peaks in the likelihood, the proposed
mean shift embedded particle filter improves the sampling efficiency. The authors
applied these algorithms in the context of developing a hand control interface for a
robotic wheelchair.
The next section describes detailed solution approach followed by the application
in the domain of virtual conversation along with human body movements/gestures.
3 Solution Approach
The proposed approach uses machine learning and geometry-based approach for
hand gesture detection. The key point-based deep learning approaches are used for
other body part movements’ detection such as standing and sitting. The solution
approach has three stages. The first stage is model training where the model is
trained on huge corpus of data for detecting the hand gestures and other body part
movement or gestures. The second stage helps to describe the role of gesture/body
part movements associated with the trained model in virtual conversation for effective
Virtual Conversation with Real-Time Prediction of Body … 117
Hand Gesture
detection The role of
Training Model Gestures/Body
Movements on
Other Body Virtual con-
Movements versation
Detection
interaction between machine and human. The final stage explains how these trained
model and gesture events will help the virtual conversation system with the real-time
streaming applications (Fig. 1).
The video corpus data collected from different sources such as YouTube and self-
videos are used for training where the hand and other body parts of human are visible.
The training model (stage 1) has two parts, one is hand gesture detection model such
as victory, fist and thumbs up and body parts’ detection model such as standing and
sitting. The training models are explained in detail in the next subsections. With
the trained model on gesture and body movements’ events, how the gap will be
filled between machine and human in virtual conversation (stage 2). The details will
be explained in the subsequent subsection. In stage 3, the real-time events will be
generated on real-time streaming data with the trained model. These events will
help in virtual conversation for human and machine interaction in effective way. The
details will be given in next Sect. 4.
As part of training model, the system trains the hand gesture detection for the corpus
of training data. The detail steps are shown in Fig. 2, and each step is explained as
given below.
Hand Detection—this step locates hand in an image. Input to this module is a raw
image, and output is a bounding rectangular box (location of hand) of hand or multi-
ple hands. This module is powered by a deep learning model which is trained for hand
detection task using recurrent neural networks (RCNN) or convolutional neural net-
works (CNN) on hand images. The CNN/RCNN does a binary classification for each
pixel in image. The system collects a huge data set of images containing hands (either
using simulation and automated labelling or manual image collection and manual
118 G. Agnihotram et al.
Hand Detector
Hand Normalization
Reflection
Transform
Geometric and Shape
Feature extraction
Gesture Identifica-
tion/Classification
FIST VICTORY
labelling). Labelling of these images includes locating the regions (pixels corre-
sponding to hands) of hands/drawing a bounding rectangle around hand. Once we
have this labelled data set, we train a CNN (SSD (single-shot detector)/RCNN/Yolo)
[16] which takes image as input and gives location of hand in image as output.
Hand Normalization—in this step, resize the detected hand to a standard size (ex-
256 × 256) and identify if it is left or right. If it is right hand, then apply reflection
transform (rotation along y-axis) and convert it to left hand. For hand and hand side
(left or right) detection, use one of the following methods given below.
1. Jointly, predict hand and hand side detection (left or right) by training a labelled
data of right and left side using a convolutional neural network (CNN) classifier
for hand side prediction
2. Otherwise in the first step, train for hand prediction in the images, and second
step, classify which hand side left or right. These two methods require training
and extra computation of data. The alternative method using key points for hand
and hand side detection is provided in the next step.
3. Key point detection:
Virtual Conversation with Real-Time Prediction of Body … 119
a. It is a CNN powered module which takes hand cropped image as input and
gives key points as output. To train this CNN, collect a labelled data set which
consists of hand cropped image with annotated joint coordinates.
b. For this, use a simulation tool in which human characters of different age, gen-
der, skin colours are simulated with different hand poses. In this simulation,
the coordinates of palm and all finger joints are known.
c. After simulating each new frame, capture its image, and save its finger joint
coordinates as label. Once large labelled data set is available, train the model
using CNN. The trained model provides key points of every input image in
real time.
Using key points predicting the hand side (left or right):
• Obtain the key points of the input images for both original and reflection
transformed images (explained in above hand normalization step).
• Predict probabilities of key points for both images—original and reflection
transformed image.
• For example, what is probability that the key point belongs to thumb in training
model—computed using CNN classifier.
⎡ ⎤
0.005
⎢ 0.00067 ⎥
⎢ ⎥
⎢ 0.86 ⎥
⎢ ⎥
⎢ ⎥
⎢ . ⎥ Ex. Probability of Thumb Key point
⎢ ⎥
⎢ . ⎥
⎢ ⎥
⎣ . ⎦
0.0009
• Take average of probabilities of all the key points for both images and use that
image for further computation which has larger average probability will be hand
side (left or right).
Geometric and Shape Feature Extraction—the training model uses geometric fea-
tures, shape features along with key points as input feature vector for hand classifier.
Following features are computed in this step:
1. Curvature of fingers—for each finger, curvature of the finger is computed using
following formula
For ith figure Fi , we have coordinates for all joints (key points)
Fi = [(x1, y1), (x2, y2), (x3, y3), (x4, y4)]; x’s and y’s—key points
First, compute mid-point of each finger bones.
−
→
m1 = c((x1 + x2)/2, (y1 + y2)/2) (1)
−
→
m2 = ((x2 + x3)/2, (y2 + y3)/2) (2)
120 G. Agnihotram et al.
−
→
m3 = ((x3 + x4)/2, (y3 + y4)/2) (3)
− →
(4)
→ − → −
m1 − m2
m3 − m2
−
→ − →
(5)
→ −
p2 − p1
p5 − p1
The angle between the edges is shown in Fig. 3. For one hand, compute 4
angles between two consecutive fingers of the five fingers.
3. Direction vector of finger bones—to obtain direction vector of ith finger bone
having edge coordinates as (x1, y1), (x2, y2) we use
direction vector = |vv| , where v = (x2 − x1, y2 − y1)
For each finger, compute 3 direction vectors, which gives 6(=3 * 2) features.
4. Key points of finger joints—coordinates of finger joints (key points), in a refer-
ence frame where mid-point of palm is taken as origin, is taken as features. For 21
key points, we get 21 * 2 dimensional features (one point gives two features—one
is x-coordinate and other is y-coordinate)
5. Concatenate all these features and normalize them to have norm 1.
Hand Gesture Identification/Classification: a classifier can be used for ges-
ture identification. Here, a multi-class SVM classifier with RBF kernel will be
used for training the hand gesture model. This training will be happening on
P1 P3
Virtual Conversation with Real-Time Prediction of Body … 121
labelled data set where each sample consists of features extracted (geometri-
cal and shape features) from cropped images and gesture type of that images,
for example, victory, thumbs up, fist, etc. Here, train the SVM model with the
labelled data sets. The model input feature vector is very small and large enough
to represent each gesture or body movement type. Here, even a smaller data set
is enough for training the classifier. Ci ’s for i = 1, 2, . . . is the class label of fist,
victory, thumbs up, etc. as shown in Fig. 2.
For obtaining the other body movements such as person standing, person sitting,
number people present in a frame, head nodding for “yes” and “no” which are very
important for virtual conversations with machine and human. Based on the human
body movements/gestures, models are trained, and trained models are used in real
time to derive the gesture events which will help in virtual interaction with the
machine and human.
In general, the models are based on key points derived from human body parts
as explained using hand gesture classification from above subsection. In addition to
the key points, the geometrical features and shape features also used for training the
classifiers for different body parts’ movements. The key points are derived from the
trained convolutional neural networks on different body parts. The geometrical and
shape features are obtained from the joint key points of different body parts. The
model is trained with a simple SVM classifier using key points, geometrical features
and shape features as shown in Fig. 4.
Head rotation and nodding for “yes” and “no”: for head, related gestures such
as head rotation for “yes” and head nodding for “No” will be computed using the time
series-based yaw, roll and pitch feature computation. These features will compute
the oscillation and derive the events for head rotation and nodding for “yes” and “no”
Key point
Human Body detection using
parts movements CNN Body part
movement classi-
fication
Shape Features
Geometrical
Features
values. Using key points derived from CNN model, the method computes Euclidian
angles of roll, yaw and pitch features (please refer Fig. 5) to derive head rotations,
head nodding for “yes” and “no” events. The gestures need to be completed within
a period, and flags are set whenever there is oscillation. The oscillation provides
the head rotation of the user, user opinion for “yes” and “no” events. Here, the
gesture/head movements are computed with the key points and geometric features
without a need of classifier. These events will help in the conversation for effective
interaction of the user in the virtual environment.
The human gesture/body movements are identified with the trained classifier model
as explained in above subsections. Here, the method describes the role of these
gesture events in the virtual conversation with the user. These events will help in
filling following gaps for effective conversation with the machine and human.
Missed conversation: user instead of speaking, user may nod his head for “yes” or
“no”. The gesture events will help the virtual system to understand the user intent
with these gestures. In another case, user can also use his hand signals such as
“Thumb up” and giving responses with finger “1”, “2” to response to the virtual sys-
tem. These gestures will help the effective interaction between human and machine
where the user conversation got missed out and virtual system can make out the user
conversation with these gestures effectively.
Misspelled/semantic conversation: using the user body movements/gestures, even
though the user misspelled the sentences while communicating to the virtual machine
the virtual machine can able to figure out the user intention with the misspelled
conversation. For example: if user saying I want to go room and signed with hand
signal. The hand signal will refer to rest room and the user misspelled as room instead
of rest room. The virtual system will able to figure out the user intention with the user
body movements/gestures, and virtual system will interact to the user accordingly.
Virtual Conversation with Real-Time Prediction of Body … 123
Fusion of voice with the gestures: with the fusion of voice and the gesture, the virtual
system will able to figure out the user intention and the conversation with the user
takes place accordingly. For example: the leg and feet gestures as shown in Fig. 6
can able to provide the most accurate clues to someone’s mental state. The ankle
lock, in a seated position, people sometimes lock their ankles and withdraw their
feet below the chair as shown in Fig. 6. The person doing this gesture is holding back
a negative reaction. And behind a negative reaction, there is always some negative
emotion. So, a person doing this gesture simply has a negative emotion that he/she is
not expressing. He/she might be afraid, angry. By fusing this gesture with the voice,
the virtual system will able to figure out the intent of the user while interacting.
This section describes the real-time generation of body movements for the virtual
machine understanding with video streaming application. This method uses the ges-
ture trained model to generate the events in real time. The steps for the real-time
body movements’ generation are shown in Fig. 7.
The video streaming data will be generated in real time and the frames or sequence
of frames will be sent to the gesture training models to generate gestures and those
gestures are used by the virtual machine while interacting with the user in real
time. These gestures will be handy to the virtual system while interacting to the user
where user shows some hand or body movements while communicating to the virtual
machine. The body movements’ identification makes the machine and human more
interactive.
124 G. Agnihotram et al.
User conversa-
tion with the vir-
tual system
The approach described in above sections can be used in many tasks involving extrac-
tion/interpretation of hand movements’ recognition in videos/images and action
recognition. The solution is deployed in virtual interview platform where user inter-
acts with the virtual system for interview. The user body parts’ movements/gestures
will help the virtual system for effective communication. The solution is tested on
many scenarios with respect to performance and observed that the solution has an
accuracy in identifying hand related gestures is more than 90% and more than 80%
on other body part movements’ detection. Few scenarios listed down where the body
movements helped the virtual system for effective interaction with the user.
1. The user interacts with the virtual interviewer where the virtual interviewer will
ask the question, Do you know the answer to this particular question. Where user
nod his head for “yes”/“no” instead of replying in voice. Hence, the system will
able to predict that the user is replying yes/no, and instead waiting for the user
voice response, the virtual system will go ahead and proceed for next question
or another context.
2. The user also interacts with the virtual interviewer with hand signals. In the inter-
view context, the virtual system will the ask the user multiple choice questions,
and the user replies the answer using hand signals such as 1, 2, 3, or 4 with hand
fingers. The system will understand the user hand signals and proceed with the
next question.
3. In another scenario, the user provides the hand signals such as STOP or waiving
to virtual system to provide contextual information rather than voice. The virtual
system understands the hand gestures and interacts the user in the same context
more effectively. Hence, the user body movements/gestures are foremost impor-
tant for conversation system where the voice-related information is missing or
misspelled while interaction in the virtual environment.
Virtual Conversation with Real-Time Prediction of Body … 125
The solution proposed in this paper discusses the new approach for human body
movements or gesture detection and role of these gestures in virtual conversation
systems. The method uses key points, geometrical and shape features to train the
different human body movements. These features along with key points are used in
SVM classifier to train the model instead of using deep learning approaches which
needs high computing systems for training, and the solution does not require any huge
amount of data to train the model. The geometrical, shape features along with key
points will help to train the model for different human body movements’ detection.
Using the trained model, the real-time events are generated with video streaming
application, and these events are consumed by virtual systems for effective commu-
nication with the candidate with respect to virtual interview scenario. The candidate
body movement events help the interview to go smoothly and more interactive with
the virtual machine. The solution is tested on multiple scenarios of interview and
observed that the solution able to get different accuracy values for different body
part movements. The solution provides 90% of accuracy for hand gestures, and other
body parts of the solution can provide average of 80% of accuracy.
In future, this solution will be extended to other applications in the domain of
smart health care where patient and machine will interact for diet related, medicine
and health-related consultancy. This solution can also be extended to teaching domain
and retail domain where virtual machine provides feedback/user query solving.
References
1. Kaushik, Manju, and Rashmi Jain. 2014. Natural user interfaces: Trend in virtual interaction.
arXiv preprint arXiv: 1405.0101.
2. Eshed, Ohn-Bar, and Mohan Manubhai Trivedi. 2014. Hand gesture recognition in real
time for automotive interfaces: A multimodal vision-based approach and evaluations. IEEE
Transactions on Intelligent Transportation Systems 15 (6): 2368–2377.
3. Molchanov, Pavlo, Shalini Gupta, Kihwan Kim, and Jan Kautz. 2015. Hand gesture recognition
with 3D convolutional neural networks. In Proceedings of the IEEE Conference on Computer
Vision and Pattern Recognition Workshops, 1–7.
4. Wu, Erwin, and Hideki Koike. 2018. Real-time human motion forecasting using a RGB camera.
In Proceedings of the 24th ACM Symposium on Virtual Reality Software and Technology. ACM.
5. Badler, I. Norman. 1997. Real-time virtual humans. In Proceedings of the Fifth Pacific
Conference on Computer Graphics and Applications. IEEE.
6. Kuffner, J.James. 1998. Goal-directed navigation for animated characters using real-time path
planning and control. International Workshop on Capture Techniques for Virtual Environments,
171–186. Berlin: Springer.
7. Ng, Kia. 2004. Music via motion: Transdomain mapping of motion and sound for interactive
performances. Proceedings of the IEEE 92 (4): 645–655.
8. Nguyen, H. Katerina. 2001. Method and apparatus for real-time gesture recognition. U.S. Patent
No. 6,256,033, 3 July 2001.
9. Zhou Jie, and Pu Cheng. 2016. System and method for gesture recognition. U.S. Patent No.
9,323,337, 26 Apr 2016.
126 G. Agnihotram et al.
10. Huang, Kuang-Man, Ming-Chang Liu, and Liangyin Yu. 2013. System and method for dynamic
gesture recognition using geometric classification. U.S. Patent No. 8,620,024, 31 Dec 2013.
11. Smith Dana, S. 2014. Geometric shape generation using multi-stage gesture recognition. U.S.
Patent Application 13/846,469, filed 18 Sept 2014.
12. Kurakin, Alexey, Zhengyou Zhang, and Zicheng Liu. 2012. A real time system for dynamic
hand gesture recognition with a depth sensor. In 2012 Proceedings of the 20th European Signal
Processing Conference (EUSIPCO). IEEE.
13. Chen, L., H. Wei, and J. Ferryman. 2013. A survey of human motion analysis using depth
imagery. Pattern Recognition Letters 34 (15): 1995–2006.
14. Marin, Giulio, Fabio Dominio, and Pietro Zanuttigh. 2014. Hand gesture recognition with
leap motion and kinect devices. In 2014 IEEE International Conference on Image Processing
(ICIP). IEEE.
15. Shan, Caifeng, Tieniu Tan, and Yucheng Wei. 2007. Real-time hand tracking using a mean
shift embedded particle filter. Pattern Recognition 40 (7): 1958–1970.
16. Redmon, Joseph, and Ali Farhadi. 2018. Yolov3: An incremental improvement. arXiv preprint
arXiv: 1804.02767.
Automatic Diagnosis of Attention
Deficit/Hyperactivity Disorder
1 Introduction
Test 2
PaƟent
Test 3 Informant
Test 4
patient prior to the visit and produce a report based on the tests. Tests would check for
the symptoms stated in DSM-5. An Attention Deficit/Hyperactivity Disorder patient
shows at least 6 out of 9 inattentive and/or hyperactive and impulsivity symptoms
for more than 6 months according to DSM-5.
In the proposed system, in order to automate the diagnosis of ADHD, authors have
to design four tests which need to be taken by the patient to check for the symptoms,
refer Fig. 1. The tests are designed such that it would recheck symptoms multiple
times through various scenarios.
Test 1—Completing a task
Patient would be given a task to match the blocks on the screen. While undergoing
this test, he/she would be examined for how accurately the task is being completed,
attention and patience. This test would look for symptoms like often patient has
difficulty in organizing tasks and activities; often he/she avoids or is reluctant to
engage in tasks that require sustained mental effort; often does not follow through
on instructions and fails to work; often has difficulty sustaining attention in tasks
or play activities; often leaves seat in situations when remaining seated is expected;
often unable to play or engage in leisure activities quietly.
Test 2—Audio Questionnaire for patient
Patient would be provided with question in audio format, and he/she has to answer
through microphone. This would check for hyperactivity and impulsivity. It would
be examined for symptoms like often blurts out an answer before a question has been
completed; often has difficulty waiting for his or her turn; often interrupts or intrudes
on others.
Test 3—Responding to Audio and Visuals
Patient would be given words in audio/image format which he/she has to detect and
enter in the text field provided on screen. This test would look for symptoms like
often fails to give close attention to details or makes careless mistakes; often does
not seem to listen when spoken to directly; often have difficulty sustaining attention
in tasks or play activities.
Automatic Diagnosis of Attention Deficit/Hyperactivity Disorder 129
This test would examine for overall symptoms in different settings like home, school,
social. This test would check for the behaviour of patient at home, school and society.
The Web application for these tests has been designed by taking consideration of
user prone to Attention Deficit/Hyperactivity Disorder as per the guidelines stated
in the paper “Designing for Attention Deficit/Hyperactivity Disorder: in search of
guidelines” [1]. The output of this model would be a report containing Attention
Deficit/Hyperactivity Disorder symptoms found and a report based on them which
can then be taken to the Consultant for further decision.
2 Related Work
Fig. 2 Attention
Deficit/Hyperactivity
Disorder age group statistics
[Apollo Hospital and others]
130 S. Kaneri et al.
3 Proposed Model
User would be given a console with few shapes on it which he/she has to match with
the impressions besides. While completing this activity, he/she would be monitored
for moods the persons goes—whether he/she loses attention by keeping wandering
and time required to complete the task. To check first to scenarios, facial expressions
would be recognized, and after every defined interval, a photograph would be taken
to check whether user is before the screen and his moods (Fig. 3).
The user would be provided with audio questionnaire regarding general questions.
He/she has to answer the questions in audio format through microphone. Using audio
132 S. Kaneri et al.
The user would be provided with set of words as audios and visuals. These words
have to be written in the text field provided. This test would check for accuracy,
involvement level and attention level. An Attention Deficit/Hyperactivity Disorder
patient tends to neglect things spoken directly or is ignorant of small things. For
example, when a word is dictated or shown as “CAT”, so the patient may become
ignorant to the details and positioning of the letters and end up jumbling them as
“CTA” (Fig. 5).
Thus, this test will help Consultant to check for accuracy levels of the patient and
his/her detailed observation or ignorant behaviour to look after things approaching
them.
4 Experimentation
the kids for more than 6 months. One case was false positive and three were true
positive. Following is the confusion matrix of our model. The results were verified
by a psychiatrist (Table 1).
Following is the confusion matrix of our tests conducted.
The accuracy of our model is 98% which signifies it is a very effective model to
determine the symptoms of Attention Deficit/Hyperactivity Disorder, especially for
youngsters.
This model will help to solve the problem of the society as it would help Consultant
to attend more patients in less time by reducing number of tests to be performed in
person and would help briefly to make decision of prevalence to the disorder. It will
also help patients to get a quick diagnosis of the disorder that saving time and many
scheduled appointments causing overhead. In the future, this model would also help
Consultant to suggest treatment based on the severity and span of disorder. Thus, the
proposed system saves the diagnosis time by performing a test before the visit to the
Consultant.
Acknowledgements Special Thanks to Dr. Amit Kharat MBBS, DM RD, DNB, PhD, FICR Quality
Assurance Radiologist and Dr. Makarand Pantoji, Junior Resident, Department of Psychiatry, NIM
HANS, for their guidance and support in this research.
References
1. Nass, Ruth. 2006. Evaluation and assessment issues in the diagnosis of attention deficit hyper-
activity disorder. Seminars in Pediatric Neurology 12: 200–216. https://doi.org/10.1016/j.spen.
2005.12.002.
2. McKnight, Lorna. 2010. Designing for ADHD in search of guidelines. In IDC 2010 Digital
Technologies and Marginalized Youth Workshop.
3. https://www.healthline.com/health/adhd/conners-scale#1.
4. Pantoji, Makarand. Junior Resident, Department of Psychiatry, NIMHANS.
Automatic Diagnosis of Attention Deficit/Hyperactivity Disorder 135
5. Matthews, Marguerite, et al. 2006. Attention deficit hyperactivity disorder. Current Topics in
Behavioral Neurosciences 16: 235–66. https://doi.org/10.1007/7854_2013_249.
6. https://www.cdc.gov/ncbddd/adhd/data.html.
7. Diagnostic and Statistical Manual of Mental Disorders—Book by American Psychiatric
Association.
Classification of Types of Automobile
Fractures Using Convolutional Neural
Networks
Nikhil Sonavane, Ambarish Moharil, Fagun Shadi, Mrunal Malekar,
Sourabh Naik and Shashank Prasad
1 Introduction
classification involves feature extraction and classifies those features on the basis of
their contribution.
Traditionally, researchers extract features from different images with label and
use classification algorithms such as SVM and random forest on the extracted fea-
tures for feature classification. The traditional machine learning algorithms require
feature extractors to extract features from images. HAARHOG and LBP are some of
the known feature extractors. SVM is one of the image classification techniques used
in computer vision. It gives good performance with small datasets. But it requires the
time-consuming process of initially extracting the features using HAARHOG or LBP.
And then after, train the model using these features. Random forest usually requires a
lot of data but can give a good robust model. It has been observed that these traditional
supervised machine learning algorithms require the data to be structured in feature
matrices and labels in a vector of the length equal to the number of observations.
Hence, today use of various pre-built algorithms is on the rise and the most popular
choice for image classification is convolutional neural networks. Convolutional neu-
ral networks fall under the category of deep learning. Many deep learning approaches
and algorithms have been discussed and researched upon in the past years. In [1],
authors have talked about convolutional neural networks (CNNs),restricted boltz-
mann machine (RBM), and deep belief network (DBN) in great detail. The authors
have also discussed briefly the challenges posed in deep learning and an effective
way to tackle them. Convolutional neural networks eliminate this time-consuming
process required as it has convolutional layers and pooling layers to take care of
feature engineering part. There is no need of manually extracting features in neural
networks. Neural networks are considered to be more flexible than the traditional
machine learning algorithms. In [1], the authors have discussed how deep learning
has highly contributed to computer vision, natural language processing, and artificial
intelligence. Also, neural networks perform exceptionally well when the predictor
variables are more. Neural networks perform well only when the size of the dataset
is large as compared to SVM. But, the size of dataset can be diligently increased by
using image augmentation techniques. Thus, it can be observed that neural networks
are seen to be more flexible and easier to use than the traditional machine learning
algorithms.
Convolutional neural network [2] is a part of deep learning which has its application in
image classification, natural language processing, visual recognition, etc., domains.
Convolutional networks are currently the most efficient and accurate deep learning
models for classifying pictorial data. CNN is now the efficient model on every image-
related problem. Figure 1 shows the representation of a general convolutional neural
network.
Classification of Types of Automobile Fractures … 139
2.1 Architecture
Convolutional layer is a major part of neural network where the main aim is to extract
features from input images. The convolution operation works in a way such that it is
able to detect the features such as edges, corners, and sharp edges from a given input
image representation [2]. Convolution operation is performed using a kernel/filter
and the input image. The kernel is slide over the input image with a stride equal
to 1, and a matrix multiplication is performed consequently at every location. The
multiplication results in the formation of feature maps (Fig. 2).
Traditionally, researchers extract features from different images with label and use
classification algorithms such as SVM and random forest on the extracted features
Fig. 2 Convolution
140 N. Sonavane et al.
for feature classification. But today the use of various pre-built algorithms is on
the rise and the most popular choice for image classification is convolutional neural
networks. CNN uses neural networks for classifying images which makes it efficient.
The activation function used in the convolution layer is the RELU which
introduces nonlinearity in the images [3].
Pooling layer is used after the convolutional layer to reduce the number of parameters
and the computation power. It reduces the dimensions of the image but also retains
the important information of the image. Pooling can be of three types, namely max
pooling, sum pooling, and average pooling. Max pooling is used in this classification
process of micrographs for automotive components. Max pooling takes the element
with the largest value from a patch in the input image and creates a new pooled feature
map. This reduces the size of the image and also prevents overfitting by removing
irrelevant information from the input images. The result of the pooling produces a
reduced dimension matrix which contains the important information. This is then
flattened into a single vector and fed to the hidden layers [4] (Fig. 3).
The flattened matrix obtained from the pooling layer is given to the fully connected
layer. The fully connected layers consist of many neurons. These fully connected
layers will help in performing classification based on the learned input features.
In a fully connected layer, each neuron is connected with the other neurons from
the previous layer, and every connection is assigned a weight [2]. After a certain
number of epochs, which are set by the user, the system or the model differentiates
the dominating features and the low-level features in the images. Next, these features
are classified using softmax classification technique. The output layer is placed at
the end of the last fully connected layer.
An activation function is nothing but a mathematical function that is applied in
the perceptron; it is basically a mapping of nonlinear complex functions between
Classification of Types of Automobile Fractures … 141
input and output responses. The main purpose of an activation function in a multi-
layered artificial neural network (ANN) is to derive an output from an input signal.
So, basically in an architecture of ANN with multilayer perceptron, the input signals
(X i ) from the input nodes (m) are multiplied to the weights (W i ) of each node and
their combined summation is fed into a function f (x) and an output signal is computed
which is again fed as an input to the adjoining layer.
The activation functions that we used in our model were rectifier and sigmoid
activation functions.
The rectifier activation function was used in the input layer and the hidden layers
as it outputs the computation of inputs directly if they are positive and produces
the output as zero if the sum of products of weights of the input signals is negative
(Fig. 4).
Equation for activation function is given as follows:
m
f(x) = Wi Xi
i=1
∅(x) = max(x, 0)
The activation function that was used at the output layer was sigmoid activation
function. The purpose of using sigmoid is that it gives the accurate value between 0
and 1, just what we get in logistic regression. So, the output can now be measured
in probabilistic terms and it helps in analyzing the results especially when we are
targeting to solve a classification problem.
142 N. Sonavane et al.
The most important part of this model was the data, and it was difficult to fetch
it from a single source. We created a dynamic Web scraper to parse the images
from various sources [5]. After careful observation, filtering and manual sorting, we
arrived at 420 (Four Hundred and Twenty) images which were suitable to train the
model. Although 420 images indicate a very less dataset for a deep network to train
on, the model was built and constructed successfully after numerous alterations and
changes.
The images were sorted into three categories as brittle, ductile, and fatigue, and
the aim was to train the CNN on this available data and predict the type of failure or
fracture accurately.
To build this algorithm, TensorFlow and Theano were taken into consideration, but
due to the ease offered by Keras and due to its rejection toward low-level computation,
Keras was used to script the algorithm and also due to the fact that it runs on top of
TensorFlow as well as Theano.
The dataset we fetched was a set of 3D images with RGB matrix values and had
to be converted into a one-dimensional set of images for the ease of training.
With 1D images, it was easy to perform image augmentation procedures.
To increase the accuracy of the model, the first and most important factor was to
increase the dataset, and instead of a quantitative approach, we went for the qual-
itative one. Several layers of image augmentation techniques like rotation, shear-
ing, blurring, grayscale conversion, adding salt-and-pepper noise, scaling, transla-
tion, flipping, nonlinearizing, perspective transformation, and alteration of lightning
conditions were performed using the Keras API.
So, this dataset of 420 images was expanded to a final set of 3890 images.
The total hidden layers or multilayer perceptrons used in the tail ANN were 6.
The optimizer used for the model was ‘Adams’ optimizer, and the loss function
used was ‘categorical_crossentropy’. While training the ANN, the model reaches
inconsistent values of accuracies. It becomes very difficult to fetch the model with
low loss and high validation accuracy, when the total batch size was getting trained
for the fixed number of 500 epoch cycles.
Ensemble learning is a process that is practiced quite often in machine learning. Vari-
ous models like decision trees and random forests work on the algorithm of ensemble
learning. Ensemble learning can be described as combining the output of multiple
models into a single output; i.e., various models are cascaded in a parallel connection,
and the output of these models is converted into a singular output combining several
predictive models into one [6].
Initially, many classification models were created using different splits of the
same training dataset. The first classifier (convolutional neural network in this case)
Classification of Types of Automobile Fractures … 143
Fig. 5 Representation of
ensemble learning
was constructed. The shape of the input image was (64, 3, 3), and 64 feature maps
were obtained at the end of convolution. Then, these 64 feature maps were further
subsampled by the fourth layer or the max pooling layer into 128 output matrices.
Four convolutional and max pooling layers were used. The max pooling layer as
explained in Sect. 2.1.2 extracted the impact features and then gave them as the
output to the flattening layer for flattening these matrices to feed them into the input
layer of the full connection. The full connection through majority voting of the
multilayer perceptrons classified the input image as ductile, brittle, or fatigue.
The second classifier (convolutional neural network in this case) was constructed.
The shape of the input image was (32, 3, 3), and 32 feature maps were obtained after
the first convolutional layer operation. Then, these 32 feature maps were further
subsampled by the fourth layer or the max pooling layer into 64 output matrices. In
the second classifier too, around four convolutional and pooling layers were used.
The max pooling layer as explained in Sect. 2.1.2 extracted the impact features and
then gave them as the output to the flattening layer for flattening these matrices to
feed them into the input layer of the full connection. The full connection through
majority voting of the multilayer perceptrons classified the input image as ductile,
brittle, or fatigue.
The third classifier (convolutional neural network in this case) was constructed.
The shape of the input image was (128, 3, 3), and 128 feature maps were obtained
as the output. Then, these 128 feature maps were further subsampled by the fourth
layer or the max pooling layer into 256 output matrices. In the third classifier, four
convolutional and max pooling layers were used. The max pooling layer as explained
in Sect. 2.1.2 extracted the impact features and then gave them as the output to the
flattening layer for flattening these matrices to feed them into the input layer of
the full connection. The full connection through majority voting of the multilayer
perceptrons classified the input image as ductile, brittle, or fatigue.
All these three models were combined into a single predictive model, and the
weighted average of all these three models was obtained as the final output (Fig. 5).
The model tended to reach max and suitable values in between and while having
these peaks extracting the end epochs with high loss and low validation accuracy
was a backdrop, but this was a major backdrop and we successfully overcame this
144 N. Sonavane et al.
backdrop by using the callback method [7]. The callback method is a very probabilis-
tic approach of extracting the most accurate epoch. A limit of 500 epoch cycles was
decided, and a patience value of 47 epoch cycles was given in the ‘callback method’
(the value 47 was decided after numerous training iterations and cycles). Now, when
the model reached a peak value, i.e., high validation accuracy and low loss, the model
would carefully observe the next 47 cycles after this peak was attained, if the accu-
racy value did not reach as high as the peak; the training iterations would stop, and
the peak model was saved making it the best model. Also while the training process
went on after the peak was attained, if the model reached higher accuracy values and
lower difference between training and validation loss, the model now gave up the
initial peak value and considered this new epoch as the new peak value and observed
for next 47 cycles until it reached the maximum value of 500 epoch cycles. This
process saved the best models while training and helped to achieve higher validation
accuracy.
The classes of the callback method were built in a way to stop the training process
if the algorithm was unable to find an epoch cycle with accuracy higher than that of
the most frequent peak in the defined patience window.
In this section, we explained the working and basic structure of our model along with
the discussion of output.
3.1 Dataset
The automobile fracture dataset consisted of 420 images extracted from internet
using Web scraping. The images were micrographs from various sources and were
labeled properly before using it in the model.
3.2 Architecture
For getting the best model among various iterations, we used a callback function
which stops the models from further iterations if there is no progress in the accuracy
of model. Thus, it helps in extracting the best model (Figs. 6, 7 and 8).
Finally, we achieved a highest validation accuracy score of 92.43% with training
and validation loss as 0.9068 and 0.6866, respectively.
125/500 [==============================] - 4 s 153 ms/step - loss:
0.2987 - acc: 0.9068 - val_loss: 0.6866 - val_acc: 0.9243
Figure 9 shows the best model epoch.
Classification of Types of Automobile Fractures … 145
4 Conclusion
In this approach discussed in the paper, we have trained a model to classify automobile
fracture types in categories: brittle, fatigue, and ductile on the basis of the micrographs
of these fractures. The classification is done using convolution neural networks, and
it is trained on 420 images. With tuning the convolutional layers, pooling layers,
batch size, and many such parameters, we can see a drastic change in the efficiency
of the model. And by using the callback function, we can stop the training if the
accuracy of the model is not increasing anymore.
References
1. Jia, Xin. 2017. Image recognition method based on deep learning. In 29th Chinese Control and
Decision Conference (CCDC).
2. Guo, Tianmei, Jiwen Dong, Henjian Li, Yunxing Gao. 2017. Simple convolutional neural
network on image classification. In 2017 IEEE 2nd International Conference on Big Data
Analysis.
3. https://www.datascience.com/blog/convolutional-neuralnetwork.
4. Yim, Junho, Jeongwoo Ju, Heechul Jung, and Junmo Kim. 2015. Image classification using
convolutional neural networks with multi-stage feature. In Robot Intelligence Technology and
Applications 3. Advances in Intelligent Systems and Computing, ed. J.-H. Kim et al., vol. 345,
587. Springer International Publishing Switzerland.
5. Tajbakhsh, N., et al. 2016. Convolutional neural networks for medical image analysis: Full
training or fine tuning? IEEE Transactions on Medical Imaging 35 (5).
6. Krizhevsky, A., I. Sutskever, and G.E. Hinton. 2012. ImageNet classification with deep con-
volutional neural networks. In Proceedings of Advances in neural information processing
systems.
7. He, K., X. Zhang, S. Ren, and J. Sun. 2016. Deep residual learning for image recognition. In
Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, June.
8. Boureau, Y.L., F. Bach, Y. LeCun, and J. Ponce. 2010. Learning midlevel features for
recognition. In CVPR.
Classification of Types of Automobile Fractures … 147
9. https://towardsdatascience.com/a-comprehensive-guide-to-convolutional-neural-networks-
the-eli5-way-3bd2b1164a53.
10. Jmour, Nadia, Sehla Zayen, and Afef Abdelkrim. 2018. Convolutional neural networks for
image classification. 978-1-5386-4449-2/18/$31.00 ©2018 IEEE.
11. Wang, Tao, David J. Wu, Adam Coates, and Andrew Y. Ng. End-to-end text recognition with
convolutional neural networks.
12. Vo, An Tien, Hai Son Tran, Thai Hoang Le. 2017. Advertisement image classification using
convolutional neural network. In 2017 9th International Conference on Knowledge and Systems
Engineering (KSE).
A Subjectivity Detection-Based
Approach to Sentiment Analysis
Abstract With the rise of Web 2.0 where loads of complex data are generated every
day, effective subjectivity classification has become a difficult task in these days. Sub-
jectivity classification refers to classifying information into subjective (expressing
feelings) or objective (expressing facts). In this paper, we use Yelp reviews dataset.
Our aim is to prove that a dataset with the objective sentences removed from each
review gives better results than the dataset containing both subjective and objec-
tive sentences. To achieve this, we have used two approaches, each divided into two
phases. The first phase of both the approaches is mainly the subjectivity classification
phase where we filter out the objective sentences and keep the subjective sentences in
the reviews, thus creating a new dataset with purely subjective reviews. The second
phase of the first approach uses CountVectorizer which creates word vectors, and we
fit the model to the classifiers. The second phase of first approach is repeated for both
the datasets, and we get better results for the newly created dataset which contains
purely subjective reviews. The second phase of the second approach uses Word2Vec,
an implementation of neural network which creates distributed word vectors. We fit
this Word2Vec model to the classifier, and we analyze the results. Again, the newly
created dataset gives better results after we repeat this phase of the second approach
for both the datasets.
N. Das (B)
TATA Consultancy Services, Gitanjali Park, Kolkata, India
e-mail: nilanjanadas010@gmail.com
S. Sagnika
School of Computer Engineering, Kalinga Institute of Industrial Technology (Deemed to be
University), Bhubaneswar, Odisha, India
e-mail: santwana.sagnika@gmail.com
1 Introduction
We have discussed some of the past work in this field in Sect. 2. Section 3 discusses
the approaches we follow. Section 4 discusses the experimental setup and the results
of the experimentation.
2 Related Work
In this section, we will study some of the related works done previously on subjectiv-
ity classification. In [1], different concepts and methodologies on sentiment analysis
are discussed by Pawar et al., where sentiment classification can be done both at the
sentence level and document level. Bravo-Marquez et al. [2] use a combination of
meta-level features for subjectivity and sentiment classification. In [8], three differ-
ent categories of methods are used for subjectivity detection by Chaturvedi et al. The
categories used are hand-crafted, automatic, and multi-modal.
SemEval 2016 stance detection task dataset was used by Dey et al. [9] where
the labels were positive, negative, and neutral stances. A feature-driven model was
developed, where in the first phase the tweets were classified into either neutral or
non-neutral; i.e., subjectivity classification was carried out. In the second phase, the
non-neutral tweets were classified into either positive or negative; i.e., sentiment
classification was carried out.
Rashid et al. [10] is a survey paper by Rashid et al. which discusses the various
supervised, unsupervised, case-based reasoning, and machine learning techniques
used for computational treatment of sentiments in opinion mining. A semi-supervised
machine learning technique is used in [11] by Esuli and Sebastiani to determine
whether a term has a positive, neutral, or negative connotation. Zhuang et al. [12] use
a multi-knowledge-based approach which is the integration of statistical analysis,
WordNet, and movie knowledge to extract features from opinion, thus classifying
them into positive or negative.
Kim and Hovy [13] propose a system that could align the pros and cons to their
respective sentences in each review. Then, they train a maximum entropy model on
the resulting dataset and use it to extract pros and cons from review datasets which
are available online and are not labeled explicitly.
Xuan et al. [14] use the concept of features extracted from the text. They use a
movie review dataset proposed by Pang and Lee earlier. They proposed 22 syntax-
based patterns to extract the linguistic features to classify a sentence into subjective
or objective using the maximum entropy model (MaxEnt). Rustamov in [15] uses
different supervised machine learning techniques like hidden adaptive neuro-fuzzy
inference system (ANFIS), markov model (HMM), and fuzzy control system (FCS)
for subjectivity detection.
Keshavarz and Abadeh in [6] propose three metaheuristic methods, namely genetic
algorithm, simulated annealing, and asexual reproduction optimization, to create sub-
jectivity lexicons in which the words are classified into either subjective or objective.
It concluded that the proposed approaches perform better by comparing the accuracy
and f-measure to the baseline.
152 N. Das and S. Sagnika
3 Proposed Approach
We aim to prove that a dataset which has its objective sentences from each review
removed will train a classifier more efficiently. We have followed two approaches to
gain the desired results. For achieving this, we use Yelp reviews dataset. The dataset
had 10 columns, out of which we used only 2 columns which served our purpose.
The columns used were customer reviews and the star ratings given by the customers.
For our purpose, we considered star ratings 1 and 5. We had divided our overall work
into two phases to achieve our aim. Let us study it phase-wise.
Phase I—Preprocessing
Phase I is common for both the approaches. All the reviews in the Yelp reviews
dataset have a star rating from 1 to 5. Each review however may contain some
objective sentences which express mere facts and no opinion or emotions toward
the place. Our aim is to remove these objective sentences and create a new dataset
with only the subjective sentences present in each review. This process of finding out
subjectivity instances is known as subjectivity detection. To achieve this, we follow
a series of steps.
(1) Subjectivity Detection: At first, we take each review and tokenize it into a
list of sentences. We then find the subjectivity score of each sentence using an
available Python library TextBlob. The TextBlob library provides a subjectivity
score for sentences using an available function. The subjectivity score ranges
from 0 to 1. We consider 0 as an extremely objective score but that does not mean
it does not contain any trace of subjectivity. Tokenized sentences of a particular
review with subjectivity score higher than 0 are concatenated directly, whereas
those sentences with score equal to 0 have to undergo a POS tagging process,
discussed in the next step.
(2) POS: Part-of-speech (POS) is very effective in subjectivity detection. Most
researches have concluded that adjectives play a very important role in express-
ing opinions or emotions and are a good indicator of subjectivity. In [7], it is
discussed that apart from adjectives, nouns like great, brilliant and delicious,
and verbs like hate and amaze can also indicate subjectivity in a sentence. As
sentences with subjectivity score equal to 0 may have some trace of subjectivity,
and we filter out those sentences by checking the POS tags such as adjectives,
nouns, and verbs for each word in each sentence of a review. If any word is
found to possess any tag from among these three, then the sentence is assumed
to be subjective. The sentences which were filtered out are concatenated, and
we get the final review with subjective sentences.
A Subjectivity Detection-Based Approach to Sentiment Analysis 153
(3) Data Cleaning: The review which we finally obtain has to undergo a cleaning
process. Using the Beautiful Soup library, we remove any HTML markup and
tags present in the review. This reduces the noise present in the review. After
this, any non-alphabet character, for example, digits, ‘!’ and ‘)’, is removed from
the reviews. Finally, all the words are converted into lowercase and we get the
clean review. The cleaned subjective review is then updated in the new dataset.
These steps are carried out for each review present in the Yelp dataset. Also, we
do not remove the stopwords as it may lead to loss of information. Therefore,
it is better to keep the stopwords. The new dataset is used in phase two where
we work out different methods to prove that new dataset more efficiently trains
a classifier, as compared to the older one. Next, we discuss phase II.
Phase II—Feature Representation
Phase II is different for both the approaches, and we will discuss them one by one.
This phase is applied to Yelp and the newly created dataset to compare the results.
In the first approach, we use a simple bag-of-words model. As we know that Yelp
dataset and the newly created dataset from Yelp have star ratings from 1 to 5 and
have only 10,000 rows of data, we consider only those rows with star ratings 1 and 5
for phase II of the first approach of each dataset. Thus, training the model becomes
more efficient. We use CountVectorizer for phase II of our first approach of each
dataset one by one. It creates a feature vector for each of the reviews in a dataset.
Once the feature vector gets created, we fit this model to train our classifier. In this
approach, we have used multinomial naive Bayes, random forest, and support vector
machine classifiers as they suit our purpose.
In phase II of our second approach, we use the concept of distributed word vectors
which is created by the Word2Vec algorithm. The Word2Vec model was initially
published by Google in the year 2013. Word2Vec is an implementation of neural
networks that creates distributed word vectors. One of the advantages of Word2Vec
is that it performs well on unlabeled datasets. It uses the concept of placing words
with similar meaning in a single cluster. To implement Word2Vec, we need to have
gensim package installed. Although Word2Vec performs well on unlabeled dataset,
we still use the same labeled Yelp dataset and the newly created dataset to compare
our final results in the second approach. In this approach too, we consider the rows
with star ratings 1 and 5. Since Word2Vec model works on single sentences, each
one as a list of words; ultimately, what we pass on to the Word2Vec model is a list
of lists. For this purpose, we process each review one by one. One by one, each
review is tokenized into sentences and passed on for data cleaning similar to what
is already discussed in phase I. The same steps are carried out for each sentence it
receives except for one difference. Stopwords are not removed again so that we
get high quality of word vectors. Each sentence is then returned as a list of words.
The above process continues for each sentence in each review, and finally, we get
a list of sentences as a list of words. There is only one difference between the data
cleaning function of the two approaches; that is, the first approach returns sentence
as a whole, whereas the second approach returns sentences as a list of words. Finally,
the list of parsed sentences is used to train the Word2Vec model. Word2vec model
154 N. Das and S. Sagnika
trained creates a feature vector for each vocabulary word. The number of rows in
the vocabulary represents the number of words contained in the word vocabulary,
and the number of columns represents the feature vector size. Next, we proceed
to vector averaging. Since our Yelp reviews dataset and the newly created dataset
contain reviews of variable length, we work out a method that will take each word
vector and convert it into a feature set that will have the same length for each review
present in the datasets. For this, we first send the training and testing reviews to the
data cleaning method again. This method now returns each review as a list of words
unlike the previous one, where each sentence was returned as a list of words. Then,
this list of reviews as list of words is processed for vector averaging. Finally, we try
to create the average of word vectors for each review. We pass on the average of
word vectors to train our random forest or support vector machine classifier (Fig. 1).
4 Experiments
This is the section where we discuss our experiments and their results. We compare
the classification reports of both the datasets for each of the approaches.
A Subjectivity Detection-Based Approach to Sentiment Analysis 155
Data Description
As we have also discussed earlier, we used Yelp reviews dataset in this paper. Yelp
reviews are a predefined dataset which is available in Kaggle. It had 10 columns out
of which only 2 columns, namely the customer reviews and the star ratings given by
them, were used in this paper. Since the size of the dataset is small (10,000 rows) and
star ratings were on a range of 1–5, therefore we considered only star ratings 1 and 5 so
that we get better results; otherwise, it was ineffective trying to classify reviews using
5 star ratings on such a small dataset. Our aim in this paper is to prove that a dataset
containing purely subjective reviews when trained using a classifier gives better
results when compared to a dataset with mixed type of reviews. We used subjectivity
detection of each sentence of a review at a time using a Python library and TextBlob.
This gives a score between 0 and 1. Sentences with score above 0 were assumed to
be subjective, whereas sentences with score equal to 0 were considered objective.
Then, further processing of objective sentences was done. They had to undergo a POS
tagging method. Each word in the sentence was checked for tags like adjective, noun,
verb, and adverb which if found in any of the words, then the sentences were assumed
to be subjective. The final subjective sentence that we got undergoes a data cleaning
method. There we remove all the HTML markup and tags using Beautiful Soup, an
available Python package. Another library ‘re’ was used to remove any non-alphabet
characters from the review. Then using the lower function in Python, all words were
converted into lowercase. Stopwords are not removed anywhere in this paper. The
above processes are done for each review in the dataset, and we finally update the
subjective reviews in a new dataset. After this, first we use CountVectorizer which
creates word vectors for each of the reviews of both the datasets. The model is then
fit into the classifier. In this case, we use three classifiers support vector machine,
random forest, and naive Bayes and compare all three outputs for both the datasets
one by one. Both the datasets using CountVectorizer are trained on three classifiers
one at a time and show how accurately the classifiers predict the reviews into 1 and 5
star ratings. Their results are then analyzed. After CountVectorizer another approach,
we tried out Word2Vec which creates feature vector for each vocabulary word. We
then fit this model to the classifier. In this approach, random forest and support vector
machine classifiers are used for both the datasets to again predict the reviews into 1
and 5 star ratings. The final classification report is compared for both the datasets.
Evaluation
Our original dataset is the Yelp reviews dataset, and the subjective dataset is the new
dataset. Here, all the values that we mention are an average we get by running the
code for five times for each type of dataset for each classifier in each approach. Thus,
we were able to avoid overfitting and underfitting. In the first approach where we use
CountVectorizer, three classifiers, namely support vector machine, random forest,
and naive Bayes, are used for both the datasets. Naive Bayes gives an accuracy of
94.13% on the new dataset and 93.89% on the Yelp dataset using CountVectorizer.
Support vector machine gives an accuracy of 92.91% on the new dataset and 92.79%
on the Yelp dataset using CountVectorizer. Random forest classifier gives an accuracy
of 86.41% on the new dataset and 86.18% on the Yelp dataset using CountVectorizer.
156 N. Das and S. Sagnika
On the new dataset using naive Bayes algorithm, we get precision of 0.93, recall
of 0.74, and f 1-score of 0.82 for 1 star rating. For 5 star rating, precision is 0.94,
recall is 0.99, and f 1-score is 0.96. On the new dataset using support vector machine
algorithm, we get precision of 0.86, recall of 0.73, and f 1-score of 0.79 for 1 star
rating. For 5 star rating, precision is 0.94, recall is 0.97, and f 1-score is 0.96. On
the new dataset using random forest, we get precision of 0.97, recall of 0.27, and
f 1-score of 0.42 for 1 star rating. For 5 star rating, precision is 0.86, recall is 1.00,
and f 1-score is 0.92.
On the Yelp dataset using naive Bayes algorithm, we get precision of 0.92, recall
of 0.73, and f 1-score of 0.81 for 1 star rating. For 5 star rating, precision is 0.94,
recall is 0.99, and f 1-score is 0.96. On the Yelp dataset using support vector machine
algorithm, we get precision of 0.85, recall of 0.74 and f 1-score of 0.79 for 1 star
rating. For 5 star rating, precision is 0.94, recall is 0.97, and f 1-score is 0.96. On
the Yelp dataset using random forest, we get precision of 0.97, recall of 0.25, and
f 1-score of 0.40 for 1 star rating. For 5 star rating, precision is 0.86, recall is 1.00,
and f 1-score is 0.92. Accuracy is higher for naive Bayes, random forest, and support
vector machine for the new dataset. However considering f 1-scores, for new dataset
using naive Bayes it is 0.82 − 0.81 = 0.01 higher for 1 star rating. F1-score for new
dataset using random forest classifier is 0.42 − 0.40 = 0.02 higher for 1 star rating.
Other f 1-scores for the first approach are the same as can be inferred from Table 1.
Now coming to the second approach which uses Word2Vec, let us analyze its
results. On the new dataset using random forest classifier, we get precision of 0.76,
recall of 0.48, and f 1-score of 0.59 for 1 star rating. For 5 star rating, precision is
0.89, recall is 0.97, and f 1-score is 0.93. On the new dataset using support vector
machine, we get precision of 0.85, recall of 0.46, and f 1-score of 0.59 for 1 star
rating. For 5 star rating, precision is 0.89, recall is 0.98, and f 1-score is 0.93. On the
Yelp dataset using random forest classifier, we get precision of 0.80, recall of 0.44
and f 1-score of 0.56 for 1 star rating. For 5 star rating, precision is 0.88, recall is
0.97, and f 1-score is 0.93. On the Yelp dataset using support vector machine, we
get precision of 0.84, recall of 0.45, and f 1-score of 0.58 for 1 star rating. For 5 star
rating, precision is 0.89, recall is 0.98, and f 1-score is 0.93. F1-score for the new
dataset using random forest classifier is 0.59 − 0.56 = 0.03 higher for 1 star rating.
F1-score for new dataset using support vector machine is 0.59 − 0.58 = 0.01 higher
for 1 star rating. Rest f 1-scores are same for the second approach as can be inferred
from Table 2.
This paper mainly aims to prove that a dataset which has objective sentences removed
from each review gives better accuracy than its original dataset. Here, in the first
phase of both the approaches, we took out the subjectivity score for each sentence
in a review and the objective sentences had to undergo a POS tagging process where
each sentence was tokenized into words and checked for POS tags. If any word in the
Table 1 Classification report using CountVectorizer
Classifier Star ratings Using CountVectorizer
Original data Subjectivity-filtered data
Precision Recall F1-score Accuracy Precision Recall F1-score Accuracy
Naive Bayes 1 0.92 0.73 0.81 93.89 0.93 0.74 0.82 94.13
5 0.94 0.99 0.96 0.94 0.99 0.96
Support vector machine 1 0.85 0.74 0.79 92.79 0.86 0.73 0.79 92.91
5 0.94 0.97 0.96 0.94 0.97 0.96
Random forest 1 0.97 0.25 0.40 86.18 0.97 0.27 0.42 86.41
A Subjectivity Detection-Based Approach to Sentiment Analysis
sentence had certain POS tag, then the sentence was assumed to subjective. The final
subjectivity-rich reviews had to go through a data cleaning method. In the second
phase of the first approach, we used CountVectorizer model with three classifiers
naive Bayes, random forest, and support vector machines. We saw that accuracies
for the new dataset were greater than those for Yelp dataset. The accuracies for naive
Bayes, random forest, and support vector machines for new dataset were 94.13%,
86.41%, and 92.91% and for Yelp dataset were 93.89%, 86.18%, and 92.79%, respec-
tively. In the second phase of the second approach, we used Word2Vec model with
random forest classifier and support vector machine. We saw that accuracies for the
new dataset were greater than those for Yelp dataset. The accuracies for random
forest and support vector machine for new dataset were 87.70% and 88.53% and for
Yelp dataset were 87.65% and 88.29%, respectively. We aim to further improve the
subjectivity detection by applying several other techniques as discussed in subjectiv-
ity/objectivity analyzer in paper [7]. More the reviews are enriched with subjectivity,
more is the accuracy.
References
1. Pawar, A.B., M.A. Jawale, and D.N. Kyatanavar. 2016. Fundamentals of sentiment analy-
sis: Concepts and methodology. Sentiment analysis and ontology engineering, 25–48. Cham:
Springer.
2. Bravo-Marquez, F., M. Mendoza, and B. Poblete. 2014. Meta-level sentiment models for big
social data analysis. Knowledge-Based Systems 69: 86–99.
3. Liu, B. 2012. Sentiment analysis and opinion mining. Synthesis Lectures on Human Language
Technologies 5 (1): 1–167.
4. Pandey, S., S. Sagnika, and B.S.P. Mishra. 2018. A technique to handle negation in sentiment
analysis on movie reviews. In:2018 IEEE international conference on communication and
signal processing (ICCSP), 0737–0743.
5. Baldonado, M., C.-C.K. Chang, L. Gravano, and A. Paepcke. 1997. The stanford digital library
metadata architecture. International Journal on Digital Libraries 1: 108–121.
6. Keshavarz, H.R., and M. Saniee Abadeh. 2018. MHSubLex: Using metaheuristic methods for
subjectivity classification of microblogs. Journal of AI and Data Mining 6 (2): 341–353.
7. Kamal, A. 2013. Subjectivity classification using machine learning techniques for mining
feature-opinion pairs from web opinion sources. arXiv preprint arXiv:1312.6962.
8. Chaturvedi, I., E. Cambria, R.E. Welsch, and F. Herrera. 2018. Distinguishing between facts
and opinions for sentiment analysis: Survey and challenges. Information Fusion 44: 65–77.
9. Dey, K., R. Shrivastava, and S. Kaushik. 2017. Twitter stance detection—A subjectivity and
sentiment polarity inspired two-phase approach. In 2017 IEEE international conference on
data mining workshops (ICDMW), pp 365–372.
10. Rashid, A., N. Anwer, M. Iqbal, and M. Sher. 2013. A survey paper: areas, techniques and
challenges of opinion mining. International Journal of Computer Science Issues (IJCSI) 10
(6): 18–31.
11. Esuli, A., and F. Sebastiani. 2006. Determining term subjectivity and term orientation for opin-
ion mining. In 11th Conference of the European chapter of the association for computational
linguistics.
12. Zhuang, L., F. Jing, and X.Y. Zhu. 2006. Movie review mining and summarization. In Proceed-
ings of the 15th ACM international conference on Information and knowledge management,
43–50.
160 N. Das and S. Sagnika
13. Kim, S.M., and E. Hovy. 2006. Automatic identification of pro and con reasons in online
reviews. In Proceedings of the COLING/ACL on main conference poster sessions. Association
for Computational Linguistics, 483–490.
14. Xuan, H.N.T., A.C. Le, and L.M. Nguyen. 2012. Linguistic features for subjectivity classifi-
cation. In 2012 IEEE international conference on asian language processing, 17–20.
15. Rustamov, S. 2018. A hybrid system for subjectivity analysis. In Advances in fuzzy systems.
Image Processing and Machine Learning
Techniques to Detect and Classify Paddy
Leaf Diseases: A Review
Abstract In Asian countries, paddy is one of the major staple foods, and the
agricultural sector is the largest employer in India’s economy. Hence, an effective
mechanism should be adopted for food security and its production. Due to infections
caused by pests like virus, fungus and bacteria, there is a huge loss in quality and
quantity of the rice. This results in a huge loss to the farmers. This survey presents the
different image processing techniques for paddy disease identification and further
classification. The challenges involved in each step of diseases detection and clas-
sification are analyzed and discussed. Image preprocessing, segmentation, feature
extraction and classification are the main stages involved in paddy disease identifi-
cation and classification. All these four stages are well addressed here. This survey
gives a complete overview along with the challenges of each stage and also focuses
the research challenge and research gap. This survey concludes that automatic paddy
leaf disease detection, and classification requires much more advancement. There is a
need for more advanced techniques to automate the system effectively and efficiently.
1 Introduction
Agriculture has always played a vital role in the economy of the most of the devel-
oping countries, mainly in South Asian countries. Farming is not only meant to feed
the increasing population, but at the same time, it helps to handle global warm-
ing problems. Agricultural production is greatly affected by the crop diseases. The
amount of crops that are damaged by adverse climatic condition and by invasion of
pathogens can never be neglected. There are many plant diseases that present in the
crop field due to soil fertilizer, mineral deficiency, environmental agents and other
various factors. It is very important to monitor plants/crops from early stage in order
to detect and avoid the diseases. It includes so many tasks like soil preparation, seed-
ing, using manure/fertilizers, irrigation, using required amount of pesticides, timely
harvesting and storage. Improvement in crop quality can only be achieved by the
proper automation techniques of the disease detection [1]. The majority of popula-
tion depends on rice as their staple food. A vast land area in the India is mainly used
to cultivate rice crops. Farmers incur huge loss in rice production due to infection of
rice crops by diseases caused by various pathogens like fungi, bacteria and viruses. It
is very important to detect and classify such diseases at the early stage so that proper
measure can be taken. Main four paddy crop diseases are rice blast disease (RBD),
brown spot disease (BSD), narrow brown spot disease (NBSD), bacterial leaf blight
disease (BLBD), etc. Normal leaf image is shown in Fig. 1a.
Fig. 1 Image of healthy leaf a, b–e diseased leaves of rice blast, brown spot, narrow brown spot,
bacterial leaf blight, respectively
Image Processing and Machine Learning … 163
The rest of this paper is categorized into three sections: Sect. 2 presents review
of different approaches to paddy disease detection and further classification of the
diseases. Section 3 presents the discussion on research gap and focuses on rele-
vant techniques that are feasible for the existing problems. Lastly, Sect. 4 gives the
conclusion of the survey.
The complete overview of image processing techniques that are used for the paddy
disease detection, and further classification is presented in this section. Image pro-
cessing technology has been widely used and studied in agriculture to detect and
classify leaf diseases. It has shown a good potential to identify the most of the
diseases that invade the paddy crops besides some serious challenges that will be
discussed below. The four main steps (preprocessing, segmentation, feature extrac-
tion and classification) of image processing techniques have their own challenges.
In Fig. 3, the block diagram of our survey is presented. In the survey, our main
point of focus is about the limitations of each step along with the existing methods
and developments in the recent approaches. Our focus is also on the advantages and
disadvantages of the recent approaches.
2.1.1 Challenges
The preprocessing step makes it easier to detect the diseased spot of the leaf as the
diseased portion is more visible now. The main tasks of preprocessing step are:
(a) Noise affected background; (b) variation of lighting; and (c) low intensity
image. These directly influence the accuracy of the segmentation process. There are
various preprocessing methods like median filtering, green plane extraction, image
binarization, color spaces, etc.
The process of segmenting an image into several segments based on the similarity of
pixels is image segmentation. The main motive behind it is to find the infected region
in the image. Various segmentation approaches are available for detection of infected
portion in the image. There are certain challenges associated with the segmentation.
2.2.1 Challenges
Segmentation one of the most challenging step. In paddy leaf disease detection,
its major aim is to segregate the diseased part and background from the images.
Challenges of this stage are (a) color change of disease, (b) illumination problem,
(c) morphological changes in disease portion and (d) affected portion texture. These
affect the accuracy level of the disease detection and performance degrade. Some
of the techniques of segmentation are: K-means, 8-connected component analysis,
region growing segmentation technique, color-based segmentation, thresholding and
edge detection. Summary of segmentation-based techniques is presented in Table 1.
166 J. P. Singh et al.
2.3.1 Challenges
The leaf features (color, shape, size and texture) are extracted from paddy leaf to
detect the diseases. Each one has its own advantages and disadvantages. The color
features are affected by outside lights and shape, and size of different diseases may
be similar. Some of the techniques of feature extraction are: principal component
analysis, color intensity methods, wavelet transform, color space transformation,
etc.
Based on the feature extraction, the classifiers are utilized for the classification of
various diseases of the paddy leaf. There are various classification techniques such
as SVM [4], ANN [7], PNN [8], BPN [9], KNN [10], MDC [11], etc. Each one
Image Processing and Machine Learning … 167
has their specific benefits and drawbacks. The different classification techniques are
summarized in Table 2 along with their benefits and drawbacks.
The techniques involved in the image preprocessing are mainly image enhancement,
color-based transformation and noise reduction and resizing. Green plane extraction
approach based on median filtering and binarization was utilized by researchers [12].
This approach is utilized for image preprocessing phase, and features are extracted
by extracting the green color components in order to highlight the affected area of
the leaf. The intensity difference of the original gray scale image and green value is
calculated in order to detect the infected area irrespective of the brightness and leaf
age. They used median filter in order to preserve edges, to remove unnecessary spots
and for proper smoothing.
Another researcher Devi et al. [13] used CIELUV color space and median filter
for the preprocessing of the paddy leaves to detect the diseases in the leaves. Color
space is applied to manage the color specification in the specified standard. In this
work, they have used the liner mapping property of the LUV model.
Narmadha et al. [14] used histogram equalization and contrast adjustment tech-
nique for image enhancement. From the input image, occurrence matrix is gener-
ated based on the probability distribution. To remove the noise and unwanted spots,
K-means technique is applied.
Another technique is based on image normalization. Firstly, R, G, B components
are extracted, and mean, max (mean) is calculated. Now scaling factor for each color
component is calculated by dividing each extracted mean by the maximum value.
Normalized image is obtained by multiplying the R, G, B component values with
the scaling factors. For segmentation, normalized RGB image is converted to YCbCr
space [15].
168 J. P. Singh et al.
Skeletonization technique is used to extract the shape of the infected spot. For
classification of diseases, MDC and KNN classifiers are used.
Kurniawati et al. [16] used local entropy (H) threshold method and Otsu thresh-
old method to convert RGB image to the binary image. An occurrence matrix was
generated from the input image based on probability distribution ( pi ) required for
entropy measures. The local entropy is defined as:
n
H =− pi log pi (1)
i=1
ai
pi = (2)
a1 + a2 + · · · + an
3 Comparative Analysis
In this chapter, we discussed various approaches to detect the paddy leaf diseases
and based on the symptoms/features, and these diseases are classified by the use
of efficient classifiers. The preprocessing step is one of the most important steps
in the disease detection. Filtering, image enhancement, color space transformation
and image cropping are the main components of this step. Various segmentation
techniques like K-means clustering, thresholding and edge detection are used for the
segmentation process. The segmentation error generally occurs due to uneven illu-
mination of the captured images. The Otsu method is often affected by this problem.
The main issue with the feature extraction is that it often extracts inaccurately the
type of lesions. Based on the features of the diseases, they are classified using various
classifiers like SVM, ANN, KNN/PNN classifiers.
The comparison of various approaches is presented in Table 3, based on which
we can conclude that histogram equalization, median filtering, Euclidean distance-
based approaches, color-based approaches, green plane extraction-based techniques
and gray scaling-based approaches perform well on all major paddy diseases. It has
been found that K-means clustering and area thresholding performs well if the input
image is not a complex and affected region is quite visible. The feature extraction is
an important factor based on which diseases can be classified. It has been observed
that support vector machine (SVM) and neural network plays an important role in
disease classification with maximum accuracy level. It has also been observed that
feature reduction and feature selection methods are not utilized which are very much
important for the outperformance of the classification techniques.
In order to minimize the size of the extracted features also, there is need to have
an effective feature reduction and feature selection techniques. Also, in segmenta-
tion step to identify infected portions, simple techniques like K-means, thresholding
are used that cannot handle when an image is complex or if it has a high level of
brightness.
4 Conclusion
Through this survey, we have discussed various image processing and machine learn-
ing techniques to detect and classify paddy diseases. All the four steps that are
involved to detect and classify diseases are well addressed along with the challenges
associated with each step. Comparison of the techniques is done based on the accu-
racy level. We conclude that the accuracy of segmentation stage is directly influenced
by the preprocessing stage. Also, it has been found that the most suitable technique
for segmentation of infected leaves is K-means. Also, it is concluded that SVM and
NN utilize texture features. These techniques are very much effective for the detec-
tion and classification of paddy leaf diseases. The above-mentioned techniques need
to be implemented and validated with existing datasets. A new paddy leaf disease
170 J. P. Singh et al.
Table 3 (continued)
References Year Diseases Techniques Efficiency Results (%)
measures
Ghosh and 2013 Anthracnose Color space Accuracy 96
Dubey [16] conversion,
clustering
(K-means),
gray-level
co-occurrence
matrix, SVM
Singh et al. 2015 Leaf color chart Accuracy 100
[17] technique,
color space
conversion,
hybrid model
of cropping,
histogram,
Chi-square
goodness of fit
test,
Chaudhary 2012 Citrus canker Color
et al. [19] transformation,
edge detection,
ANN
Al Bashish 2011 Frogeye leaf Median filter,
et al. [20] spot, brown color space
spot, blueberry conversion,
leaf spot Otsu method
detection technique can be proposed including effective feature reduction and feature
selection techniques.
References
1. Gutte, V.S., and M.A. Gitte. 2016. A survey on recognition of plant disease with help of
algorithm. International Journal of Engineering Science 7100.
2. Scardaci, S.C., R.K. Webster, C.A. Greer, J.E. Hill, J.F. Williams, R.G. Mutters, and J.J. Oster.
1997. Rice blast: A new disease in California. Agronomy Fact Sheet Series 2: 1–2.
3. Kobayashi, T., E. Kanda, K. Kitada, K. Ishiguro, and Y. Torigoe. 2001. Detection of rice panicle
blast with multispectral radiometer and the potential of using airborne multispectral scanners.
Phytopathology 91 (3): 316–323.
4. Hearst, M.A., S.T. Dumais, E. Osuna, J. Platt, and B. Scholkopf. 1998. Support vector machines.
IEEE Intelligent Systems and their applications 13 (4): 18–28.
5. Gonzalez, C.R., E.R. Woods, and L.S. Eddins (eds.). 2001. Digital image processing. Pearson
Education.
6. Zhang, M. 2009. Bilateral filter in image processing. Master’s Thesis, Louisiana State
University, USA.
172 J. P. Singh et al.
7. Schalkoff, R.J. 1997. Artificial neural networks, vol. 1. New York: McGraw-Hill.
8. Specht, D.F. 1990. Probabilistic neural networks. Neural networks 3 (1): 109–118.
9. Buscema, M. 1998. Back propagation neural networks. Substance Use and Misuse 33 (2):
233–270.
10. Mucherino, A., P.J. Papajorgji, and P.M. Pardalos. 2009. K-nearest neighbor classification.
Data Mining in Agriculture, 83–106. New York: Springer.
11. Hodgson, M.E. 1988. Reducing the computational requirements of the minimum-distance
classifier. Remote Sensing of Environment 25 (1): 117–128.
12. Suman, T., and T. Dhruvakumar. 2015. Classification of paddy leaf diseases using shape and
color features. International Journal of Electronics and Electrical Engineering 7 (01): 239–250.
13. Devi, D.A., and K. Muthukannan. 2014. Analysis of segmentation scheme for diseased
rice leaves. In International conference on advanced communication control and computing
technologies (ICACCCT), May 2014, 1374–1378. IEEE.
14. Narmadha, R.P., and G. Arulvadivu. 2017. Detection and measurement of paddy leaf disease
symptoms using image processing. In International conference on computer communication
and informatics (ICCCI), January 2017, 1–4. IEEE.
15. Joshi, A.A., and B.D. Jadhav. 2016. Monitoring and controlling rice diseases using image
processing techniques. In International conference on computing, analytics and security trends
(CAST), December 2016, 471–476. IEEE.
16. Ghosh, S., and S.K. Dubey. 2013. Comparative analysis of k-means and fuzzy c-means
algorithms. International Journal of Advanced Computer Science and Applications 4(4).
17. Singh, A., and M.L. Singh. 2015. Automated color prediction of paddy crop leaf using image
processing. In International conference on technological innovation in ICT for Agriculture and
Rural Development (TIAR), July 2015, 24–32. IEEE.
18. Kular, J.S. 2014. Package of practices for Kharif Crops of Punjab. Punjab Agriculture
University Ludhiana. https://www.pau.edu/. Accessed December 25, 2014.
19. Chaudhary, P., A.K. Chaudhari, A.N. Cheeran, and S. Godara. 2012. Color transform based
approach for disease spot detection on plant leaf. International Journal of Computer Science
and Telecommunications 3 (6): 65–70.
20. Al Bashish, D., M. Braik, and S. Bani Ahmad. 2011. Detection and classification of leaf diseases
using K-means-based segmentation and Neural- networks-based Classification. Information
Technology Journal 10 (2): 267–275.
A Comparative Study of Classifiers
for Extractive Text Summarization
1 Introduction
A compact version of the original text which produces the same concept as the
original document is known as summary. In 1958, Luhn introduces the concept of
abstract generation out of text data. That gives boost to the idea of automatic text
summarization [1, 2] (ATS). In recent years, high availability of text data helps in
growth of natural language processing, especially in the field of text summarization.
Summarization can be of two categories, such as abstractive and extractive. Extrac-
tive text summarization is one of the oldest and widely used approaches among
researchers. The idea behind extractive summarization is to extract the sentences
from the document in a way that those sentences will represent the core idea of the
document [2]. Extractive summary is the basic summarization technique. Sentences
are selected in the basis of some scores and ranks. Scoring and ranking of sentences
are done by feature mapping and selection. Features can be of different types such
as frequency-based and prediction-based. Frequency-based features are more widely
used in extractive summarization. Summarized data can be helpful in various fields.
A summarized document helps in understanding the whole document in less amount
of time. One can find relevant document from a query search more faster by going
through summaries. In case of scientific data or medical data, one can easily produce
a report through the summaries. Summaries can help in creating proper indexing of
multiple documents in much lesser time. Several methods have been implemented
over the years for creating improved summaries. Machine learning is one of the best
suited methods for resolving classification and clustering-based problem nowadays
according to the state-of-the-art approaches on different fields. This work presents
a technique for extractive summarization as a classification problem instead of an
information extraction problem. Extractive summary has been considered as a two-
class problem where each sentences can be considered as either 0 or 1. If the sentence
is not selected for the summary, then it is classified as 0 else 1. Different machine
learning techniques have been implemented on the same dataset, and comparisons
have been made.
Rest of the paper is organized as follows: Sect. 2 represents related works, Sect. 3
explains the proposed method, Sect. 4 presents experimental setup and result analysis.
Finally, we have concluded our work in Sect. 5.
2 Related Works
3 Proposed Method
where tf is term frequency, N is for number of total document, and df stands for
document frequency. Higher the tf/idf value rarer the term in the document and has
more weight for term occurrence in summary section. In the ML modeling phase,
every sentence of the document is first trained with the model. The labeled data,
176 A. Pattanaik et al.
i.e., the human generated extractive summaries are the extracted sentences from the
original text. It can be considered as two class problem with class value 0 and 1.
The value 0 refers to the sentences those are not in the summary where as, value 1
presents sentences available in human generated summary.
In case of machine learning modeling, K-nearest neighbor (KNN), random forest,
support vector machine, multilayer perceptron, decision tree and logistic regression
models have been implemented. KNN algorithm works on grouping approach by
considering the k-nearest majority voting. In case of support vector machines, it
handles large dimensions more precisely and also removes irrelevant features easily
as explained by Joachims [5] in 1998. Decision tree classifies categorical values more
accurately. Multilayer perceptron is a feed forward network which trains through back
propagation.
Generating extractive summary from the text data is a high computing task when it
involves machine learning algorithm. Table 1 elaborates about every specification
that author have taken into consideration. Newsroom [9] is one of the widely used
dataset for text summarization. This dataset consists 1.3 million news articles and
associated summaries by various authors and editors. Extractive summary articles
are taken into account, and experiment has been done on these data as explained in
Sect. 3. In this dataset, extractive, abstractive and mixed categories of summaries
are available in different volumes. In this work, only extractive summaries are taken
into account. K-nearest neighbor, random forest, support vector machine, multilayer
A Comparative Study of Classifiers for Extractive … 177
perceptron, decision tree and logistic regression classifiers have been implemented.
In Table 1 experimental setup, software used and models used are given for better
understanding of the work.
In K-nearest neighbor, ten neighbors are taken into consideration with uniform
weights. In random forest 1000, 10,000 n_estimators are taken into consideration
with ‘gini’ criterion for splitting quality of the data is used. In decision tree, “en-
tropy” is used as criterion function. In support vector machine, radial basis function
is used as kernel function which focuses on worst case class splitting. Multilayer per-
ceptron classifier is used with 100 hidden layers, having RELU activation function
with stochastic gradient-based optimizer for the back propagation and weight updat-
ing solver. A classification problem analyzes by its classification report [10] which
includes precision, recall and F1-score values. Precision is the score which indicates
what percent of your prediction was correct. Recall indicates how much positive
prediction is done by the classifier. F1-score can be calculated by taking weighted
harmonic mean of recall and precision. Table 2 illustrates classification scores for
each class. 0 indicates the sentences are not in the summary, and 1 indicates sentences
are in the summary.
True Negative (TN): originally negative and predicted negative
True Positive (TP): originally positive and predicted positive
False Negative (FN): originally positive but predicted negative
False Positive (FP): originally negative but predicted positive
TP
Precision = (2)
(TP + FP)
TP
Recall = (3)
(TP + FN)
2 ∗ Precision ∗ Recall
F1-Score = (4)
(Precision + Recall)
178 A. Pattanaik et al.
In this paper, six different types of classifiers were being examined. Multilayer
perceptron classifier seems to perform better in terms of precision, recall and f1-score
for class 1. Multilayer perceptron has the advantage of hidden layer which provides
some meta-data to the classifier, and the back propagation mechanism keeps the
classifier more accurate. In the following tables, i.e., from Tables 3, 4, 5, 6, 7 and 8,
confusion matrix values for all the six classifiers. Confusion matrix gives the brief
idea about how much data predicted correctly.
Receiver operating characteristic (ROC) [11] curve is a widely used curve to
analyze the performance of classifiers. ROC is a curve between true positive rate
and false positive rate of any classifier. The curve having more area under the curve
(AUC) is better than others. In Fig. 2, ROC curve of different classifiers is given.
Multilayer perceptron classifiers cover more area under the curve as compared to
other classifiers. KNN, random forest, logistic regression and support vector machine
are having same AUC value.
In this paper, the authors considered the extractive text summarization problem as a
classification problem. A document is classified under a two-class problem. Class 0
indicates that the sentences in the document are not considered as summary sentences.
Class 1 is the class of summary sentences. TF-IDF, keywords and sentences length are
taken into consideration for evaluation. K-nearest neighbor, random forest, support
vector machine, multilayer perceptron, decision tree and logistic regression classifiers
have been implemented on Newsroom dataset. The experimental analysis of the
algorithms with precision, recall, f1-score and confusion matrix of all the classifier
is mentioned in Tables 2, 3, 4, 5, 6, 7 and 8. Confusion matrix is calculated over
the test data, which indicates the polarity of data according to the classifiers. The
ROC curve is also plotted. All the analyses state that although above classifiers do
not satisfy the goal more accurately, and out of them multilayer perceptron classifier
gives better result. MLP has hidden layers and back propagation principle which
180 A. Pattanaik et al.
enhances the quality of the classifier. In this work, MLP provides 72% accuracy.
This work can be considered for future work by modifying different parameters and
adding more features to the input vectors.
References
1. Luhn, H.P. 1958. The automatic creation of literature abstracts. IBM Journal of Research and
Development 2 (2): 159–165.
2. Gambhir, M., and V. Gupta. 2017. Recent automatic text summarization techniques: A survey.
Artificial Intelligence Review 47 (1): 1–66.
3. Meena, Y.K., and D. Gopalani. 2014. Analysis of sentence scoring methods for extractive auto-
matic text summarization. In Proceedings of the 2014 international conference on information
and communication technology for competitive strategies, November 2014, 53. ACM.
4. Pattanaik, A., S. Sagnika, M. Das, and B.S.P. Mishra. 2019. Extractive summary: An optimiza-
tion approach using bat algorithm. Ambient communications and computer systems, 175–186.
Singapore: Springer.
5. Joachims, T. 1998. Text categorization with support vector machines: Learning with many
relevant features. In European conference on machine learning, April 1998, 137–142. Springer,
Berlin, Heidelberg.
6. Nobata, C., S. Sekine, M. Murata, K. Uchimoto, M. Utiyama, H., and Isahara. 2001. Sentence
extraction system assembling multiple evidence. In NTCIR.
A Comparative Study of Classifiers for Extractive … 181
7. Jafari, M., J. Wang, Y. Qin, M. Gheisari, A.S. Shahabi, and X. Tao. 2016. Automatic text
summarization using fuzzy inference. In 22nd International conference on automation and
computing (ICAC), September 2016, 256–260. IEEE.
8. Matsuo, Y., and M. Ishizuka. 2004. Keyword extraction from a single document using word
co-occurrence statistical information. International Journal on Artificial Intelligence Tools 13
(01): 157–169.
9. NewsRoom Dataset Available (2017) Cornell Newsroom. https://summari.es. 2017.
10. Powers, D.M. 2011. Evaluation: From precision, recall and F-measure to ROC, informedness,
markedness and correlation.
11. Davis, J., and M. Goadrich. 2006. The relationship between precision-recall and ROC curves.
In Proceedings of the 23rd international conference on machine learning, June 2006, 233–240.
ACM.
Capabilities of Chatbots and Its
Performance Enhancements in Machine
Learning
1 Introduction
An interested customer wants to know more about a certain company products. The
customer care will be called by him, or a few sites will be surfed by him to get the
information, which is very tedious, and all the required information may not be gotten
by him which is looking by him. Another scenario can be a situation where an internal
business person is trying to reach IT through any channel, for example, by raising an
incident/ticket for any technical issue that he might be facing. Since all this is manual
M. P. Nath (B)
Cognizant Technology Solutions India Pvt. Ltd., Manyata Embassy Business Park,
Bangalore, Karnataka, India
e-mail: mahendranath272@gmail.com
S. Sagnika
School of Computer Engineering, Kalinga Institute of Industrial Technology (Deemed to be
University), Bhubaneswar, Odisha, India
e-mail: santwana.sagnika@gmail.com
i. Messenger (Client)
ii. API.AI (NLP)
iii. Java Service
iv. APIs from respective applications.
In the above technical architecture diagram, vAssist acts as service registry. Ser-
vice registry simply works as a phonebook. It lets clients/end users search for different
services using their logical names. There are a number of service registries avail-
able these days. But out of all those service registries, Netflix’s Eureka, HashiCorp
Consul, and Apache Zookeeper are more popular. Spring Cloud’s DiscoveryClient
abstraction offers a handy client-side API to work with service registries. A major
benefit of using a service registry is load balancing on the client side. This factor
allows clients to select from among the registered instances of a service. Ribbon
is an efficient load balancer for the client side by Netflix. Spring Cloud integrates
Ribbon and applies it automatically at various layers of the framework, be it is the
RestTemplate, declarative REST clients which are powered by Netflix’s Feign, or
the Zuul microproxy.
Messenger (client):
Messenger can be any company wide messenger or a social messenger like Skype
for Business, Facebook, Slack, etc. [4, 5].
Components of Messenger (Skype for Business):
We are using Skype for Business as messenger, and below is the high-level
architecture of the same.
i. Skype server
ii. Application server
iii. Windows service (Fig. 2).
Skype server:
A trusted end point has to be created on Skype server, and it can reside on multiple
nodes.
Application server:
A trusted application has to be created and mounted on application server. A
trusted channel is to be established between end point and application to keep the
data in sync.
Windows service:
Windows service comprises UCMA and collaboration platform.
1. UCMA is a Skype plug-in to enable chatbot.
2. Collaboration platform is used to maintain the status of the application user as
online always.
186 M. P. Nath and S. Sagnika
Links given in references section can be used for Skype for Business implemen-
tation.
API.AI (Dialogueflow.com)
API.AI is a platform to build interfaces that facilitate conversation between bots,
devices, and applications. This provides input and output methods and can react to
actionable data. An optional web hook can be implemented for connection to web
services. The web service becomes able to call external APIs, performs business
logic, or accesses data stores [3, 4].
API.AI accepts input data in the form of a query. A query is in the form of
either an event name or text in natural language. API.AI converts the query text into
actionable data and then returns a JSON response object as output data. This pro-
cess of transforming natural language into actionable data is called natural language
understanding (NLU).
Terminology in API.AI (Dialogueflow.com)
a. Intents:
Intent is a representation of a mapping between the statement of a user and action
needed to be taken by the software. Standard intent interfaces consist of the following
sections:
• User says
• Action
• Response
• Contexts.
Capabilities of Chatbots and Its Performance … 187
b. Action:
An action refers to what the application will do after identifying the intent based
on user input.
c. Parameters:
Parameters can be identified as the input values extracted from the user text, for
the action.
d. Context:
Context is used to make system remember topic of conversation in order to link
multiple intents. Each intent can be linked with other using input and output contexts.
e. Agents:
Grouping of different context and intents. Allows you to set language, time zone,
theme, and other settings for your bot.
f. Talent Manager (vAssist):
Talent Manager talks to API.AI to get the output JSON that is generated with
all the information related to the question user has asked along with intent, context,
parameters and transforms into more readable format. It maintains the registry with
a list of Java Services for various applications. After reading the intent, it calls
appropriate Java Services according to the mapping for further action. This also
receives the output from the Java Service and passes it to the messenger as a response
to the user via API.AI. The Talent Manager setup can be skipped, and messenger can
directly interact with API.AI to call the Java Service; however, having the repository
of java services in internal layer ensures data integrity.
g. Java Service:
The Java Service determines which application to call based on the intent and
the inputs that are received from API.AI. Each application can have a different JS
and can be configured to be called accordingly. It reads the received JSON file and
parses the values to route it to the appropriate API. This also reads the output that
the application API sends and returns it to the messenger.
Application Layer:
Code residing on the application layer performs the task and returns a response
to the java service (Fig. 3).
The most well-known thing in this day and age is that you will almost certainly hear
an alternate prologue to the specialized field on regular routine. The greater part of
this innovation is encompassing information as it were. It is an extensive field. It
188 M. P. Nath and S. Sagnika
In any case, this framework is a touch of the model interface, yet there is an other
way like model arranging part that is utilized to pick every single movement. We
ought to just use right word, i.e., y_t. The basic spotlight is on getting the benefit
next word each time. We can even imagine the join word by giving a prefix. After
a short time, nothing is impeccable even in the headway world. The thing has some
hindrance that makes human to continue refreshing and altering them. The basic
issues in the generative models are as referenced underneath:
• It gives a standard reaction. For example, the likelihood of getting “yes”, “no”,
“okay”, and so on at a wide stage is conceivable. In any case, there are individuals
that are attempting to change this. The fundamental issue to regulate is to change
the target furthest reaches of the model interface. Something else is that it is
essential to display the fake metric that will be utilized as a reward.
• Something different that can be seen is a conflicting reaction to the solidified point
of view. The originators are attempting to overhaul the model with a further made
encoding state that can cause an answer and that to can no ifs, ands, or buts do
speaker embeddings.
After we are finished with the generative model, now we have to comprehend
what selective neural conversational model is and in what manner may it work. The
specific model works on clear breaking point, for example, answer, setting, and word
(w). There are, on an exceptionally fundamental level, two towers each distributive
to “setting” and “answer.” These two have specific structures. The working is funda-
mental, one zenith takes in the information and the count is done in the vector space.
The fitting response that sounds coherently immaculate as shown by the setting is
the picked answer.
This all functions with the triplet hardship, for example, setting, reply_correct, and
reply_wrong. The fitting reaction wrong is basically utilized for the negative model
which is only some answer in the most phenomenal likelihood answer. Through
this, we get the arranging of cutoff that is not much educational. The best possible
response with the best score will be picked as the significant answer.
Intent- versus Flow-Based Chatbot Communication
Intent-Based Bot
An intent-based communication implies that the bot will comprehend the client
questions by consolidating two information sources—state and context. State alludes
to assessing the visit history, while context alludes to investigating contributions from
outer information focuses. For example, if a client says “Book a Dr’s. appointment,”
the bot can take logical data like area and state data like talk history in regard to
conceivable sickness conditions to recommend fitting specialists.
Intent-based bots unravel client inquiries with respect to a coordinated premise.
With each addressed inquiry, it adjusts to the client conduct. The degree of insight
increments as these bots gets more information. Mainstream instances of intent-based
bots are Google Assistant, Siri, Amazon Alexa, and so forth [7].
Flow-Based Bot
A flow-based bot centers with respect to settling client questions such that it moves
them bit by bit to an answer. The stream is plainly characterized remembering the
Capabilities of Chatbots and Its Performance … 191
4 Benefits of Chatbots
5 Drawbacks of Chatbots
1. Chatbots are not suitable for all business. These are suitable for predictable
and stereotyped conversation such as pizza delivery service, taxi retails, and
travel companies. But in Business-to-Business (B2B) companies, direct client
interaction needed; there chatbots do not work with 100% efficiency.
2. Chatbots have limited audience.
192 M. P. Nath and S. Sagnika
3. Sometimes chatbots are confused with user’s request. As they are not human
beings, they cannot improvise with information when it is needed.
6 Conclusion
Artificial intelligence and machine learning are two confounding terms. Man-made
brainpower is the investigation of preparing machine to impersonate or recreate
human assignment. A researcher can utilize distinctive techniques to prepare a
machine. Toward the start of the AI’s ages, software engineers composed hardcoded
programs, that is, type each coherent probability the machine can face and how to
react. At the point when a framework develops complex, it ends up hard to deal with
the tenets. To conquer this issue, the machine can utilize information to figure out
how to deal with every one of the circumstances from a given situation. The most
vital highlights to have a great AI are to have enough information with extensive
heterogeneity. For instance, a machine can learn diverse dialects as long as it has
enough words to gain from.
References
Abstract The H-1B is a visa that allows US employers to employ foreign workers
in specialty occupations. The number of H-1B visa applicants is growing drastically.
Due to a heavy increment in the number of applications, the lottery system has been
introduced, since only a certain number of visas can be issued every year. But, before
a Labor Condition Application (LCA) enters the lottery pool, it has to be approved by
the US Department of Labor (DOL). The approval or denial of this visa application
depends on a number of factors such as salary, work location, full-time employment,
etc. The purpose of this research is to predict the outcome of an applicant’s H-1B
visa application using artificial neural networks and to compare the results with other
machine learning approaches.
1 Introduction
The H-1B is a work visa granted by the USA which allows an individual to temporar-
ily work at an employer in a specialty occupation. According to the Immigration Act
of 1990, a total of 65,000 foreign nationals may be issued a visa each fiscal year.
However, an additional 20,000 H-1Bs are available to foreign nationals holding a
master’s or a higher degree from US universities. An individual can stay in the USA
on an H-1B visa for a period of 3 years which is extendable up to six years. Although
the final outcome of the visa status (approved or denied) is based on the lottery
system, the LCA is scrutinized by the US DOL prior to entering the lottery pool.
A number of factors such as the salary, job location, and job designation are taken
into account by the DOL to determine the approval or denial of the application. The
application is forwarded to the lottery pool once it has been approved by the DOL.
If denied, the applicant has to reapply before his/her current H-1B visa term expires.
At present, there is no such model which predicts the outcome of H-1B visa appli-
cation. The applicant applies through a standard process and waits for the decision.
If the visa is denied, then the applicant has to reapply, which is in a way inefficient
for the applicant. It is therefore essential to have an automated tool that could give
insights to the applicant prior to applying.
Machine learning (ML)-based approaches are becoming popular for prediction
problems in different domains including health care, banking, security, computer
vision, and natural language processing. Artificial neural networks (ANN), which
are inspired by human brain, are a state-of-the-art ML technique to find patterns in
data and make predictions. Here, an automated system is proposed using ANN to
predict the outcome of H-1B visa application by taking into account multiple factors
such as salary, work location, and job designation. This will help the applicant to
determine his/her chances of approval. This model has been compared with other
ML algorithms to understand the overall efficiency.
2 Literature Review
A large number of changes are taking place while recruiting new employees due to
the restrictions imposed by H-1B visa. For instance, Monica [1] has presented the
changes in recruiting tracts because of the changes in H-1B policies. H-1B visa has
played a key role in hiring international talent. Dreher et al. [2] have showed how
the USA has managed to improve its post-secondary education by hiring top faculty
from overseas. This shows how important an H-1B visa is to the employers.
The approval of H-1B visa application is decided on several parameters. In the
current scenario, the salary parameter is extremely important due to some of the
changes, which have been proposed by the administration, and as a result, it is crucial
to understand the salary trends of the H-1B workers. Interesting salary versus case
status trends have been presented by Lin et al. [3]. Lin et al. have used decision tree
and K-means clustering to analyze the attributes of foreign workers such as salary
and work location. They have presented a model which shows a picture of different
approval rates compared with different conditions based on the data. It is evident
how salary changes from one state to another from the salary versus job location
trends presented by Renchi Liu et al. These trends can be useful not just from the
application point of view but also prior to finding a job. For example, a trend shows
that people working in California are earning quite higher than any other state. This
trend can be helpful for people who are looking for the highest paid job. Interesting
trends presented by Lin et al. [3] helped us in identifying the effect of parameters
across the dataset.
A Deep Learning-Based Approach for Predicting … 195
3 Problem
Given the attributes of an applicant as input, determine the likelihood of H-1B visa
approval.
3.2 Approach
The dataset was acquired through Kaggle [4]. The dataset has about 3 million entries.
However, we truncated this dataset to a size of 1 million for testing our prototype.
The name and description of each column are given below:
The dataset we have used had data for the year 2016 only. Therefore, we decided to
remove the YEAR attribute. We are trying to predict the likelihood of approval based
on the geographical area the individual works in. The exact location (longitude and
latitude) may adversely affect the generality of the algorithm. Therefore, we decided
to remove the same and keep WORKSITE only in the dataset. The resulting dataset
used for training and testing had the following attributes: EMPLOYER_NAME,
SOC_NAME, JOB_TITLE, FULL_TIME_POSITION, PREVAILING_WAGE,
WORKSITE, and CASE_STATUS.
The only numeric attribute in the dataset was PREVAILING WAGE. We used box-
plot to detect the outliers and removed the same using Tukey’s 1.5*IQR method
[5].
The original dataset had four possible values for the attribute CASE_STATUS, which
were CERTIFIED, DENIED, WITHDRAWN, and CERTIFIED-WITHDRAWN.
We removed WITHDRAWN and CERTIFIED-WITHDRAWN rows from the
dataset because it solely depends on the applicant. In addition, the original dataset
was highly skewed. The number of certified rows was very less compared to that of
denied. Therefore, we randomly selected rows from the dataset such that the resulting
dataset had equal number of certified and denied rows to avoid bias.
Except PREVAILING WAGE, all other attributes were in textual format. To feed
them to the neural networks, it was necessary to convert them into numeric for-
mat. FULL_TIME_POSITION column had only two possible values: Y and N.
We replaced Y with 1 and N with 0. CASE_STATUS column also had only
two possible values after removing withdrawn cases in the previous step. So we
replaced CERTIFIED with 1 and DENIED with 0. The remaining four columns
EMPLOYER_NAME, SOC_NAME, JOB_TITLE, and WORKSITE were textual.
To get the numeric representation of these columns, we used one-hot encoding tech-
nique implemented in Keras library. The pictorial representation of the one-hot
A Deep Learning-Based Approach for Predicting … 197
Fig. 1 Attributes chosen as input are on the left side of the figure. Each of these attributes is pre-
processed separately. Some of them are one-hot encoded and others are kept as they were. Finally,
these are clubbed together to get the final attributes which will be fed to the model
encoding is shown in Fig. 1. Left side of the figure shows six columns. Four out
of the six columns are then one-hot encoded individually. All the resultant one-hot
encoded columns are merged to produce the final feature set of size 16,978.
4.5 Normalization
After one-hot encoding, all attributes, except prevailing wage, had values 0 or 1.
Therefore, we normalized prevailing wage using min-max normalization to scale its
value in the range [0, 1]
198 A. Dombe et al.
x − min{X }
y=
max{X } − min{X }
x An attribute value
X An array of all possible values for attribute x
y Normalized value of attribute x
5 Experiments
train. The fact that it outputs 0 for negative values leads to sparsity, which is often
desirable in representation learning. Because of the piecewise linearity, there is no
gradient vanishing effect [6].
We applied multi-layer perceptrons (MLPs) to our problem to predict the case
status of a visa application. The ANN architecture we used is shown in Fig. 3.
The input to the neural network is the one-hot encoded data generated after pre-
processing. The input layer is of size 16,978. It has four hidden layers having neurons
512, 128, 64, and 32, respectively, of ReLU type and an output layer containing single
neuron of sigmoid type. Visa status prediction being a binary classification problem,
sigmoid was the suitable choice for the output layer. The output neuron gives a
probability value of the case status. The output probability is then rounded off to get
the final output, i.e., if probability is 0.5 or <0.5 then it becomes 0, which means
DENIED and if probability is >0.5, then it becomes 1, which means CERTIFIED.
Since this is a binary classification problem, we used binary cross-entropy function.
Fig. 3 Architecture of the neural network used. The one-hot encoded input attributes are shown
on the left side of the figure. Hidden layers of 512, 128, 64, and 32 neurons are used followed by
an output layer consisting of single neuron which gives the prediction
200 A. Dombe et al.
For training the ANN model, we experimented with Stochastic Gradient Descent
and Adam [7] optimizers implemented in Keras library and chose Adam since it was
converging faster.
Other ML Algorithms:
We compared the outcome of ANN with other standard ML algorithms like logistic
regression, Gaussian Naïve Bayes, and random forest and found the prediction from
ANN to be more accurate. The results are shown in Fig. 4 and Table 1.
The entire dataset was split into training and testing in the ratio 80:20. For ANN,
20% of the training dataset was separated and used as validation data during training.
The ANN model was trained on the training dataset for multiple epochs. After each
epoch, the validation dataset was fed to the model to validate if the model is being
trained properly or not. The training was stopped when the accuracy on the validation
dataset stopped improving considerably. The trained ANN model was then used on
the testing dataset for verification.
A Deep Learning-Based Approach for Predicting … 201
Fig. 5 ROC curve for ANN (on left) and for logistic regression (on right)
Fig. 6 Confusion matrix and F1 score for ANN (on left) and for logistic regression (on right)
5.2 Results
Figure 4 shows accuracies obtained for different algorithms. The ROC plots for the
logistic regression and ANN models are shown in Fig. 5. The area under the curve
(AUC) value is 0.9629 for ANN and 0.9611 for logistic regression model. Figure 6
shows the confusion matrix, precision–recall score, and F1 score. It is clear from
Fig. 6 that the false-positive value is 16 for ANN and 68 for logistic regression
model.
6 Future Scope
In this paper, we are just predicting whether the applicant’s H-1B visa application
will be approved or not. Also, we trained the model on a truncated dataset to test
our prototype. However, in future we will be using the entire dataset and will be
creating a recommendation system that will guide the applicant in the right direction.
The recommendation system will generate recommendations if the prediction of the
outcome of H-1B visa for an applicant is DENIED and will suggest improvisations
in the applicant’s profile to increase the chances of H-1B visa approval.
202 A. Dombe et al.
References
1. Boyd, Monica. 2014. Recruiting high skill labour in North America: Policies, outcomes and
futures. International Migration 52 (3): 40–54.
2. Dreher, Axel, and Panu Poutvaara. 2005. Student flows and migration: An empirical analysis.
3. Jing-Lin. H-1B Visa Data Analysis and Prediction by using K-means Clustering and Decision
Tree Algorithms. [Online] Available: https://github.com/Jinglin-LI/H1B-Visa-Prediction-by-
Machine-Learning-Algorithm/blob/master/H1B%20Prediction%20Research%20Report.pdf.
4. H-1B Visa Petitions 2011–2016—Kaggle. [Online]. Available: https://www.kaggle.com/
nsharan/h-1b-visa/data. Accessed October 20, 2017.
5. Seo, Songwon. 2006. A review and comparison of methods for detecting outliers in univariate
data sets. Master’s Thesis, University of Pittsburgh (Unpublished).
6. Glorot, Xavier, Antoine Bordes and Y. Bengio. 2010. Deep sparse rectifier neural networks.
Journal of Machine Learning Research 15.
7. Kingma, Diederik P., Ba Adam Jimmy. A method for stochastic optimization.
Data Leakage Detection and Prevention:
Review and Research Directions
1 Introduction
In the present digital era, organizations are becoming more and more data-driven.
Data and information are one of the most important assets for any data-driven enter-
prise. Data-driven decision making helps organizations manage the business oper-
ations well while keeping the customers and other stakeholders satisfied. Effective
management practices in terms of data capture, processing, storage, usage and pro-
tection of data are indispensable for such organizations. In particular, the protection
of data from loss and leakage is paramount to organizational policies and practices
in order to stay ahead of business competition.
Data leakage is defined as the accidental or intentional distribution and disclosure
of confidential or sensitive data to an unauthorized party. It may be caused by internal
or external entities to the organization. Sensitive data of an organization may include
financial data, corporate strategy information, trade secrets, intellectual property,
data about future projects, personal data of customers and employees such as patient
records, credit card data, biometric data and many such data depending upon the
business and industry. Similarly, sensitive data for a government may involve data
about internal security and law enforcement, military and defense secrets, relation-
ships and transactions with political parties, confidential diplomatic engagements,
etc.
While distribution and sharing of data are a necessary requirement for business
operations, leakage of sensitive or confidential data results serious consequences such
as heavy financial loss, damage of reputation and credibility, regulatory penalties,
decrease of company share price and likes. According to Data Breach QuickView
Report [1], the year 2015 reported all-time high 3930 incidents of data breach expos-
ing 736 million records. The highest number of incidents accounted for the business
sector was 47.2%, followed by education (13.9%), government (12.2%) and medical
(6.8%). All other sectors combined were only 19.9% of the reported incidents. Sev-
eral other reports and studies reveal that incidence of data leakage is frequent and
significant in several organizations [2]. As per IBM Security’s 2019 Data Breach
Report [3], the global average cost of data breach is estimated to USD 3.92 mil-
lion, and the most expansive sector health care accounts for USD 6.45 million. India
accounts for USD 1.8 million as the total average cost of data breach with the highest
cost in industrial sector estimated at USD 2.7 million.
Data leakage is a serious and growing security concern for every organization
which creates a pressing need for leakage detection and prevention of sensitive data
and information. Consequently, it drives ample research attentions toward develop-
ment of effective solutions from both academia and industry. Although a plethora of
research proposals and data leakage prevention systems available in the literature,
there is a pressing need to find an effective approach to this problem of security and
privacy of sensitive data and information [4, 5]. Thus, it remains an active field of
research.
This survey paper provides a comprehensive understanding of the field of study
and presents state-of-the-art approaches for data leakage detection and prevention
problem. The paper is structured as follows. Section 2 discusses on phases of a data
leakage detection and prevention system and presents a taxonomy to characterize
data leakage detection and prevention systems. Section 3 provides a review of select
research proposals in the field of study. Section 4 lists an array of research challenges
and provides pointers for future research. Section 5 concludes the paper.
Data Leakage Detection and Prevention: Review … 205
Although a number of data leakage prevention solutions are available from software
vendors, there is less clarity on the exact definition of a data leakage prevention
system. While there is no commonly agreed definition of what exactly a data leak-
age detection and prevention (DLDP) system should be, an attempt is made here
to avoid the ambiguity. It provides an understanding of the DLDP system and its
characterization.
Data leakage detection and prevention systems aim at identifying sensitive data and
information, monitoring its usages and movement inside and out of the organization
and taking action to prevent unintentional or deliberate disclosure of it. As shown in
Fig. 1, identifying the sensitive data is the first important step in the DLDP systems.
What to be included and what not to be included within the scope of sensitive data
are a very crucial decision to be taken by the organization. The DLDP system must
be configured correctly; otherwise, the system may result in false negative, and the
leakage of potentially sensitive data will go unnoticed. The sensitive data must be
kept in a readable digital format so that the DLDP system can monitor its usage and
movement through multiple channels such as email, instant messaging, file transfer,
access using HTTP, blogs and copying to USB. While it is rather challenging to
develop a system to monitor all possible channels of data leakage, most systems focus
on critical channels such as email, instant messaging, file transfer, HTTP and USB
copy. Several techniques such as content matching, image recognition, fingerprinting,
and statistical analysis can be used by DLDP systems to detect sensitive data leakage
during channel monitoring. Once the leakage is detected, the system may perform
one or more of the following actions. It may simply log the incident for later analysis
and investigation, notify the risky behavior of the individual involved in the leakage
to designated authorities and block the data being transmitted.
DLDP systems can be characterized mainly based on which data state they handle,
where they are deployed and what approach they employ to detect leakage. The
classification is shown in Fig. 2. Since techniques for monitoring data are different
for different data states, DLDP systems may be characterized based on three states
of data such as Data-at-Rest, Data-in-Motion and Data-in-Use. Data-at-Rest can be
defined as data in storage such as in file systems, databases, servers, desktop and
laptops. Data-at-Rest can be local or remote to the DLDP system. A DLDP system
may monitor Data-at-Rest at regular intervals or on demand. It checks if the sensitive
data is encrypted or not. It checks if the access control policies are violated or not.
Data-in-Motion can be defined as data that is passing through a network such as the
Internet or local area network. A DLDP system usually monitors the network traffic
by inspecting data packets to identify sensitive data traveling over the network using
variety of protocols such as SMTP, FTP and HTTP. Data-in-Use can be defined as data
that is being processed on endpoint devices. A DLDP system monitors the sensitive
data and checks if it is being transmitted from one endpoint to another. Usually,
the system monitors operations such as copy and paste between applications, screen
capture, print, download to portable storage device such as USB drives and CD/DVD.
DLDP systems can be characterized based on their deployment location. A DLDP
system can be deployed directly on endpoint devices or on network level. The DLDP
system that is deployed on an endpoint device monitors and controls access to sensi-
tive data, while a central server is responsible for policy creation and management,
policy distribution and violation management, system administration and creation
logs for incidents. On the other hand, a DLDP system deployed at network level
inspects traffic based on leakage detection policy at multiple monitoring points in the
network and blocks suspicious packets. DLDP systems can be further characterized
based on the approaches employed to detect leakage of data. They can be context-
based, content-based and hybrid. In a context-based approach, a DLDP system ana-
lyzes contextual metadata such as source, destination, sender, recipients, header, file
type and format. A content-based system analyzes the data content using several
techniques such as natural language processing, statistical analysis, fingerprint and
keyword matching. A DLDP system using both content-based and context-based
analysis is a hybrid one.
Data Leakage Detection and Prevention: Review … 207
3 Literature Review
Several research efforts have been found in academic research literature to address
the data leakage problem. Most of the works have been made to detect and prevent
data leakage considering different states of data. These proposals mostly used either
content analysis or context analysis for leakage detection. A few of them used the
hybrid approach. The analysis techniques used in the works differ from one to another.
This paper considers select research works carried out in the past decade based on
their relevance. These are summarized below and their characterization is shown in
Table 1.
Zilberman et al. [6] proposed an approach to detect email leakage based on topic
identification and sender-recipient analysis in a group communication. A person
in an organization belongs to several topic groups. The recipient of an email is
classified as legal or illegal based on his/her association with the topic of the email
being exchanged. The email content is analyzed to establish its topic using k-means
clustering. The proposal was evaluated on a standard data set and provides 85% of
accuracy. However, it suffers from a serious problem when the trained model does
not have any information about a recipient.
Shapira et al. [7] proposed a content-based approach that uses well-known fin-
gerprinting methods. The proposal uses k-skip n-gram technique and extracts fin-
gerprints from confidential contents while skipping nonconfidential contents of a
document to address the problem of traditional fingerprinting methods. It also takes
care of rephrasing of contents and makes it possible to detect intentional leakage
incidents. While it outperforms the traditional fingerprinting methods, it requires
extensive data indexing when comes to implementation.
Shu et al. [8] proposed fuzzy fingerprint technique to enhance data privacy during
leakage detection. The approach enables the data owner to safely delegate the content
inspection task to a leak detection provider without disclosing the content. Although
the network-based approach is efficient and provides satisfactory results under var-
ious data-leak scenarios, it suffers from computational complexity and realization
difficulty.
Costante et al. [9] proposed a hybrid approach that combines signature-based and
anomaly-based techniques to detect and prevent data leakage. It identifies insider
threats by monitoring the activities of users and detecting anomalous behavior. Once
the malicious behavior is detected, it is flagged up and the transaction is blocked. An
attack signature is created and recorded to prevent such type of activities in the future.
It uses a rule base which is updated automatically when a new anomaly is detected.
While it attempts to combine both detection and prevention of leakage attacks using
a rule-based technique, the system is flooded with false-positive alerts in the initial
stage of system operation.
Papadimitriou et al. [10] studied guilty agent identification. The authors proposed
data allocation strategies using which a distributer discloses sensitive data to a set of
supposedly trusted agents. The distributer then assesses the likelihood that an agent
is a guilty agent and responsible for the leakage when data is leaked. The approach
is robust in case the released data is altered. However, the study does not capture
adequate leakage scenarios.
Alneyadi et al. [11] proposed a hybrid method to detect potential data leakage
in email communications of an organization. The method first performs context
analysis using five contextual parameters in an email to measure RAI. The mail that
scores high RAI is subjected to semantic analysis of its content in order to detect
data leakage. The proposal showed encouraging results in detecting data leakage
in emails. However, it suffers from implementation issues and consequently poor
performance when it is not possible to capture all five contextual parameters in an
organization.
Shu et al. [12] proposed a content-based approach that detects data leakage in
transformed content. When the leaked data is modified, it becomes difficult to detect
using usual n-gram technique. The problem is addressed using an alignment method
in which a pair of algorithms, one sampling algorithm and another alignment algo-
rithm, is used to compute a similarity score between the sensitive data sequence and
the content sequence under inspection. The approach is efficient and results in high
Data Leakage Detection and Prevention: Review … 209
specificity; i.e., the percentage of true positive is very high than false-positive cases.
However, it suffers from computational complexity.
Katz et al. [13] proposed a context-based model called CoBAn to detect data leak-
age in a controlled communication channel of an organization. During training phase,
the model considers both types of documents, confidential and nonconfidential, and
identifies various subjects dealt in these documents. Using k-mean clustering, it
develops clusters of documents representing subject or context approximation. The
model generates a context-based confidential term graph for each cluster. In detection
phase, documents are analyzed and matched with one or more term graphs to estimate
their confidentiality score. A document is detected confidential if its score crosses a
predefined threshold. Although the model is able to find confidential data hidden in a
document, the time complexity of the model is very high. Further, the model results
in high rate of false positive which may not be acceptable in a real-world system.
Gomez-Hidalgo et al. [14] proposed a context-based approach that uses named
entity recognition (NER) technique to detect data leakage of individuals and com-
panies. Using Twitter data, experiments were conducted on a developed prototype
which demonstrated encouraging accuracy level. However, the attributes considered
in the NER technique are mostly homogenous.
Trieu et al. [15] proposed a method that used semantic and content analysis of
documents to detect sensitive data leakage. The model uses document embedding
to generate vector representation of a document or a text fragment. This vector
representation is evaluated using a sensitivity corpus to find the sensitivity score
of the document or text fragment. Experimental results show very high detection
accuracy.
Lu et al. [16] proposed an approach for collaborative data leakage detection over
distributed big data sets. The approach performs a privacy-preserving collaborative
training on each owner’s data which eventually trains and optimizes a global model.
It uses a graph masking technique on the local weighted graphs representing local
data and develops a trained template graph that represents global graph space. Then,
the document to be tested is converted to graph and matched with the trained template
graph to compute the sensitivity score of the test document. Although the approach
can handle leakage detection with efficiency, the computational complexity remains
high.
Chhabra et al. [17] studied data leakage detection in MapReduce computation in
cloud computing environment. They used s-max algorithm to the reduced data in
order to identify the guilty agent when any data leakage happens. They conducted a
simulation for parallel processing of weather forecasting data using Hadoop frame-
work and cloud analyst tool. However, the probability of indentifying the guilty agent
is not very significant, and it reduces with increase in number of agents.
210 S. K. Nayak and A. C. Ojha
DLDP systems face several challenges while preventing leakage of sensitive data [5].
One of them is abundant leaking channels such as email, social media, USB, printer,
fax, smart phone and laptop. It is difficult to manage and secure all the channels. It is
also very complex to model all possible activities of a person with sensitive data as
well as leakage channels while developing a DLDP system. Another major challenge
is the transformed data which is very hard to detect. When sensitive data is modified,
its identity and patterns are changed making leakage detection very challenging.
The advent of big data and cloud computing has amplified the challenges of a
DLDP system [18]. The system should be scalable to process massive data using
parallel processing in a distributed environment. Real-time leakage detection is a
requirement but a huge challenge for DLDP systems while dealing with big data.
Anonymization and privacy preservation in big data are rather challenging to protect
sensitive information [19]. Since data is kept in a remote location away from the data
owner in a cloud environment, security and privacy remain a major concern which
adds to the leakage detection problem. Multi-tenancy model in cloud computing
offers threats of data leakage due to vulnerabilities in inter-tenant isolation [20].
Successful DLDP systems must be able to deal with the above said challenges in
order to address growing concern of data leakage.
Additionally, there exist several areas with research opportunities which require
further efforts from the researcher community. Deep learning has been successfully
applied in various domains. The efficacy of deep learning can be exploited in both
context and content analysis to detect data leakage and identify the insider threat
with higher accuracy while achieving timely protection of sensitive data. In particu-
lar, deep learning-based leakage detection may be investigated in transformed data,
wherein sensitive information is hidden in the exposed content. Cloud computing
offers a new avenue for DLDP systems. Leakage detection and prevention can be
offered as Software-as-a-Service (SaaS). Consequently, privacy preservation of the
sensitive data becomes a major concern if DLDP system is offered as SaaS. Further,
data leakage detection in MapReduce computation is a key research direction in
cloud computing.
5 Conclusions
particular, the new challenges are thrown by big data and cloud computing invite
further investigation. Nevertheless, organizational policies play a very effective role
in curbing the menace of data leakage. Organizations should have concise policies to
identify their critical data, and its handling since successful DLDP solutions heavily
rely on classified information. Organizations should have policies to monitor access
and activities on sensitive data at network level as well as on various endpoints. Addi-
tionally, a bare-minimum policy of encrypting sensitive data should be universally
implemented across the organization.
References
1. Data breach quick view. 2015. Data breach trends. Available at https://www.riskbasedsecurity.
com/2015-data-breach-quickview/. Accessed on 5 Sept 2019.
2. Data leakage news. Available at https://infowatch.com/analytics/leaks_monitoring. Accessed
on 5 Sept 2019.
3. IBM security’s cost of a data breach report 2019. Available at https://www.ibm.com/security/
data-breach. Accessed on 5 Sept 2019.
4. Asaf, Shabtai, Yuval Elovici, and Lior Rokach. 2012. A survey of data leakage detection and
prevention solutions, 1st ed. Springer: New York Heidelberg Dordrecht London. https://doi.
org/10.1007/978-1-4614-2053-8.
5. Alneyadi, S., E. Sithirasenan, and V. Muthukkumarasamy. 2016. A survey on data leakage
prevention systems. Journal of Network and Computer Applications 62: 137–152.
6. Zilberman, P., S. Dolev, G. Katz, Y. Elovici, and A. Shabtai. 2011. Analyzing group com-
munication for preventing data leakage via email. In Proceedings of 2011 IEEE international
conference on intelligence and security informatics, 37-41, 10–12 July 2011. Beijing, China:
IEEE.
7. Shapira, Y., B. Shapira, and A. Shabtai. 2013. Content-based data leakage detection using
extended fingerprinting. arXiv preprint arXiv:1302.2028.
8. Shu, Xiaokui, Danfeng Yao, and Elisa Bertino. 2015. Privacy-preserving detection of sensitive
data exposure. IEEE Transactions on Information Forensics and Security 10 (5): 1092–1103.
9. Costante, E., D. Fauri, S. Etalle, J.D. Hartog, and N. Zannone. 2016. A hybrid framework
for data loss prevention and detection. In Proceedings of 2016 IEEE security and privacy
workshops, 324–333. IEEE Computer Society.
10. Papadimitriou, P., and H. Garcia-Molina. 2011. Data leakage detection. IEEE Transactions on
Knowledge and Data Engineering 23 (1): 51–63.
11. Alneyadi, S., E. Sithirasenan, and V. Muthukkumarasamy. 2016. Discovery of potential data
leaks in email communications. In Proceedings of the 10th international conference on signal
processing and communication systems (ICSPCS), 1–10. Gold Coast, Australia: IEEE.
12. Shu, Xiaokui, Jing Zhang, Danfeng Daphne Yao, and Wu-chun Feng. 2016. Fast detection of
transformed data leaks. IEEE Transactions on Information Forensics and Security 11 (3): 1–16.
13. Katz, G., Y. Elovici, and B. Shapira. 2014. CoBAn: A context based model for data leakage
prevention. Information Sciences 262: 137–158.
14. Gomez-Hidalgo, J.M., J.M. Martin-Abreu, J. Nieves, I. Santos, F. Brezo, and P.G. Bringas.
2010. Data leak prevention through named entity recognition. In Proceedings of IEEE 2nd
international conference on social computing, 29–34, Minneapolis, USA.
15. Trieu, Lap Q., Trung-Nguyen Tran, Mai-Khiem Tran, and Minh-Triet Tran. 2010. Document
sensitivity classification for data leakage prevention with twitter-based document embedding
and query expansion. In Proceedings of 13th International Conference on Computational
Intelligence and Security, 537–543, 15–18 Dec 2017. Hong Kong, China: IEEE.
212 S. K. Nayak and A. C. Ojha
16. Lu, Yunlong, Xiaohong Huang, Dandan Li, and Yan Zhang. 2018. Collaborative graph-based
mechanism for distributed big data leakage prevention. In 2018 IEEE Global Communications
Conference (GLOBECOM), 9–13, Abu Dhabi, UAE.
17. Chhabra, S., and A.K. Singh. 2016. Dynamic data leakage detection model based approach for
MapReduce computational security in cloud. In Proceedings of fifth international conference
on eco-friendly computing and communication systems (ICECCS-2016), 13–19. IEEE.
18. Cheng, L., F. Liu, and D. Yao. 2017. Enterprise data breach: causes, challenges, prevention,
and future directions. WIREs Data Mining and Knowledge Discovery 7: e1211. https://doi.org/
10.1002/widm.1211. Wiley & Sons, pp. 1–14.
19. Basso, T., R. Matsunaga, R. Moraes, and N. Antunes. 2016. Challenges on anonymity, privacy
and big data. In Proceedings of seventh Latin-American symposium on dependable computing,
164–171, Cali, Colombia, 19–21 October. IEEE Computer Society.
20. Priebe, C., D. Muthukumaran, D. O’Keeffe, D. Eyers, B. Shand, R. Kapitza, and P. Pietzuch.
2014. CloudSafetyNet: detecting data leakage between cloud tenants. In Proceedings of the
6th edition of the ACM workshop on cloud computing security, 117–128, Scottsdale, Arizona,
USA, November 7–7, 2014. ACM.
21. Data Leakage Prevention (DLP)-ISF Briefing Paper. Available to https://www.securityforum.
org/research/data-leakage-prevention-briefing-paper/. Accessed on 5 Sept 2019.
A Blockchain Based Model to Eliminate
Drug Counterfeiting
Abstract The drug counterfeit problem has become global and so huge that it has
drawn significant attention from everyone. The fake drug industry is worth $10B per
year as estimated by survey. Also estimated by WHO, about a million die per year due
to fake drugs. The distribution of fake drugs is a crucial issue. One of main reasons
behind drug counterfeiting is imperfect supply chain. There are many loopholes in
our drug supply chain. In the present scenario of supply chain, either the information
is not at all shared between the parties during the hand-off process or a little or
irrelevant information is shared, which has led to counterfeiting. The counterfeit
drug not only affects the health condition of patients but also results in the financial
loss of genuine manufacturer. In this paper, the details of the drug counterfeit have
been explained along with its impact. Also, various measures to fight the counterfeits
using blockchain technology have been discussed. Using blockchain technology
traceability, visibility and security can also be incorporated into the drug supply
chain. This proposed system will track the drugs from its origin, the manufacturer to
the end, the consumer.
M. Sahoo · S. S. Singhar
Department of Computer Science, Utkal University, Bhubaneswar, India
e-mail: monalisasahoo.ln@gmail.com
S. S. Singhar
e-mail: sunilsinghar@gmail.com
S. S. Sahoo (B)
DDCE, Utkal University, Bhubaneswar, India
e-mail: sony28788@gmail.com
1 Introduction
Imagine a situation, where someone dies because of fake medicine. It is not a distant
possibility, and this has become a reality for most developing countries even for some
developed countries. Fake medicines have turned to be multi-billion-dollar problem
on a global level. The shape, size, color of the pharmaceuticals and even the packaging
exactly look like the original. Small amounts of the active ingredients can be found
in these bogus products or sometimes none at all or may be even worse like some
fatal ingredients. It is upsetting, but fact is, India is the epicenter of manufacturing
of fake medicines. It is beyond doubt that fake medicines are an increasing threat to
consumers and the pharmacy industry. So, the real-time visibility of drug production
and management is necessary. Blockchain is the answer for dealing with counterfeit
drugs. It can be accessed freely because the transactions are stored in digital ledger
format without compromising on the security and privacy of the users [1].
Blockchain technology concept came from bitcoin cryptocurrency [2] where no
third party is required for transaction. In blockchain, transaction is being carried out in
a distributed peer-to-peer system. It has been a common thought that blockchain can
only be used for financial sector as it is based on the idea of bitcoin. Only recently,
the true potential of blockchain has been realized by the researcher community.
This decentralized technology has many useful applications [3] such as health care,
logistics, Internet of Things (IoT), reputation system, public service, supply chain
which are beyond the financial sector. If only health care sector [4] is considered,
it may further be divided into many parts like medical record management, health
care data access control, clinical trial, drug supply management, insurance claim
adjudication, etc. (Fig. 1).
The remaining of the paper is organized as follows. Section 2 provides a detailed
background study. Section 3 explains about fake drugs and how they enter into the
supply chain. Sections 3.1 and 3.2 list down the challenges posed by fake drugs and
those by the drugs supply chain. Section 4 describes the blockchain solution, and
Sect. 5 demonstrates the proposed solution. Finally, Sect. 6 concludes this research
work.
2 Background
market, down from 70% thirty years ago. Exports of pharmaceuticals products from
India have also risen from US 6.23 billion dollars in 2006–07 to US 8.7 billion dollars
in 2008–09. India has sold 11.7 billion dollars worth of pharmaceutical products in
2014. Pharmaceutical export from India stood at US 17.27 billion dollars in 2017–18
and is estimated to increase by 30% to reach US 20 billion dollars by the year 2020
(Table 1).
A counterfeit drug is one which is composed of the wrong active ingredients, the
proper active constituents at the wrong dose or none of the prescribed active ingre-
dient. Counterfeit drugs have found a place in the mainstream of drug supply chain.
More than one third of spurious drugs being produced globally have originated from
India, which erodes the reputation of our country in international market. How the
fake drugs find such a big chunk of share in the market? There are multiple points
of contact which are not secure and unsubstantiated, any of which could have been
an entry point of counterfeit drugs.
A number of probable reasons have been listed:
– The global supply chain of drug market is so complex that it is very difficult to
decide on the drugs or drug ingredients.
– Fragmented supply chain. (Before the arrival at their final destination, sometimes
the drugs pass through various trading companies who usually do not check for
quality)
– One or several members in the supply chain may be corrupt: Adulterating,
Replacing or Mislabeling products in the supply chain.
– Uncertified source of raw material. (A trusted manufacturer might be unaware that
the ingredients possibly came from an unverified source).
A Blockchain Based Model to Eliminate Drug Counterfeiting 217
– Fake drug manufacturers or the fake drug brands successfully pose as legiti-
mate pharmaceutical companies and reputed drug brands with the help of false
paperwork.
– Lack of consumer awareness and high price of genuine medicine.
– Weak enforcement law of legislation toward corruption and advanced method used
in counterfeiting.
In drug supply chain system, lots of challenges have to be faced for overcoming the
fake drugs supply like:
– Loss of Business Possibilities: The existence of counterfeit drugs can cause the loss
of market share and business possibilities for producer of authentic pharmaceutical
products. It has been predicted that fake drugs have given rise to a 46 billion dollars
loss yearly to pharmaceutical industries globally.
– Affecting the authorization of Generics: Estimation is that 90% of the cost of
India’s medicine market is influenced by generic trademarks. In order to lessen the
health care overheads, many governments are recommending the use of low-cost
generic medications, but the presence of counterfeits is serving as a barrier.
– Accumulating the Financial and Societal Difficulties: The consumption of spurious
drugs has caused a significant rise in expense of the health care system due to
the requirement for further interventions for undesired reactions and/or advanced
disease progression. This is a particular issue for developing countries like India,
where drug spending is already high at almost 70% of the income, and affordability
levels are low.
– Resourcing: To deal with the problem of counterfeits, the Government of India
has employed various anti-counterfeiting programs, but with inadequate effects,
largely due to India’s Central Drugs Standard Control Organization, the country’s
drug regulator, having only 323 employees in 2014, about two percentage the size
of the FDA of USA. This under-resourcing is likely to influence the success of any
subsequent approaches.
With multiple entry points and so many loopholes and absence of a transparent and
robust security system, tracking the drug supply chain has become a herculean task.
The hindrances to the present supply chain system are as follows:
– Absence of unified and inter-operable labeling and identifications standard.
218 M. Sahoo et al.
– Soiled, fractured and opaque supply chain infrastructure. There is no easy and
clear way to track down the product’s journey in the supply chain that can unveil
the real origin and hand-off points.
– Improper monitoring and management of cold chain supply.
– Hand-offs between various parties, including packager and transport agencies, as
the drugs are transferred between multiple stages, which could be the possible
points of entry of spurious drugs.
– Different systems have been adopted by different manufacturers. This leads to a
compatible problem where the distributors and transport agencies to keep different
type of solutions along their supply chain systems. Which can cause a confusion
in the information or delivery of the drugs and eventually risk to consumer.
In this context [6], the issue of counterfeit medicine is an increasingly acute world-
wide issue and demands immediate attention. With\$10B lost in the war against
counterfeit drugs annually, crumbling patient safety issues, a chain-of-custody log
that blockchain could potentially implement, holds major promise. This type of
blockchain [7] system could ensure a standard log of tracking at each step in the
supply chain at the specific drug or product classification level.
Blockchain technology [8] allows large amounts of correlated data to be dis-
tributed but not copied. Although these types of digitally distributed ledgers have
been around for years, blockchain uses new technology, like specialized algorithms,
to validate and authenticate transactions within a decentralized architecture while
progressing in real time through protected layers of digital encryption.
The decentralized framework of blockchain [9] in addition to distributed digital
ledger that records and transfers data in a secure, transparent and fast manner may be
put to use. A drug supply chain based on blockchain depends on a trusted network
geared with blockchain and IoT. The required entities or nodes of the blockchain
network are the manufacturer, distributors, the transport agencies and pharmacist
or hospital. A supply chain is to be created including all the stakeholders such as
suppliers, vendors, distributors and pharmacies, hospitals. Drug bottles are to be
labeled with serial numbers and unique fingerprint of its manufacturer. The drug
bottles packaged for transportation and the packages must also contain the serial
numbers of the drugs inside. The serial numbers are to be scanned at every point
of its journey from the manufacturer to the consumer, and information recorded in
the blockchain. The whole journey of the drug is to be made visible to all entities in
the blockchain [10]. The manufacturing inputs like chemical ingredients and other
parameters can be updated and linked to the serial no of the products. So, when the
consumer finally buys the drug by just scanning the label by a smartphone, he/she
will be able to know everything about the drug.
A Blockchain Based Model to Eliminate Drug Counterfeiting 219
5 Proposed Model
Basically, we can divide the whole drug supply chain from the manufacturers to the
consumers into six parties.
1. Manufacturer
2. Distributor
3. Transporter
4. Pharmacy
5. Hospital
6. Consumer.
The model can be described as follows. The manufacturer sends the drugs to the
distributor or to the transport agency according to their requirement. The distributor
can directly collect drugs from the manufacturer or order it through a transport agency.
The distributor sends drugs to hospitals and pharmacies again through the transport
agencies. If the hospital’s requirement is very large, it can directly order from the
manufacturer through transport agency, or it can collect drugs from the distributor
directly. The smaller hospitals can order drugs from their nearby pharmacies. At
last, the consumer or patient will get the drugs either from the hospital or from the
pharmacy store.
Excluding the last party (the consumer), a blockchain framework can be created
among the first five parties. It is quite obvious question that the last party, which is
the consumer, has to be kept out of the blockchain. This is because every node in
the blockchain framework participates in the transaction verifying process. The con-
sumer community is very diverse from illiterate to highly educated, from high school
kid to senior citizens, from a person who lives in a remote area to the person who
lives in a city, so they all cannot participate in the transaction verifying procedure.
And also one more thing is that if every consumer is added to the blockchain frame-
work, then the system will be heavily loaded as there are billions of consumers in our
nation. Managing such a huge data will be expensive, and the complexity will also
increase. But there is a platform where every consumer can check the authenticity
of the drug, which has been discussed later in this section.
As our traditional system consists of some loopholes, so a decentralized
blockchain framework is to be created with a shared distributed ledger with the
five entities. Each entity should be verified before adding them into the blockchain
framework (Fig. 2).
First, all the drug manufacturers must have to be certified by the Central Drugs
Standard Control Organization (CDSCO), India. Each manufacturer should have a
unique id. The drug manufacturer should be a trusted entity. It has to make sure that
all the registered manufacturers produce genuine drugs, and the Government of India
must take responsibility of the genuineness. Once a drug is produced, it has to be
labeled by the unique id of the manufacturers, the drug name, the composition of
the drug and the other details. The drug information is added to the shared ledger
with all the details of the drug. Now, the manufacturer will transfer the drugs to the
distributor.
220 M. Sahoo et al.
All the distributors registered in the blockchain should also be assigned with a
unique id. During the hand-off from manufacturers to distributor, the both parties
should digitally sign using their private keys in the distributed ledger, and the trans-
action is added to the block. And all the entities in blockchain should verify the
transaction before adding another transaction so that no one can deny or tamper
this transaction in future. Now the only objective of the distributor is to distribute
the drugs according to the requirement of hospitals and pharmacy stores through
transport agencies (Table 2).
Once the distributor receives the drugs from the manufacturer, all the parties in the
blockchain will know that distributor has received the drugs. The transport agency
will now take the drugs from the distributor to pharmacy stores and hospitals. The
hand-off between distributor and the transport agencies will happen after they both
digitally sign in the shared ledger using their private keys. It is crucial period during
the transport because during the transportation, the probability of counterfeiting is
maximum. Here, we can take the help of IoT and wireless sensor devices to track
and check. A GPS system can be attached to the drug package and that can be
monitored constantly before it reaches its next destination. A wireless sensor can
also be placed inside package which will constantly monitor temperature humidity
and other parameters because some drugs are sensitive to temperature, and higher
temperature will spoil the medicine (Table 3).
In a similar fashion, the hand-off of drugs between transport agency and hospital
or transport agency and pharmacy is to be carried out. Now, the drug is available for
its final usage. The drugs can be now administrated to the patients, or the consumers
can buy from the pharmacy stores.
6 Conclusion
With the recent advancement in network and Internet technologies, there is an urgent
need of improvement in medical health care services. In this paper, various issues
related to drug supply chain management have been discussed. This work also dis-
cusses how blockchain can prevent the issues in a transparent and secure manner.
Blockchain can be used to add traceability and visibility to the drug supply chain
to overcome the issues of drug counterfeiting. A model of decentralized blockchain
architecture has also been presented with a shared ledger system, which will not only
prevent drug counterfeiting but also will make the drug supply more robust, transpar-
ent and trustworthy. Using this proposed model loopholes in our current drug supply
chain can be blocked. Moreover, the proposed model of blockchain framework is not
only limited to countering the counterfeiting of drugs but also be useful in real-time
tracking such as scheduling delivery of products.
References
1. Salman, T., M. Zolanvari, A. Erbad, R. Jain, and M. Samaka. 2018. Security services using
blockchains: A state of the art survey. IEEE Communications Surveys and Tutorials 21 (1):
858–880.
2. Nakamoto, S. 2008. Bitcoin: A peer-to-peer electronic cash system.
3. Zheng, Z., S. Xie, H.-N. Dai, and H. Wang. 2016. Blockchain challenges and opportunities: A
survey. International Journal of Web and Grid Services 1: 1–25.
4. Roehrs, A., C.A. da Costa, and R. da Rosa Righi. 2017. Omniphr: A distributed architecture
model to integrate personal health records. Journal of Biomedical Informatics 71: 70–81.
5. Pharmaceutical industry in India. https://en.wikipedia.org/wiki/Pharmaceutical.
6. Mettler, M. 2016. Blockchain technology in healthcare: The revolution startshere. In 2016 IEEE
18th International conference on e-Health networking, applications and services (Healthcom),
1–3. IEEE.
222 M. Sahoo et al.
7. Azaria, A., A. Ekblaw, T. Vieira, and A. Lippman. 2016. Medrec: Using blockchain for medical
data access and permission management. In 2016 2nd International conference on open and
big data (OBD), 25–30. IEEE.
8. Zhao, H., Y. Zhang, Y. Peng, and R. Xu. 2017. Lightweight backup and efficient recovery
scheme for health blockchain keys. In 2017 IEEE 13th international symposium on autonomous
decentralized system (ISADS), 229–234. IEEE.
9. Liang, X., J. Zhao, S. Shetty, J. Liu, and D. Li. 2017. Integrating blockchain for data sharing
and collaboration in mobile healthcare applications. In 2017 IEEE 28th annual international
symposium on personal, indoor, and mobile radio communications (PIMRC), 1–5. IEEE.
10. Dubovitskaya, A., Z. Xu, S. Ryu, M. Schumacher, and F. Wang. 2017. How Blockchain could
empower eHealth: An application for radiation oncology. In Data management and analytics
for medicine and healthcare. DMAH 2017, ed. E. Begoli, F. Wang, G. Luo, Lecture Notes in
Computer Science, vol. 10494. Cham: Springer.
A Comparative Analysis of Predictive
Modeling Techniques: A Case Study
of Device Failure
Abstract In the modern lifestyle, a person’s life is full of gadgets and appliances.
At some period of time, every device will have downtime. During the downtime of a
device, its performance keeps on degrading up to any level. Preventive maintenance
in every regular interval can only save a machine from performance degradation
or sometimes any kind of accidental hazard. Hence, if the device downtime can be
detected automatically, then some kind of preventive measure can be taken to recover
it from its degradation state. Here, a machine learning-based approach is proposed to
detect the state of any device. The Kaggle dataset is used here for experimentation.
The whole system is divided into three different phases. In the first phase, the data is
collected and cleaned using various statistical methods. In the next phase, the model
is trained using the training dataset. Finally, the trained model is evaluated using the
testing dataset. In this work, various machine learning algorithms are used to find
the best algorithm showing the best performance against the given dataset.
1 Introduction
J. Santhappan (B)
NorthBay Solutions, Houston, TX, USA
e-mail: jstudent921@gmail.com
P. Chokkalingam
CFA Level 3—CFA Institute, Houston, TX, USA
© Springer Nature Singapore Pte Ltd. 2020 223
D. Swain et al. (eds.), Machine Learning and Information Processing,
Advances in Intelligent Systems and Computing 1101,
https://doi.org/10.1007/978-981-15-1884-3_21
224 J. Santhappan and P. Chokkalingam
state of any device can be detected, then it becomes very convenient to recover it
through maintenance. Through predictive maintenance, the accidental failure can be
avoided and the productivity gets increased [2].
Presently, banking organizations are having a lot of concern about providing
qualitative service to the customers. For that reason, they are trying to maintain good
quality in their ATMs, online transaction platforms, and other web-related applica-
tions. While improving their services, sometimes older versions need enhancement in
terms of its functionalities. During enhancement in service, some older components
need to interact with some new components. Due to this, any service may encounter
some error at any particular point of time due to incompatibility between different
components. As a solution to this problem, nowadays technical support team keeps
track of the technical maintenance activity of different components [3].
In some cases, for some particular device, the past data related to its performance
at a different instance of time is collected. This data should be brief in nature. This
data can give information about the characteristic of the device explaining its healthy
and degraded performance condition. This data can be useful for doing some predic-
tion about the downtime of the device and also suggest the time when maintenance
is required for the device. But while collecting the data, its quality needs to be
checked because sometimes exceptional data also gets introduced during data col-
lection. Exceptional data creates a special case study which creates some error during
predictive maintenance. Hence, precaution needs to be taken during data collection
[4].
Due to failure, there is always some intervention created in different applications.
Due to this, a company generally faces a lot of loss. In industries, various transmitters
are used for different purposes like temperature, pressure, and level. These are the
different crucial parameters for any production-based industry. Sometimes, intelli-
gent agents like intelligent field devices are used for the detection of possible failure
before it occurs. This detection helps in avoiding an emergency trip [5].
Predictive maintenance plays a major role in medical industries also. For the
detection of the diseases, different methods are used such as MRI, CT scan, and
X-rays. At some period of time, the devices used in these methods undergo a down-
time. During this period, its performance gets affected. Because of this, it delays the
diagnosis process, and sometimes, errors also observed in the reports generated by
the devices. In such kind of applications, sometimes sensor-based systems are used
for fault detection of any part of the device. In some other places device, log data is
used for fault detection [6].
In telecommunication systems, different devices like AD converter, DA converter,
and signal conditioning systems are used. Due to the lesser capacity of the capac-
itors used in these systems generally, the performance of the system gets affected.
Sometimes, the lives of the equipment are affected by temperature, humidity, and
other environmental factors. For such applications, equipment monitoring programs
are used to get equipment data at a different instance of time. Hence, if any faulty
status data is recorded, then immediately maintenance activity is carried out to avoid
any kind of failure [7].
A Comparative Analysis of Predictive Modeling Techniques … 225
2 Literature
Susto et al. [1] proposed an effective predictive maintenance method using support-
vector machine (SVM) and K-nearest neighbor (KNN) algorithm. Here, first the
predictive maintenance module receives physical variables information of different
processes. Through this data, the faulty status of different processes is known. After
that, the data is given to the classifiers to train the model. Then the inputs about
different process are given to the classifier to predict about the future status of the
processes. The SVM classifier creates a hyperplane to differentiate between the
different classes. If the hyperplane is not able to linearly separate the instances, then
the additional kernel is added for proper separation of the instances. In the KNN
algorithm, the class of k-nearest neighbors of any test instance is determined first.
The class having the majority is assigned to the test record.
Jose et al. [2] proposed an IoT-based predictive maintenance system. In this work,
the wrong alignment of bearings and machinery operational fail due to metal fatigue
are analyzed. Here, in the proposed system, different sensors are used to monitor the
system. The system is used for vibration analysis, oil level analysis, and electrical
parameter analysis. After that, the captured data is given to the machine learning
system to do predictive maintenance.
Mishra et al. [3] developed a useful predictive maintenance method for finan-
cial organizations. The system is divided into three layouts. Component-level failure
probability is determined at a regular time interval using machine learning method.
In the data flow layout, the parameters related to failure are collected. In the data
preparation approach, the error event is taken as input and prediction of each device
level failure is made. For every device, the component-level errors are determined.
Here, the observation, transition, and prediction window methods are used to prepare
the features of failure. The final architecture layout talks about the high-level archi-
tecture of the whole system. Various services are associated with a different layer of
226 J. Santhappan and P. Chokkalingam
the main architecture. Hence, all the parameters related to failure are analyzed and
then given to the machine learning model for prediction of failure.
Chen et al. [4] have used a virtual metrology-based predictive maintenance scheme
for the minimization of accidental failure. Through BPM scheme, the faulty system
prediction can be done. The quality of prediction depends upon the past failure data.
The data sample consists of concise and good data collected before and after mainte-
nance is done. If some exceptional data gets introduced in the data set, then it affects
the prediction process of BPM scheme. Hence, while the data selection processes,
an ABSS scheme was used that selects healthy and proper data by removing the
exceptional data samples.
Fornaro et al. [5] have developed a device diagnosis algorithm to predict the future
device failure and future abnormal behavior to be shown by the different transmitter.
The failure and mal-operational data of different pressure transmitters and echo
transmitters are first collected. Then using this data rule base for the prediction is
developed.
Patil et al. [6] implemented a machine learning-based method for the failure
forecasting of different imaging device used in the medical field. The behavior of
these devices is observed by the original equipment manufacturer remotely to collect
different features like downtime, accidental failures. After that, the data is used to
train the machine learning model and predict the failure of any device. Here, the
author has done a case study on PHILIPS iXR system.
Sisman et al. [7] have developed a predictive maintenance system for different
electronic devices used in the telecommunication system. Statistical analysis is car-
ried on failure data to identify the critical factors responsible for the malfunction of
the devices that cause the system to stop working. The initial analysis identified the
capacity of the capacitors as a crucial factor for the failure of the devices. Then after
using fuzzy graphs also, the same capacitor capacity is again identified as the crucial
parameter. Hence, predictive maintenance is suggested for the capacitors to avoid
any kind of failure.
Susto et al. [8] have proposed an effective maintenance technique for the fabrica-
tion of semiconductors used in electronics industries. In the predictive maintenance
technique, the historical data is used for predicting the capability of the device. In
the predictive maintenance technique, support-vector machine algorithm is used to
forecast the life of the filament used in semiconductor fabrication.
3 Dataset
The data set is collected from Kaggle repository. The dataset consists of a total of
575 records and nine features. The dataset consists of a total of 575 records and 11
attributes. The attribute1 and attribute2 are the device id and data of failure. These
two features are not having a correlation with the target which is identified through
Karl Pearson method of correlation. Hence, these two features are removed from the
dataset. Out of the nine attributes, eight are independent and one is dependent. Now,
A Comparative Analysis of Predictive Modeling Techniques … 227
4 Proposed System
Data Collection
In this step, the data is collected from the device failure data set available in Kaggle.
Data Cleaning
The dataset consists of a total of 575 records and nine features. Out of the nine
features, eight are independent features, whereas one is a dependent feature. Feature1,
feature2, and feature3 contain total of 90, 75, and 62 missing values. The number
of null value counts is determined by using isnull.conut() function. The null values
in the feature1 are imputed with the mean, whereas missing values in feature2 and
feature3 are imputed with their mode. So, many missing values are found in 45
records; hence, those records are removed from the dataset. Hence, finally the total
cleaned records found in the dataset are 530. For the important feature selection, here
forward feature selection method is used. During this, it is found that all nine features
are really important. Hence, none of the features is dropped from the dataset.
Data Splitting
The whole dataset with 530 records is divided into two parts. 90% (477) of the records
are used for training, whereas 10% (53) of the records are used for testing.
Model Training
In this step, the model training is carried out using the training data. During the
training of the model, it finds the best values of the parameters related to the model.
This process is known as hyperparameter tuning. The actual knowledge base of
any model is created during its training. The model uses its knowledge base to do
accurate predictions during its testing. The issue of overfitting and underfitting is
handled properly during the training process.
Model Testing
In this step, the trained model is evaluated using the test data. Generally, the test data
is unseen for the model. Here, the model categorizes the test data into two classes:
(1) failure class and (2) non-failure or healthy class.
In this step, the accuracy, specificity, and sensitivity are calculated. Through these
scores, the effectiveness of the model can be known.
In the proposed system, the different classification models used are decision tree,
random forest, K-Naïve Bayes.
A Comparative Analysis of Predictive Modeling Techniques … 229
n
Entropy E = − q j log q j (1)
j=1
where
E = The group for which entropy is calculated.
qj = Probability of the group to be there in jth state.
Ni
Information Gain I = E − Ei (2)
N
E = Entropy of all attributes at root node.
N i = Number of elements at ith level in group.
N = Total number of elements in the group.
E i = Entropy at ith level.
P(x|c)P(c)
P(c|X ) = (3)
P(x)
5 Results
The efficiency of any machine learning algorithm is evaluated using different per-
formance metrics are used. In the proposed work, the different metrics considered
are accuracy, recall, and precision.
Accuracy is the ratio of the total number of correct predictions to the total number
of predictions done [9]. Recall is the ratio between true positive to the total number
of positive cases. Precision is the ratio between true positive and the sum of true
positive and false positive [10]. The device which is detected to have failure referred
to as positive cases. The devices which are detected as non-failure cases referred
to as negative cases. The following Table 1 and Fig. 2 show the details of different
performance-related factors of all the above-discussed methods.
In the above table TP, FP, TN, FN, accuracy, precision, and recall score are given.
The highest accuracy and recall are obtained in the case of random forest classifier,
120.00%
100%
91.66%
87.05%
86.07%
83.01%
81.48%
100.00%
77.77%
77.53%
80.00%
58.62%
Naïve Bayes
60.00%
Decision Tree
Random Forest
40.00%
20.00%
0.00%
Accuracy Precision Recall
whereas precision score is highest in the case of Naïve Bayes classifier. For all of the
models, the false positive value is less. False positive is the count that is predicted as
non-failure device but actually, the device will get a failure. This creates a problem
because as it is predicted not to have failure; hence, no preventive maintenance
activity will be carried out. By considering all these above factors, random forest
can be considered as the most efficient algorithm. In random forest algorithm, the
overfitting and the underfitting issues are properly handled by the use of model-related
parameters like the number of splits, the number of subtrees used.
ROC Curve. It is a curve created by drawing a graph between precision and
recall value of the classifier. The area present under the graph is called AUC. This
value generally ranges between 0.5 and 1. If it is closer to 1, then it indicates that
the classifier is doing better prediction. In the following, the AUC curve of different
classifiers is shown. For the random forest and decision tree, the AUC value is
obtained as 0.89, whereas, in the case of Naïve Bayes, the AUC value obtained as
0.91. This AUC is demonstrated in Figs. 3, 4 and 5.
6 Conclusion
In this study, a number of failure detection methods are discussed. In the proposed
system, a machine learning-based predictive model is applied. Here, three different
models are taken like Naïve Bayes, decision tree, and random forest. Out of all the
models, random forest method has shown a better result in terms of accuracy and
recall. Hence, it is suggested here to use a machine learning-based algorithm for
the detection of failure. Once the failure device is detected, then some predictive
232 J. Santhappan and P. Chokkalingam
maintenance can be planned and it can be saved from any failure. Through machine
learning algorithms, failure can be detected within less amount of time with high
accuracy. Finally, it is concluded to use random forest algorithm for the prediction
of real-time applications where failure causes a huge loss.
A Comparative Analysis of Predictive Modeling Techniques … 233
References
1. Susto, Gian Antonio, Andrea Schirru, Simone Pampuri, and Se´an McLoone. 2013. Machine
learning for predictive maintenance: A multiple classifier approach. IEEE Transactions on
Industrial Informatics. https://doi.org/10.1109/tii.2014.2349359.
2. Jose, Tinku Malayil, and Roshan Zameer. 2018. A novel sensor based approach to predictive
maintenance of machines by leveraging heterogeneous computing. IEEE Sensors 2018. https://
doi.org/10.1109/ICSENS.2018.8589620.
3. Mishra, Kamala Kanta, and Sachin Kumar Manjhi. 2018. Failure prediction model for predic-
tive maintenance. In: 2018 IEEE International Conference on Cloud Computing in Emerging
Markets (CCEM).
4. Chen, Chun-Fang, and Yao-Sheng Hsieh. 2013. Automatic baseline-sample-selection scheme
for baseline predictive maintenance. In: 2013 IEEE International Conference on Automation
Science and Engineering (CASE).
5. Fornaro, Oscar, Micaela Cascrza Magrot, and Paolo Pinceti. 2004. Diagnostics for measure
transmitters. In: IMTC 2004 Instrumentation and Measurement Technology Conference Como,
Italy, 18–20 May 2004.
6. Patil, Ravindra B., Meru A Patil, Vidya Ravi, and Sarif Naik. 2017. Predictive modeling for
corrective maintenance of imaging devices from machine logs. In: 2017 39th Annual Interna-
tional Conference of the IEEE Engineering in Medicine and Biology Society (EMBC). https://
doi.org/10.1109/embc.2017.8037163.
7. Sisman, George Roberti, and Oproescu Mihai. 2017. Monitoring the parameters of the electron-
ics devices to assure the predictive maintenance of equipment. In: 10th International Symposium
on Advanced Topics in Electrical Engineering, Mar 23–25, Bucharest, Romania.
8. Susto, Gian Antonio, Andrea Schirru, Simone Pampuri, Daniele Pagano, Se´an McLoone, and
Alessandro Beghi. 2013. A predictive maintenance system for integral type faults based on
support vector machines: An application to Ion implantation. In: 2013 IEEE International
Conference on Automation Science and Engineering (CASE).
9. Swain, Debabrata, Santosh Pani, and Debabala Swain. 2019. An efficient system for the pre-
diction of coronary artery disease using dense neural network with hyper parameter tuning.
International Journal of Innovative Technology and Exploring Engineering (IJITEE), 8: (6S).
ISSN: 2278-3075.
10. Swain, Debabrata, Santosh Pani, and Debabala Swain. 2019. Diagnosis of coronary artery
disease using 1-D convolutional neural network. International Journal of Recent Technology
and Engineering (IJRTE) 8: (2). ISSN: 2277-3878.
Python and OpenCV in Automation
of Live Surveillance
Abstract There have been many uses of image recognition and feature detection
in recent days. With the growth in the popularity of Python and simplification of
automation allows us to bring the live surveillance in the domain. Nowadays, the
situation, with the sectors like banking, needs high amount of security due to their
increasing importance. Especially on remote location where services like ATM are
provided, the security factor becomes main concern. By automation of surveillance,
there will be an efficient way to reduce stress on security.
Keywords Image and video processing · Haar Cascades [1] · Haar-like feature
[1] · Integral images · OpenCV–Python
1 Introduction
The main objective of our work is to detect suspicious activities in public places.
Currently, we are working on the object detection process by creating Haar cascade
file for specific object like the helmet. The Harr cascade file which was created is
XML classifier which is used in various detection processes like detection of helmet
wearing persons in front of ATM’s, in traffic monitoring, etc.
This requires the collection of datasets of multiple images of the objects for the
detection. It basically uses the images as the reference to detect alike objects. It
requires different datasets such as dataset of positive (to be detected object) images
and negative (not to be detected) images, etc. To achieve high accuracy, dataset must
be large enough so that it can be trained at maximum level.
The main motive is to create efficient Harr cascade file for detection of person
wearing helmet. The surveillance system will be further enhanced to increase the
accuracy and efficiency by training the object dataset model to further stages.
2 Object Detection
Today, pictures and video are available in plenty. The field of vision research has
been overwhelmed by AI, measurements, and statistics. Making use of pictures to
recognize and characterize and track objects or events so as to discern a genuine scene.
We consider picture as the scene comprising of objects of interest and the foundation
background represented by everything else in the image. Object identification decides
the presence of the object or its extension and its areas in the picture.
Object recognition field is commonly completed via looking through each section
of a picture to limit parts, whose various properties like geometric, photometric, etc.
coordinate those of the objective article in the preparation database. This can be
cultivated by examining an object layout over a picture at various areas, scales,
and turns, and location is proclaimed if the likeness between format and picture is
sufficiently high.
3 Image Processing
• Output is the last stage where result can be modified image or report that depends
on image examination (Fig. 1).
The use of Haar-like feature for detection is proposed first time by Paul Viola and
Michael Jones in 2001. The thought behind the Haar detection cascade is to discard
negative images with very negligible computation [3, 4]. A series of classifiers are
processed to each subsection of the image. If that subsection fails to pass through any
classifier, then that image is discarded, and further calculations are done. Once the
subsection passes the stages, amount of computations required will be more. If the
image passes the classifier, it is passed to the next subsection. Classifiers’ accuracy
can be increased by more training.
Haar-like characteristics are an over complete arrangement of two-dimensional
capacities, which can be used to encode local appearance of object. They consist of
at least two rectangular areas encased in layout.
Haar-classifier is a binary classifier; it simply detects whether the object is present
in given a image or not (Fig. 2).
As shown in Fig. 3, the feature is calculated as the difference of the sum of the pixel
intensities of black and white. The ideal is 1. While as in figure for real image, we
have the = 0.74−0.18 which is equal to 0.56. So closer the value to 1, the more
likely we have found a Haar-feature.
1 1
n n
= I (x) − I (x)
n DARK n WHITE
238 N. Navghare et al.
5 Literature Review
• In 1984, Franklin C. Crow proposed summed-area tables for texture mapping [6].
This paper describes the following features. Given a method to map the antialiasing
texture of egg-shaped surfaces.
• The design and use of steerable filters in 1991. This paper describes following
features: analysis of orientation and phase, angularity adaptive filtering, edge
detection and shape from shading.
• Coarse to Fine Face Detection 1999 describes following features. Detect and
roughly locate the generic objects class. The detection is rapid but high rate of
false alarms is raised.
• In 2001, Viola, P. and Jones, M. found rapid object detection technique using a
boosted cascade of simple features [1]. The paper describes the new concept of
integral images which results less computation time. Adaboost algorithm reduces
number of features from larger one to give efficient classifier.
6 Methodology
We were supposed to collect the positive images of the object which was to be detected
(Here in this case, images of helmet) [7]. Having resolution between 300 * 300
and 400 * 400, we kept resolution of positive images lesser than negative images
because, at time of creation of sample, we are going to supper impose positive image
on negative image [8].
We have used 30 positive images and stored them in raw data directory and nearly
400 negative images and stored them in …\negative directory. There is also one
descriptor file for negative images named as bg.txt which can be created by running
create_list.bat file.
240 N. Navghare et al.
We have taken care that the object which we want to detect its images (positive
images) is covered in all direction and in all possible cases. If we consider for gun
detection, we have collected images like gun placed on the table, in pocket, in the
hand, etc [9].
Similarly, we have taken all way possible images of helmet (positive images),
i.e., helmet placed at different angle, wore by different people, at different angles,
etc [10].
Before training, we need one vector file that contains path to positives and location
of object in each image [11]. To do this run objectmarker.exe file which is present
at…/training/positives. Draw rectangle over required object press “SPACE” to add
object and press “ENTER” to save that object.
Run samples_creation.bat file present at…/training which will create vector of all
positives [12]. The content of this file is as follows.
6.4 Training
The training command has the parameters like the location of info file and vector
file, number of positive and negative images, the memory used the minimum height
and width of window for detect and number of stages unto which we are supposed
to train the object [13]. More the stages better the detection. The stages also depend
upon the size of database as more the stages more the feature to extract, thus more
example images required. Run the haartraining.bat file present at…/training [4].
haartraining.exe -data cascades -vec vector/vector.vec -bg negative/bg.txt -npos 30
-nneg 400 -nstages 9 -mem 1024 -mode ALL -w 24- h 24–nonsym.
Python and OpenCV in Automation of Live Surveillance 241
7 Experimental Setup
Dataset Algorithm
Images (positive, negative) Rapid object detection using boosted cascade of simple features
Level no Background processing Precalculation time (ms) Stage training time (ms)
time (ms)
0 0.01 7.43 2.96
1 0.03 7.03 2.83
(continued)
242 N. Navghare et al.
(continued)
Level no Background processing Precalculation time (ms) Stage training time (ms)
time (ms)
2 0.07 6.65 2.72
3 0.10 6.61 3.09
4 0.43 6.99 3.27
5 0.83 6.92 3.69
6 1.48 6.74 4.73
7 7.84 6.93 5.72
8 27.05 6.44 8.83
Above table indicating training of helmet detection Harr cascade XML classifier. The
experimental readings are mentioned above. In this experiment, we have successfully
crated Harr cascade XML classifier up to ninth stage [1]. From the above reading,
we can clearly see that as the level increases in training of Harr cascade file, back-
ground processing time gets increased exponentially but precalculation time remains
constant.
Along with this stage, training time gets increased gradually.
Here, precalculation time varies between 6.5 and 7.5 ms. Which means, it is nearly
constant in that range.
NOTE: For above reading, we have used Intel i5-8th gen processor.
Below graphs can show that how background processing time and stage training
time increase exponentially.
Python and OpenCV in Automation of Live Surveillance 243
9 Future Scope
Image processing has wide range of application in current developing industry and
day-to-day life also [1]. In order to do image processing, we need to have XML clas-
sifier. In this experiment, we have created Harr cascade file for helmet detection and
this XML classifier can be used for surveillance in banking industries for detection
of helmet wearing person in premises of ATM machines.
And through this process, we can create different Harr cascade XML classifier as
per the need of industries. For example, in product line, for detection of any other
object except the product, we can use this approach in order to solve this issue and
many more.
References
1. Viola, Paul, and Michael Jones. 2001. Rapid object detection using a boosted cascade of simple
features. CVPR (1): 511–518.
2. Amit, Yali, Donald German, and Kenneth Wilder. 1997. Joint induction of shape features and
tree classifiers. IEEE Transactions on Pattern Analysis and Machine Intelligence 11: 1300–
1305.
3. Rezaei, Mahdi. 2013. Creating a cascade of haar-like classifiers: Step by step. Aplikasi
Pendeteksian Ras kucing dengan mendeteksi wajah kucing dengan metode viola jones.
4. Mahdi, Rezaei. Creating a cascade of Harr like classifiers. Department of computer science,
Auckland.
5. Sung, Kah K., and Tomaso Poggio. 1994. Example based learning for view-based human face
detection. Massachusetts Institute of Technology Cambridge Artificial Intelligence Lab, No.
AI-M-1521.
6. ElSaid, W.K. A System for managing attendance of academic staff members in university
development programs using face recognition.
7. Greenspan, Hayit et al. 1994. Overcomplete steerable pyramid filters and rotation invariance,
222–228.
8. Schneiderman, Henry, Takeo Kanade. 2000. A statistical approach to 3D object detection
applied to faces and cars. Carnegie Mellon University, the Robotics Institute.
9. Tsotsos, John K., et al. 1995. Modeling visual attention via selective tuning. Artificial
Intelligence 78 (1-2): 507–545.
10. Itti, Laurent, Christof Koch, and Ernst Niebur. 1998. A model of saliency-based visual attention
for rapid scene analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence 11:
1254–1259.
11. Fleuret, Francois, and Donald Geman. 2001. Coarse-to-fine face detection. International
Journal of computer vision. 41 (1–2): 85–107.
12. Schapire, Robert E., et al. 1998. Boosting the margin: a new explanation for the effectiveness
of voting methods. The Annals of Statistics 26 (5): 1651–1686.
13. Osuna, Edgar, Robert Freund, and Federico Girosi. 1997. Training support vector machines:
an application to face detection. cvpr. 97: 130–136.
Privacy Preservation in ROI of Medical
Images Using LSB Manipulation
Abstract In the current era of digitization, the medical images are getting trans-
mitted over the communication channel for diagnosis. During this process, it can
be modified accidentally or intentionally. This occurs because of the insecure Inter-
net network, which ultimately affects the sent image information. Thus, it results in
wrong diagnostic decisions. Here, the basic purpose is to secure image transmission,
without changing the sensitive areas of images that carry medical information. One
of the possible watermarking methods to avoid erroneous diagnosis and checking
the reliability of the received medical image is region-based analysis using ROI. The
ROI of medical image is irregularly figured which contains important information.
This paper tries to analyze these issues and attempts to overcome such problems by
recovering secure medical images.
1 Introduction
a new method is proposed to get the exact original image when it needs without
affecting the affected area. The sensitive as well as important areas in a medical
image are named region of interest (ROI), which is meant for diagnostic decision
making and on the other hand, remaining part is referred to as region of non-interest
(RONI) [2]. Now, diagnostic of the specialist depends upon the ROI contents of the
different medical image, so it has to be tamper-free or should not be modified [2].
Excluding the rest part of the paper is structured as follows: Sect. 2 covers the image
security techniques, different ROI techniques in Sect. 3, and in Sect. 4, the proposed
ROI hiding technique is discussed and the conclusion with possible future work is
described in Sect. 5.
2.1 Watermarking
2.2 Steganography
On the other hand, the steganography method is widely accepted which is meant
to be a proposed technique of both covered and secret writing. It is a technique of
hiding messages from the others where the only intended recipient can only know
that a message has been sent, where the cover messages are encoded. This encod-
ing message is called a secret message after that it is covered into steganography
messages. In this case, image steganography lossy and lossless compressions are
used for saving the storage space but have performed different effects on any decom-
pressed hidden data in their image. As per the name suggests, lossy provides high
compression but not properly maintains the original image integrity, where a lossless
compression performed the opposite principle of the loss compression. The best part
of this method is to provide better security for sharing data faster with a large number
of software which is difficult to detect except the receiver.
Privacy Preservation in ROI of Medical Images using LSB … 247
The main propose behind the hiding information in an image is to protect origi-
nal image pixels from a vast distortion on a permanent basis after extracting that
information. This proposals is commonly called a reversible data hiding technique.
It extracts the original state of the image at the receiver end from the cover image,
as well as it helps the embedding of messages in a host image without any losses
of the host content. So this technique is not only used for data hiding but also the
entire recovery of the original image from the encrypted image. This technique is
most desirable in those applications where degraded restoration is not allowed, like,
medical images, forensic images, military maps, etc.
ROI is commonly known as region of interest. That means it defines certain areas
or region which is a very important area of an image. The regions can be polygons,
circles, etc. Selecting ROI of an image means to select that area; this is not intended
for any kind of loss in pixels or any kinds of distortions. RONI is commonly known as
region of non-interest. Due to the consideration of the important and non-important
areas of an image, an image is divided into two parts. Though ROI is considered
an important part of an image, on the other hand, RONI is an area that is left after
selecting the ROI part. In the case of RONI, the pixels of that area may get compressed
or data that may lose.
In 2016, Deepa S. et al. proposed an ROI lossless colored medical image watermark-
ing scheme with secure embedding of patient data [3]. This proposed system consists
of two types of watermarks BP and BP+ at the two different LSB planes for creating
and embedding the first watermark in the bit plane BP+1, patient health record, and
color medical image is provided as input [3].
As a result of which color medical image with watermark1 is come out. Secondly,
for generating watermark in the bit plane BP, a color medical image with watermark1
is provided as input and on the other hand, as a result of its output, it is ready for
transmission. On the process, the receiver side is the same occurred in a reverse
manner. A similar process is done in the case of a noisy environment except that
there is an addition of noise at the transmitter side and it will be removed by the filter
248 M. Das et al.
Fig. 1 Block diagram for a colored image with noise addition [3]
at the received side. A colored image environment with noise addition and removal
complete block diagram is shown Fig. 1.
As a result of it the accurate health record of the patient at the receiver side with the
marked ROI can be accurately recovered for diagnosis [3].
an image. The ROI is calculated by the hash of ROI if decrypted image. On the other
hand, the received ROI is calculated by hash of ROI. Calculated value is compared
with the extracted value. If both of the values are the same, then it will be the original
retrieved image otherwise the image is discarded. As a result of which
1
m−1 n−1
MSE = (I (i, j) − Iw (i, j))2 (1)
mn i=0 j=0
MSE defined the decrypted original image. In this case, PSNR is used to show the
loss and lossless compression of the image. The PSNR ratio is maximum, easily
clear by the MSE. The PSNR can be measured from the MSE as follows (Fig. 2):
2552
PSNR(in dB) = 10 log10 (2)
MSE
value of each segment will be subtracted from the center and added with the center
depending whether first bit of the segment is zero or one simultaneously.
After improving data of ROI, it embeds within RONI. Then the medical image of
the RONI area is divided into block of 8 × 8 size each and these division steps are
continued for each blocks B in RONI. After all, the bits of recovery data are embedded
in the ROI, CDF (2, 2) integer wavelet transfer up to second level on B is applied.
There are two bits of recovery data in ROI is embed in every coefficient of the
middle frequency sub-bands LH1, HL1 , LH2, and HL2 of B and finally, inverse integer
wavelet transform is applied on B [2]. In case of to avoid overflow/underflow problem,
histogram shifting process is used by
B(x, y) + k if B(x, y) ≤ k
B(x, y) = (3)
B(x, y) − k if B(x, y) ≥ k
where k is considered as a threshold value and for block B, B(x, y) is the pixel value.
To avoid overflow and underflow, always the value which is set in place of k will be
greater than 15 or similar to it.
As a result of which after conducting experiments on medical image like MRI
scan, ultrasound, CT scan, etc., as shown in the below Fig. 3, this method properly
recognizes tampered areas within ROI and get back to its original ROI as per the
necessity. No embedding distraction will be there after generating the watermarking
medical image. Also, find out the tampers within ROI and without any losses the
original ROI recovers during the time of tempered, as well as provides robustness.
Here, the original grayscale image is IM × N whose pixel values range from 0 to 255.
Step-1: Let the ROI image be IROI = (I)n × n , where the sides of the square are n.
Step-2: Convert each pixel of IROI format to 8-bit ASCII format; let it be IROI-8 .
Step-3: Let us extract the 2 LSB of each pixel of IROI-8 and store in a matrix ILSB-2 ,
circular row shift them and store in ILSB-2R of order [n2 × 2].
Step-4: Let us replace the matrix ILSB-2 with ILSB-2R in IROI .
Step-5: Now, convert the IROI-8 matrix to IROI by 8-bit ASCII to decimal conversion.
Step-6: Finally, select and replace the IROI in IM × N and let it be I’M × N .
After hiding data into the ROI segment, now the image will be sent to the receiver.
At the receiver end, the ROI region will be again selected and recovered.
Privacy Preservation in ROI of Medical Images using LSB … 251
Fig. 3 Actual image watermarked and reconstruct medical images. From top to bottom sequence:
CT scan, MRI scan, and ultrasound images [2]
Here, the received grayscale image is I M × N whose pixel values range from 0 to
255.
Step-1: Now, the ROI image selected from I M × N be I ROI = (I M × N )n × n , where
the sides of the square are n.
Step-2: Convert each pixel of I ROI format to 8-bit ASCII format; let it be I ROI-8 .
Step-3: Let us extract the 2 LSBs of each pixel of I ROI-8 and store them in a matrix
I LSB-2 , now reverse circular shift the rows and store in I LSB-2L of order [n2 × 2].
Step-4: Let us replace the matrix I’LSB-2L with I’LSB-2 in I ROI .
Step-5: Now, replace the I ROI matrix in I M × N .
252 M. Das et al.
5 Experimental Results
The proposed method demonstrates the prospective experiments were carried out on
standard 512 × 512 test X-ray images. Figure 4a depicts an original X-ray image,
Fig. 4b shows the extracted ROI of an X-ray image. Then, the extracted ROI image is
marked using the embedding procedure to generate the embedded image, shown in
Fig. 4c. Figure 5a depicts the embedded X-ray image which is visually similar to the
original image. Finally, the recovered image is generated by reversing the marked
ROI in Fig. 5c.
The complete experimental data of our proposed scheme is given below in Table 1
showing the PSNR, SSIM, and MSE of recovered image with reference to an input
image.
Fig. 4 a Original Image, b extracted ROI image, and c embedded image, respectively, at sender
end
Fig. 5 a Embedded image, b extracted ROI image, and c recovered Image, respectively, at receiver
end
Privacy Preservation in ROI of Medical Images using LSB … 253
The proposed technique is completely reversible, which can preserve the ROI in
medical images, and can be recovered as in the original image. A simple data hiding
capability and creating enough embedding capacity are the two most important areas
that have scope for improvement. In the existing methods, the embedding process
follows a uniform circular shift technique of the LSB bits of ROI of the input image,
this largely improves the data hiding capacity. In the RONI segment, no such data
hiding is done. It is a concern of the future scope to increase the data preservation on
sensitive images. Color images can be taken as test images. The test images can be
further expanded into JPEG, DICOM, and higher than 512 × 512 grayscale images.
Other evaluation parameters can be effectively measured.
References
1. Rocek, Ales, M. Javornik, K. Slavicek, and O. Dostal. 2017. Reversible watermarking in medical
imaging with zero distortion in ROI. In: 2017 24th IEEE International Conference on Electronics
Circuits and Systems (ICECS), 356–359. https://doi.org/10.1109/icecs.2017.8292071.
2. Rayachoti, E. and S.R. Edara. 2014. A novel medical image watermarking technique for detecting
tampers inside ROI and recovering original ROI. In: IEEE International Symposium on Signal
Processing and Information Technique (ISSPIT), 321–326. https://doi.org/10.1109/isspit.2014.
7300608.
3. Deepa, S., and A. Sandeep. 2016. ROI lossless colored medical image watermarking scheme
with secure embedding of patient data. In: 2016 International Conference on Communication
Systems and Network (ComNet), 21–23 July 2016 Trivandrum, 103–106.
4. Avcibas, I., B. Sankur, and K. Sayood. 2002. Statistical evaluation of image qualitymeasures.
Journal of Electronic Imaging 11 (2): 206–224. https://doi.org/10.1117/1.1455011.
Novel Approach for Measuring Nutrition
Values Using Smartphone
Abstract Food is the source of energy, and it plays a vital role in human existence.
The quality of food is suffering day by day such as adulteration and heavy use
of various pesticides. The traditional approach to analyze food nutritional values
involves the use of various sensors and laboratory procedures to detect the quality,
but such sensors and methods really take lots of time. There is a need of a system
which we can use to quickly evaluate the quality of food by methods which are
ubiquitous. The number of handheld devices and their processing capabilities has
increased manifolds over the last few years. In this paper, the novel methodology
has been proposed which uses the smartphones to take the image, and instantly, it
provides the nutrition value. The proposed model helps detect the nutritional quality
of the food by utilizing the various sensors which are present in smartphones such
as cameras and microphone. The model uses the classifiers to detect the type of food
and process all the algorithms in cloud. Four datasets are used with multi-class level.
Machine is trained with various algorithms such as CNN and RNN, and we have used
transfer learning. The whole system is implemented successfully, and the accuracy
of 82% has been achieved.
1 Introduction
As per the economist in 2016, more than 1.9 billion adults aged 18 years and older
were overweight. Of these, over 650 million adults were obese [1]. Similarly, about
13% of the world’s adult population (11% of men and 15% of women) were obese
in 2016 [2]. Obesity increases the risk of various diseases and health conditions such
as heart disease, stroke, high blood pressure, diabetes and some cancers. Similarly,
more than one-third of the world’s malnourished children live in India. Among these,
half of the children under three years old are underweight and one-third of wealthiest
children are over-nutriented [3]. Malnutrition also adverse effect on just children but
adult as well, some them include stunted growth, poor immune system, micronu-
trient deficiency, weak immunity, inactivity of muscles, apathy, depression, kidney
function impairment. Thus, there is a need for a method which can quickly determine
the nutrition value of food without using complicated laboratory procedures. Such
methods can help users quickly determine nutrition information and thus make better
diet choices. While laboratory procedures are most accurate and reliable, they are
often time consuming and cannot be carried out in all situations. Smartphones are
becoming ubiquitous and more powerful with better cameras and processors. The
aim of this paper is to leverage the technologies like object detection and ML to
provide a method of detection of nutritional values on smartphones by utilizing the
cameras to identify the food and subsequently provide these values to the user. Using
object detection, now we can identify the food items easily. While annotating the
images, we create bounding boxes and label each food item in the photograph. So,
using this, we can detect more than two food items from a single image. Transfer
learning is also used here, for finding accuracy on trained models like MobileNet
and Inceptionv3. ML is used to improve the accuracy of the system based on the
previous interactions.
We aim to create a system (android application) which will be able to detect the
nutritional values of the food. This will be done by utilizing the cameras which are
presented in smartphones. We aim to leverage machine learning and object detection
to help detect the nutritional quality of the food. The traditional approach involves the
use of various sensors and laboratory procedures to detect the quality, but such sensors
and methods are not ubiquitous. The model created will utilize object detection to
recognize instances of food items and help in better classification, while machine
learning will be utilized to make sure that the system improves based on the previous
interactions in order to increase the accuracy over time.
2 Approach (TensorFlow)
We are using TensorFlow open-source software for model training and classification.
TensorFlow provides us with a trained model, which we then port into our Android
application for food detection, classification and then to obtain the nutritional values
of the given food items.
Novel Approach for Measuring Nutrition Values Using Smartphone 257
Steps:
• Create database of food images.
• Annotate or label the database.
• Create xml files using image labeling softwares, e.g., LabelImg.
• Create a csv file and then convert it into TFRecord file.
• Create a pipeline, i.e.,. build a.config file.
• Use cloud services to train the classifier using the images, tfrecord file and some
configuration steps.
• Model is created and then ports it into Android application, i.e., copy the.pb files,
labels.txt file and the assets into the android project.
• Create activities in Android to exploit the model and obtain nutritional value
results.
Dataset Information
See Table 1.
So, what we did here is train datasets on various MobileNet models and Inception
v3 and did analysis based on that. MobileNets are suitable for low-compute power
devices, whereas Inception requires more computation power (Fig. 3).
Analysis
MobileNet provides enough accuracy as compared to Inception considering the low
computation power and the time it takes. Inception provides more accuracy but is
not much suitable for mobile devices.
(1) UPMC_Food20
This dataset has 20 categories of food with 100 images per category. MobileNet
1.0 gives the maximum accuracy here when we consider dimension = 224.
Novel Approach for Measuring Nutrition Values Using Smartphone 259
4 Implementation
The application uses the smartphone camera to get the input which is provided to
the object detection algorithm which helps to identify the respective food item. To
improve the accuracy over time, we have used machine learning algorithms. The user
has various options, and the user can use the smartphone camera, choose the desired
food item from the list of options provided in—app and choose from the recently
searched item or frequently searched items. Everyday intake can also be monitored
using the application. The application has a very user-friendly GUI and can function
online as well as offline.
The nutritional value system can be divided into following components:
• Object Detection
• User Diary
A daily diary or logbook can be maintained by the user, where he can see the breakfast,
lunch, snacks and dinner intake. What foods has he/she eaten, how much calories
consumed, what all food items are present. The user can see up to one week’s
previous entries and can plan up to two days ahead of time. Using this diary, user
can find out trends and the level of calorie consumption or his day-to-day food habits
(Fig. 5).
5 Conclusion
The problem of increasing food adulteration is posing a serious harm to the society.
While laboratory procedures are the ultimate test for checking quality, it is not feasible
to use them every time. The final system, if properly implemented, will be a huge
benefit in day-to-day life as we will be able to quickly decide whether the food is
fit for consumption and what its nutritional value is. The main advantage of such a
system being that no extra investment is required for this procedure as smartphones
are widely available.
262 S. Mishra et al.
References
1. World Health Organization. 2018. Obesity and overweight. Available at: http://www.who.int/
news-room/fact-sheets/detail/obesity-and-overweight. Accessed May 29, 2018.
2. The Economist. 2018. Putting the smallest first. Available at: https://www.economist.com/
node/17090948 Accessed May 29, 2018.
3. The Economist. 2018. The Indian exception. Available at: https://www.economist.com/node/
18485871. Accessed May 29, 2018.
4. Singh, D.P., P.J. Gadakh, P.M. Dhanrao, S. Mohanty, D. Swain, and D. Swain. 2017. An appli-
cation of NGBM for forecasting indian electricity power generation. Advances in Intelligent
Systems and Computing 556: 203–214.
5. Pathan, A., R. Kokate, A. Mutha, P. Pingale, and P. Gadakh. 2016. Digital India: IoT based
intelligent interactive super market framework for shopping mall. Engineering Science 1(1):
1–5.
6. Singh, V., and S. Dwivedi. Ultrasonic detection of adulteration in fluid foods. In: Proceedings
of the First Regional Conference, IEEE Engineering in Medicine and Biology Society and 14th
Conference of the Biomedical Engineering Society of India. An International Meet.
7. Pouladzadeh, P., S. Shirmohammadi, and R. Al-Maghrabi. 2014. Measuring calorie and nutri-
tion from food image. IEEE Transactions on Instrumentation and Measurement 63 (8):
1947–1956.
8. He, H., F. Kong, and J. Tan. 2016. DietCam: Multiview food recognition using a multikernel
SVM. IEEE Journal of Biomedical and Health Informatics 20 (3): 848–855.
9. Villalobos, G., R. Almaghrabi, P. Pouladzadeh, and S. Shirmohammadi. 2012. An image pro-
cessing approach for calorie intake measurement. In Proceedings of IEEE Symposium on
Medical Measurement Applications, 1–5, Budapest, Hungary.
10. Villalobos, G., R. Almaghrabi, B. Hariri, and S. Shirmohammadi. 2011. A personal assistive
system for nutrient intake monitoring. In: Proceedings of ACM Workshop Ubiquitous Meta
User Inter, 17–22, Scottsdale, AZ, USA.
11. Ginesu, G., D. Giusto, V. Margner, and P. Meinlschmidt. 2004. Detection of foreign bodies
in food by thermal image processing. IEEE Transactions on Industrial Electronics 51 (2):
480–490.
12. Anthimopoulos, M., L. Gianola, L. Scarnato, P. Diem, and S. Mougiakakou. 2014. A food
recognition system for diabetic patients based on an optimized bag-of-features model. IEEE
Journal of Biomedical and Health Informatics 18 (4): 1261–1271.
13. Karuppuswami, S., A. Kaur, H. Arangali, and P. Chahal. 2017. A hybrid magnetoelastic wireless
sensor for detection of food adulteration. IEEE Sensors Journal 17 (6): 1706–1714.
14. Dave, A., D. Banwari, S. Srivastava, and S. Sadistap. 2016. Optical sensing system for detect-
ing water adulteration in milk. In: 2016 IEEE Global Humanitarian Technology Conference
(GHTC).
15. Nandi, C., B. Tudu, and C. Koley. 2016. A Machine vision technique for grading of harvested
mangoes based on maturity and quality. IEEE Sensors Journal 16 (16): 6387–6396.
16. Kong, F. and J. Tan. 2011. Dietcam: Regular shape food recognition with a camera phone. In:
2011 International Conference on Body Sensor Networks (BSN) 127–132.
17. Casasent D., M.A. Sipe, T.F. Schatzki, P.M. Keagy, and L.L. Lee. 1998. Neural net classification
of X-ray pistachio nut data. Lebensmittel Wissenschaft and Technologie 31: 122–128.
18. Davidson, V., J. Ryks, and T. Chu. 2001. Fuzzy models to predict consumer ratings for biscuits
based on digital image features. IEEE Transactions on Fuzzy Systems 9 (1): 62–67.
19. Neelamegam, P., S. Abirami, K. Vishnu Priya, and S. Valantina. 2013. Analysis of rice granules
using image processing and neural network. In: 2013 IEEE Conference on Information and
Communication Technologies.
20. Anami, B.S., V. Burkpalli, S.A. Angadi, and N.M. Patil. 2003. Neural network approach
for grain classification and gradation. In: Proceedings of the Second National Conference
on Document Analysis and Recognition 394408.
264 S. Mishra et al.
21. O’Farrell, M., C. Sheridan, E. Lewis, C. Flanagan, J. Kerry, and N. Jackman. 2007. Online
optical fiber sensor for detecting premature browning in ground beef using pattern recognition
techniques and reflection spectroscopy. IEEE Sensors Journal 7 (12): 1685–1692.
22. Khosa, I. and E. Pasero. 2014. Artificial neural network classifier for quality inspection of
nuts. In: 2014 International Conference on Robotics and Emerging Allied Technologies in
Engineering (iCREATE).
23. Sasano, S., X. Han, and Y. Chen. 2016. Food recognition by combined bags of color features
and texture features. In: 2016 9th International Congress on Image and Signal Processing,
Bio-Medical Engineering and Informatics (CISP-BMEI).
24. Mueen, A., M.S. Baba, and R. Zainuddin. 2007. Multilevel feature extraction and X-ray image
classification. Applied Sciences, 7: (8) 1224–1229.
25. Lee, K., Q. Li, and W. Daley. 2007. Effects of classification methods on color-based feature
detection with food processing applications. IEEE Transactions on Automation Science and
Engineering 4 (1): 40–51.
26. Singh, V., and S. Dwivedi. 1995. Ultrasonic detection of adulteration in fluid foods. In: Pro-
ceedings of the First Regional Conference, IEEE Engineering in Medicine and Biology Society
and 14th Conference of the Biomedical Engineering Society of India. An International Meet.
27. Howard, Andrew G., Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun Wang, Tobias
Weyand, Marco Andreetto, and Hartwig Adam. 2017. MobileNets: Efficient convolutional
neural networks for mobile vision applications. CoRR, abs/1704.04861.
Detection of Hate Speech in Hinglish
Language
Abstract As mobile phones and Internet become more and more popular, the num-
ber of social media users in India continues to go up. Majority of Indian social
media users use Hinglish as their medium of communication. The Hinglish language
is a mixture of Hindi words (typed in English) and English words. However, with
increasing numbers, there is also an increase in the amount of hate-filled messages,
posts, and comments put up on social media platforms. Hate speech is usually done
to target an individual or group of individuals on the basis of caste, community,
ethnicity, religion, gender, or any other discriminating factor. It can have negative
impacts on the individuals facing it and consequently on the society as well. As the
amount in which such kind of content is generated is huge, it becomes necessary to
automatically detect hate speech so that preventive measures can be taken to control
it. Although there has been quite a lot of research on hate speech detection in English
texts, not much work can be found on hate speech detection in Hinglish language.
This paper presents an approach of detecting hate speech in Hinglish texts using
long short-term memory (LSTM), which works on word embeddings generated by
gensim’s word2vec model.
1 Introduction
The online space is rapidly growing by leaps and bounds year after year. The Indian
social media space is a highly diverse community as people hailing from different
ethnicities and religions are a part of it. As of 2018, there were 326.1 million social
media users in India. It is estimated that it will cross 400 million by 2021 [1]. Lowering
of mobile data rates resulting in cheaper data plans has largely contributed to this
increase in numbers. Facebook, Twitter, and Instagram are few of the most widely
used social media applications in India. When people from such a wide variety of
backgrounds come together, discussions and disagreements are inevitable. However,
many a times it is observed that these disagreements transform into heated debates
where people start using harsh and abusive words. Such situations become the source
of hate speech.
In Indian social media, hate speech can be mostly observed in forums or threads
related to politics and religion. People who are politically or communally polarized
often resort to using hate speech. Statements like ‘ise pakistan bhej dena chahie’ or
‘aise deshdrohiyo ko bharat mein rehne ka koi haq nahi hai’ are very common. Such
statements can create social unrest as they can easily offend someone and can even
lead to violent incidents like riots. In recent past, there was a rise in number of mob
lynching and communal violence incidents in India fueled by propaganda-driven
hate messages, which would usually get viral through WhatsApp groups [2]. Social
scientists have found a correlation between online hate speeches and hate crimes [3].
Thus, the problem of hate speech is worrying and needs to be kept in check.
Facebook, Instagram, and Twitter have taken measures from their own end by
letting users report offensive posts, comments, or tweets. A team of experts then
reviews such reports and takes appropriate actions. However, all these measures are
reactionary in nature and not proactive. Besides, almost all sentiment analysis tools
are for English texts, whereas very few works of sentiment analysis for Hinglish texts
can be found on the Internet. A Hinglish word is a Hindi word spelled in Roman
script. Thus, the same Hindi word is bound to suffer from spelling variations. For
example, for the Hindi word there could be different spellings like pyaar, pyar,
pyara, etc. Thus, a lot is dependent on phonetics. Due to all these reasons, there arises
a need for making an effort in the direction of automatic hate speech detection in
Hinglish language.
In this paper, a variant of recurrent neural network called long short-term memory
(LSTM) is used, which takes word embedding values as input for training and then
classifies Hinglish documents into two categories—(1) hate and (2) not hate. LSTM is
a deep neural network which captures long-term dependencies between words of the
same text document. In the following sections, the paper describes the related work,
the process of collecting and constructing the dataset, the proposed methodology
which is followed by performance analysis. The paper concludes by mentioning the
scope of future works that can be pursued for the same problem statement.
The main goals of this work are as follows:
• Classification of Hinglish tweets as hate or not hate and
• Compilation of a dictionary of over 11k Hindi words and phrases with their
corresponding English translations.
Detection of Hate Speech in Hinglish Language 267
2 Related Work
Mathur et al. [4] used multi-input multi-channel transfer learning-based model for
detecting offensive Hinglish tweets. They used a hybrid CNN-LSTM architecture
which was pre-trained on English tweets and then used for transfer learning. Ravi
and Ravi [5] concluded that a combination of TF-IDF features with gain ratio-based
feature selection when passed to radial basis function neural network gives best
results for sentiment classification in Hinglish text. Bohra et al. [6] did supervised
classification on Hinglish tweets using support vector machine and random forests
on character-level, word-level, and lexicon-based features. Each word of every tweet
was labeled with its language, i.e., English or Hindi, and all the tweets were classified
into two classes—hate speech and normal speech.
Kunchukuttan et al. [7] developed Brahmi-Net, a publicly available translitera-
tion tool on the Internet for interconversion of words in scripts of different Indian
languages as well as English. The services of Brahmi-Net were made available via
a REST API.
Pan et al. [8] suggested that how transfer learning can be used to solve various text
classification problems. Bali et al. [9] did analysis on various Facebook posts, gen-
erated by Hinglish multilingual operators. They concluded that many posts contain
significant amount of code-mixing.
3 Dataset Collection
Table 1 illustrates the detailed dataset distribution. There are a total of 6431 text
documents out of which 2211 are categorized as hate speech and remaining 4431
are categorized as non-hate speech. Ratio of hate text documents to non-hate text
documents is approximately 1:2, since in real life as well the proportion of usage of
non-offensive language is more as compared to that of offensive langauge.
Table 2 illustrates how multiple documents from various sources are combined to
create the dataset. Hate tweets are gathered from HOT dataset [4] and Aggression
dataset [10]. HOT dataset consists of manually annotated Hinglish tweets (Hindi
tweets written in English script), primarily classified into two categories: ‘1’ for
tweets which do not express any kind of hate or offense and ‘0’ for tweets which
express hate or offense. Similarly, aggression dataset consists of three distinct cat-
egories of tweets which are listed as follows: covertly aggressive (CAG), overtly
aggressive (OAG), and non-aggressive (NAG). In the present paper, tweets which
are having label ‘0’ in HOT dataset and tweets having labels ‘CAG’ or ‘OAG’ are
considered as hate tweets.
4 Proposed Methodology
This section focuses on brief description about system workflow overview, LSTM
architecture, and detailed description system workflow.
Figure 1 gives an overview of the workflow of the present system. A raw Hinglish
dataset annotated with ‘1’ (hate) and ‘0’ (non-hate) labels is taken and made to go
through some cleaning steps like removal of URLs, usernames, special symbols, etc.
Then, two separate dictionaries—profanity list and Hindi–English dictionary—
are used to replace all the Hindi words in the dataset with their appropriate English
translations. Both the dictionaries are self-constructed and will be discussed later
in detail in Sect. 4.3.1. This is followed by stop words removal, lemmatization,
and generation of word embeddings using gensim’s word2vec model. These word
embeddings are used later for creating an embedding layer for the LSTM network.
The word2vec model also generates a vocabulary list from the dataset, which is
used to replace each word in the dataset by its corresponding integer index in the
vocabulary list. A 75:25 train–test split is performed on the resultant dataset. The
training data along with the word embedding layer is used to create and train the
LSTM network, followed by making predictions using testing data. All these steps
are discussed in detail in Sect. 4.3.
Detection of Hate Speech in Hinglish Language 269
gates. Unlike RNNs, LSTM does not fail to retain information from many time steps
ago. Rather, LSTM is made to forget irrelevant old information.
Steps of LSTM
In every time step, new data (x t ) and output from previous time step (ht −1 ) are
combined to form the input data for current time step.
To decide which information is irrelevant and should be thrown out, forgetting gate
layer is used. The forgetting gate layer uses the sigmoid activation function which
takes input of a combination of new data (x t ) occurring at the new time step and the
output of the hidden layer from the previous time step (ht−1 ). The sigmoid activation
function in the forgetting gate layer outputs a bunch of values ranging from 0 to 1,
where each value corresponds to a value stored in the cell state vector ct −1 . Here,
0 suggests completely discard the data, while 1 represents completely keeping the
data in the cell state. Only that data for which f t . C t −1 is nonzero will be kept in the
cell state. It is demonstrated in Fig. 3.
For deciding and adding new information in the cell state, LSTM uses two different
neural network layers—one with sigmoid activation and the other one with tanh
activation. The first input gate layer (with sigmoid activation function) decides which
values should get updated. Next layer with tanh activation function generates a vector
of newly generated values that could be added to the cell state. To add new information
in cell state, (I t * C ~ t) is added in current cell state as shown in Fig. 4.
The current cell state is filtered through tanh, and the resulting vector is multiplied
point by point with the result obtained by applying sigmoid layer on combination of
x t and ht −1 . Tanh outputs values between −1 and 1. The resultant product gives the
output of current time step (ht ) which is passed as ht −1 for the next time step. It is
demonstrated in Fig. 5.
4.3.1 Preprocessing
After the above two dictionaries are ready, the entire corpus is scanned document
by document, which is in turn scanned word by word. If any of the words is found in
either of the two dictionaries, it is replaced by its corresponding English translation.
A Keras Sequential model is created by adding the embedding layer as the first layer.
Then, it is followed by adding an LSTM layer of seven units, which implies that
each of the hidden layers (input gate layer, forget gate layer, etc., in the LSTM cell)
will have 7 activation units. The number of LSTM units was determined with the
following formula:
Nh = Ns /(α ∗ (Ni + No ))
where N h is count of LSTM units, N i is count of activation units in the input layer,
N o is count of activation units in the output layer, N s is number of samples in training
dataset, and α is an arbitrary number between 2 and 10.
Here, values taken are N i = 300, N o = 1, N s = 4823, α = 2.3.
Therefore, the number of LSTM units is Nh = 7 (approx.)
Then, a dropout layer with dropout rate as 0.4 is added which means that 40% of the
input layer units will be set to 0 at each update during training, to prevent overfitting.
Finally, a dense layer with 1 activation unit and sigmoid activation is added to the
network. Then, it is compiled using ‘binary cross entropy’ loss function, ‘adam’
optimizer algorithm, and ‘accuracy metrics’ as the evaluation metrics.
‘binary cross entropy’ loss function is grouping of ‘sigmoid activation’ function
along with ‘cross entropy loss’ function. It is basically used for binary classification
as it isolates decisions from multiple classes. ‘adam’ optimizer function is enhanced
version of stochastic gradient optimizer. It is taken with its default parameter values,
Detection of Hate Speech in Hinglish Language 273
so that it can optimize parameters such that it can reduce loss. The ‘accuracy metrics’
is used to judge the performance of the model when it is compiled.
A train–test split of 75/25 is performed. The training data is used for training
the above designed deep neural network in a single epoch. The test data is used to
evaluate the trained model using the accuracy metrics.
5 Performance Analysis
From above tables, it can be seen that maximum testing accuracy of 85.07% was
achieved for the following combination of hyperparameter values:
• Number of LSTM units—9 and
• Dropout rate—0.4.
Besides, other important evaluation metrics like precision and recall were also
computed for dropout rates at 0.2 and 0.4. Table 5 demonstrates precision and recall
values for dropout rate at 0.2, and Table 6 demonstrates precision and recall values
for dropout rate at 0.4. The number of training epochs was kept at 1 in both the cases.
Maximum precision of 0.7912 was achieved for 15 LSTM units and 0.2 dropout rate.
Maximum recall of 0.7504 was achieved for 7 LSTM units and 0.4 dropout rate.
It can be seen that best results for accuracy, precision, and recall were obtained for
different combinations of hyperparameter values. Precision represents the number of
correct positive predictions out of total positive predictions. Here, positive prediction
corresponds to the class with label 1, i.e., hate class. Out of total predictions for hate
class, 79.12% were actually from hate class. Likewise, recall represents the fraction of
actual positives that were correctly identified. Out of all the actual hate class samples,
75.04% samples were identified to be from the hate class. Misclassifying non-hate
samples as hate samples is always less risky than misclassifying hate samples as
non-hate. Hence, correctly identifying all the hate samples is more important. Thus,
out of all evaluation metrics, recall value is chosen for setting the best combination
of hyperparameter values.
Detection of Hate Speech in Hinglish Language 275
In this paper, an approach to detect hate speech in Hinglish texts using LSTM was
proposed. The dataset used comprised of hate and non-hate Hinglish texts, collected
from various sources. The LSTM network was trained on word embeddings gener-
ated by gensim’s word2vec model. A maximum recall value of 0.7504 was achieved
for a specific set of hyperparameters of the LSTM network. Besides, two dictionar-
ies namely—profanity list dictionary and Hindi-to-English dictionary—were also
constructed. The profanity list dictionary contains English translations of 224 pro-
fane Hindi words, and the Hindi-to-English dictionary contains English translations
of 11662 Hindi words or phrases. These dictionaries were used in the system to
translate the Hinglish documents in the dataset into English, word by word.
It can be noted that this paper presents an approach in which the Hinglish sequence
of words in the dataset was translated word by word, resulting in loss of proper sen-
tence structure and thereby making the sentence meaningless. To overcome the prob-
lem of losing structure and sense from the sentence, a proper language translation
model needs to be developed that can translate a Hinglish sentence into meaning-
ful English sentence. Once the translation is done, various sentiment analysis and
NLP tasks can be conveniently conducted on such data. Apart from using only word
embeddings as input features, other semantic and sentimental features can also be
included in the feature set. Those semantic features may include number of differ-
ent types of punctuations, different types of emoticons among others. Sentimental
features like sentiment score of text document and score of profane words in the
text document can be included. Thus, by creating a diverse set of features, one can
develop an even more robust hate speech detection model.
References
1. https://www.statista.com/statistics/278407/number-of-social-network-users-in-india/.
2. https://www.washingtonpost.com/graphics/2018/world/reports-of-hate-crime-cases-have-
spiked-in-india/?noredirect=on&utm_term=.cfbf7bbb0432.
3. https://www.cfr.org/backgrounder/hate-speech-social-media-global-comparisons.
4. Mathur, Puneet, Ramit Sawhney, Meghna Ayyar, and Rajiv Shah. 2018. Did you offend me?
Classification of offensive tweets in Hinglish language. In: Proceedings of the 2nd Workshop
on Abusive Language Online (ALW2).
5. Ravi, Kumar, and Vadlamani Ravi. 2016. Sentiment classification of Hinglish text. In: 3rd
International Conference on Recent Advances in Information Technology (RAIT).
6. Bohra, Aditya, Deepanshu Vijay, Vinay Singh, Syed S. Akhtar, and Manish Shrivastava. 2018. A
dataset of hindi-english code-mixed social media text for hate speech detection. In: Proceedings
of the Second Workshop on Computational Modeling of People’s Opinions, Personality, and
Emotions in Social Media.
7. Kunchukuttan, Anoop, Ratish Puduppully, and Pushpak Bhattacharyya. 2015. Brahmi-Net:
A transliteration and script conversion system for languages of the Indian subcontinent. In:
Conference of the North American Chapter of the Association for Computational Linguistics—
Human Language Technologies: System Demonstrations.
276 R. S. Varade and V. B. Pathak
8. Pan, Weike, Erheng Zhong, and Qiang Yang. 2012. Transfer learning for text mining. Mining
text data. Boston: Springer.
9. Vyas, Yogarshi, Spandana Gella, Jatin Sharma, Kalika Bali, and Monojit Choudhary. 2014.
Pos tagging of english-hindi code-mixed social media content. In: Conference on Emperical
Methods in Natural Language Processing (EMNLP), 974–979.
10. Kumar, Ritesh, Aishwarya N. Reganti, Akshit Bhatia, and Tushar Maheshwari. 2018.
Aggression-annotated corpus of hindi-english code-mixed. In: Proceedings of the 11th
Language Resources and Evaluation Conference (LREC).
Enhancing Multi-level Cache
Performance Using Dynamic R-F
Characteristics
Akshay Motwani, Debabrata Swain, Nikhil Motwani, Vinit Vijan,
Akshar Awari and Banchhanidhi Dash
Keywords Clock with adaptive replacement (CAR) · Least recently used (LRU) ·
First in, first out (FIFO) · Classical weight ranking policy (CWRP)
1 Introduction
An easiest way to enhance cache performance is to increase the size of it, but it is
not an inexpensive solution. Therefore, this effect is seen on the size of cache which
is very small than that of the main memory. The working of cache is as follows, if
accessed data is present in the cache, that means the data is found in cache and is
called as ‘cache hit.’ Otherwise, if the data is not present ‘cache miss’ takes place
[1]. The percentage of overall cache hit along overall access is called as ‘hit ratio’
or ‘hit rate.’ The dominant variables that determine the cache utilization are access
time and hit rate [1].
Various substitution techniques are applied to revamp cache potency. The cache
does not contain any pages or blocks at the beginning so the inceptive access is a
miss and this is constant until the cache is filled with pages or blocks, where pages
or blocks are consistently sized articles in the cache memory. After this gradually
when a page or block is to be put on to cache which does not exist in cache, if a cache
miss takes place then this particular page or block is to be restored with latest page
or block that is there in the cache.
The execution of the restoration of a page can be done by substitution techniques
where to choose a page with less probability of being looked for ahead in the cache
and to restore the particular page with a latest page is its vital part.
There are three categories of mapping methods which correspond with the cache:
(1) Blocks within the main memory are mapped to a lone block of cache is entitled
as direct mapping. (2) Rather than mapping to a distinctive block of cache, fully
associative gives the permission to save the data in any block. (3) The method where
the overall cache is divided into sets is called as set associative. Set associative
is a combination of fully associative mapping and direct mapping in which sets
are directly mapped to the particular number of blocks in main memory and fully
associative mapping is used within each set [2–8].
An optimal substitution technique should be easy to enact and should lessen the
overhead of the central processing unit. The use of multi-level cache has become per-
vasive in computing systems. Increasing the cache hit ratio would greatly enhance
the performance of the system. Our observation of cache memory tells us that select-
ing the victim page for page replacement plays an important role in determining
the cache hit ratio. The recency and frequency of the page play a critical value in
predicting whether that particular page would be referenced in the ensuing future
[9]. We have taken this into consideration while formulating a method that would
enhance the performance of multi-level cache. We have considered a cache having
three levels, namely L1, L2, and L3. L1 cache is near to the processor and having
less access time, while L3 is far from the processor and is having more access time
compared to L1 and L2. L2 level cache is in between L1 and L3 having moderate
access time. L2 and L3 level caches are exclusive to L1 cache. The pages evicted
from L2 are placed into L3 cache rather than going to main memory.
Our method uses a combination of cache replacement algorithms to increase
the cache hit ratio of the overall multi-level cache. In case of L1 cache, frequency,
recency, and freshness of the page would be considered and each page in L1 would
be given a R-F score accordingly. The one having least R-F score would be selected
for replacement in case of L1 cache. For L2 cache, least frequently used page would
Enhancing Multi-level Cache Performance Using Dynamic R-F … 279
be selected as a victim for replacement, and in case of L3 cache, least recently used
page would be selected for replacement.
The main thing to note about our proposed method is that blocks from the main
memory are brought into L1 cache rather than placing them in L3 cache. The method
formulated has a better performance than that of FIFO, CAR, LRU, and CWRP [9].
2 Background
Least frequently used (LFU) algorithm works on the assumption that if a page
has been used least frequently, it might not be referred to in the ensuing future and
hence suggests on removal of that page. As miscellaneous fragments of the system
memory have varying time variants, LFU is inappropriate.
Downsides of ‘least frequently used’ (LFU) algorithm [2–8] are as follows:
1. This method is not fruitful in the case where the page calling style varies as the
cache is still populated with the pages with most frequency, which no longer
remain useful.
2. For calculating the cache size, the method requires some application using
logarithms.
3. The method ignores recency.
Adaptive replacement cache (ARC) uses both periodicity and recentness charac-
teristics of the stack, balanced both effectively and continuously, and hence is less
expensive and self-adjusting. This method also avoids continuous monitoring, so
when a particular pattern of page calls takes place, cache memory remains unshaken
[14].
Following is how ARC is implemented: Let ‘x’ be the size of the cache memory.
Then, according to ARC, a cache directory of two times ‘x’ pages, i.e., ‘2x’, is
supported, namely ‘x’ for the pages as record to store history and remaining ‘x’ for
the pages in cache. For dealing with the ‘x’ pages pertaining to the cache, ARC
maintains two lists, that is, L1 and L2. L1 stores the pages that have been called very
recently. L2 stores those pages which have been called with high frequency. Those
pages are defined as having high frequency that have been called multiple times in
a short interval of time. Hence, pages in both L1 and L2 when summed up give the
total pages in the cache. The number of pages in the individual lists L1 and L2 varies
and is determined by the other ‘x’ of the record pages stored in the cache directory
of ARC. The ‘x’ pages in the directory that store the history are classified into two
lists, namely B1 and B2. B1 stores those pages that were removed from L1, whereas
B2 stores those pages that were removed from L2. In the case where any of the two
lists L1 and L2 are full, least recently used (LRU) algorithm of page replacement is
implemented to move the removed page into B1 or B2, respectively. If a page being
called is present in L1, it is popped and is pushed into L2. When the page being
called is present in B1 (also called as Phantom hit), then size of L1 is incremented
by 1 and L2’s size is reduced by 1, but if L2 was full before the operation, a page
is then moved from L2 to B2, as per LRU. Contrarily, when a page being called is
present in B2, then size of L2 is incremented by 1 and L1’s size is reduced by 1,
but if L1 was full before the operation of reduction, a page is then moved from L1
to B1, as per LRU. Hence, allowing the cache to tune as per high periodicity or the
recentness characteristic of the stack [14].
Adaptive replacement cache (ARC) has a disadvantage such that the most recently
used (MRU) location is heavily loaded as all the four lists L1, L2, B1, and B2 use
LRU method of page replacement when either of them becomes full.
Another cache replacement policy known as clock with adaptive replacement
(CAR) is also widely used because of it not only being identical to ARC, but
Enhancing Multi-level Cache Performance Using Dynamic R-F … 281
also assuming all the benefits of it. It performs great on workloads that iterate
over a working set larger than the size of the cache and is also less expensive in
implementation.
Clock with adaptive replacement (CAR) maintains a cache directory with a count
of ‘2x’ pages, where first half ‘x’ denotes pages as record to store history and remain-
ing ‘x’ for the pages in cache, just like the adaptive replacement cache (ARC) does.
The ‘x’ pages in the directory that store the history are classified into two LRU lists,
B1 and B2, similar to ARC. For dealing with the ‘x’ pages pertaining to the cache,
CAR maintains two circular buffers or CLOCKs that are L1 and L2, with every page
having a reference bit. Just like ARC, L1 stores the pages that have been called very
recently and L2 stores the pages which have been called with high frequency. B1
stores those pages that were removed from L1, whereas B2 stores those pages that
were removed from L2. When the page being called is present in B1, L1’s size is
increased by 1 and L2’s size is reduced by 1, so as to maintain the cache’s size upon
their addition. Contrarily, when a page being called is present in B2, L2’s size is
increased by 1 and L1’s size is reduced by 1. In such a manner, CLOCKs manage as
per the high requests. This policy also prevents the addition of CLOCK and history
list to overshoot the size of the cache. Insertion of latest pages is done behind clock
hands for both the CLOCKs and their respective reference bit being set to 0. The
reference bit of a page is changed to 1 in the event of a cache hit, whereas in the event
of a page fault, the clock hand searches in clockwise direction, and when it finds a
page with reference bit 0, it replaces it with the one referenced page and moves the
one replaced in the MRU location of either B1 or B2, whichever applies. In the event
of cache miss, when the clock hand of L1 encounters a page with reference bit 1,
it moves it to L2 and modifies its value to 0, and thereon continues in clockwise
direction as long as it locates a page with reference bit 0. The same is done with
L2, when there is a cache miss; when the clock hand of L2 encounters a page with
reference bit 1, it resets its reference bit to 0. L2 and keeps on searching in clockwise
direction as long as it locates a page with reference bit 0 [12].
Both clock with adaptive replacement (CAR) and adaptive replacement cache
(ARC) fail to perform in the case of two consecutive hits, as they both move the
page from the recency list to high periodicity list, but quick and continuous hits are
recognized as ‘correlated references’ and do not tend to be as frequent in the near
future, hence decreasing the performance of the cache and making it more expensive
for this case [12].
3 Proposed Work
make this multi-level cache functional, following are the assumptions that need to
be considered for our proposed algorithm to work successfully.
1. The size of L3 is triple as compared to size of L1.
2. The size of L2 is double as compared to that of L1.
3. All blocks are of same size.
4. When a memory block is needed by the system or the application, the processor
first looks it into the cache.
5. Finite amount of memory is available in the system having specific number of
blocks.
We have considered the recency and frequency values commonly known as R-F
score of the pages [9]. Based on our analysis, we have used different tuning algorithms
for different levels of cache that would increase the hit ratio of the overall multi-level
cache and thus would increase the overall system performance.
3.1 Working
Whenever a block (say b) is needed by the system, the processor will first look into
the L1 cache.
1. If it is not present in L1 level cache, then the block b is searched in the L2 level
cache.
As L2 being exclusive L1, if the block b is present in L2 cache, then block is
moved from L2 to L1. This movement of block from L2 to L1 may cause eviction
from L1 level cache. The replacement block for L1 level cache is selected based on
the dynamic weight ranking policy in which weights are assigned to the blocks based
on their R-F scores. Following formula is used to assign weights or R-F score to the
blocks.
Enhancing Multi-level Cache Performance Using Dynamic R-F … 283
W i = f i/(Rh − Ri + 1) (1)
where is Rh latest recorded recency value, Ri is recency value for last access request
made for ith block, and is the frequency value of ith block. Initially, Wi , fi and Ri are
set to 0.
The block having minimum weight value from the L1 cache is chosen for
replacement with block b. The evicted block from L1 is then placed into L2 cache.
2. If block is not present in L2 cache, then L3 cache is searched. If the block
is present in L3 cache, similar action is taken and the block b in L3 cache is
swapped with the block having minimum R-F score from L1 level cache.
3. If the block b is not found in either of the caches, then it is a cache miss. The
block b, then, has to be fetched from the main memory and it is placed in L1
level cache. This may cause removal of existing block from L1 level cache, given
that it is full. The block having minimum R-F score is removed, and block b is
placed in its position. The removed block from L1 is then placed into L2 cache.
If L2 cache is full. The replacement block in L2 cache is selected based on the
LFU cache replacement policy. The evicted block from L1 cache is then replaced
with the least frequently used block from L2 cache. The removed block from L2
cache is then positioned into L3 cache.
If L3 cache is full. The evicted block from L2 cache is replaced with least recently
used block from L3 cache. The expelled block from L3 level cache is brought back
to the main memory.
4. If it is found in either of the caches, then there is a cache hit. The frequency of
the block b would be incremented by one.
The recency of that block would hold the latest clock access value.
Our proposed algorithm takes into account the combination of recency and fre-
quency [9], only frequency and only recency of the blocks present in different levels
of cache. For L1 level cache, both recency and frequency are analyzed and R-F scores
of the block are then calculated accordingly. L2 level cache takes care of frequency
feature of blocks, whereas L3 cache takes into account of recency feature of blocks.
In this way, our multi-cache model takes the holistic view of capturing the recency
and frequency features of the blocks and also considering them individually, as they
play a decisive role in determining the cache hit ratio.
4 Performance Analysis
We have simulated our proposed algorithm with other traditional cache replacement
algorithms such as CAR, FIFO, and LRU. The simulation results were calculated
based on hit ratio, which takes into consideration the size of cache, locality of refer-
ence, and replacement algorithm being used during cache miss. Our algorithm gave
better results than other algorithms.
284 A. Motwani et al.
For analyzing the performance, the address trace (dataset) used had one thousand
memory addresses formed by the actual execution of the program in CPU. Usually,
an address trace has data addresses (store and load) and instruction fetch addresses,
but as simulation is performed on data cache only, traces having only data addresses
are taken into consideration. Mapping scheme considered for simulation is set asso-
ciative mapping. Based on the traces being used, simulation was executed with seven
different sizes of cache and results obtained from the simulation were analyzed. As
our proposed algorithm works for multi-level cache, the cache size was divided in a
ratio of 1:2:3 among L1, L2, and L3 level caches, respectively.
Results were then compared with CAR, FIFO, LRU, and CWRP [1, 9]. Figure 2
provides the comparison summary of all results obtained by using different cache
replacement algorithms. It is clearly observable from Fig. 2 that our proposed algo-
rithm performs better than CAR, FIFO, LRU, and CWRP. The average gain compared
to CAR is 5.3%, whereas the average gain compared to LRU, FIFO, and CWRP is
10.33%, 8.47%, and 4.35% respectively [9]. Table 1 provides the hit ratio of all the
block accesses implemented by using different cache replacement algorithms.
The maximum hit ratio obtained by our proposed algorithm is 76.12% for a cache
size of 210 blocks, and the minimum hit ratio is 41.23% for a cache size of 30 blocks.
In comparison with CAR, minimum gain obtained by our proposed work is 0.93%
for a cache size of 210 blocks and maximum gain is 11.38% for a cache size of 60
40.
30 60 90 120 150 180 210
cache size (in number of blocks)
CAR FIFO LRU
Enhancing Multi-level Cache Performance Using Dynamic R-F … 285
Table 1 An assessment between hit ratio of CAR, LRU, FIFO, CWRP, and our proposed work
Cache size (in number of blocks) CAR LRU FIFO CWRP Proposed work
30 40.24 41.6 40.93 42.51 41.23
60 49.65 48.6 49.26 51.83 55.30
90 59.27 54.5 57.48 59.36 64.82
120 66.20 60.81 62.14 66.40 70.96
150 70.96 65.21 66.30 70.06 74.43
180 75.22 72.30 72.84 73.63 75.92
210 75.42 72.70 74.03 75.12 76.12
blocks. Minimum gain obtained from FIFO is 0.73% for a cache size of 30 blocks,
and maximum gain is 14.19% for cache size of 120 blocks. Minimum gain obtained
with respect to LRU is 4.70% for a cache size of 210 blocks, and maximum gain
is 18.94% for a cache size of 90 blocks. From CWRP, minimum gain obtained is
1.33% for cache size of 210 blocks and maximum gain is 9.2% for cache containing
90 blocks. There is one exceptional case for cache size of 30 blocks in which LRU
performs 0.9% better than our proposed method and CWRP performs 3.1% better
than our proposed work.
We can observe from Fig. 2 that as the cache size is increasing, the performance
of our algorithm is also increasing significantly as compared to other cache replace-
ment algorithms. The reason for such behavior to occur is that we are considering
combinations of recency and frequency which helps in accurately deciding the page
that would not be used in the near future, and this increases our overall hit ratio as
there would be less occurrences of cache misses.
5 Summary
This paper introduces a new algorithm which enhances the performance of a multi-
level cache. It gave better results than CAR, FIFO, LRU, and CWRP [9]. We observed
that reference rate of a block depends on the freshness of the block present in the
cache along with recency and frequency features of the block. As multi-level cache
has become an inevitable part of modern computing systems, we have devised a
method that uses a combination of algorithms that takes into account the R-F score
of the block along with the level in which the cache is present. Our proposed method
would work very well in case of multi-level cache. We have used L1 cache to capture
the R-F score of the block along with its freshness. L2 cache is used for capturing
only frequency, and L3 cache is used for capturing only recency. Our assumption
was that we should not only take the R-F score and freshness of the block but also
consider taking, recency and frequency of a block, individually, would increase the
cache performance and multi-level cache design model would be a best fit for our
assumption to hold true. Simulation results validated our assumptions, and we got
286 A. Motwani et al.
better results than CAR, FIFO, LRU, and CWRP [9]. Our proposed method can be
used for different address traces, and results can be used for comparison with other
cache replacement algorithms. It is worth mentioning that if reference rate depends on
other features of the cache, then these features can be added in a multi-dimensional
cache and can be tuned according to the application domain. Our proposed work
would perform even much better in such application-specific domain.
References
1. Tanenbaum, A., and A. Woodhull. 1997. Operating systems: Design and implementation.
Prentice Hall 3: 373–410.
2. Yang, Q., H. Zhang, and H. Zhang. 2001. Taylor series prediction: A cache replacement policy
based on second-order trend analysis. In: Proceedings of 34th Hawaii Conference on System
Science 5: 5023.
3. Belady, A. 1966. A study of replacement algorithms for a virtual storage computer. IBM systems
Journal 5 (2): 78–101.
4. O’Neil, E., P. O’Neil, and G. Weikum. 1999. An optimality proof of the lru-k page replacement
algorithm. Journal of the ACM 46 (1): 92–112.
5. Jihang, S., and X. Zhang. 2002. LIRS: An efficient low inter reference recency set replacement
policy to improve buffer cache performance. In: Proceedings of ACM Sigmetrics Conference
on ACM Presentation 2: 31–42.
6. Hosseini-khayat, S. 2000. On optimal replacement of nonuniform cache objects. IEEE
Transactions on Computers 49: (8).
7. Glass, G., and P. Cao. 2003. Adaptive page replacement based on memory reference behav-
ior. In: Proceedings of ACM SIGMETRICS Conference on Overhead Replacement Cache.
Proceedings of Usenix Conference on File and Storage Technologies (FAST 2003), Usenix,
115–130.
8. Irani, S. 1997. Page replacement with multi-size pages and applications to WebCaching. In:
Proceedings of 29th Annual ACM Symposium of Theory of Computing 50: 701–710.
9. Swain. D., S. Marar, N. Motwani, V. Hiwarkar, and N. Valakunde. 2017. CWRP: An effi-
cient and classical weight ranking ploicy for enhancing cache performance. In: IEEE Fourth
International Conference on Image Information Processing.
10. Swain, Debabala, Bijay Paikaray, and Swain Debabrata. 2011. AWRP: Adaptive weight ranking
policy for improving cache performance. Journal of Computing 3: (2).
11. Dash, B., D. Swain, BK. Paikaray. (2017) International Journal of Computational Systems
Engineering (IJCSYSE) 3: (1/2).
12. Bansal, Sorav, and Dharmendra Modha. 2004. CAR: Clock with adaptive replacement. In:
USENIX File and Storage Technologies (FAST), Mar 31–Apr 2, San Francisco, CA.
13. Swain, Debabrata, Bancha Nidhi Dash, Debendra O Shamkuwar, Debabala Swain. 2012.
Analysis and predictability of page replacement techniques towards optimized performance.
International Journal of Computer Application Proceedings on ICRTITCS-2011 12–16.
14. Megiddo, N., and D. Modha. 2003. ARC: A self-tunning, low overhead replacement cache. In:
Proceedings of Usenix Conference on File and Storage Technologies (FAST 2003). Usenix, 2:
115–130.
Denoising Documents Using Image
Processing for Digital Restoration
Abstract This paper develops an algorithm that will help decipher the text in
decrepit documents, which if put in simpler terms aims at converting stained, blot-
ted, creased, and faded documents into a cleaner and legible format. Handwritten
or printed records are carelessly stacked away without undertaking measures for
preserving them. They are subjected to degradation because of mishandling and
improper storage conditions. This ultimately results in the loss of important docu-
mentation owing to the inability of reproduction or recovery of original data. Digital
image preprocessing techniques are used to convert a color (RGB) image into a
grayscale image for further processing. Image denoising is one of the most sought
areas after research in image processing, and in this paper, we use image segmenta-
tion and median filter to achieve this. In this paper, we attempted to come up with
an approach to remove noise from the image by applying image segmentation and
thresholding, histogram, and median filter.
1 Introduction
Visual distortion is usually described as the ‘noise’ in the image. It is similar to the
grains we see in photographs when the image is captured in low lights or when there
is some arbitrary variation in the color or brightness in the image [1]. Improving the
image quality majorly concerns itself with image restoration by eliminating unwanted
pixels. Noise filtering has numerous approaches, the most popular being—replacing
the noisy pixel by a befitting one often dependent on the values of the neighbor-
hood pixels. This paper proposes a method which focuses on restoring the image by
detecting noise. This is done using median filters, averaging, Gaussian, or bilateral
techniques.
Averaging is a technique where the center pixel is replaced by the average value
of all the pixels surrounding it. Convolution of the image with a box filter can help us
achieve this. Gaussian operator uses the weighted mean; pixels closer to the subject
pixel contribute more weight. Here, the kernel size needs to be specified and must
necessarily be an odd positive integer. Median filtering replaces the central pixel
with the median value of the surrounding pixels, and bilateral technique uses two
Gaussian distributions and is a function of space [2]. The texture is not preserved,
but the edges are.
The median filter is a nonlinear digital image filtering technique which is very
effective in removing the salt-and-pepper noise as well as the impulse noise with-
out harming the details of the image. The edges are preserved as this filter only
removes the outlier points. The only drawback of median filtering is its computational
complexity.
2 Literature Review
Image inpainting is a method by which we can recreate the spoiled part of the image
by recognizing the patterns and curvatures present in the other parts of the image.
Image inpainting can be used to denoise the stained part of documents by analyzing
the patterns present between the letters and white spaces in the text [3].
After the image inpainting, the image can be refurbished to the real and accord
with the human ocular sight. Image inpainting has a broad application scope, such
as protection of valuable authentic artifacts, past photographs, and error in images.
In our case, images are considered to have coffee or tea patches across the doc-
ument. We used image inpainting because the portion, which was filled by these
specks of dirt, required to be restored by the original pure data.
Denoising Documents Using Image Processing … 289
Optical character recognition identifies, generates, and interprets the individual char-
acter in a scanned copy of the document. This method is very complicated and requires
that the OCR program maps each letter present in the image to a computer-generated
transcription that resembles it [4].
We tried using OCR to extract text from our test document. As most of the texts on
the document was covered with coffee stains, OCR misattributed the characteristics
of individual letters due to which it was giving unreliable results.
3 Proposed Methodology
The median filter is a nonlinear robust filter, which is used to remove noise. This
filter is usually used as a preprocessing step which helps the results of later steps [5].
This method does not harm the edges which makes this filter to be used widely. The
main concept of the median filter in image processing is to run through the image
matrix, pixel by pixel, and to replace the entries with the median of neighboring
pixels. Median is nothing but the center value of entries when the entries are sorted
numerically. If there are even entries, there can be more than one median [6, 7]. In
this case, cv2.medianBlur function from OpenCV was used to blur the image. This
removed the text from the image leaving behind only the noise [8].
An approximate threshold constant value is subtracted from the output of the
median filter for differentiating between the text and noise. The later output is then
compared with the original image to recover the original text in the output. If the
pixel value of the original image is less than the subtracted value, then we restore the
pixels from the original image in the output. Remaining everything is made equal to
the pixel value of white color, i.e., 255 [9].
Application of the median filter is the first step in this technique to overcome
noise in the image. The median filter is used to detect noisy background. The filter
is applied twice on decrepit images. This smoothens the image background while
keeping the edges near the letters sharp. The step is repeated to get pure noise (Fig. 1).
Output of median filter iterated twice is shown in Fig. 2.
In noise subtraction, the noise detected in the first step is subtracted from the original
image to get a grayscale foreground image.
290 M. Kulkarni et al.
The noise detected in the first step is considered as the background, which contains
noise such as coffee stains. By performing matrix subtraction, the noisy background
is subtracted from the original image pixel by pixel to get a foreground image which
contains only the text (Fig. 3).
A histogram is a graph or plot, which gives an overall idea about the intensity distri-
bution of an image. It is a plot with pixel values on the X-axis and the corresponding
number of pixels in the image on the Y-axis [10]. Histogram analysis is just another
way of understanding the image graphically. The histogram gives us an overall idea
of all the pixel values present in the image. Information about contrast, brightness,
and intensity distribution can be obtained from the histogram of any image [11].
In this step, histogram of the original image was plotted which looked as shown
in Fig. 4.
Denoising Documents Using Image Processing … 291
By analyzing the histogram, it can be seen that all the noise lies between pixel
values 50–225. Because our image is grayscale, we have any pixel as text either
in black color (pixel value lying in between 0 and 40) or white background (pixel
value lying in between 225 and 255). We can remove the noisy pixels with intensity
varying from 50 to 225 and get a clean image.
After removing the corresponding noisy pixels with image segmentation (next
step), the output image and its histogram are as shown in Figs. 5 and 6, respectively.
292 M. Kulkarni et al.
This kernel considers the pixel as black and measures the number of black pixels
surrounding the pixel. If a pixel in the text is black, we assume that six or more
pixels surrounding it are black. If the pixel is a part of the salt-and-pepper noise, the
number of pixels in the surrounding must be less than six. The salt-and-pepper noise
is reduced considering this assumption. It can be seen that significant salt-and-pepper
noise has been reduced as shown in Fig. 7.
4 Conclusion
A new and easy method to suppress noise from distorted images has been presented.
In this work, four different techniques, i.e., median filter, noise subtraction, histogram
analysis, and image segmentation, were applied on coffee-stained, noisy, as well as
dog-eared document which reduced the noise. It can be concluded that the median
filter gives better results for compound images after the performance analysis of dif-
ferent filters. The median filter requires less computational power when compared to
image inpainting as image inpainting uses deep learning in the backend. Furthermore,
to get better results, curvelet transform [13] and sparse modeling [14] techniques can
be used.
5 Future Scope
This project can be improvised using many machine learning algorithms as well as the
correlation. Correlation is an amazing technique by which we can extract characters
on the digital document. Training and testing data can be created and used to train our
machine learning model which can be used to denoise the document quite efficiently
[15]. Sparse modeling [14] and curvelet transform [13] are also some of the methods
by which background subtraction is possible. Recent techniques in machine learning
like convolutional neural networks can also be used to train a model for a particular
use case [16]. The model can automatically detect noise based on the learnings and
denoise the required documents for us. Furthermore, traditional machine learning
techniques can also be used.
Acknowledgements The authors feel grateful and wish their profound indebtedness to their guide
Prof. Milind Kamble, Department of Electronics and Telecommunication, Vishwakarma Institute
of Technology, Pune. The authors also express their gratitude to Prof. Dr. R. M. Jalnekar, Director,
and Prof. Dr. Shripad Bhatlawande, Head, Department of Electronics and Telecommunication,
for their help in completion of the project. The authors also thank all the anonymous reviewers of
this paper whose comments helped to improve the paper.
References
1. Reka, Durai, and V. Thiagarasu. 2014. A study and analysis on image processing techniques for
historical document preservation. International Journal of Innovative Research in Computer
and Communication 2 (7): 5195–5200.
2. Afrose, Zinat. 2012. A comparative study on noise removal of compound images using different
types of filters. International Journal of Computer Applications (0975–888) 47 (14): 45–47.
3. Mallick, Satya. 2019. Image Inpainting with OpenCV. Available via https://www.learnopencv.
com/image-inpainting-with-opencv-c-python/. Accessed 13 Apr 2019.
Denoising Documents Using Image Processing … 295
4. Vamvakas, G., B. Gatos, N. Stamatopoulos, S.J. Perantonis. 2008. A complete optical char-
acter recognition methodology for historical documents. Document Analysis Systems, IAPR
International Workshop, 525–532. doi:10.1109/DAS.2008.73
5. Rajasekaran, Angalaparameswari, and P. Senthilkumar. 2014. Image denoising using median
filter with edge detection using canny operator. International Journal of Science and Research
3 (2): 30–33.
6. Malothu, Nagu, and Shanker N. Vijay. 2014. Image de-noising by using median filter and
weiner filter. International Journal of Innovative Research in Computer and Communication
Engineering 2 (9): 5641–5645.
7. Sandeep, Kumar, Kumar Munish, and Agrawal Neha Rashid. 2017. A comparative analysis on
image denoising using different median filter methods. International Journal for Research in
Applied Science & Engineering Technology 5 (7): 231–238.
8. OpenCV Documentation, Available via http://docs.opencv.org/. Accessed 7 March 2017.
9. Suman, Shrestha. 2014. Image denoising using new adaptive based median filter. Signal &
Image Processing: An International Journal (SIPIJ) 5 (4): 1–12.
10. Govindaraj, V., and G. Sengottaiyan. 2013. Survey of image denoising using different filters.
International Journal of Science Engineering and Technology Research (IJSETR) 2 (2): 344–
350.
11. Senthilkumaran, N., and S. Vaithegi. 2016. Image segmentation by using thresholding
techniques for medical images. Computer Science & Engineering 6 (1): 1–6.
12. Sujata, Saini, and Arora Komal. 2014. A study analysis on the different image segmentation
techniques. International Journal of Information & Computation Technology 4 (14): 1445–
1452.
13. Jean-Luc, Starck, J. Candès Emmanuel, and Donoho David. 2002. The curvelet transform for
image denoising. IEEE Transactions on Image Processing 11 (6): 670–684.
14. Julien, Mairal, Bach, Francis, Ponce, Jean. 2014. Sparse modeling for image and vision
processing. arXiv 1411.3230: 76–97.
15. Priest, Colin. 2015. Denoising Dirty Documents. Available via https://colinpriest.com/2015/
08/01/denoising-dirty-documents-part-1/. Accessed 27 Jan 2017.
16. Kaggle. 2015. Denoising Dirty Documents. Available via https://www.kaggle.com/c/
denoising-dirty-documents. Accessed 13 April 2017.
Humming-Based Song Recognition
Abstract Today, the Internet is the only medium through which songs are accessed
by the majority, resulting in myriad of songs and artists on the Internet. The only
effective way to search music on the Internet still requires at least some of the
details about a song or its artist. But, what if we don’t know any such detail about
the music? What if we heard it somewhere and know only how to sing or hum?
Humming-based song recognition (HBSR) addresses this conundrum. It is a system
which takes the audio file or audio recording as input and predicts the song present in
the database based on the humming sound analysed. The fundamental ideas of HBSR,
its implementation and some techniques to improve its performance are introduced
in this paper.
1 Introduction
Humming-based song recognition (HBSR) is a system that takes hum audio as input
and analyses the audio to predict the song that rhymes with the input audio. Music
retrieval techniques have been developed in recent years since signals have been
digitized. Searching by song name or singer’s name is often considered as an easy
task to find the song. However, many times it happens that we cannot recall the
wordings of the song and all we can remember is the tune of the song. We can use
HBSR in this case. We can easily create a hum sound for the song we want to search
for. HBSR will analyse the hum sound and output the song name.
Various approaches are made in the field of song recognition by tune. Shazam [1]
is a well-known system that searches for a song by listening to the music. Shazam [1]
uses fingerprinting algorithm to generate a unique hash for each song in its database
usually known as fingerprint. This fingerprint is unique in terms of songs similar to
unique human fingerprint in terms of people. Whenever the input is fed to the system,
it generates the fingerprint of that music using the same algorithm used to generate
fingerprints of the songs in the database. This fingerprint is then matched with the
stored fingerprints, and the most similar song is given as output.
The HBSR approach is different as compared to the fingerprinting. HBSR uses
neural network [2] approach to achieve the results. HBSR system has multiple stored
images of many songs in its database. These images are the spectrograms of the audio
samples generated using the humming audio samples. These images are fed to the
CNN [3] model, and it is trained on these images. A new hum audio is then fed to
HBSR system which first creates a spectrogram of the audio which is then fed to the
model which predicts the song which is similar to the humming sound. The detail
working of the HBSR is discussed in this paper.
2 Background
A hum is a sound arising forcefully from the nose with the mouth either opened
or closed. The hum by a human most probably is based on some already existing
song or music, and hence, it is melodious. It is common to hum when it is hard to
recall lyrics of a particular song. Considering the fact that hum of a particular song
is as per its melody, a song can be identified based on its hum. Based on this finding,
there have been numerous approaches proposed in the past for identification of songs
based on hum input, each with their pros and cons.
Over the past decade, the Internet has expanded exponentially, resulting in colossal
amount of data being generated every day. As a result, some of the recent successful
approaches for identification of songs by hum heavily rely on song data and machine
learning [5] algorithms. Machine learning [5], as the name suggests, allows a machine
to learn. It is a field of study involving statistics and algorithms which enables a
machine to carry out a particular task without explicit instructions, relying on patterns
in data. Larger the amount of data with variations in records, better the accuracy (in
most cases) of the machine learning [5] algorithms while predicting output. Prediction
by machine learning [5] algorithms can either be a discrete value or continuous value.
Song identification by hum is better viewed as a classification problem as songs can
be viewed as discrete values to be predicted.
In the field of machine learning [5], classification problem is a problem of identify-
ing set of categories to which a new observation belongs to. It is used for predicting
discrete responses. While viewing song identification problem as a classification
problem, each song is viewed as a category and then prediction involves classifying
Humming-Based Song Recognition 299
the hum input to one of the categories. Over the years, solutions to classification
problems have evolved, and some of the recent approaches perform much better than
before.
3 System Overview
Figure 1 depicts the overview of the HBSR system and the flow of the system. The
working of the system and how the input song is processed till the final prediction
state are shown in Fig. 1. The usage of external libraries like Librosa [4] and CNN [3]
model is shown. K-samples of audio data belonging to each song are collected and
converted to spectrogram images using Librosa [4]. Further, these images are trained
using CNN [3]. The trained model can be used to predict the new spectrogram images.
The entire system was implemented and tested against six songs with 42 instances
(hum audio) belonging to each song.
4 Implementation Phases
To train a CNN [3] model for HBSR, a large amount of data is required to cover
as much songs as possible. Hence to collect samples, audio samples from different
people were collected. The age group ranged from 9 to 60 where majority of voices
belonged to the group of 20–24 including both male and female voices. The variation
in voice of different people helped to generate variety in audio files. Although the
voice data collected was not enough to train the model, we used Audacity [5] to
tune the pitch of existing audio files and generated multiple samples per audio file.
These changes in pitch ensured that the audio samples are different from the existing
audio samples and also helped to increase the number of samples. A total of 252 data
belonging to six different songs (42 each) were collected for experimentation.
The first phase through which the sound input goes through in the HBSR system is
the data conversion phase. The data conversion phase, as the name suggests, carries
out the data conversion task, converting sound input data to images. As CNN [3] does
not work on audio, the audio data must first be converted to some suitable form. The
conversion of audio inputs to images boils down the sound identification problem
to image identification problem. The conversion phase comprises two steps—audio
sampling and audio to mel spectrogram images.
The uniqueness of a music note lies within its frequency. Hence, the audio files
are first passed through sampling. Audio sampling involves mapping of audio file
into set of consecutive numbers. The size of set depends on the duration of audio in
audio file. The generated set is stored in the form of an array, which is then converted
into an image. Audio to mel spectrogram involves converting sampled data to mel
spectrogram images which will further be used as an input to the CNN [3] model.
Frequency data of each audio sample is plotted in the form of a spectrogram to
generate images.
Both processes, audio sampling and audio to mel spectrogram conversion, are
carried out with the help of an external Python library for music and audio analysis
called Librosa [4]. Librosa [4] is a musical information retrieval library that provides
all the necessary building blocks required for data sampling and plotting the sampled
data in amplitude time domain or frequency domain or other forms of graphical
representation.
The generated spectrogram images are then used to create a data set which will
further be used to train the CNN [3] model. Input audio is also converted to spec-
trogram and fed to the model for the prediction of a similar song. As discussed in
Sect. 1, Shazam’s [1] fingerprinting algorithm used spectrogram data to predict the
Humming-Based Song Recognition 301
song. This was the motivation behind the approach to train the model on spectrogram
images. Figure 2 shows the spectrogram of humming audio files belonging to four
songs.
Model training phase, as the name suggests, involves the training of a machine
learning [5] model. The sole purpose of training a model is to make the model
capable of predicting songs based on the user’s audio input. HBSR uses CNN [3]
algorithm to train the model. The spectrogram images of the audio files (generated
during the conversion phase) are used to train the CNN [3] model. The trained CNN
[3] model is a classification model which treats each song as a separate class, i.e.
each class represents one song.
The image data set is divided into two parts, testing data and training data. 20%
of the data is used as testing data, and 80% of the data is used as training data. The
CNN [3] model comprises the following layers:
• Two convolutional layers for finding patterns in the images. The convolutional
layer consisted of 32 3 * 3 filters more specifically known as feature detectors
used against the input shape of 64 * 64 with RGB colour code. Activation function
used for convolutional layer is ‘relu’.
• One max pooling layer of size 2 * 2 for improving the performance by progressively
reducing the spatial size of the representation, resulting in reduction in the amount
of parameters and computations in the network.
• One flatten layer for converting the output of the convolutional network part of the
CNN [3] into a 1D feature vector to be used by the ANN part of it.
• Two dense layers for full connection in which each input node is connected to each
output node, so all the neurons in one layer are connected to all the neurons in the
next layer. A total of 128 nodes were used in the hidden layer with ‘relu’ activa-
tion function. The second dense layer had an output dimension of 6. ‘Softmax’
activation function was used as it was more suitable for categorical classification
rather than ‘sigmoid’ which is suitable for binary classification.
• One dropout layer in which randomly selected neurons are ignored during training.
It results in their contribution to the downstream neurons to be removed temporarily
on the forward pass, and any weight updates are not applied to the neurons on the
backward pass. Dropout percentage used was 20%.
• One fully connected layer in which each neuron receives input from every element
of the previous layer.
Once the CNN [3] model has finished training, the HBSR system is then ready
for taking hum audio input from the user and predicting the song. The user’s audio
input is fed to the HBSR system. It then goes through data conversion phase, which
takes the user’s audio input and converts it into respective spectrogram image. The
302 S. Marar et al.
Happy Birthday
Twinkle
Anthem
Jingle bell
spectrogram image is then passed to the trained CNN [3] model, which then outputs
its prediction. Figure 3 depicts the overview of the song prediction.
5 Performance Analysis
CNN [3] was trained against 234 spectrogram images belonging to six songs. Training
set consisted of 186 spectrograms, and test set consisted of 48 spectrograms. HBSR
system achieved accuracy of 97.92% in predicting the songs correctly on a data
set consisting of six songs. This implies that out of 48 test set images, only one
spectrogram image was predicted incorrect (Fig. 4).
6 Experimental Results
A total of 186 humming data collected by us were used for training the CNN [3]
model. The trained model was then saved to memory so that it can be used later. For
the saving of model, an external Python library called joblib [6] was used. By saving
the model on the disc, we eliminated redundant training of model on the same data
set.
A total of 18 new humming data were then collected by us and were provided as
input to the trained model for its evaluation. This new humming data was not a part
of train or test set. Out of the 18 new humming data inputs which belonged to six
songs, the model was successful in predicting 17 of them correctly, i.e. the model
predicted the correct song names for 17 out of 18 given inputs.
304 S. Marar et al.
7 Future Scope
8 Conclusion
This approach of using machine learning [5] and neural networks [2] to predict
songs can be used in real time. Mel spectrogram representation of audio uniquely
differentiates one song from another. The system thus has the potential to predict the
songs based on humming input. It predicted correctly for six songs in the database.
It can be expanded to predict more songs by adding data for more songs and then
training the model on this data. The system can become more robust by training it
against tons of audio samples representing a huge category of songs.
References
1. Wang, Avery Li-Chun. 2003. An industrial-strength audio search algorithm. In Proceedings 4th
International Conference on Music Information Retrieval, pp. 7–13, 2003-Oct.
2. Bishop, J.M., R.J. Mitchell. 1991, January 25–25. Neural networks-an introduction. IEE
Colloquium on Neural Networks for Systems: Principles and Applications.
3. Al-Zawi, S., T. Mohammed, S. Albawi. 2017. Understanding of a convolutional neural network.
In Proceedings of ICET, Aug 2017.
4. McVicar, Matt, P.W. Ellis Daniel, Dawen, Liang, Colin, Raffel, Brian, McFee, Eric, Battenberg,
Oriol, Nieto. 2015. librosa: audio and music signal analysis in python. In Proceedings of the
14th Python in Science Conference.
5. Mitchell, Thomas M. 1997. Machine Learning. New York, NY: McGraw-Hill Inc.
6. Joblib: For running Python functions as pipeline jobs, https://joblib.readthedocs.io/en/latest/.
A New Data Structure for Representation
of Relational Databases for Application
in the Normalization Process
Abstract In this paper, a new data structure named relational tree is proposed for the
representation of the relational database in computer memory. Relational database
schema represented using relational tree(s) appears to be more promising for semi-
automating the process of normalization in a very efficient manner, which is prime
motive of this paper. This paper provides all the fundamental concepts required for
the understanding of the representation of relational schema using relational tree so
that efficient algorithms of various normal forms can be designed by using this repre-
sentation. This automation will considerably reduce manual efforts and errors in the
process of normalization in software industries. Space requirements also improved,
compared to previously proposed approaches. It is expected that application of vari-
ous normalization algorithms on this way of representation is very efficient since the
relational tree can be easily manipulated.
1 Introduction
Increment in the profit of any commercial organization can be seen by increasing the
productivity and quality of the product. We have to maintain the quality of the product
while increasing its productivity. To achieve this kind of production, automation of
the tasks becomes necessary. Organizations can involve automation in their design
and development sectors to gain the required amount of products with maintained
quality criteria.
Normalization plays a vital role in database management system, but before nor-
malizing any relational schema, its proper representation is very essential. Appropri-
ate representation of attributes and functional dependencies provides easier way to
apply the normalization algorithms on the given relational schema. Representation
of the relational schema provides better understanding of the database and thereby
simplifies the process of normalization.
In this proposed approach of representing a database given the attributes and
relations, these attributes share a new data structure is suggested and named as
“relational tree”. This new data structure is an alternative approach for the tool
proposed in [1].
Using tree as data structure is a better option compared to linked list as we know
we do not have to traverse to whole tree to get a particular node. Application of
the normalization algorithms on relational schema represented by the proposed tool
would be quite efficient and simpler to understand as compared to the tool proposed
in [1].
We have known that widely used relational databases used in recent times were
proposed by Dr. Codd [2]. The different applications of major organizations are
storage and manipulation of data for decision making by using the bulk of data.
They are capable of managing an enterprise in an efficient manner which is also
simple and trusted. Their capability in these areas has really increased their scope
in software industries, which are trying to develop relational databases according to
the requirements of their clients.
Design of relational schema is one of the important factors on which success of
the relational database model depends. “Normalization” is a key step in the design of
relational database. This process takes a relation as input and decomposes the bigger
relation into smaller ones which are free from redundant data along with anomalies
like deletion and insertion [2]. Each step of normalization has a name, first, second
and third normal forms which are represented as 1NF, 2NF, 3NF, respectively.
Normalization is not limited till the third normal form, and there are further normal
forms of normalization, 3.5NF also known as Boyce–Codd Normal form (BCNF)
and 4NF. 3.5NF, as the name suggests, it is stricter than 3NF but not as strict as 4NF.
4NF has its own rules, and it is even stricter than 3.5NF.
The paper has been organized in the following way. Section 2 describes the work
that has been previously done in this domain. We also found some useful resources
which would help students and researchers understand normalization along with
basic concepts required for relational trees. Section 3 gives detailed explanation of
A New Data Structure for Representation of Relational … 307
node structure which has been used in this approach for representation of relational
databases in computer memory. Section 4 explicitly explains advantage of using
regular trees as a basic data structure in comparison with previous approaches which
used linked list. Section 5 explains creation of relational tree along with a real-world
example. Section 6 compares ideas and methods in previous approach and current
approach along with a comparison between the node structures. Conclusions drawn
are mentioned in Sect. 7. Future scope is mentioned in Sect. 8, and references are
cited at the end.
2 Related Work
Software industries mostly carry out normalization manually, which requires employ-
ees with proficiency in normalization. In order to satisfy today’s business enterprise’s
demands, large number of relations and functional dependencies are required.
So, the process of normalization requires more than one person when it is carried
out manually. The drawbacks of carrying out normalization manually are as follows.
1. It makes the process slower and thus less productive: many relations contain-
ing many attributes are required to model an enterprise which makes it slower
manually.
2. It could have many errors: reasons are declared in 1.
3. Requirement of skilled people: requires skilled people but since we can automate
the process there is no need.
Several researchers have tried to eliminate these drawbacks by automating the
normalization process by proposing new tools/methods. There is a patent [3], which
proposes a database normalizing system. It normalizes the given database by observ-
ing record source input for the tool is records which are already present in table.
Another tool named Micro which was proposed by Du and Wery [4] in which the
representation of a relational schema is done using two linked list. These linked
lists are used to represent attributes and functional dependencies in a relation. Jmath
Norm is another tool which was made using Jlink library’s inbuilt functions given by
Mathematica. Jmath Norm was proposed by Yazici et al. [5]. Dependency graph and
dependency matrix were created by an automatic database normalization system, and
this was proposed by Bahmani et al. [6]. One can define normalization algorithms
on them now. Generation of primary keys and relational tables is also done.
It is difficult to motivate students and researchers to study normalization because
the process is difficult, theoretical. Maier [7] also claimed that normalization can be
hard for even average designers. LBDN [8] is another tool we came to know which is
abbreviated for Learn Database Normalization and which is also web-based. The tool
is very interactive since tool is client–server model. This tool provides a platform to
students to understand normalization process by providing them lectures and hands-
on practice. This tool also provides them some assignments to solve. In this tool, the
representation is done in the form of sets and the keys, functional dependencies and
308 R. Jadhav et al.
attributes get stored as array of strings. We found another similar web-based tool
which is mentioned in [9]. This tool is found useful in data management courses and
also in system design and analysis. “This tool has a positive effect on students” claim
the authors of the tool.
Representation of a relational schema is done using only a singly linked list in
the tool RDBNorma [10]. Since the requirement for representation is only one single
linked list, it is really a good effort which does the job using less time and space in
comparison with Micro. This tool operates at schema level.
RDBNorma uses a single linked list, and thus, it requires less memory and time
to process. It is an honest attempt for the development of a new way to represent
relational schema. Since RDBNorma works on schema level, it really helps to elim-
inate the manual process of normalization, in a very efficient and speedy way. The
work behind RDBNorma really increases the automation of normalization and helps
to eliminate the drawbacks of manual process of normalization.
In this paper, we are proposing an alternate method to implement RDBNorma. We
are well aware that RDBNorma uses single linked list as data structure, and we are
proposing that if we use trees as data structure instead of linked list, the results will
be better. The reason behind that is the complexity of trees is more as compared to
linked, but searching a node is simpler and less time-consuming compared to linked
list. The trees which will store the functional dependencies are called as relational
trees.
2. AttributeType: This is where we store type of attribute. It is set to store “*” for
multi-valued attributes and “1” when attributes will be of atomic type.
3. Determiner: This field is used to hold the binary value, and it will store 1, if the
given attribute is determiner in functional dependencies (FDs), and it will store
0, if it is not a determiner.
For example, consider the following FDs
B→A
B→C
E→D
In these FDs, B and E are the determiners, while A and D are not determiners
so value 1 will be stored in determiner field of attribute B and E and 0 for A and
D nodes.
4. childofnode [0], childofnode [1], childofnode [2] and childofnode [3]: These
fields will hold the pointer to all the children of the attribute. This assumes that
there can be at most four children of any given attribute.
For example,
A→D
B→E
A → BC
Attributes names and functional dependencies will be taken as input, and it will get
temporarily stored in different arrays separately. We have used two different 2-D
arrays, one for attributes names and other for functional dependencies.
By using the given FDs and the attributes, automatic key detection will be done for
each relation schema. The key which will get detected will act as a primary key for
that relation schema. The program will go through all the attributes and FDs, and the
determiners which will determine all the attributes present in relational schema will
be selected as key.
The key detected in automatic detection can be of the composite form or non-
composite form. Either way we have to store the key in linked list to create the
relational tree. If the key is not composite in that case, only head of the linked list
will store the key for the given relational schema. In case of composite key, each
attribute in the key will get stored in linked list as a different node.
This step is the first step towards getting minimal cover of FDs, and by doing
automatic key detection, we are going to calculate minimal cover of the given relation
schema.
The key which gets detected automatically will get stored in the linked list, and by
using given FDs, the child of key node in the linked list will get detected. The child
pointer of a particular node in linked list will point to the child of that node. There
will be maximum four child node pointers for each node.
A New Data Structure for Representation of Relational … 311
After allocating child to the node, program will traverse the FDs for the child
considering it as a parent, and if there is child present for that node, then child
pointer of that node will point to the child, and if child is not present, then child
pointer will point to NULL. Recursively, program will search the FDs for each node
by traversing linked list and FDs.
Step 1: In this first step, algorithm will traverse the linked list of keys, and after
taking of each node, it will search that node in the linked list.
Step 2: After getting that node, it will traverse that FD and will look for the child
of that node, i.e. RHS side of the FD. Algorithm will detect the right side
of FD, and it can be of composite form or non-composite. If the right side
is of composite form, then the algorithm will detect each attribute from
composite form and make each different node as a child from composite. If
it is not composite, then it will directly make a new node for that attribute
and will make that node as a child of left side of FD.
Step 3: Then, algorithm will recursively check for the child node as a parent, it will
traverse the FDs for that node and detect its child, and its child pointer will
point to that child.
Step 4: Before the Step 2 and Step 3, it will check that all child node pointers are
full or not. It will point new child pointer node to any new node if any child
pointer is null before that.
Following section explains an example of representing a real-word relation
and its FDs using relational trees which would improve understanding of the
algorithm discussed above.
Consider a relational schema
EMPLOYEE = (supp_id, supp_name, street, supp_phone, produc_id, produc_name,
produc_price, produc_quantity)
Functional Dependencies
supp_id→supp_name, street, supp_phone
produc_id→produc_name, produc_price
supp_id, produc_id→produc_quantity
Process of Formation of Relational Tree
Step 1: In the above given relational schema and FDs, all the attribute names and
FDs will get temporarily stored in the array. Here, for “node [] []”, one 2D-array will
get created which will store the attribute names, and for “fd [] []”, another 2D-array
will get created which will store the given FDs temporarily.
Detailed information of the attributes is shown in Table 2.
Step 2: By using the above two arrays, the algorithm will detect the primary key
of the relational schema. In the above example, the primary key will be composite
key consisting supp_id and produc_id. Through primary key, we can access all the
attributes nodes of relational schema.
312 R. Jadhav et al.
Table 2 Detailed
Attribute name KeyAttribute NodeType
information
supp_id 1 Atomic
supp_name 0 Atomic
Street 0 Composite
Supp_phone 0 Multi-valued
Product_id 1 Atomic
Product_name 0 Atomic
Product_price 0 Atomic
Product_quantity 0 Atomic
Step 3: Primary key will get stored in the linked list of relational tree.
In the FD: supp_id, produc_id→produc_quantity, both (supp_id, produc_id) will
get stored in the one node of linked list for the representation.
The other two keys will get stored after that in the linked list, and supp_id will be
the head of linked list (Fig. 1).
Step 4: Now, while traversing the linked list, first node will be supp_id, and by
traversing the FD’s array, program will get the child nodes of the attribute supp_id.
There are three child nodes for supp_id which are supp_name, supp_phone and
street. Child field’s pointer of supp_id will point to these three child nodes.
Therefore, relational tree will be as shown in Fig. 2.
Step 5: Now, program will recursively search for all child nodes for each node in the
linked list, i.e. primary key. Step by step, child nodes of each primary node will get
attached to the its parent node in the linked list (Fig. 3).
Final Relational Tree
See Fig. 4.
Memory Representation for EMPLOYEE Schema.
Memory representation for EMPLOYEE schema is shown in Fig. 5.
Fig. 2 Stage 1
A New Data Structure for Representation of Relational … 313
Fig. 3 Stage 2
Basic difference between relational trees and previous tool lies in the basic idea
behind using trees and linked list data structure, respectively. Technique proposed
in [2] uses single linked list to store attributes and FDs. Micro uses two linked lists,
314 R. Jadhav et al.
one of which is used to store attributes, and the other is used to store FDs. However,
when we use single linked list to store relational schema and FDs, accessing the
required node becomes way faster than expected. Using a singly linked list probably
uses a list to store all the presently created nodes along with a pointer pointing to
the address of the node, or it travels the whole linked list to find a particular node.
But the basic problem with this technique lies with the structure of the node. A node
contains pointers to determiner of that node. However, this is not how logical flow
takes place. Basic flow should be from determiner to its children.
In relational trees, linking is from determiner to its children. Any given node of
relational tree stores pointers to its children. Also, this allows us to create a structure
which can be simply understood as a linked list of roots of trees, where all the roots
together act as primary key.
Along with this, relational trees take advantage of nonlinear structure (trees) while
maintaining the general idea of keeping attributes that contribute to primary key
together (linked list).
Consider the following example
X → Q, A
In previous method, a linked list of these three nodes is created. Nodes containing
Q and A store address of X, but in relational trees, a linked list of primary key (i.e.
X) is created. X has a pointer pointing to Q and a pointer pointing to A.
Following is the comparison between note structures of current (Table 3) and
previous approach (Table 4).
7 Conclusion
using a singly linked list [1]. Using relational tree to represent a relational with the
functional dependencies overcomes the flaws of representation done using linked list.
It gives the solution to children to determiner linking system in the previous approach
by providing a new node structure for representation. Normalization algorithms can
be applied, to normalize the given schema.
Thus, it is concluded with the expectation that relational tree performs better than
tool proposed in [1] and Micro [4].
8 Future Scope
References
1. Dhabe, P.S., Y.V. Dongare, S.V. Deshmukh. 2010. Representation of a database relation and it’s
functional dependencies using singly linked list data structure for automation of normalization.
IJERIA 3 (No. II): 179–198.
2. Codd, E.F. 1970. A relational model of data for large shared data banks. Communications of
the ACM 13 (6): 377–387.
3. Hetch, C. Stephen. 1998. US Patent 5778375—Database normalizing system.
4. Du, H., L. Wery. 1999. Micro: a normalization tool for relational database designers. Journal
of Network and Computer Application 22: 215–232.
5. Yazici, A., Ziya, K. 2007. JMathNorm: a database normalization tool using mathematica. In
Proceedings of International Conference on Computational Science, 186–193.
6. Bahmani, A., M. Naghibzadeh, B. Bahmani. 2008. Automatic database normalization and
primary key generation. Niagara Falls Canada IEEE.
7. Maier, D. 1988. The Theory of Relational Databases. Rockville, MD: Computer Science Press.
316 R. Jadhav et al.
Aseem Patil
1 Introduction
A. Patil (B)
Department of Electronics Engineering, Vishwakarma Institute of Technology, Pune, India
e-mail: aseem.patil16@vit.edu
will be implemented. The screen provides a favorable response, and the picture is
acknowledged if all classifiers offer a favorable response. Otherwise, the next filter
ran from the set of N filters. The Haar function extractors (weak classifiers) are
made in each of the classifier already present in the set. Each Haar characteristic is
a weighted sum of 2-D integrals connected to each other in tiny linear zones. The
weights can be ±1.
The sleepiness/drowsiness of the rider is the main factors that cause injuries,
according to WHASTA operational stats. However, over the years, India faces serious
repercussions for road safety, with the growing pace of every single individual trying
to own their personal vehicle followed by an extension. In India, there has been a
5% increase in the complete percentage of route fatalities from 580,400 in 2013 to
611,414 in 2018.
If vehicle generation prevents or at the least warns of driving force fatigue, what are
the signs and symptoms that may be detected by means of the motive force? There
are numerous classes of technologies that can detect driver fatigue, according to
investigate. The primary is the usage of cameras to display the behavior of a person.
This process includes keeping track of their eye aspect ratio, yawning, the head
position and a diffusion of various other things. The subsequent of those technologies
is the popularity of voice. The voice of someone can often provide clues as to how
worn-out they are. Specific rationalization of the underlying techniques is used for
the detection of drowsiness are as follows:
• Electrocardiogram and electroencephalogram
• Local binary patterns
• Steering wheel motion
• Visual detection
• Eye blinking-based approach.
Fig. 1 Schematic of sensing system integration for driver drowsiness and alertness system using
EEG
The concern in machine vision and image processing accelerated in local binary
patterns. LBP efficiently sums up local pixel models as a non-parametric technique
by linking each pixel to its neighboring pixel vertices. The most significant charac-
teristics of LBP are their sensitivity and flexibility in computer to monotonic modi-
fications in lighting. This method is used mostly to detect stimuli on the body such
as joy, melancholy and thrill. In the drowsiness assessment process, LBP is used in
the vehicle face detection, dividing the picture into four quadrants and separating the
upper and lower parts.
steering wheel in small increments, drivers are constantly assessing the situation
ahead and applying small, smooth, steering adjustments to correct for small road
bumps and crosswords.
Therefore, it is possible to determine the driver’s somnolence state based on small
SWMs and thus provide an alert if necessary. In a simulated environment, light side
winds pushing the car to the right side of the road were added along a curved road
to create lateral position variations and force the drivers to make corrective SWMs.
The blinking rate and duration of the eye closure are measured in this eye to detect the
drowsiness of the driver. Because at that time, when the driver felt sleepy, his/her eye
blinking and his/her eyelid gaze differed from normal situations so that drowsiness
could be easily detected [2]. And a remotely positioned camera is used in this sort of
system for collecting recordings and machine imaging methods to frames to evaluate
the completion rate of the head, nose and ears [3]. Drowsiness of the rider can be
identified with these deeper visual lenses and the blinding hoard. Any indications
that sway the face, lower the ears or nodes continuously, can be monitored by the
device installed in a special spot in the vehicle.
In many applications such as eye gaze tracking, iris detection, video conferenc-
ing, auto-stereoscopic displays, face detection and face recognition, eye detection is
required. This paper uses color and morphological image processing to propose a
novel technique for eye detection.
To understand the system more effectively, we shall have a look at the block diagram
we have made for the design of the system (Fig. 2).
In the proposed system, we shall crop the image using an 8 MP USB camera that
shall be mounted on a vehicles front view. We shall use facial detection to crop only
Real Drowsiness Detection Using Viola–Jones Algorithm … 321
the face and remove any other unnecessary details from the cropped photo. We shall
use Viola–Jones algorithm in detecting the eyeball stats away from the entire facial
image. Using the eye aspect ratio, the system will be able to determine the strong
homogeneous coordinates and will store them in a basic excel file. From there, we
shall implement the process and get the required result.
if ear> Threshold:
DBAR=0
If EAR drops below the set threshold in any case and remains for at least 1 s, it
will be detected as blink, and COUNTER will store blink no value [2]. If further,
EAR remains below the threshold for more than 3, it is considered to be sleep and
will be displayed on somnolence scale, and no drowsiness events will be stored
in the DEVENT variable. This is the logic we are going to use to implement the
above-proposed system. This can only be done by Viola–Jones algorithm.
322 A. Patil
This is the system’s first and initial phase. For the present owner, the configuration
is done and optimized. The driver’s head position is the primary tracking phase.
When the driver’s face is effectively positioned, processing the specified image is
straightforward, and the driver’s actual mental status is also recognized [4]. The
camera used is of an intensity of 8 MP. We can implement the face detection using
the proposed method, which is Viola–Jones algorithm. The steps for doing so are as
follows:
(1) To build a simple prescreening filter with template features for the first stage in
the cascade.
(2) Encode the most basic differences between the template-based faces.
(3) Match the training set and the training/demonstration test set.
4) Draw the final decision on the positive training set in the first place and the
negative training set in the second place.
(5) Read the image source and remove the ROI from the image face source. ROI is
the image of the sub and must be smaller than the image of the target.
(6) Do standardized cross-correlation and find ROI and target image peak coordi-
nates. Calculate and display the standardized cross-correlation as a surface plot.
Here, Viola–Jones algorithm plays a vital role.
(7) Find the full inverted images. The complete distortion or transfer of pictures
relies on how the top is located in the cross-correlation matrix and on whether
the pictures are sized and positioned.
(8) Check that the face of the target image is extracted. Figure 3 where the face
matches exactly within the target image.
The setup of the system includes two major processes: (i) extracting the driver’s
facial expressions and detecting their head movement and (ii) collecting different
samples of eyes open or closed. This information gained is further processed to
analyze the current state of the driver (Fig. 4).
From the 68 facial landmark points, we can assume the distances parallel to each
other for each eyelid and the eye aspect ratio that holds the facial landmark points
in visualization [5]. After applying Viola–Jones algorithm, we get the image to be
transferred into a spider web-like structure that connects each point with each other
(Fig. 6).
Imagine we have a tendency to have an interest into coaching a model that is able
to localize solely the landmarks of the left and right eye [6]. To do this, we have to
edit the image and vision computational annotations by choosing solely the relevant
points. This can be done by calling the slice_xml () function that makes a replacement
xml-file with solely the chosen landmark points of the respective person.
The final occurrence comes when the eye aspect ratio is measured for each eye
that helps us to determine the accuracy and precision of the measurement taken when
the person was drowsy. We shall scale a graph of the measures in the future before
predicting the number of hits and the percentage of hits for each person used as a
data point in the data set made.
Real Drowsiness Detection Using Viola–Jones Algorithm … 323
From the given figure (Fig. 5), the points that are shown red represent the 68
facial landmark points that are situated on the face of a human being. The facial
landmark points are basically the strong points out of the points that are white in
color. We extract the features based on the highest rate of the EAR. If the EAR for
a certain point is greater than another facial landmark point, we shall neglect such
points. In the given figure, we could identify a total of 68 facial landmark points, out
of which 42 stood out. Using these 42 points, we found the average of the distances
between the ear and the eye of the individual in his normal state and compared this
average to the average of the same individuals’ fatigue state. This helped us in getting
progressive and real readings.
If the driver is found in the abnormal state which could either be in a trance state or
drunk, the system activates the alarm. The alarm that is rung can be in the form of
music, vibrator placed under the seat or a normal alarm noise.
324 A. Patil
Fig. 4 The data set visualizes the following 68 facial landmark points of a higher resolution that
is related to the HOG descriptor
This is measured using the steering angle sensor and is a widely used vehicle-based
measurement for detecting the level of drowsiness in the driver. The driving activity
of the driver is evaluated using an angle sensor attached to the steering wheel. If you
are drowsy, your steering gear reduces the amount of nano adjustments relative to
ordinary steering measurements [3]. We found that drivers who were sleeping poorly
had smaller turns on the steering wheel than ordinary riders. We considered only small
steering wheel movements (between 0.4° and 6°) necessary for adjusting the side
stance inside the row to eliminate this impact of lanes. In particular, the directional
conduct is then drive-like (e.g., driving practice), and drivers usually evaluate and use
simple, smooth, motor modifications to fix little obstacles or crosswinds by rotating
the steering wheel in slight increase based upon the challenge. In addition, the riding
performance is usually determined by drivers.
Paul Viola and Michael Jones presented a fast and robust face detection method that
at the time of release is fifteen times faster than any technique with 95% accuracy
at about 17 fps. The algorithm of detection Viola and Jones is used as the basis of
our design. Since we know that all human faces have some similarities, we used this
concept as a hair feature to detect face in image. Algorithm looks for a face’s specific
hair feature if the algorithm found passes the candidate to the next stage.
Real Drowsiness Detection Using Viola–Jones Algorithm … 325
The candidate is not a complete image here but only a rectangular part of this
image, known as a sub-window, is 24 * 24 pixels in size. Check the entire image
with this window algorithm. In this method, we will be using the cascading method
from Viola–Jones algorithm.
Using stage cascading, the false candidate can be eliminated quickly. If the first
stage was not passed, the cascade eliminates the candidate. If it passed, it is more
complicated than sending it to the next stage. If a candidate has passed the entire stage,
a face will be detected. This makes Viola–Jones algorithm a discrete and different
form of dividing the good from the wrong and helps in maintaining the accuracy
percentage of the system (Fig. 6).
Each vision is represented by 6(x, y) points from the top angle of the mirror (as if you
were looking at the individual), then in the chronological position operating around
the mirror (Fig. 7).
326 A. Patil
In our case of drowsiness detector, we will monitor the ratio of the attention to
visualize if the value falls but does not rise again, implying that the person has closed
their eyes. The subsequent formula is used for the calculation of EAR.
p2 − p6 + p3 − p5
EAR = (1)
2 p1 − p4
From the following Eq. (1), the numerator calculates the range between the vertical
eye landmarks, while the denominator calculates the range between inverted eye
landmarks, properly balancing the denominator because of only one set, but two
sequences of vertical numbers exist. As the eye is accessible, the aspect ratio is
almost continuous but falls to null rapidly when the eye is blinked as an involuntary
action.
5 Results
Under normal and regular conditions, the mean of the percentage of hits gives us the
accuracy of the system. It turns out to be 93.37%. By Table 1, we can conclude that
the accuracy that we needed is more than expected.
Table 2 shows the result table of the different hits observed at circumstances in
real life with a cabby and a person wearing spectacles. It is quite hard to understand
whether the person is drowsy or not when he/she is wearing spectacles. That is why
the accuracy of the driver with glasses is lesser than average.
Real Drowsiness Detection Using Viola–Jones Algorithm … 327
Table 1 Under normal and regular conditions, the mean of the percentage of hits gives us the
accuracy of the system. It turns out to be 93.37%. By the following table we can conclude that the
accuracy that we needed is more than expected
Test Number of observations Number of hits Percentage of hits
Yawn detection 170 143 84.11
Front nodding 200 184 92.0
Assent of the head to the 200 190 95.0
right
Assent of the head to the 200 191 95.5
left
Distraction to the right 200 184 92.0
Distraction to the left 200 193 96.5
Blink detection 200 197 98.5
Table 2 The following table shows the result table of the different hits observed at circumstances
in real life with a cabby and a person wearing spectacles. It is quite hard to understand whether the
person is drowsy or not when he/she is wearing spectacles. That is why the accuracy of the driver
with glasses is lesser than average
Test Number of observations Number of hits Percentage of hits
Driver with a cab 1400 1295 92.5
Driver with glasses 1400 1183 85.5
From the tables obtained, we have plotted four sample graphs randomly. We find
that the accuracy is in its nominal state and that there may be variation in steering
angle when a person is drowsy as every person has his/her own energy level till they
get to their extreme level of drowsiness condition (Fig. 8).
6 Conclusions
The paper has suggested a scheme to help a driver by assisting his/her condition avoid
significant collisions usually provoked by drowsiness. The driver status is determined
using image processing algorithms. A buzzer and a horn warn the user when they
are in the drowsy circumstances. The location of the driver’s face is determined with
the regard to the center of propulsion, and the present driver position is determined.
High focal camera of excellent precision captures the motion of the body. The scheme
provides an additional tracking function. If the rider yawns more often, then the alarm
is still triggered. The detector detects whether or not the rider is drunk or drowsy.
For correct precision, the sensor should be distanced from the rider. The warning,
which can be in audio or vibration form, can encourage the rider to securely achieve
the desired state of mind. The scheme can add to the detection of the actual driver’s
328 A. Patil
Fig. 8 Output results obtained by testing the code with four drivers. The result we go was 93.37%
accurate, and this was only possible because we took real drowsy people as our test samples. The
red line shows drowsiness condition, and the blue line shows alertness condition in that person
condition and the frequency of road accidents although further advanced research is
still required.
The advanced driver help system that has been designed is used to observe the
temporary state of driver. It may be used to check if the driver is drunk or not. This
technique will be enforced in vehicles in real world to administer provision to take
live video feed of driver when he/she is on the road either driving or stuck in traffic.
References
1. Saito, Yuichi, Makoto, Itoh, Toshiyuki, Inagaki. 2016 March 21. Driver assistance system with a
dual control scheme: effectiveness of distinctive driver sleepiness and preventing lane departure
accidents. IEEE Transactions on Human-Machine Systems, 4–8.
2. Sari, Nila Novita, Yo-Ping, Haung. 2016. A two-stage intelligent model to extract options
from PPG for somnolence detection. In 2016 International Conference on System Science and
Engineering (ICSSE) National Chi Nan University, July 7–9, 2016.
3. Tran, D., E. Tadesse, W. Sheng, Y. Sun, M. Liu, S. Zhang. 2016. A driver help framework
supported driver sleepiness detection. In The sixth Annual IEEE International Conference on
Cyber Technology in Automation, management and Intelligent Systems, June 19-22, 2016, 7–9.
4. Aleksandar, Oge, Marques, Borko, Furht “Design and Implementation of a Driver somnolence
Detection System A practical Approach.
Real Drowsiness Detection Using Viola–Jones Algorithm … 329
5. Anjali, K.U., Athiramol K. Thampi, Athira, Vijayaraman, Franiya M. Francis, Jeffy N. James,
Bindhu K. Rajan. 2002. Real-time nonintrusive observation and detection of eye blinking visible
of accident interference thanks to drowsiness. In 2016 International Conference on Circuit,
Power and Computing Technologies[ICCPCT].
6. Ahmed, J., Jain–Ping, Li, S. Ahmed Khan, R. Ahmed Shaikh. Eye Behavior Based Mostly
Sleepiness Detection System, 6–8.
Smart Sound System Controller
for Restricted Zones
Abstract Noise pollution is a major aspect and has become a serious issue. The
scale or range of noise pollution is increasing rapidly day by day. According to
the statistical data, the acceptable range of noise is 40–45 dB. But during festivals
like Diwali and Ganesh festival, it is observed to be around 90–95 dB. This rate of
growth of pollution is very high which leads to environment problem and disturbance
to ecosystem. To bring it under control, it is necessary to take appropriate actions
against the causes of noise pollution. In existing systems, basic monitoring of the
noise is done which is not sufficient to control and reduce the problem of noise
pollution. Hence, we are introducing a system which will overcome this issue and
will not only help us to monitor the level of sound but also provide a way to control
that sound-emitting device using IoT. This system provides a fast and automated way
for taking action against major sound-polluting devices in specific areas. Thus, with
the help of our system, the noise pollution can be detected as well as controlled to
remain in specified limits.
1 Introduction
Nowadays, the entire world is moving toward the era of technology and smart devices.
These upcoming and rapidly developing technologies are making human life more
and more comfortable [1]. All the systems around us are becoming smart and auto-
mated. Internet of Things is the main backbone of majority of the smart systems.
The high amount of use of Internet and the human interaction with the machines
have increased the requirements and importance of IoT on a large scale. IoT has
contributed in many ways for the development of smart and automated devices and
systems. The availability of low-cost sensors that can share information easily is the
major part toward the success and popularity of IoT [2].
Along with the developments in the technologies and rise in population, environ-
mental problems have also been increasing on a large scale. Pollution is one of the
leading problems among all. Any type of pollution is not only harmful for the nature
but also very much harmful for humans. Sound pollution is one such type of pollution
[3]. There are various researches done on the problem of sound pollution detection,
and various rules have been imposed for controlling sound pollution. But the existing
model requires a large amount of manual work of collecting data from the pollution
spot and then imposing rules for its control [4]. Our proposed model “Smart Sound
Control System Controller” is a smart system which not only detects the sound pollu-
tion but also controls it in a complete automated manner. “Smart Sound Controller”
makes use of sound sensors to detect the sound system that is causing sound pollution
in the restricted zones and stops them at that particular instance. This paper gives
the description of designing and working of our smart sound system controller using
IoT [5].
2 Motivation
The main motivation for the system comes from festivals like Diwali, Shiv Jayanti,
Ambedkar Jayanti, Moharam, Navratri, Ganesh Utsav, etc. [6]. The noise level
increases up to 80–100 dB during evening hours against the normal level of 75 dB.
Due to this unwanted sound, not only the small children and the older people get
irritated but also the young people get rid of this thing which caused many problems
[7]. This system will help to control that unwanted sound by using the RF transmit-
ter and receiver and IoT concepts such as noise increases in the restricted zones the
sound sensor will be able to sense it and send a signal to the controller board so that
it will be able to keep track of that data accordingly and send the RF signals to the
RF receiver [8]. All the sensor data will be uploaded to the web server for keeping
track of that data.
Smart Sound System Controller for Restricted Zones 333
3 Literature Survey
Smart sound control system is very beneficial for India by making use of this system in
restricted areas such as hospitals, schools, and co-operate areas. Our system provides
means to overcome the disadvantages of the existing system.
Sound pollution monitoring system includes device like Arduino UNO which is
primarily based on ATmega328 microcontroller and helps to make analysis of the
sound and the air pollution in particular region. If there is any increase in pollution,
it notifies people through android application to take appropriate action [9].
Air and sound pollution monitoring system based on IoT consisting of air and
sound level sensor is used to sense the live pollution of air and noise in that region
and transmit that data through Raspberry Pi over an application. That allows the
authorities to take required action against this noise and air pollution [10].
Air and sound pollution monitoring system using technology of IoT which
includes some sensors, namely MQ7, MQ6, MQ135, and LM35, to measure the
amount of carbon monoxide, cooking fumes, carbon dioxide, smoke, and temper-
ature, respectively, in the environment. And it also uses sound level sensor just to
detect the sound and air pollution. The output of the system obtained is in digital
form so that the authorities can take appropriate action against it [11].
IoT-based air and sound pollution monitoring system uses the sound level sensor
and gas sensor to detect the noise in restricted areas and measure the amount of toxic
gases in the air to report this data to authorities to take appropriate action against
the pollution [12]. The existing model is an embedded device which monitors the
levels of noise pollution and carbon dioxide in the atmosphere and helps in making
the environment intelligent as well as interactive with the objects using wireless
communication. The existing system model (Fig. 1) is more adaptive and distributive
in nature in monitoring the environmental parameters. The proposed architecture is a
4-tier model in which each module functions for noise and air pollution monitoring.
The 4 tiers of proposed model consist of the environment in tier 1, sensor devices
in tier 2, tier 3 for sensor data acquisition, and decision making and intelligent
environment in tier 4 [13].
4 Proposed System
Most of the silent/restricted zones, while ensuring that the necessary facilities are
available to its users, also seeks to provide an atmosphere conducive enough for
research, study, and assimilation. However, noise is a major hindrance in achieving
such conducive arrangements in our silent/restricted zone. Therefore, measures need
to be taken to eliminate this problem. The proposed model device is divided into two
parts: one for the restricted zone that is shown in Fig. 2 and second for the sound
system that is connected between the sound system and amplifier which is shown in
Fig. 3.
334 R. Jadhav et al.
The proposed system monitors the sound and noise signals by using the noise
sensor and sends the data to the arguing UNO board. After that data can be tracked
by the Arduino. Until the sound is in range between 0 and 75 dB, no action will be
taken. But when the sound range increases above 75 dB, the device sends the RF
signals by using RF transmitter. Also, the data monitored by sound system is sent to
the web server by using the Wi-Fi module.
The proposed system model is mainly used to control the noise pollution created by
the sound/DJ system. To control the sound level in DJ sound systems, we have used
two systems. One system will be on the hospital (or at any restricted zone), and the
other system will be connected to the sound-emitting device like Dolby. In the first
system which is there on the hospital, we have used NodeMCU, RF transmitter, and
sound sensor (shown in Fig. 2). In the second system which will be connected to the
sound-emitting device, we have mainly used Arduino UNO module and RF receiver
(shown in Fig. 3). We have used SD card adapter as sound-emitting device.
When the sound-emitting device comes into the 1 km range of the hospital, sound
sensor on the system on the hospital will start sensing the sound and will transfer
those signals to the Arduino. The Arduino will then send the signals in decibels to the
NodeMCU. At the same time, when the signals are received, the level of this sensed
sound is checked by the NodeMCU. If the level of the sensed sound goes above
75 dB, then with the help of the RF transmitter, it will send the radio frequency
signals to all the systems connected to all the sound-emitting devices in the 1 km
range of the hospital so that we can reduce the sound level of all the sound-emitting
devices by 50–80%. Now, suppose if you do not want to reduce the sound level of all
the sound-emitting devices by 50% then you can also disable all the sound-emitting
devices by sending stop signals which will switch off the power supply of sound
336 R. Jadhav et al.
in sound-emitting device. And now the sound-emitting device will remain disabled
until the device goes out of the restricted zone [1].
The most important thing before playing a DJ/sound system on roads or at any public
places is to issue a No Objection Certificate (NOC) from the police inspector of that
area. Without the NOC from police, it is illegal to play the DJ/sound system. Hence,
when the sound system owner or the person who wishes to play the DJ comes to
the police station, the police officer will complete his registration process by his
approval and give him the NOC along with the controller device system shown in
Fig. 3. Only if the controller device is connected to the DJ/sound system, the owner
will be granted with NOC. If not, then he has no legal permission to play his sound
system.
Once the sound system is played and the allotted time of permission for that
registration expires, the sound system owner has to return the controller device back
to the police station. To ensure the security of the controller device, police inspector
can charge some deposit money from the sound system owner while issuing it to him.
When he comes back to return it, after verifying its condition, the deposit money can
be returned back as it is (Fig. 4).
The communication between the transmitter and receiver is done using a simple
transmission protocol. This protocol works for sending the messages in the form
of packets from transmitter to desired receiver. This protocol mainly shows proper
working when there are multiple receivers for one transmitter (Fig. 5).
Hence, the data packets sent by the transmitter have a fixed format which helps
in identifying the desired transmitter to which the data has to be sent.
Each receiver has a specific ID by which it is recognized. When the message is
sent by the transmitter, it has the receiver ID and then the corresponding message in
the packet. Receiver ID is matched with all receivers in the range, and then, the one
whose ID matches accepts the packet.
As shown in Fig. 6, the first 8 bits (1 byte) are reserved for the receiver ID and the
next 8 bits (1 byte) are reserved for the message. So according to our working, if any
system in the range is detected with sound signals greater than threshold value, the
transmitter transmits the packet with the receiver ID of that specific receiver and the
message to stop the sound system. On receiving the packet, corresponding action is
taken by the receiver-side device.
Smart Sound System Controller for Restricted Zones 337
xcluma Sound Sensor Module Sound Detection Module: This is a sound sensor
module sound detection module main chip, i.e., Lm393, electret microphone. It has
3 sets of servo horns and fittings. It can be used for sonic lamp with photosensitive
sensors which acts as a sound and light alarm, and it can also be used in the occasion
of voice control and sound detection circuit boards’ output switch value.
RF Module Ask Wireless Transmitter Receiver Pair 433/434 MHz Compatible
with Arduino, Raspberry Pi: It has an operating voltage of 3.6–5.5 V (5V rec-
ommended). It has receiver sensitivity as high as −110 dBm. It has high quality
and compatible with HT12E-HT12D packed in anti-static packet to protect during
handling and shipping. It has low power and good range (200–300 m).
ESP8266: The ESP8266 is a low-cost Wi-Fi microchip with full TCP/IP stack
and microcontroller capability produced by Shanghai-based Chinese manufacturer
338 R. Jadhav et al.
Espressif Systems. Its memory size is 32 KiB instruction and 80 KiB user data. It
has default CPU speed of 80 MHz or 160 MHz.
3–24 V Piezo Electronic Tone Buzzer Alarm 95DB Continuous Sound: Its alarm
diameter is 22 mm/0.86 . Its alarm height is 10 mm/0.39 . It has two mounting holes
of distance 30 mm/1.18 . It has two wires of length 90 mm/3.54 . Its buzzer type is
piezoelectric. Sound pressure level is 95 dB. Rated voltage is 12 V DC. Its operating
voltage is 3–24 V. Maximum current rating is 10 mA. Its frequency is 3900 ± 500 Hz.
Arduino UNO: The Arduino UNO is an open-source microcontroller board based
on the microchip ATmega328P microcontroller and developed by Arduino.cc. The
board is equipped with sets of digital and analog input/output pins that may be
interfaced to various expansion boards and other circuits. Its CPU is microchip AVR
(8-bit). It has SRAM memory. Its storage is flash EEPROM.
Micro TF Card Memory Shield Module SPI Micro Storage Card Adapter For
Arduino: The module (micro-SD card adapter) is a micro-SD card reader module,
and the SPI interface via the file system driver, microcontroller system to complete
the micro-SD card read and write files. Arduino users can directly use the Arduino
IDE which comes with an SD card to complete the library card initialization and
read-write.
Smart Sound System Controller for Restricted Zones 339
4.5 Experimentation
(a) This system is placed in restricted zone for detecting sound (Fig. 7).
The sound level sensor is used to detect the signals or loud noise and send the signal
from transmitter to receiver at the sound system to control the sound level.
(b) This circuit is placed on SOUND SYSTEM for controlling the sound after
monitoring the loud noise (Fig. 8).
Receiver at the sound system gets signal from transmitter end and controls the sound
if it crosses the limit.
(c) The data is monitored, and data is sent to server.
The sound signals detected by the sound sensor and that particular instance of time
are shown in Table 1. The level of the sound coming from the sound system that is
detected by the sound sensor in analog format is converted into digital format, i.e.,
in decibels (dB), and displayed at the wireless module through the NodeMCU.
Graph 9 gives a clear graphical representation of exactly how the system works. If
the detected sound range is less than 75 dB, there is no action taken by the system, but
once the sound signal is detected above the threshold value, i.e., 75 dB, the system
is stopped by sometime. In the above graph, we can see, at the time instance of 9 s,
the sound range went to 95 dB. Here, the entire working of the system takes place
and the sound system is stopped. As the system is stopped, the sound sensor can
sense only the default values until the sound system is in stopped state. When the
system starts again, the same procedure is repeated. The next limit is exceeded at
time instance of 17 s with 83 dB of sound level. Hence, again the sound system is
stopped.
Smart Sound System Controller for Restricted Zones 341
5 Result
6 Conclusion
With the help of this smart sound control system, we can automatically control level
of sound when a high-noise-making system like loudspeaker enters in restricted zones
like hospitals, schools, etc. After installing this system, it can continuously track and
control the sound level within given range around restricted zones. Additionally, this
system has advantages like easy to use as well as less expensive in comparison with
the other systems developed till date for controlling sound pollution in restricted
zones. This helps our system to contribute toward digital India, Skill India.
References
1. Chaitanya, Kulkarni, Kulkarni, Shruti, Bhopale, Siddhi, M.M. Raste. 2017 February. Sound and
air pollution monitoring system. International Journal of Scientific & Engineering Research 8
(Issue 2).
2. Joshi, Lalit Mohan. 2017 November. Research paper on IOT based air and sound pollution
monitoring system. International Journal of Computer Applications 178 (No. 7), (0975 – 8887).
3. Sai, Palaghat Yaswanth. 2017 March. An IoT based automated noise and air pollution moni-
toring system. International Journal of Advanced Research in Computer and Communication
Engineering (IJARCCE) 6 (Issue 3).
342 R. Jadhav et al.
4. Sumithra, A., J. Jane Ida, K. Karthika, S. Gavaskar. 2016 March. A smart environmental
monitoring system using internet of things. International Journal of Scientific Engineering
and Applied Science (IJSEAS) 2 (Issue-3).
5. Guthi, Anjaiah. 2016 July. Implementation of an efficient noise and air pollution monitoring
system using internet of things. International Journal of Advanced Research in Computer and
Communication Engineering 5 (Issue 7).
6. Kaur, Navreetinder, Rita, Mahajan, Deepak, Bagai. 2016 June. Air quality monitoring sys-
tem based on Arduino microcontroller. International Journal Innovative Research in Science,
Engineering and Technology (IJIRSET) 5 (Issue 6).
7. Sai Chandana, P., K. Sreelekha, A. Muni Likith Reddy, M. Anil Kumar Reddy, R. Senthamil-
selvan. 2017 March. IOT air and sound pollution monitoring system. International Journal on
Applications in Engineering and Technology 3 (Issue 1).
8. Al-Ali, A.R., Imran, Zualkernan, Fadi, Aloul. 2010 October. A mobile GPRS-sensors array for
air pollution monitoring. IEEE Sensors Journal 10 (10).
9. Sharma, Anushka, Vaishnavi, Varshney, Roopank, Maheshwari, Upasana, Pandey. 2018 March.
IOT based air and sound pollution monitoring system. International Research Journal of
Engineering and Technology 5 (Issue 3).
10. Pan, Meng-Shiuan, Yu-Chee, Tseng. 2013. ZigBee Wireless Sensor Networks and Their Appli-
cations, vol. 12, issue 3.. Department of Computer Science, National Chiao Tung University
Hsin-Chu, 30010, Taiwan
11. Singh, Arushi, Divya, Pathak, Prachi, Pandit, Shruti, Patil, Priti. C. Golar. 2018. IOT based
air and sound pollution monitoring system. International Journal of Advanced Research in
Electrical, Electronics and Instrumentation Engineering 4 (Issue-3).
12. Deshmukh, Sarika, Saurabh, Surendran, M.P. Sardey. 2019 April. Air and sound pollution mon-
itoring system using IoT. International Journal on Recent and Innovation Trends in Computing
and Communication 7 (Issue IV).
13. Akinkaude, Shadrach Tunde, Kowawole, Peter Fasae. 2018 March. A Survey of Noise Pollution
in Ado-Ekiti Metropolis Using Mobile Phone, vol. 4, Issue 11. Science Technology Department,
Science Research Publishing.
Predictive Analysis of Co-seismic Rock
Fall Hazard in Hualien County Taiwan
Abstract Rock fall hazards pose a significant danger to human lives. Being the
most abundant among the slope failures in an earthquake event, rock falls are one
of the most destructive co-seismic events. A recent earthquake in Taiwan (Mw 6.1)
on April 18, 2019, has been analyzed, and artificial accelerograms were generated
using SeismoArtif software. In preserving the site properties, the Chi-Chi earthquake
accelerogram was used to match the spectral envelope. Data of rock fall during
earthquake in the Zhongbu cross island highway in the Hualien County was collected
from a dash-cam recording of the event. This rock fall was modeled in 2-D using
the ‘Rockfall’ software by Rocscience, and the number of rocks with respect to the
rotational energy of the modeled rock was studied. The artificial accelerogram was
used as an input to the predictive model, and the results predicted the Newmark’s
displacements. It was found that the predicted displacement values were significant
enough to trigger the rock fall but the topography as observed by simulation has
aided in the propagation of the rock fall.
1 Introduction
Co-seismic rock falls are the most common type of landslides that affect large com-
munities in mountainous regions in the event of an earthquake. Apart from posing a
threat to human lives, they are capable of instant property damage. There have been
incidents of houses being crushed by dislodged boulders during earthquake events
A. Sridharan (B)
Department of Physics, Amrita Vishwa Vidyapeetham, Amritapuri, India
e-mail: aadityans@am.amrita.edu
S. Gopalan
Department of Electronics and Communication Engineering, Amrita Vishwa Vidyapeetham,
Amritapuri, India
e-mail: sundar@am.amrita.edu
[1]. Predictive models that can identify regions prone to rock fall could reduce the
societal risk for communities in mountainous regions that are tectonically active
[2–4]. With the advancement in computational techniques, predictive analyses have
become more reliable for hazard prediction and modeling [5].
Many rock fall hazard modeling techniques have been proposed in the literature
[6, 7]. Generally, initial field conditions that cause rock fall such as the exact topogra-
phy are difficult to be assessed. Therefore, reliable predictive models have to employ
high resolution digital elevation model (DEM) and probabilistic methods in their
analysis [8]. Some of the recent models employ the values of acceleration obtained
from accelerogram to determine and predict earthquake-induced hazard [9]. The arti-
cle by Jibson in 2007 discusses models that can be used to assess seismic landslide
hazard [10]. The models were based on 2270 individual acceleration records from
30 worldwide earthquakes. While the proposed models give a global perspective to
assess earthquake-induced slope failures, region-specific prediction requires careful
selection of accelerogram records. The nature of seismic coefficient required for pre-
diction varies among the models, and some usual terms are peak ground acceleration
(PGA), static horizontal coefficient, peak ground velocity (PGV) and others [11, 12].
Introducing terms that are closely correlated with the dependent terms in the model
has been carried out very carefully by various models [13].
In this study, we present a two-phase analysis of a very recent rock fall event in
Taiwan. The event was reported on April 18, 2019, simultaneously along an earth-
quake of magnitude (MW 6.1). The field observations were recorded in a dash-cam
video of a car that nearly missed a massive rock fall. Analysis presented here explains
the possible reason for such a rock fall event triggered by the earthquake event.
Taiwan is situated in the Pacific Ring of Fire on convergent boundary of the ‘Eurasian
plate’ and the ‘Philippines sea plate.’ Devastating earthquakes such as the 1999 Chi-
Chi earthquake of moment magnitude MW 7.6 have occurred in this country [14,
15]. Recently, on April 18, 2019, there was a strong tremor due to an earthquake of
moment magnitude MW 6.1 with epicenter at 23° 59 20.38 N and 121° 41 34.81 E
off the eastern coast of Taiwan. An intensity map of the earthquake generated by
United States Geological Survey (USGS) is shown in Fig. 1. There were many rock
falls reported all over the Hualien County on the eastern coast of Taiwan. The event
considered for the analysis here was along the Zhongbu cross island highway situated
at 24° 12 12.77 N and 121° 26 57.19 E. The rock fall has been simulated, and the
results were correlated with Newmark’s displacement model.
Topography of the Hualien County of Taiwan comprises steep slopes generally
varying from 45° to 85° [17]. The highways in this region are covered by tunnels
at various stretches. Due to the steep topography and slope cuts along the highway,
this region is among the most vulnerable areas prone to rock fall in the country of
Taiwan.
Predictive Analysis of Co-seismic Rock Fall Hazard … 345
2 Methodology
There are two main stages to the analysis presented here; initially the rock fall location
information was collected from the dash-cam video posted after the event shown in
Fig. 2. As the quality of dash-cam video was not very good, the time frame was not
easy to extract. But the video does show that within 3 s of earthquake the rocks reach
the highway. The first stage of the analysis involves simulation of rock fall based on
slope design and other parameters using the ‘Rockfall’ software [18]. The second
stage involves the predictive model that is based on strong motion records generated
for this particular earthquake using SeismoArtif software [19]. SeismoArtif generates
artificial accelerograms based on magnitude, location and hypocentral distance of
an earthquake. Artificial accelerograms were processed to calculate the probability
of failure for the rock slope using predictive models. Finally, the simulation results
were validated by the predicted values.
The topography of the slope was imported to the software from Google Earth as
a polyline that represents the slope. To make the simulation more realistic, a factor
called coefficient of restitution is input into the software, and coefficient of restitution
is the measure that can predict the impact of a rock fall [20]. Definition of coefficient
of restitution (Rc ) can be given by
346 A. Sridharan and S. Gopalan
Fig. 2 Pre and event images of the Hualien County rock fall, Taiwan. Figure on the top (pre) was
adapted from Google street view, in the bottom (event) the dash-cam video still that was recorded
during the event
From the literature, it was found that a certain range of restitution values have
been used in rock fall analysis for various cases in Taiwan [8, 21]. Coefficient of
restitution has two components: normal (Rn ) and tangential (Rt ). Equation (1) will
accordingly change for the two components, the Rn will have normal components of
the velocities, and Rt will have tangential components, respectively. Typical values
Predictive Analysis of Co-seismic Rock Fall Hazard … 347
of the two components were carefully studied, and values that closely represent the
characteristics of study area were selected for analysis (Rn = 0.4–0.7, Rt = 0.5–0.9)
[21].
Around 53 rock fall paths were simulated by the software and mapped on a 2-D
slope model. Along with the coefficient of restitution, evolution of kinetic energy of
the rock mass is another factor that affects the impact. The variation of the rotational
energy of the rocks in the simulation was calculated, and the values of number of
rocks that correspond to the values of measured angular velocity were plotted.
The distance of rock fall event from epicenter was calculated from Google Earth by
drawing a polyline, and the depth of the earthquake was reported to be at 20 km by
USGS [16]. Hence, the hypocentral distance was estimated to be 37.73 km. To gen-
erate the synthetic accelerogram in the SeismoArtif software, the site classification
information is important, and the classification is based on shear wave velocity (VS )
at 30 m depth in the site. Based on VS 30 values reported in Taiwan, the value in the
region of rock fall is more than 760 ms−1 [22].
Generating the synthetic accelerogram requires preserving site properties, and
variations in spectral accelerations are different for different sites. The natural period
of the accelerogram changes with different site classes. SeismoArtif considers the
site classification to produce the synthetic accelerogram; in our case, the site class
was B, and corresponding characteristics of study area were used as inputs. Other
inputs include the moment magnitude and the location of site with respect to the
plate boundaries.
The tectonic location of the study area is modeled in the software as various
options as either ‘Inter-plate regimes,’ ‘Intra-plate regimes’ and ‘Regimes of active
tectonic extension.’ As mentioned earlier, Taiwan is an active tectonic region and is
near the converging plate boundary, ‘Regimes of active tectonic extension’ option
was used to generate the accelerogram in the software. ‘Inter-plate regime’ region
was another option that represents Taiwan. Chi-Chi earthquake data was used for
matching the spectral envelope. Figure 3 shows one of the generated accelerogram
plots.
The Newmark’s model has been an intermediate model that can analyze earthquake-
induced slope displacements. There are conventional pseudo static models and the
348 A. Sridharan and S. Gopalan
Fig. 3 Simulated synthetic accelerogram generated by SeismoArtif software [19]. Due to space
constraints, only part of the accelerogram has been displayed
finite element-based detailed models that are less and more sophisticated, respec-
tively. Also termed as rigid block analysis based on the permanent displacement of
landslide block, it has been one of the most successful models to estimate co-seismic
landslides [13]. Albeit being simplistic and with many approximations, this model
has been used in seismic landslide hazard analysis on a region level [13]. Various
statistical models have employed this analysis for predicting seismic slope failures
[2, 10, 23]. Hseih and Lee in 2011 modified the earlier models for Taiwan based on
Chi-Chi earthquake data. The equations are given by [24]
It was found that average displacement values were 3.96 cm and 7.64 cm in case of
model 1 and 2, respectively. Usually in field, predicted Newmark’s displacement of
few tens of centimeters corresponds to major failure of slope. The average values
obtained here are significant enough to trigger the rock fall. The complete failure of
the boulder however must have been due to the steep topography. Maximum slope
in the areas of rock fall was found to be 60.8° calculated from the elevation profile.
Variation of the slope was found to be in the range 41°–60.8°; this shows that the
Predictive Analysis of Co-seismic Rock Fall Hazard … 349
Fig. 4 Predicted displacement values in cm have been plotted against critical acceleration values.
a The predicted values for Eq. (1) and b The predicted values for Eq. (2)
trigger caused by the earthquake and the topography aided the boulder to continue
the roll. The rotational velocity distribution is shown in Fig. 5, and increased values
are observed upon impact near the highway. From the plot out of the 53 simulated
rock paths on an average, every five rock fall paths have an angular velocity of more
than 10 rad/s. Given the approximate size of the boulder from the video still, it is
safe to assume that this value could cause serious damage on impact.
350 A. Sridharan and S. Gopalan
Fig. 5 Rotational/angular velocity along the simulated rock paths for 53 trials. Y-axis shows the
number of rock paths that have corresponding angular velocity (rad/s)
Figure 6 shows the bounce height on slope for 95th percentile of rock paths (The
simulations that are close to field observations). It can be seen here that bounce
height has suddenly peaked on reaching 10 m distance from the fall. From the video
still in Fig. 2, the boulder shown is approximately 3–4 m above the ground which
is consistent with the bounce height plot which shows values around 8 m from the
simulation.
From the simulated accelerogram, the I a values have been found to be large values
which suggest a moderate to high ground shaking scenario [13]. Typical values of I a
are consistent with the literature [4, 25] and suggests that these values can be more
reliable in assessing the rock fall than the displacements alone. While the magnitude
of the earthquake is more than above 6, there can be attenuation of kinetic energy
of the ground motion at a distance of 37.73 km, which is the distance of the rock
fall from the source. But higher I a values on the other hand might signify increased
ground shaking. For boulders that are already ready to roll, this increased shaking
could have provided the initial trigger. The 95th percentile of simulations show close
resemblance to field observations. These trials were used to assess the bounce height
as well as angular velocity.
It has to be kept in mind that the simulated rock fall paths need not fully converge
to the exact path that might be observed in the field. While most of the rock fall
paths are showing similar characteristics as the available field observations, it is
possible that the bounce height as mentioned earlier could be an over estimation.
Albeit the simulated values have predicted close resemblance to the field observation,
there might always be slight aberrations to the calculated values to the actual field
conditions. Synthetic accelerograms generated have also been indicative of higher
values of acceleration as expected in a near field condition (<100 km from epicenter).
Though Chi-Chi earthquake accelerograms were used to mainly site amplification
properties, there might be certain local site effects during the tremor that might not
have been accounted for.
Recent techniques involve combining the Newmark’s algorithm with rock fall
simulations using Geographical Information System (GIS)-based analysis that have
been proved reliable for rock fall analysis [26]. Future work will involve using the
regional predictive capability of the Newmark’s rigid block model to enhance the
prediction to warn communities vulnerable to such rock fall hazards. Utilizing newer
physical models such as Newmark’s multi block analysis might aid in precision of
the predicted displacement [27]. Early warning systems based on such new models
combined with seismic hazard analysis [28, 29] and other parameters can improve
the safety of travelers along these highways [30].
4 Conclusion
A combination of predictive analysis and simulation for rock fall hazard has been
presented here. Moderate agreement between the two models shows the capabil-
ity of extending such work for wider range of analysis. The resultant analysis has
successfully correlated with the predictive algorithm to generate significant results.
Field observations based on the video stills have provided crucial information that
has aided in fruitful analysis of the rock fall. Highways along the Hualien County
are prone to such rock fall risks and signboards have been placed in various places
along the highway that warn travelers of imminent danger. Models such as the one
presented above can be deployed with a combination of earthquake warning systems
to early warn travelers in real-time scenarios.
References
1. Jacklitch, C.J. 2016. A geotechnical investigation of the 2013 fatal rockfall in Rockville, Utah.
https://etd.ohiolink.edu/!etd.send_file?accession=kent1464978379&disposition=inline.
2. Ma, S., C. Xu. 2018. Assessment of co-seismic landslide hazard using the Newmark model
and statistical analyses: a case study of the 2013 Lushan, China, Mw6.6 earthquake. Natural
Hazards. https://doi.org/10.1007/s11069-018-3548-9.
3. Saygili, G., and E.M. Rathje. 2008. Empirical predictive models for earthquake-induced sliding
displacements of slopes. Journal of Geotechnical and Geoenvironmental Engineering 134:
790–803. https://doi.org/10.1061/(ASCE)1090-0241(2008)134:6(790).
352 A. Sridharan and S. Gopalan
4. Saygili, G., and E.M. Rathje. 2009. Probabilistically based seismic landslide hazard maps:
an application in Southern California. Engineering Geology 109: 183–194. https://doi.org/10.
1016/j.enggeo.2009.08.004.
5. Fan, X., G. Scaringi, Q. Xu, W. Zhan, L. Dai, Y. Li, X. Pei, Q. Yang, R. Huang. 2018. Coseismic
landslides triggered by the 8th August 2017 Ms 7.0 Jiuzhaigou earthquake (Sichuan, China):
factors controlling their spatial distribution and implications for the seismogenic blind fault
identification. Landslides 15, 967–983. https://doi.org/10.1007/s10346-018-0960-x.
6. Ii, F., P.D. Calcaterra, G. Pappalardo, P. Sebastiano, P. Zampelli, P.M. Fedi, S. Mineo. 2017.
Analysis of rock masses belonging to the Apennine-Maghrebide Orogen by means of in situ
and remote methodologies applied to rockfall risk assessment. www.fedoa.unina.it/11456/1/
Mineo_Simone_XXIX.pdf.
7. Harp, E.L. 2002. Anomalous concentrations of seismically triggered rock falls in Pacoima
Canyon: are they caused by highly susceptible slopes or local amplification of seismic shak-
ing? Bulletin of the Seismological Society of America 92: 3180–3189. https://doi.org/10.1785/
0120010171.
8. Ku, C.Y. 2014. A 3-D numerical model for assessing rockfall hazard. Disaster Advances 7:
73–77.
9. Du, W., and G. Wang. 2016. A one-step Newmark displacement model for probabilistic seismic
slope displacement hazard analysis. Engineering Geology 205: 12–23. https://doi.org/10.1016/
j.enggeo.2016.02.011.
10. Jibson, R.W. 2007. Regression models for estimating coseismic landslide displacement.
Engineering Geology 91: 209–218. https://doi.org/10.1016/j.enggeo.2007.01.013.
11. Sepúlveda, S.A., W. Murphy, R.W. Jibson, and D.N. Petley. 2005. Seismically induced rock
slope failures resulting from topographic amplification of strong ground motions: the case
of Pacoima Canyon, California. Engineering Geology 80: 336–348. https://doi.org/10.1016/j.
enggeo.2005.07.004.
12. Jibson, R.W. 2011. Methods for assessing the stability of slopes during earthquakes—a
retrospective. Engineering Geology 122, 43–50.
13. Jibson, R.W., E.L. Harp, and J.A. Michael. 2000. A method for producing digital proba-
bilistic seismic landslide hazard maps: an example from the Los Angeles, California, USA.
Engineering Geology 58: 271–289.
14. Wang, G., A. Suemine, F. Zhang, Y. Hata, H. Fukuoka, T. Kamai. 2014. Some fluidized
landslides triggered by the 2011 Tohoku Earthquake (Mw 9.0), Japan. Geomorphology 208,
11–21.
15. Wang, K.-L., and M.-L. Lin. 2010. Development of shallow seismic landslide potential
map based on Newmark’s displacement: the case study of Chi-Chi earthquake, Taiwan.
Environmental Earth Sciences 60: 775–785. https://doi.org/10.1007/s12665-009-0215-.
16. USGS. 2019. Earthquake overview—United States Geological Survey. https://earthquake.usgs.
gov/earthquakes/eventpage/us700038c1/executive.
17. Khazai, B., and N. Sitar. 2004. Evaluation of factors controlling earthquake-induced landslides
caused by Chi-Chi earthquake and comparison with the Northridge and Loma Prieta events.
Engineering Geology 71: 79–95.
18. https://www.rocscience.com/: Rockfall 2019—Rocscience. https://www.rocscience.com/help/
rocfall/#t=rocfall%2FGetting_Started.htm.
19. Seismosoft. 2018. SeismoArtif, https://www.seismosoft.com/seismoartif.
20. Li, L.Ping., S. qu. Sun, S. cai. Li, Q. qing. Zhang, C. Hu, S. shuai. Shi. 2016. Coefficient of
restitution and kinetic energy loss of rockfall impacts. KSCE Journal of Civil Engineering 20,
2297–2307. https://doi.org/10.1007/s12205-015-0221-7.
21. Wei, L.W., H. Chen, C.F. Lee, W.K. Huang, M.L. Lin, C.C. Chi, and H.H. Lin. 2014. The
mechanism of rockfall disaster: a case study from Badouzih, Keelung, in northern Taiwan.
Engineering Geology 183: 116–126. https://doi.org/10.1016/j.enggeo.2014.10.008.
22. Lee, C.T., and B.R. Tsai. 2008. Mapping Vs30 in Taiwan. Terrestrial, Atmospheric and Oceanic
Sciences 19: 671–682. https://doi.org/10.3319/TAO.2008.19.6.671(PT).
Predictive Analysis of Co-seismic Rock Fall Hazard … 353
23. Saade, A., G. Abou-Jaoude, and J. Wartman. 2016. Regional-scale co-seismic landslide assess-
ment using limit equilibrium analysis. Engineering Geology 204: 53–64. https://doi.org/10.
1016/j.enggeo.2016.02.004.
24. Hsieh, S.-Y., and C.-T. Lee. 2011. Empirical estimation of the Newmark displacement from
the Arias intensity and critical acceleration. Engineering Geology 122: 34–42. https://doi.org/
10.1016/j.enggeo.2010.12.006.
25. Chousianitis, K., V. Del Gaudio, N. Sabatakakis, K. Kavoura, G. Drakatos, G.D. Bathrellos, and
H.D. Skilodimou. 2016. Assessment of earthquake-induced landslide hazard in Greece: from
arias intensity to spatial distribution of slope resistance demand. Bulletin of the Seismological
Society of America 106: 174–188. https://doi.org/10.1785/0120150172.
26. Yue, X., S. Wu, Y. Yin, J. Gao, and J. Zheng. 2018. Risk identification of seismic landslides by
joint Newmark and RockFall analyst models: a case study of roads affected by the Jiuzhaigou
earthquake. International Journal of Disaster Risk Science 9: 392–406. https://doi.org/10.1007/
s13753-018-0182-9.
27. Song, J., Q. Fan, T. Feng, Z. Chen, J. Chen, and Y. Gao. 2019. A multi-block sliding approach
to calculate the permanent seismic displacement of slopes. Engineering Geology 255: 48–58.
https://doi.org/10.1016/j.enggeo.2019.04.012.
28. Sitaram, T.G., N. James, and S. Kolathayar. 2018. Comprehensive seismic zonation schemes
for regions at different scales. Springer. https://doi.org/10.1007/978-3-319-89659-5.
29. Nirmala, V., K. Sreevalsa, S. Aadityan, R. Kaushik. 2016. An investigative study of seismic
landslide hazards. https://doi.org/10.2991/rare-16.2016.60.
30. Ramesh, M.V., Vasudevan, N. 2012. The deployment of deep-earth sensor probes for landslide
detection. Landslides 9, 457–474.
Abstractive Text Summarization
and Unsupervised Text Classifier
Abstract In this day and age, as the Internet gets increasingly cluttered for content,
the comprehension of generated huge texts is growingly becoming a source of major
inconvenience for a normal viewer, and classifying it is another gruesome task. Text
summarization and classification prove to be a boon as it abridge and categorize the
massive text into sizeable length without removing the key information within it and
assign it to its genre. In this paper, the authors have tried to implement the abstractive
type of summarization using sequence-to-sequence RNN. Using this model, the
output perceived is a summary which is short, lossy, and whose length does not
necessarily depend on source text length, and then the authors have used bidirectional
LSTM to find the best one to assign a particular genre.
1 Introduction
Text summarization is the process of abridging very large texts into a concise and
fluent summary while keeping the key information and overall meaning intact [1].
The area of implementation of text summarization is becoming increasingly popular
because be it anything, a small institution, a hefty Web site, a business firm, all require
shortening of the vast texts into readable summaries [2, 3]. It could be anywhere
from machine translation to speech recognition, from image captioning to language
identification, etc. [4].
2 Related Works
The authors are proposing a model for text classification and summarization of huge
text using sequence-to-sequence RNN and bidirectional LSTM. There are many
researches already done in this field but the task was to merge both the broad scenarios
and provide results at one singular place.
The paper also elucidates several explanations about text summarization. The
work by Gupta, et al. is an overview in which they have clarified the facts about
abstractive text summarization [1]. They have thoroughly demonstrated the broad
categories of techniques of text summarization and have explained them in depth
[2, 3]. The paper also infers similar work done by Islam et al. in which they used
sequence-to-sequence RNN to translate Bangla text [4]. They have furthered their
study LSTM and have worked to implement this technique in the Python framework
[4]. The work in this paper is a largely inspired work from the paper of Nallap-
ati et al. where they proposed the system to attention encoder–decoder RNN and
achieved state-of-the-art performance and were able to address many critical prob-
lems in summarization [5]. Azmi et al. have implemented this to convert Arabic
text using user granularity [8]. The text is segmented text topic wise and applied
to an enhanced extractive summarizer inferred to as rules-based sentence reduc-
tion technique or RST-based extractive summarizer, and the length of the extracted
summary was modified to control the size of predicted summary. The results were
better in comparison with an abstractive model. In the work of Bhargava et al., the
authors have implemented the sentiment infusion to implement abstractive text sum-
marization [9]. The work done by Sahoo et al. implements sentence clustering using
Markov clustering principle followed by sentence ranking [10]. The then top-ranked
sentences from the clusters are worked through linguistic rules to form the summary.
Text summarization was also implemented in the work of Negi et al. in which the
authors have used NLP and ML to implement a mix of syntactic featuring to compre-
hend the writing style of incoming reports for supporting timely report monitoring
and perceive authoritative statements [11].
On the other hand, work by Mirończuk et al. demonstrates the type of text classi-
fication techniques available and how can they be used [6]. Liu et al. have worked to
implement bidirectional LSTM with attention mechanism and convolutional layer to
classify text [7]. This is another approach for classifying text which provided better
results than normal supervised techniques. The semantic text classification done by
Altinel et al. demonstrates the advantages of semantic text classification over text
classification which included extraction and using in-words latent relationship and
textual semantic understanding, etc. [12]. The work also compares the performance
comparison of knowledge-based approaches [13].
358 Aditya et al.
3 Summarization Algorithm
5. The steps 3–5 are iterated until <end> token is generated to the maximum length
of the target sequence is hit.
4 Classifier Algorithm
5 Experimental Studies
Fig. 3 Diagnostic plot to observe the distribution of length over the text
________________________________________________________________
__________________________________
Layer (type) Output Shape Param #
Connected to
================================================================
==================================
input_1 (InputLayer) (None, 30) 0
________________________________________________________________
__________________________________
embedding (Embedding) (None, 30, 100) 844000
input_1[0][0]
________________________________________________________________
__________________________
lstm (LSTM) [(None, 30, 300), (N 481200
embedding[0][0]
________________________________________________________________
362 Aditya et al.
__________________________________
input_2 (InputLayer) (None, None) 0
________________________________________________________________
__________________________________
lstm_1 (LSTM) [(None, 30, 300), (N 721200
lstm[0][0]
________________________________________________________________
__________________________________
embedding_1 (Embedding) (None, None, 100) 198900
input_2[0][0]
________________________________________________________________
__________________________________
lstm_2 (LSTM) [(None, 30, 300), (N 721200
lstm_1[0][0]
________________________________________________________________
__________________________________
lstm_3 (LSTM) [(None, None, 300), 481200
embedding_1[0][0]
lstm_2[0][1]
lstm_2[0][2]
________________________________________________________________
__________________________________
attention_layer (AttentionLayer [(None, None, 300), 180300
lstm_2[0][0]
lstm_3[0][0]
________________________________________________________________
__________________________________
concat_layer (Concatenate) (None, None, 600) 0
lstm_3[0][0]
attention_layer[0][0]
________________________________________________________________
__________________________________
time_distributed (TimeDistribut (None, None, 1989) 1195389
concat_layer[0][0]
================================================================
==================================
Total params: 4,823,389
Trainable params: 4,823,389
Non-trainable params: 0
________________________________________________________________
__________________________________
Train on 41346 samples, validate on 4588 samples
Epoch 1/50
41346/41346 [==============================] - 85s 2ms/sample -
loss: 2.8152 - val_loss: 2.5780
Epoch 2/50
41346/41346 [==============================] - 79s 2ms/sample -
loss: 2.4859 - val_loss: 2.4072
================================================================
…Epoch 19/50
41346/41346 [==============================] - 80s 2ms/sample -
loss: 1.7070 - val_loss: 2.0398
Epoch 00019: early stopping
Abstractive Text Summarization and Unsupervised Text Classifier 363
In the following plotted, we can see the validation loss has incremented after 17th
epoch and kept increasing for two epochs. So, we used early stopping at 19th epoch
(Fig. 4).
Here, a few summaries generated by the model:
Review: gave caffeine shakes heart anxiety attack plus
tastes unbelievably bad stick coffee tea soda thanks
Original summary: hour
Predicted summary: not worth the money
In this work, the authors applied the attention mechanism on RNN to find the best pos-
sible summary. The model’s output was then transferred to the bidirectional LSTM
classifier model to assign a category. The former model could be further improved
by using beam strategy for decoding the test sequence. The accuracy could also be
measured using BLEU score and pointer-generator networks and coverage mecha-
nisms could be used to handle unique words. Then, the classifier model can also be
improved by using hierarchical attention networks or using CNNs and RNNs with a
greater number of layers.
References
1. Gupta, S., and S.K. Gupta. 2019. Abstractive summarization—an overview of the state of the
art. Expert Systems with Applications, Elsevier 121 (1): 49–65.
2. Mahajani, A., V. Pandya, I. Maria, D. Sharma. 2019. A comprehensive survey on extractive
and abstractive techniques for text summarization. In Ambient Communications and Computer
Systems (Part of the Advances in Intelligent Systems and Computing), vol. 904(1), 339–351.
Berlin: Springer.
3. Gambhir, M., and V. Gupta. 2017. Recent automatic text summarization techniques: a survey.
Artificial Intelligence Review 47 (1): 1–66.
4. Islam, S., S.S.S. Mousumi, S. Abujar, S.A. Hossain. 2019. Sequence-to-sequence Bangla sen-
tence generation with LSTM recurrent neural networks. In: International Conference on Per-
vasive Computing Advances and Applications—PerCAA 2019, Procedia Computer Science,
152(1), 51–58. Amsterdam: Elsevier.
5. Nallapati, R., B. Zhou, C. Santos, Ç. Gulçehre, B. Xiang. 2011. Abstractive text summarization
using sequence-to-sequence RNNs and beyond. In The SIGNLL Conference on Computational
Natural Language Learning (CoNLL), 1–12.
6. Mirończuk, M.M., and J. Protasiewicz. 2018. A recent overview of the state-of-the-art elements
of text classification. Expert Systems with Applications 106 (1): 36–54.
7. Liu, G., and J. Guo. 2019. Bidirectional LSTM with attention mechanism and convolutional
layer for text classification. Neurocomputing 337 (1): 325–338.
8. Azmi, A.M., and N.L. Altmani. 2018. An abstractive Arabic text summarizer with user
controlled granularity. Information Processing & Management 54 (6): 903–921.
9. Bhargava, R., Y. Sharma, G. Sharma. 2016. ATSSI: abstractive text summarization using
sentiment infusion. In Twelfth International Multi-Conference on Information Processing, 1–8.
Abstractive Text Summarization and Unsupervised Text Classifier 365
10. Sahoo, D., A. Bhoi, R.C. Balabantaray. 2018. Hybrid approach to abstractive summarization.
In International Conference on Computational Intelligence and Data Science (ICCIDS 2018),
1–10.
11. Negi, K., A. Pavuri, L. Patel, C. Jain. 2019. A novel method for drug-adverse event extraction
using machine learning. In Informatics in Medicine Unlocked, 1–6. Elsevier, In Press, Corrected
Proof.
12. Altinel, B., and M.C. Ganiz. 2018. Semantic text classification—a survey of past and recent
advances. Information Processing and Management 54 (6): 1129–1153.
13. Harish, B.S., B. Udayasri. 2014. Document classification: an approach using feature clustering.
In Recent Advances in Intelligent Informatics (Part of the Advances in Intelligent Systems and
Computing book series (AISC)), 235 (1), 163–173.
Image Classification Using Convolutional
Neural Network (CNN) and Recurrent
Neural Network (RNN): A Review
Abstract With the advent of technologies, real-time data is essentially required for
future development. Everyday, a huge amount of visual data is being collected, but
to use it efficiently, we need to recognize, understand and arrange the visual data
for a perfect approach. So, the neural network was introduced to find out patterns
from images, a form of visual data as the neuron functionality in a human brain. It
is biologically inspired programming approach to allow the machine to learn from
observational data. Neural networks have provided solutions to several problems of
image recognition, and it is actively utilized in the medical field due to its efficiency.
This paper concentrates upon the use of RNN and CNN in the feature extraction of
images and the challenges. The paper also presents a brief literature review of the
neural networks like CNN and RNN.
1 Introduction
With more and more visual data accumulation, every day, the use and application
of automated image classification are required. Automated image classification can
make the image search procedures more efficient, and the whole process will be
beneficiary for the visual understanding of medical security and military applications
[1]. Here is an example concluding the above context. The image can be deferred
into different tags like ‘temple,’ ‘river,’ ‘boat’ and ‘person.’ Similarly, these logical
reasoning results may vary to different people since we all have different opinions.
As shown in Fig. 1, there are several different aspects using which the images are
classified as the concept of the image.
The fundamental errands of visual data analysis, image classification came to exis-
tence only to cease the gap between the computer and human vision. It differentiates
the image into a category based on the content and type of perception. Image classi-
fication uses pixel-wise segmentation which identifies parts of the image and tries to
find out where they belong, and it relies on object detection and classification. Image
classification plays an important role in medical operations, therapy estimate, surgical
emulation, anatomical structure review and the treatment strategy. Image classifica-
tion could be done manually by manually pointing out the boundaries with the related
regions of interest (ROI) which requires the intervention of experts. Classification
can be done in two ways.
1. Semantic classification—It studies the image and classifies the pixels of the
image to meaningful objects which correspond to real-world categories. The
pixels belonging to different objects need to be treated and grouped separately,
and hence, it performs pixel-wise classification.
2. Instance classification—It redetects the same instance of the object. In this pro-
cess, the system focuses specifically on the similar look-alike of the object, and
if the objects belonging to the same class looks different, then those objects will
be ignored since the process is based on instances.
Effects of clusters on image segmentation: Expert intervention is required to deter-
mine the optimal image cluster number in each image since image segmentation
Image Classification Using Convolutional Neural … 369
utilizing clustering requires prior knowledge to determine the location centroid. Uti-
lization of metaheuristic optimization search algorithm can, however, be a probable
solution for ‘fuzzy clustering.’ Fuzzy clustering is a type of clustering in which the
datapoints can be a part of more than one cluster, and only for its robust ambiguity
features, it can retain more information than other segmentation methods. Fuzzy
c-means clustering (FCM) is one of the most famous fuzzy clusterings. FCM-based
artificial vee colony [6], FCM-based differential equation [7], FCM-based ant colony
optimization [8], FCM-based genetic algorithm, FCM-based harmony search algo-
rithm, FCM-based firefly algorithm and FCM-based on image segmentation suffer
from noise. The challenges related to the FCM are as follows: FCM is sensitive to the
process of initiating center cluster and outlier noise. It helps in predicting the num-
ber of clusters. Hence, noise-sensitive images, the sensitivity of initializing cluster
centers and the unpredictable number of clusters present in the dataset have been
observed as the main challenges of fuzzy clustering.
3 Overview of CNN
In deep learning, convolution neural network (CNN) comes out as one of the sig-
nificant topics which help us in the recognition of faces, detection of an object and
classification of images [9]. CNN consists of many layers. These layers are fully con-
nected layers like neurons in a biological brain. Neurons help to carry the ‘message’
from one cell to another cell. One neuron is surrounded by many other neurons.
CNN also works on this approach. One layer is connected with many other layers,
so the output of one layer becomes an input for many other layers those are directly
connected to that particular layer [10]. CNN takes input in the form of audio signals,
images or any structured samples [11].
Working of CNN: As in Fig. 2, CNNs consist of three layers, the input layer, the
hidden layer (fully connected layers) and the output layer. Each neuron of a given
layer is connected to every single neuron of the subsequent layer [12]. Input layer
takes 2D images in the form of array of pixels, and to do this, the input images are
represented in the form of pixel values. Main goal is to extract the features from
Fig. 3 Calculate ‘feature matrix’ using image matrix and filter matrix
image, and it can be done by analyzing the weights of filters and helps to find out
the connection between the path and neurons [12].
As in Fig. 3, CNNs extract features in three processes:
(1) For getting local features, apply set of weights.
(2) Apply multiple features to get some more characteristics.
(3) Spatially share parameters to each and every filter.
These processes help us to extract any features from the image, e.g., the image below
shows that by applying, 3 × 3 filter matrix on 5 × 5 image matrix with stride value
1 gives us convolved feature. It is known as ‘convolution.’ Convolution layer uses
different features and performs many operations on image such operations used for
detecting edges, sharpening images and blurring it [9].
Anyone can control the behavior of convolution by changing the number of filters
and the size of filters. But there are many other parameters as in Fig. 4:
(1) Stride: Number of filter matrix shifts over input matrix is called ‘stride.’ If stride
is two, then filter matrix shifts [9] two pixels over the input matrix, and here, we
get smaller convolution matrix than we got before. Whenever applying stride,
its value should be low because low stride value gives us better resolution per
each filtered image due to high overlapping of images happens in receptive field.
On contrary, higher stride value gives us low resolution per each filtered image
and also low overlapping in receptive field (Fig. 5).
(2) Padding: Padding is useful when filter does not perfectly fit image. There are
two options:
Zero padding: It means to pad picture with zeros and Valid padding: It means
drop that part of the image where filter does not fit.
(3) Receptive field: A neuron in a downstream layer that is connected to its
respective input layers is termed as receptive field [12].
(4) NonLinearity (ReLU): ReLU stands for rectified linear unit, and it is used for
nonlinear function which produces the output f (y) = max(0, y). Nonlinearity is
introduced in our convolution network by ReLU.
(5) Pooling: Pooling is used to reduce the dimensionality and preserve the spatial
invariants when image is too big. It reduces the computation complexity. There
are three different types of pooling:
1. Max Pooling: It takes maximum value from patch or feature map.
2. Average Pooling: It takes an average of all values from feature map.
3. Sum Pooling: It takes sum of all values from feature map.
4 Overvıew of RNN
We, human beings have the ability to think and deduce conclusion on any topic.
However, for inferring anything, we should know the context beforehand. If the
context is known, we do not need to start everything from scratch every time we
try to make a conclusion. This cannot be done by traditional neural networks, and it
is the main problem of the traditional neural network. Traditional neural networks
cannot remember past things, so they cannot be able to do things like text prediction,
language translation which are heavily dependent on the previous context. Few major
problems often occur while implementing traditional neural network:
It does not work for long-term dependencies. Count does not preserve order, and
no parameter sharing occurs.
372 P. Dhruv and S. Naskar
h t = f (h t−1 , xt ) (2)
Now, we have the previous input state apart from current input state because the
transformations are applied on the previous input by the input neuron. Each and every
one of these inputs is termed as time stamp. Taking tanh as the activation function,
W hh as the weight of recurrent neuron and W xh as the weight of input neuron, the
equation for state time can be written as
yt = why h t (4)
x <2> , x <3> , etc., which is not desired and gives wrong output [16]. LSTM is used to
solve this issue of long-term dependency.
Working principle and basic calculations of a neural network: The output of
the hidden neurons is a part of the next neural network. The next neural network
receives the output of the previous hidden neuron as an input. The neural network
passes the hidden state info back to itself, and then, a new time-step is received.
‘Activation function’ is added to introduce nonlinearity. Sigmoid function transforms
linear output to a value between 0 and 1. With four layers of 0.2 value, the output
will be 0.2∧ 4, and then, most of the layers will perish with values nearing to 0.
This would result in ‘gradient vanishing.’ The network with the value 0 will not
be able to learn and be functional anymore. As a result of this problem, the low-
level layers will vanish. But the low-level features are fundamentals of high-level
features, so the vanished gradients will not only affect the output, but it is also a
serious issue. The lower layers are important because they learn the simple patterns
used for building up the network. A ReLU activation function has been introduced
to avoid the vanishing gradient problem. In ReLU, only negative inputs end up being
0, so the backpropagated error can be ruled out whenever there is a negative input.
In leaky ReLU, if the input is smaller than 0, the output will be between the input
and 0.01. Here, even if the activation function is negative, the inputs will not be
0. Hence, the neural network can be reactivated and reused, thereby improving the
overall performance (Fig. 9).
x t = input, ht −1 = previous hidden state, ct −1 = cell state, ct = output of memory
pipeline, ht = hidden state output.
The layers are concatenated and then passed through a linear layer. The output
constitutes the hidden size, number of LSTM units present. In Fig. 10, the output
enters the sigmoid function to translate the output between 0 and 1.
f t = forget gate.
Box 1 is responsible on how much to keep from the old memory
f t = σ W f . h t−1 , xt + b f (5)
Box 4: Determines what is required and what needs to be deleted from the old
memory.
376 P. Dhruv and S. Naskar
ct = f t ∗ ct−1 + i ∗ ct . (8)
Now, the new memory is constructed. Box 5 is responsible for the purpose.
ot = σ Wo h t−1 , xt + bo (9)
The output is calculated basing on the memory, and the gate calculated previously
controls it.
h t = ot ∗ tanh(Ct ) (10)
The memory state is passed through a tanh activation function to generate a new
hidden state in Box 5.
The sigmoid function located previously to Box 5 controls how much to output
[17].
5 Literature Survey
This is a literature survey of RNN and CNN. A detailed review of the architectures
of the different neural networks and their function is given here.
1. LeNet
This was recognized as the first successful work in application of convolution neural
network. In 1990, the work was done by LeCun et al. [18] and later improvised in
1998 [2]. They worked on ‘handwritten digit recognition’ by using ‘ConvNets.’ It is
applied in reading zip codes, house numbers, digits, etc. Due to the unavailability of
machines with high computational power during the period, it took too much time
to implement CNN.
2. AlexNet
Krizhevsky et al. [19] introduced this feature. They are famous for making convolu-
tion network more popular in the field of computer vision. AlexNet is quite similar
to LeNet though, the major difference between AlexNet and LeNet is that, in LeNet,
Image Classification Using Convolutional Neural … 377
there is alternation between Conv and pooling layer, while in AlexNet, all Conv lay-
ers put together. AlexNet is much bigger and deeper than LeNet. It won the ILSVRC
(2012) competitions [20].
3. GoogleNet
Szegedy et al. developed the architecture [21]. It earned reputation at ILSVRC com-
petition (2014). A newly developed architecture called Inception (v1). By using
this architecture, they can improve the utilization of the computing resources in the
network. GoogLeNet comprises 22 layers of Inception modules but with lesser num-
ber of parameters as compared to AlexNet. Many improvements had been done on
Inception-v1 lately. The most significant addition was the introduction of batch nor-
malization which formed Inception-v2 by Ioffe et al. [22]. With some more modern
additions, the architecture is called as Inception-v3 [23].
4. VGGNet
Andrew Zisserman and Karen Simonyan introduced VGGNet by analyzing the depth
factor in a ConvNet, where the remaining parameters were held the same. By doing
this, they were successful in extracting a large number of parameters in the net-
work. The layers still consisted of ‘3 × 3’ convolution filters. The innovation gave
community to the new accurate version of architecture that is better than ConvNet.
5. Inception V4
Szegedy et al. introduced an architecture [24], in 2015. It was an extended version
of the GoogleNet. When the Inception modules (by Szegedy et al. [21]) were trained
with residual connections (introduced by He et al. [25]), the training process was
accelerated. The network produced a state-of-the-art performance in the ILSVRC
challenge and won the contest in 2015.
6. ResNet
‘Residual learning framework’ was introduced by Kaiming et al. The layers learnt
residual functions from the inputs received and avoided unreferenced functions.
Optimization of residual networks made the usage of ResNet more superior adding
more accuracy to it. This network is expensive to evaluate because a large number
of parameters are present. The removal of the first fully connected layers, however,
did not affect the performance [25].
AlexNet: The first famous convolutional neural network (CNN) was AlexNet. In
AlexNet, convolutional layers are put together, performing max pooling at each
layer and after that dense layers at the back. It is deeper than LeNet.
GoogleNet: GoogleNet comprised of Inception modules which gives more efficient
result when used with convolutional neural network. It is very helpful to reduce cost
for computation. GoogleNet comprises 22 layers of Inception modules.
378 P. Dhruv and S. Naskar
VGGNet: VGGNet introduced by analyzing deep factor in ConvNet. Each filter has
size ‘3 × 3.’ With the help of this network, one can extract very large parameter of
network. It is deeper than AlexNet. It consists of 16 convolutional layers.
Inception V4: This architecture is extended version of GoogleNet. In which, Inception
modules are trained with residual connections to increase training speed.
ResNet: This architecture is similar to Inception V4. Layers learnt residual functions
from the inputs and avoided unreferenced functions. This network is expensive to
evaluate because a large number of parameters are present.
In modern days, RNN is used in many different applications. Using natural language
processing with RNN is recently used in many applications.
1. RNN For Voice Activity Detection:
In 2013, IEEE International Conference on Acoustic Speech and Signal processing
(ICASS), Thad Hughes and Keir Mierle presented RNN for VAD. They have used
Gaussian mixture models (GMM) and hand-tuned state machines (SM) for temporal
smoothing purpose [26].
2. Multi-lingual acoustic models using distributed deep neural networks:
In 2013, IEEE International Conference on Acoustic Speech and Signal process-
ing (ICASS), multi-lingual mode was proposed by Heighold and Vanhoucke using
‘deep learning.’ It was found to be useful in solving difficulties arising in data and
decreasing the performance gap of resource rich and resource scarce [27].
3. Multi-frame deep neural networks for acoustic modeling:
Multi-frame deep neural network for acoustic modeling was introduced by Vincent
Vanhoucke, Matthieu Devin, Georg Heigold in 2013, IEEE International Confer-
ence on Acoustic, Speech and Signal processing (ICASS). Compared with Gaussian
mixture (GM), this model was costlier for real-time applications. In performance, it
is same as the frame synchronous model. This model reduced computation cost of
neural network-related issues up to 4× times [28].
4. ASR error detection using recurrent neural network language model and
complementary ASR:
In 2014, IEEE International Conference of Acoustic, Speech and Signal processing
(ICASS), automatic speech recognition (ASR) is proposed by Yun Lei, Wen Wang,
Jing Zheng and YikCheung. This model is very helpful for the computer to understand
human spoken language. This model locates problem occurring in utterance and then
manages it [29] (Tables 1 and 2).
Image Classification Using Convolutional Neural … 379
6 Conclusion
We discussed the process of image classification using CNN and RNN, their basic
working principles and terminologies in this paper. A brief review on the archi-
tecture of some primitive models of the neural networks and their corresponding
differences has also been presented in the literature survey. Convolutional neural
networks (CNNs) and recurrent neural networks (RNNs) are gradually occupying an
important position in the field of image classification, text prediction and are exten-
sively used by data scientists for their easiness in usability and satisfactory minimal
contingent outcomes. The neural networks can be used to provide better architecture
for efficient memory management, reduce gradient exploding problems.
380 P. Dhruv and S. Naskar
References
26. Thad Hughes, and Keir Mierle. 2013. RNN for voice activity detection published. In IEEE
International Conference on Acoustics, Speech and Signal Processing.
27. Heigold, G.V., A. Vanhoucke, P. Senior, M. Nguyen, M. Ranzato, and Devin J. Dean. 2013.
Multilingual acoustic models using distributed deep neural networks. In IEEE International
Conference on Acoustics, Speech and Signal Processing.
28. Vanhoucke, Vincent, Matthieu Devi, and Georg Heigold. 2013. Multiframe deep neural net-
works for acoustic modeling. In IEEE International Conference on Acoustics, Speech and
Signal Processing.
29. Yik-Cheung Tam, Yun Lei, Jing Zheng, and Wen Wang. 2014. ASR error detection using
recurrent neural network language model and complementary ASR. In IEEE (ICASSP).
Analysis of Steganography Using
Patch-Based Texture Synthesis
1 Introduction
insertion capacity relates to size of texture image. For fixed-size image, more inser-
tion of data gives less quality image appearance and that can be easily identified by
attacker. So, in order to avoid such problem, patch-based algorithm firstly resizes the
existing image as per requirements, and then, message is inserted to image. Insertion
of message is achieved by texture synthesis process. Texture synthesis can be done by
two approaches either pixel-based texture synthesis or patch-based texture synthesis.
We make use of patch-based texture synthesis. Patch is a small unit of image. Image
considers for texture synthesis is any digital image taken by camera or pictorially
drawn [1–3].
In specific, this algorithm conceals the source texture picture as opposed to using a
current cover picture to conceal messages and inserts hidden messages by patch-based
texture synthesis process. Patch-based texture synthesis process allows to extract
hidden messages and source texture from synthetic texture. In the literature of texture
synthesis, steganography using patch-based algorithm has always been submitted to
the best of our understanding. Test comes out has confirmed that this method gives
different numbers of implanting capacities; it gives embedded image which is not
distorted. Ideally, it is seen that there is very less chances to break this steganographic
approach. This method can stand up to an RS steganalysis attack.
2 Related Work
H. Otori and S. Kariyama have given one approach for data embedded texture syn-
thesis. This technique has major drawback such that it gives some sort of error during
recovery of the message. The method gives an idea of joining pixel-based texture
synthesis with data coding. The embedding of data takes place in the form of dot
pattern, and the capacity of embedding depends on the number of data patterns to be
used [4, 5]. K. Xu has given one approach as feature-aligned shape texturing. The
approach pastes the source texture to form the whole image. The image formed is
of uniform source texture so that the quality of the image is enhanced. One of the
drawbacks such that during paste of texture, the neighbouring pixel required to check
from time to time [6].
Efros and Freeman have given image quilting methodology. In image quilting,
patch stitching performed by overlapping the neighbouring patch, and due to that, at
neighbouring side, similarity between adjacent gets increased. So, the minimum error
obtained through the overlapped region. Optimum boundary utilization is obtained
by image quilting [7]. Z. Ni et al. given algorithm for recovery of the secret message
hidden on to image texture. Two types of algorithms are present, the first is forward
data hiding, and the second is reversible data hiding algorithm. Generally, reversible
data hiding algorithm is used in which during recovery distortion is less [8].
Analysis of Steganography Using Patch-Based Texture Synthesis 385
3 Methodology
Fig. 2 Elements of steganography process. a patch structure; b source texture kernel structure;
c overlapping process; d expansion by mirroring source patch
At receiver side, index table is generated which is exactly same as that of gen-
erated at transmitter side. Each kernel is retrieved by referring to index table and
ready to organize source texture as per order, hence recovering the recaptured input
texture which is able be precisely copy of source texture. Within the third process,
as per the index table, composite image is formed. Finally, message extraction and
authentication are done (Fig. 4).
Net beans IDE 8.0.2 and JDK 1.8 were used to implement the scheme. The effect of
variations of the steganography finds out by using portable desktop computer system.
The system outcome shows improvement in the embedding capacity as BPP obtains
its optimum value. Table 4.1 shows results on different embedding capacities for
different size source texture for Ph * Pw = 58 * 58, Pd = 8, T h * T w = 500 * 500.
The size of source texture is proportional to the embedding capacity. Large size
source texture has less embedding capacity. As source texture size is large, source
patches are more which ultimately reduce embeddable patches, so total embedding
388 K. O. Akhade et al.
Total Embedding Capacity = Bits per patch ∗ Total embeddable patches. (3)
where
BPPmax Bits per patch maximum embedding capacity
Th Height of synthetic texture
Tw Width of synthetic texture
Ph Height of patch
Pw Width of patch
Sh Height of source texture
Sw Width of source texture
Pd Depth of patch (Table 1)
Variation and robustness of the steganographic outcome with optimum values are
as per Figs. 5 and 6; Table 2.
The accuracy of image is defined by using mean squared error of the overlapped
area (MSEO). It is the analysis in which the overlapping in between source patches has
been checked. The MSEO is always positive quantity. The MSEO value is inversely
proportional to image quality. The value of MSEO resembles good quality of image.
Texture synthesis procedure gives the MSEO. Table specifies the MSEO with the
embedding capacity for the source textures. This has been calculated for patch size
having equal height with width of 58 and patch depth of 8. Image quality is compared
using mean squared error of overlapped area. For peanut, source texture in case of
5BPP MSEO is 2837, and for 10BPP, MSEO is 2913 which shows slightly higher
image quality (Table 3).
Analysis of Steganography Using Patch-Based Texture Synthesis 389
Table 3 Performance of
Original 5BPP 10BPP
MSEO
Ganache 1116 1125 1204
Peanuts 2832 2837 2913
Metal 8763 8768 8919
Rope net 6842 6846 6790
References
1. Johnson, N.F., and S. Jajodia. 1998. Exploring steganography: seeing the unseen. Computer 31:
26–34.
2. Provos, N., and P. Honeyman. 2003. Hide and seek: an introduction to steganography. IEEE
Security and Privacy 1: 32–44.
3. Akhade, K.O., S.V. Pingle, and A.D. Vidhate. 2019. Steganography using texture synthesis—a
review. IJIRCCE 7: 1798–1800.
4. Otori, H., and S. Kuriyama. 2007. Data-embeddable texture synthesis. In Proceedings of the 8th
International Symposium on Smart Graphics, Kyoto, Japan.
5. Otori, H., and S. Kuriyama. 2009. Texture synthesis for mobile data communications. IEEE
Computer Graphics and Applications 29.
6. Xu, K., D. Cohen-Or, L. Liu, H. Zhang, S. Zhou, and Xiong, Y. 2009. Feature-aligned shape
texturing. ACM Transactions on Graphics 28.
7. Freeman, W.T., and A.A. Efros. 2001. Image quilting for texture synthesis and transfer. In
Proceedings of the 28th Annual Conference on Computer Graphics and Interactive Techniques.
Analysis of Steganography Using Patch-Based Texture Synthesis 391
8. Ni, Z., Y.-Q. Shi, N. Ansari, and W. Su. 2001. Reversible data hiding. IEEE Transactions on
Circuits and Systems 16.
9. Wu, Kuo-Chen., and Chung-Ming. Wang. 2015. Steganography using reversible texture
synthesis. IEEE Transactions on Image Processing 24.
Non-invasive Approach for Disease
Diagnosis
1 Introduction
and piezoelectric sensors had been explored. The main focus of researchers is sensing
logic and fabrication methodologies to detect variety of bandwidths of concentrations
of gases for achieving user-friendly sensors with high sensitivity and low cost. In
optical transduction gas sensor technique, the detection of gases is based on change
in absorption properties, refractive index, luminescence, scattering, reflectivity and
optical path length. The optical absorption and luminescence techniques are based
on changes in light intensity and wavelength once it is passed through gases. The
surface plasmon resonance (SPR) is the most effective technique in gas sensing.
Even if the optical gas sensors are excellent in sensitivity and cost effectiveness but
due to its size and signal conditioning, it is rated less based on fabrication feature
on E-nose. The piezoelectric gas sensors used in some of the E-nose systems are
surface acoustic wave (SAW) device and the quartz crystal microbalance (QCM).
In quartz crystal microbalance (QCM), a piezoelectric disk is coated with semicon-
ductor metal oxide or gas sensitive polymer and travelling wave resonant frequency
is captured. In line with this, surface acoustic wave (SAW) gas sensor captures the
propagation frequency changes via piezoelectric substrate-coated sensing material
due to gas. The salient features of piezoelectric gas sensors include compactness,
less power consumption, robust, quick response time and higher sensitivity. Due
to the operational capability at higher frequencies and complexity in circuits, the
signal-to-noise ratio for piezoelectric sensors is lower leading to less effective ele-
ment for electronic nose system. The commercial electronic nose systems consists of
polymer composites with number of reversible physical and chemical properties and
these composites can operate at room temperature with higher sensitivity, accuracy
and selectivity (Fig. 1).
2 Breath Analysis
The VOCs in individual’s breath will fluctuate; still the basic centre of breath which
is available in each individual remains same. The basic breath analysing techniques
for bifurcation of inhaled or exhaled breath biomarkers are:
• Gas chromatography-based techniques, combined with a type of mass spec-
troscopy.
396 A. Gade and V. Vijayabaskar
3.1 Diabetes
There are four fundamental demonstrative techniques for monitoring and diagnosing
diabetes mellitus:
• Glycosylated haemoglobin test,
• Oral glucose tolerance test,
• VOC comprehensive test,
• Gene diagnosis.
The blood glucose detection techniques which are available in market are depen-
dent on laboratories and hand held glucometer devices for sugar-level analysis [7].
The testing methods involve needle pricking for blood sample complex operation
along with involvement of high cost. The exhaled breath contains involvement of
different natural mix gases in VOC biomarkers. But (CH3 )2 CO fixation is viewed
as a dependable pointer of diabetes since it is discharged from skin and its fixation
is with h-hydroxy butyric corrosive [8]. Analysis of (CH3 )2 CO in individual’s skin
foreseen as a screening test for ketoacidosis and also can be utilized as diabetes
control.
Non-Invasive Approach for Disease Diagnosis 397
The exhaled breath biomarkers diagnosed with lung disease and analysis from various
research groups is still not in a decisive and conclusive diagnostic stage due to
unstable natural gases mixture in the captured samples [9].
With the period of customized prescriptions requesting quick be spoken analysis
and treatment, this region of atomic diagnostics is starting to see an upsurge in
biotechnological headway. Here, we talk about late enhancements and headings in the
improvement of breath VOC examination and finding stages that offer the potential
for illness biomarker revelation and ailment guess. Actually, so as to analyse the lung
malignancy, low portion chest computed tomography check is effective method.
There are presently around 7 million youngsters around the world suffering from
asthma, the WHO evaluates that asthma is the most widely recognized non transmit-
table infection in kids. This fast ascent has been seen in numerous different populaces;
however, asthma commonness is not falling all over the place [10]. While over conclu-
sion of youth asthma unquestionably added to the “asthma plague”, almost certainly,
there was an authentic ascent in asthma predominance related with a move towards
“westernized living”. The important factors for which they are mostly proned to be
change in microbial exposures, air quality and action levels.
4 Literature Survey
The human nose and brain are systematically interconnected for detection and dis-
tinguishing number of chemicals leading to recognition of food freshness, perfumes,
strong odours and leakage of gases [11]. The complex volatile organic compounds
with the help of neuron receptors send the signals for pre-processing to brain through
nerve impulse. The hypothalamus and olfactory cortex store the signals and analyse
it in the brain in conclusive steps. On similar lines, electronic nose technique contains
sensor array, data acquisition system, signal processing unit, data storage unit and
AI neural network for detection and screening of chemical vapours. Until 1980, the
variety of chemical sensors using electrical, thermal, mass and optical transducer
principles was available due to the scarcity of advanced electronics components and
advanced computational tools. The world’s first E-nose was developed by Persaud
and Dodd, using a chemical sensor array and pattern recognition system for gas dis-
crimination. Further similar research is continued at Hitachi Research Laboratory,
Japan by Ikegami et al. at Warwick University, United Kingdom in 1982. E-nose as
398 A. Gade and V. Vijayabaskar
“an instrument which comprises an array of electronic chemical sensors with par-
tial sensitivity and an appropriate pattern recognition system capable of recognizing
simple or complex odours” is defined by Gardner and Bartlett in 1994. Since the last
decade, researchers working on E-nose have been focusing on developing highly
sensitive and selective sensors, a compact electronic module (electronic circuits for
amplifying signals, signal conditioning, and analog-to-digital (A/D) converters) for
better feature extraction with reduced redundancy as well as classifier (primarily
neural network) for learning and validation process as pattern recognition algorithm
to discriminate the chemical of interest. Several E-noses have been developed by
combining nonspecific sensor array and machine learning, for different applica-
tions such as biomedical, healthcare, safety, food industry, chemical industry and
pharmaceutical [12].
Table 1 shows previous work on medical E-nose with different techniques.
The major challenges faced in Electronic nose technology during disease diagnostics
are briefed below.
It is reasonable to check whether the old style graphical yield of the E-nose fits
a change in a graphical capacity that would streamline its substance and make it
effectively comprehensible [13]. Such rearrangements would make the scientists
progressively sure about the down to earth utilization of the method and afterward
increasingly inspired to investigate its possibilities.
As of now E-nose has been projected as an indicative system for authentication
and diagnostic strategies based on individual symptoms.
The progressing evaluation of BP of individual sicknesses, just somewhat, might
be interpreted in clinical practice because of three significant confinements of
chemometrics:
• Age-related ordinary benchmarks are not accessible, in spite of the fact that age-
related contrasts are in all respects likely. This restrains the translation and routine
utilization of the BP as an indicative device.
• It is obscure to what degree the deliberate BP changes with momentary introduction
to contamination. In this way, a similar subject may have diverse BP relying upon
whether the expirate is gathered following seven days end at home, or a work day.
• Both intense and ceaseless maladies are never disengaged in grown-ups and,
significantly more, in the old [14].
Non-Invasive Approach for Disease Diagnosis 399
Table 1 (continued)
S. Journal Study Technique Design based Main finding
No. and year on technology
study
9 Giorgio Experts opinions Breath Approach Exhaled breath
et al. printing generates a
Elsevier huge,
2019 significant
library of
breath prints
10 Raffaele Non-neoplastic Breath Case study Poorly suitable
et al. chronic printing for rapid and
Elsevier large-scale
2019 application
11 Xidong Acetone sniffer Acetone YSZ-based gas Low detection
et al. (diabetes detection sensor with limit of
Journal detection) mixed potential 300 ppb to
2019 with acetone
Sm2-xSrxNiO4 (−1.8 mV)
(x = 0.4, 0.6
and 0.8) -SEs
synthesis by
sol–gel method
12 Ayten et al. Lung cancer Breath 8 Metal oxide
IEEE conf. semiconductor
2018 gas sensors and
14 quartz
crystal
microbalance
sensors
13 Dongmin Odour analysis Odour Tedlar gas Airway
et al. IEEE sampling bag inflammation
Trans. and an airtight sensitivity
2018 box 73.79%
14 Alphus Gastrointestinal VOC WOLF E-nose: Unique VOC
Sensors diseases eight profile
2018 electrochemical signatures are
sensors, two indication of
non-dispersive complex
infrared optical mixtures of
devices and one specific
photoionization biomarker
detector metabolites
with unique
properties
15 Jiang et al. Diabetes Acetone Gas Low feasible
Journal (acetone chromatography
2018 recognition) (Usage of
aluminium foil
bag to collect
the breath
acetone samples
of 25 number of
Type 2 diabetes
patients and 44
number of
healthy people)
Non-Invasive Approach for Disease Diagnosis 401
In spite of the fact that VOC examination is still in the fundamental stages, it is
without a doubt a standout amongst the most creative and intriguing methodologies
in heart failure investigation. For ensuring and validating the results, accessibility
of cutting edge innovations like genome sequencing, medication is switching to a
platform with precise and customized treatment with reference to individual’s health
risk profile analysis, the treatment choices are tailored to a characteristics of patient
[15]. Later on, with the advancement of the diagnostics equipment’s performing
VOC investigation, we would most likely inhale into our gadget serenely at home,
and the results of the examination would be conveyed to the medicine specialists.
6 Conclusions
The investigation of breath is non-invasive and test sample collection can be sim-
ple and without any harm to individual without production of any noise. As of now,
breathed out breath tests can be estimated predominantly by gas chromatography and
an electronic nose. Gas chromatography and electronic nose are the best practises in
402 A. Gade and V. Vijayabaskar
this field with optimal third-generation capability and flexibility. Gas chromatogra-
phy results can be with high level of systemized reciprocal capability and accuracy
but with more space and cost with verification and validation of results by experts.
References
1. Guo, Dongmin, David Zhang, Naimin Li, Lei Zhang, and Jianhua Yang. 2018. A novel breath
analysis system based on electronic olfaction. IEEE Transactions on Biomedical Engineering.
2. White, Iain R., and Stephen J. Fowler. 2019. Capturing and storing exhaled breath for offline
analysis. In Breath analysis. Amsterdam: Elsevier.
3. Pennazza, Giorgio and Marco Santonico. 2019. Breathprinting roadmap based on experts’
opinions. In Breath analysis. Amsterdam: Elsevier.
4. Gao, Fan, Xusheng Zhang, Min Wang, and Ping Wang. 2019. Breathprinting based diagnosis,
selected case study: GCMS and E-nose collaborative approach. In Breath analysis. Amsterdam:
Elsevier.
5. D’Amico, Arnaldo, Giuseppe Ferri, and Alessandro Zompanti. 2019. Sensor systems for breath-
printing: a review of the current technologies for exhaled breath analysis based on a sensor
array with the aim of integrating them in a standard and shared procedure. In Breath analysis.
Amsterdam: Elsevier.
6. Brinkman, Paul. 2019. Breathprinting based diagnosis, selected case study: U-BIOPRED
project. In Breath analysis. Amsterdam: Elsevier.
7. Jiang, Y.C., M.J. Sun, and R.J. Xiong. 2018. Design of a noninvasive diabetes detector based
on acetone recognition. In Journal of Physics.
8. Zeng, Xianglong, Haiquan Chen, Yuan Luo, and Wenbin Ye. 2016. Automated diabetic
retinopathy detection based on binocular siamese-like convolutional neural network. IEEE
Access 4.
9. Tirzite, Madara, Māris Bukovskis, Gunta Strazda, Normunds Jurka, and Immanuels Taivans.
2019. Detection of lung cancer with electronic nose and logistic regression analysis. Journal
of Breath Research.
10. La Grutta, Stefania, Giuliana Ferrante, and Steve Turner. 2019. Breathprinting in childhood
asthma. In Breath analysis. Amsterdam: Elsevier.
11. Aguilar, Venustiano Soancatl, Octavio Martinez Manzanera, Deborah A. Sival, Natasha M.
Maurits, and Jos B.T.M. Roerdink. 2018 Distinguishing patients with a coordination disorder
from healthy controls using local features of movement trajectories during the finger-to-nose
test. IEEE Transactions on Biomedical Engineering.
12. Subramaniam, N. Siva, C.S. Bawden, H. Waldvogel, R.M.L. Faull, G.S. Howarth, and R.G.
Snell. 2018. Emergence of breath testing as a new non-invasive diagnostic modality for
neurodegenerative diseases. Brain Research. Elsevier.
13. Pako, Judit, Helga Kiss, and Ildiko Horvath. 2019. Breathprinting-based diagnosis: case study:
respiratory diseases. In Breath analysis. Amsterdam: Elsevier.
14. Le Maout, P., J-L. Wojkiewiczy, N. Redony, C. Lahuec, F. Seguin, L. Dupont, A. Pudz, and
S. Mikhaylov. 2018. From drifting polyaniline sensor to accurate sensor array for breath
analysis.In IEEE.
15. Zhang, Wentian, Taoping Liu, Miao Zhang, Yi Zhang, Huiqi Li, Maiken Ueland, Shari L.
Forbes, X. Rosalind Wang, and Steven W. Su. 2018. NOSE: a new fast response electronic
nose health monitoring system. In IEEE.
16. Incalzi, Raffaele Antonelli, Antonio De Vincentis, and Claudio Pedone. 2019. Breathprinting-
based diagnosis, selected case study: nonneoplastic chronic diseases. In Breath analysis.
Amsterdam: Elsevier.
Alleviation of Safety Measures
in Underground Coal Mines Using
Wireless Sensor Network: A Survey
Abstract Mining is one of the oldest endeavors regarded of mankind along with
agriculture. To ensure personal safety of miners and maximizing the mining process
improvements in safety measures is a must. Also wireless sensor networks play a
vital role for monitoring the mining environment. This paper describes a survey of the
improvements in safety measures in underground coal mines using the wireless sensor
network finding the unexplored areas which needs improvement. To ensure quick
rescue of miner’s, ground station plays a vital role. In view of all the safety measures
taken together with the development of technology it is feasible to apply some of the
technologies with modifications. This paper surveys the need for extended research
in wireless sensor network and applications to monitor the UG mine environment
effectively.
1 Introduction
Mining is the process of obtaining coal or other minerals from a mine. Several
activities are carried out within an underground mine. The workers who are indulged
in mining are exposed to several callous situations such as roof fall, mine fires, release
of toxic gases, etc. They face several unsuitable conditions for communication and
monitoring of systems. Safe mining depends on several environmental factors and
R. Rout (B)
Department of CSE, DRIEMS Autonomous Engineering College, Cuttack, India
e-mail: raghunathrout_78@yahoo.co.in
J. Pramanik
Centurion University of Technology and Management, Bhubaneswar, Odisha, India
S. K. Das
Department of CSE, Arihant Junior College, Ganjam, India
A. K. Samal
Department of CSE, Trident Academy of Technology, Bhubaneswar, India
conditions of mines, like airflow, surrounding temperature, humidity, dust and gases
present. To undergo safe coal mining, the basic requirement is to regularly monitor
the level of gases like methane, oxygen, carbon dioxide, carbon monoxide, etc.,
monitoring air pressure and detecting mine fires. To combat the problem of toxic
gases and overcome the traditional mining hazards, WSNs are widely deployed in
multidisciplinary areas where human intervention is not possible.
2 UG Mines: An Insight
Earth reserves abundant amount of coal and mineral deposit underneath. Most of the
coal companies try to extract profound deposit of coal safely by different methods.
Columns of coal are left to support the ground in room and pillar mines. Then they are
often taken out and the mine is left to disintegrate that is called subsidence. In long-
wall mining, mechanical shearers strip the coal from the mines. Support structures
that allow the shearers’ get admission to the mine are eventually removed and the
mine collapses.
It reasons subsidence as the land above it starts sinking and mines crumple which
results in causing severe damage to buildings.
It lowers the water table altering flow of the groundwater and streams. Several
gallons of water are pumped out each year. Only a small portion of it is used by the
corporation or by the nearby villages; the rest of it is wasted.
Coal mining produces additionally greenhouse fuel emissions.
There are several key elements that build a WSN node shown in Fig. 1.
The job of a WSN node consists of processing of both locally sensed data and
also information shared by other sensor nodes. Embedded processors face several
challenges regards to computational energy (e.g. several of the units used presently
in doing research and development of an eight-bit 16-MHz processor) Due to the
Alleviation of Safety Measures in Underground … 405
RADIO TRANSCEIVER
POWER SOURCE
3.2 Memory
Two kinds of memory are most widely used in WSN nodes; they are on-chip memory
of microcontroller and FLASH memory. Generally, off-chip RAM is rarely used.
The portions of memory and storage on an embedded WSN device are frequently
controlled generally with the aid of financial issues, and also are probably to improve
over time.
The primary task of a radio transceiver is to transmit and receive data through sensor
nodes. It operates in different modes like the on/off mode. Wireless sensor nodes
consist of a low-charge, concise range wi-fi radio (<100 m). Due to their limited
capability sensor nodes undergo gradual improvement in upgrades in cost, spectral
efficiency, tunability, and immunity to noise, fading, and interference, etc. Radio
communication takes the maximum energy intensive operation in a WSN device,
and hence the radio must use different energy proficient ways like sleep and wake-up
modes.
3.4 Sensors
Due to limitations of bandwidth and energy constraint, wireless sensor nodes assist
least data rate sensing. For this reason, there is a requirement of several sensor nodes
on board [1]. These sensors are application dependent, it may include sensors like
pressure sensor, humidity sensor, light sensor, temperature sensors, accelerometers,
chemical sensors, etc.
406 R. Rout et al.
For getting an optimized output of sensor nodes, sensor locations must be prede-
fined. This is possible by preconfiguring the sensor places for deployment; however,
this may only be feasible in constraint deployments. Sensor nodes are deployed in
an ad hoc manner for outside operations. The data of these nodes are effortlessly
obtained via satellite-based GPS. In most of the application, a small portion of the
nodes is geared up with GPS functionality because of environmental and monetary
constraints. Here, other nodes obtain their location by using network localization
algorithms.
Numerical reviews relating to the safety in UG mines have been carried out (Andrey
somonov et al.). To our understanding, no such systematic literature review (SLR)
has been conducted. The papers [2] studied the sensor data by measuring the analog
voltage values by studying the close characteristics of gases. This paper focussed
more on the technological progresses by monitoring the environment but this paper
did not make a SLR and studied inadequately in an unplanned manner. In this paper,
we present a SLR on the wide application areas of WSN in UG mines.
The SLR was performed using four methods of identifying, selecting and verifying
by making crucial studies of the abstract, the full text and keywords and finding their
relevance to SLR (Table 1).
5.1 Energy
Qin Xianli et al. 5 Monitoring the coal 2011 Zig Bee module Paper needs to focus Yes
mine gases more on sensitivity
and accuracy of
detection systems
Ajay k singh and 6 Utilization of methane 2015 Using the extraction Policy initiative No
jaywardhan kumar emissions in coal technologies of needed to exploitation
mines for generating methane like CMM, of gas
clean energy VAM and CMM
Andrey somonov et al. 7 Methane leak 2012 Used WSN nodes WSNs cannot be used No
detection using WSN (with embedded for long duration due
Catalytic sensors) to high power
consumption
(continued)
407
Table 1 (continued)
408
purpose, they stay passive within the duration when they do now not exchange data,
namely they bounce to the sleep mode.
5.3 Security
5.4 Efficient OS
Sensor nodes in coal mining contain very limited amount of memory and processing
power, so an efficient and small size OS need to solve this issue. The operating system
of sensor node must be capable of providing basic resource management, memory
management and should be less complex as compared to traditional OS. TinyOS,
Nano-Q plus and mantis operating system are specially designed for WSN but still
need for improvement.
For the reducing size of sensor nodes, limited amount of resource like memory,
processing speed and energy is also an issue for WSN. Normally, sensor node has
microcontroller, sensor, transceiver and power backup. Sensor collects or sense data
and sends to microcontroller for processing. Microcontroller performs necessary
operation and transmits via transceiver to sink node. Microcontroller is responsible
410 R. Rout et al.
for basic WSN functions and communication protocols. For storage data, flash mem-
ory is suggested to use in nodes because flash memory is cheap and fast. In order to
save power microcontroller should work in three state sleeps, active and idle.
For automatic intelligent sensing of the environment, cognitive sensor networks are
used in localized procedure by deploying a huge number of sensor nodes in a par-
ticular location. Managing these huge number of sensor nodes is a vital task and is a
challenge for researchers for which researchers are moving for bio-inspired sensing
and networking [7].
Two accepted an example that best describes cognitive sensing is quorum sensing
and swarm intelligence: Swarm intelligence is urbanized using AI for studying the
cooperative behavior of decentralized in self-organized systems. Example of quorum
sensing is bio-inspired sensing and networking. Quorum sensing is the capability of
coordination bacteria to communicate and behavior via signaling molecules.
As we realize that most applications require sensor nodes that utilize minimum
power and its requirement is increasing day by day, we are able to envision a future
wherein wi-fi devices, which includes wireless keyboards, smartphone headsets,
condition monitoring sensors, etc. But the recognition of these common tools guides
to extended interference and blockage within as well as among networks. Out of many
approaches introspective radios and MACs that have manifold frequency looms to
make use of numerous frequencies for parallel communication. A standard solution
for WSNs is a self-adaptive spectrum management [SAS] is supplied by [7] which
can be effortlessly incorporated together with the prevailing only frequency.
delay and transmitting information to the base station. Underwater sensor networks
face several challenges compared to terrestrial sensors. Distributed underwater clus-
tering scheme (DUCS) is routing protocol that shrivels the hands-on routing message
exchange and does no longer use flooding approach. DUCS additionally makes use
of data aggregation to eliminate redundant facts.
The principal obstacle of most of the sensor nodes is the power constraint in coordi-
nation with other networks. Sensor networks are very useful in the domain of health
tracking, natural world habitat tracking, monitoring the forest fire, building controls,
etc. Sensor nodes are deployed in UG mines to supervise the information of other
sensor nodes, and must be communicable. This can be achieved by linking wireless
sensor nodes with the prevailing infrastructure such as WWW, LAN or any privatized
network [4].
Sensor nodes deployed in UG mines may find research challenges related to holes,
i.e. sometimes, it may happen that the nodes that need to transmit data because of
node failures cannot transmit or it may not sense the correct data. Holes normally
breach communication within the sensor nodes.
7 Conclusion
References
1. Tiwari, A., P. Ballal, and F.L. Lewis. 2007. Energy efficient wireless sensor network design and
implementation for condition based maintenance. ACM Transactions on Sensor Networks 3 (1):
17.
2. Henriques, V., and R.Malekian. 2016.Mine safety system using wireless sensor network. IEEE
Access 4:3511–3521.
3. Anastasi, G., M. Conti, M. Di Francesco, and A. Apassarella. 2009. Energy conservation in
wireless sensor network. A survey. Ad Hoc Networks 7 (3): 537–568.
4. Arms, S.W., C.P. Townsend, and M.J. Hamel. 2001. Validation of remotely powered and
interrogated sensing networks for composite cure monitoring. In International Conference on
Composite Engineering 8(8):7–11.
5. Fernandes, LL. 2007. Introduction to Wireless Sensor Networks Report, University of Trento.
6. Zia, T. 2008. A security framework for wireless sensor networks. In International Conference
on Intelligent Sensors.
7. Yang, Guang Zhong. 2008. Cognitive sensing for body sensor networks. sensors. In IEEE 26–29.
8. Akyildiz, I.F., D. Pompili, and T. Melodia. 2005. Underwater acoustic sensor networks research
challenges. Ad Hoc Networks 3:257–279.
Automation of Reliability Warranty
Report Using SAS Software for Data
Analysis
1 Introduction
JMP has a functionality for data preparation using techniques like scripting and
recoding of data. Saving an analysis of task so that it can be used on the same
or different data tables is an important step in creating an analysis workflow that
is efficient and repeatable. The JMP scripting language (JSL) can be used along
with the drag and drop tools available in JMP. Dashboard and Application Builder
are to quickly build and deploy dashboard and application. Designing report with
data filters, parameterized application can be created using JMP to analyze results,
including interactive HTML. Multiple dashboards can be created by using JMP which
makes the analysis easier and faster. In manufacturing industries, product reliability
strongly influences business process. Reliability tools in JMP help you to prevent
the failures and improve warranty performance. By using reliability tools in JMP,
outliers can easily have identified and actual model can be predicted. JMP helps
to find important design faults, pinpoint defects in product design and processes.
Reliability prediction is used to identify the system requirement perform analysis
using appropriate statistical distribution.
Reliability testing is often used to make decisions in the form of warranty returns.
Warranty database is created for financial reporting purposes, but more it provides
reliable information about product design. When the product is released to market,
reliability engineers provide the warrant cost of the product after few months. There is
difference between prediction and product design models. These differences provide
failures modes. Using JMP for early detection of reliability, issues are implemented
in software and help the company financially. The reliability process, recent addition
to SAS software, provides reliability tools and data analysis as well as recurrence
data analysis. The reliability process provides all the failure data for uncensored data.
The purpose of the JMP application and Application Builder is to provide the
following capabilities to do reliability analysis:
not provide analysis of variance (ANOVA), but on predicted failure rate. Consider
whole factors for prediction. Understand the confounding factors not an assignable
cause analysis tool but will steer the direction of the investigation, understanding the
characteristics that are significantly influencing predicted failure rate.
Statistical models help to encapsulate trends and patterns, which helps to take the
best decision for organization development. Developing models in JMP include a sta-
tistical platform to build data-driven decision. JMP provides a different way to fit the
linear and non-linear models. JMP model provides platform to construct model terms.
JMP includes advanced statistical techniques: Principal components, clustering, pre-
dictive modeling, screening, and multivariate methods. Exploring meaningful results
from data so that missing values or outliers can be easily removed. JMP provides
many multivariate methodologies with automated functionality.
– The statistical tool is used to dig into an identified issue, using fields that describe
the engine to determine what characteristics are significantly influencing the failure
rate.
– The tool is highly dependent on what data is available. Users should be aware that
the most influential drivers may not be in their dataset.
– Highly dependent on what data is available.
– Utilizing SAS JMP and scripting.
2 Proposed Work
Proposed work is based on the life data analysis using some statistical driver’s. By
using some of the statistical tool, we can describe the failure mode in components.
Reliability analysis used to develop product decisions through warranty report.
Warranty report is used to make financial decisions but helps to execute the relia-
bility testing. Method to calculate the early faults in the product and component help
to detect reliability problems and help to take business decisions to minimize the
cost of production. As soon as reliability analysis is performed on product, statistical
report helps to forecast the warranty cost of product. To identify the reliability data,
most of the organizations are deploying modern statistical tool. Sensor and smart
chips in a product can be easily installed to execute rate data product. By using relia-
bility testing, we can get the patterns which help to figure out the problems. Statistical
methods using JMP software play an important role in reliability analysis. To deploy
new statistical methods is the need of engineers to meet customer satisfaction. Today,
we are having the best and needed statistical tool for implementation (Fig. 1).
The proposed process incorporates information from distinct database. The pro-
cess of mapping data is from different database into a consolidated one by using JMP
functionality.
418 P. Urkude and S. Lade
1. Download data
2. Merge and recode data
3. Analyze data and create graphs
4. Export graphs and generate summary reports.
1. Download Data
Finding relevant information to perform analysis, need to download data from multi-
ple databases. Fetching data from different data sources is time consuming. Industries
have lack of documentation and insufficient amount of data to process and difficult
to handle large amount of data in tool like Excel. So, the JMP add-in created to
download data from different data sources, as it takes less time to download data.
JMP add-in helps to remove missing values from the data table and able to fetch
the standardized dataset in JMP. Query builder provides interactive user interface to
pull data from ODBC databases. Database allows organization to provide massive
amount of data but databases are organized for efficient storage and transactions.
Data analysis can be performed on different datasets. To make a significant work
process, JMP has functionality to join multiple data tables. We can use SQL and
other tools to join them (Fig. 2).
Steps for connection to database to fetch data from JMP Add-in:
Fig. 2 Proposed system to perform the reliability data analysis on the dataset
Step 4: Click on finish and provide login information and server address
Step 5: Provide the same ODBC connection in JMP Add-In to fetch data directly
from database without manual intervention.
Once a dataset is ready, an analyst can able to perform statistical analysis. Recoding
and cleaning are performed on dataset. To perform this process on Excel is tedious
work but JMP is having a functionality to perform the recoding of data in one click.
Once the dataset is ready to use, analyst can be abled to perform downstream analysis.
Most of the data are in unstructured format but the visualization will not be possible
on these types of data. Therfore, data transformation is the foremost thing to do
visualization. In unstructured dataset most of the claims for suppliers are missing,
JMP helps to link the claims with suppliers. Another problem of data cleaning is the
data which we get from database is heterogeneous. This occurs due to the human
manual intervention in the log file while updating the data in the database. Time
required to prepare data for data analysis is more, but by using JMP, we can easily
prepare data, faster and reliable one. No matter how you process the data cleaning,
JMP automates the process. Customer we required a visualization, where the user
interface can be easily be created using JMP. Process to perform Recode: Select
column → COL → Recode.
3. Analyze and Create Graph
JMP helps to explore the visualization of graphs interactively so that reliability
engineers can easily perform analysis which will helps to understand the problems
and take appropriate decision for improvement in product. Graphs provide better
understanding of our datasets. Effective graph transformation is available in JMP
tool. The purpose of analysis is not only to create interactive graphs and statistics, but
to provide better understanding of dataset to take business decisions. Graphs display
the dependancies between the variables used in datasets. It shows the distribution
to predict, or graphically display significant effect. JMP help to communicate with
420 P. Urkude and S. Lade
analysis results through dashboards. Patterns and trends in datasets are not easily
available in spreadsheet. Visualization tools help to discover statistical functionality
effective which make discoveries. JMP graphs can be enhanced by using graph
builder properties.
4. Export Graph and Generate Summary Report
To save the report as text, image, HTML, or PowerPoint. To create an automation
system for work which is used to automatically update the graphs in JMP. When.jsl
script runs, the latest graph will update automatically.
Communicate results: After completion of analysis, visualization is best way to
communicate the results. Functionality of JMP gives you the tools to share the graphs.
Following information is displayed by the graphs (Fig. 3):
• Product failure date.
• How the number of claims increased with prediction RPH increase as well during
this time.
3 Conclusion
JMP is expeditiously developing statistical and visualization tool for the reliability
analyst. By using JMP scripting language, JMP offers a validated statistical analysis
and integrated tool to extend the demand of the report. This paper has demonstrated
how to monitor organizational business process, as business process is on-going and
recurring work. JMP scripting language is used to automate the business process,
which help to reduce analysis time. Through simple points and clicks, data ana-
lysts and reliability engineers can visualize the products relationships. JMP automa-
tion work will be used for statistical modeling, report building, and text processing.
422 P. Urkude and S. Lade
Analysis through JMP provides interactive and non-static. JMP promote continuous
improvement and best practices through custom interface. JMP deliver standardized
warranty agreements for suppliers. Visualization of JMP report serves as a basis for
competent discussion and business decision. JSL speeds up the work to get results
and possible to correlate the data using script. JMP configures the work and provides
consistent graph output. JMP helps to stabilize and optimize process without loss
of time. Warranty report and reliability analysis reports are the two most important
reasons to understand the new product and current product. By using JSL profiler,
we can tune a scripts performance and iteratively optimize code for peak efficiency
and speed.
Automatic exporting of graphs provides:
– Easy to organize the report
– Retains data interactivity
– Improve cycle times
– Standardized and transparent reporting
– Save countless hours of manual processing
– Improvements in efficiency, scalability, and best practices
– Statistical and graphical elements presented using JMP report
Save scripts to regenerate analysis reports without writing any code.
References
1. Wu, S., and M. Xie. 2008. Warranty cost analysis for non-repairable services products.
International Journal of System.
2. Thomas, MU. 2006. Reliability and warranties: Method for product development and quality
improvement. Taylor and Francis Group.
3. Amar, R., J. Eagen, and J. Stasko. 2005. Low-level components of analytics activity in
information visualization. IEEE Information Visualization (InfoVis).
4. Blischke, W.R., D.N.P. Murh. 1991. Product warranty management—I. A taxonomy for
warranty policies. European Journal of Operational Research.
5. Hill, Eric. 2011. JMP 9 add-ins. Taking visualization SAS data to new heights. In SAS Global
Forum 2011 Conference. Cary, NC: SAS Institute Inc.
6. Murthy, D.N.P. 2006. Product warranty and reliability, annals of operation research, 133–146.
7. Pecht, M.G. 2006. Establishing a relationship between warranty and reliability. IEEE
Publication.
8. Ascher, H., and H. Feingold. 1984. Repairable systems reliability. New York: Marcel Dekker
Inc.
9. Deshmukh, Amol, Jeff McDonald. Real-time market monitoring using SAS BI tools (1835–
2014).
10. Isenberg, P., D. Fisher, M. Morris, K. Inkpen, and M. Czerwinski. 2010. An exploratory study
of co-located collaborative visual analytics around a tabletop display. In Proceedings of the
IEEE Visual Analytics Science and Technology (VAST), 179–186.
Automation of Reliability Warranty Report … 423
11. Nelson, W. 1988. Graphical analysis of system repair data. Journal of Quality Technology 20
(1): 24–35.
12. Cao, Jin Hua, Yan Hong Wu. 1988. Reliability analysis of a multistate repairable system with
a replaceable repair facility. Acta Mathematical Applications Sinica (English Ser.) 4: 113–121.
A Critical Analysis of Brokering Services
(Scheduling as Service)
Abstract Cloud of things integrates cloud computing and Internet of things. The
cloud broker acts as an interface between the cloud service providers and service
consumer. The objective of this paper is to provide an in-depth knowledge on various
aspects of a cloud broker. It aims to focus on scheduling as a service provided
as a brokering service. This paper will provide initial background knowledge to
researchers on cloud brokering services and platform to explore the open research
issues related to scheduling as a service in cloud of things.
1 Introduction
Nowadays, with rapid growth of connected devices across networks leads to the
necessity of analyzing and storing huge volume of data. Cloud of things integrates
both of the cloud computing and Internet of things [19]. In recent past, cloud of things
is gaining popularity due to its ability to handle massive data that came from things
(smart watches, smart phones, smart vehicles, laptops, home appliances, etc.) in an
effective manner in the cloud. The cloud brokerage service is an intermediary between
cloud service providers and service consumer to conduct the negotiation process,
ensuring the good performance of both [21]. Cloud broker creates an interface to
facilitate communication between the service consumer and service providers. A
simple example of cloud brokerage (as an intermediation service) is price negotiation,
which helps the cloud consumer to select the desired cloud service [19].
The paper is organized as following. The basic concepts and background of bro-
kering services are presented in section two. Task scheduling as a brokering service
This section gives an overview of brokering concept and its major components. The
necessity of brokering services, techniques, and types are presented to strengthen the
foundation concepts related to brokering services.
2.1 Brokering
The term cloud brokering was coined at the beginning of the appearance of cloud com-
puting. According to the National Institute of Standards and Technologies (NIST)
[18], cloud broker is “An entity that manages the use performance, and delivery
of cloud services, and negotiates relationships between Cloud Providers and Cloud
Consumers.” In more precise terms, a cloud broker is a third-party auditor between
cloud service consumers and cloud service providers. It manages multiple services
such as data management, budgeting management, provisioning needs, and work
processes. The cloud service broker is not a cloud service provider but a mediator
that works on typical brokerage process principles [13].
Cloud brokerage is a time and cost saving technology, that enables the management
of all operations between cloud service consumers and cloud service providers. The
principal components of cloud broker are presented in Fig. 1.
The service consumer is the individual or organization who pays for the service. They
use an interface to gain access and submit their requests to the broker. In addition, they
are able to monitor and manage the services after deployment by use of management
console [8].
• Abstract Cloud API: This component is used to manage cloud resources and pro-
vide an API to the cloud user to interact with heterogeneous clouds and different
cloud providers.
The huge volume of data transferred in cloud of things requires access authorization,
identity management, ease of storing in the right location, and many other processes
that make the data safer and accessible by the owner in an easy way with good
performance and low cost. Cloud broker is a business model that acts as a third party
to help the customers to choose the right resources and track their activities with less
time and minimal effort [1, 2]. Moreover, cloud broker facilitates the customer in
choosing and contracting the services, and deploys the applications onto the cloud
platform. It is responsible for event or incident reporting such as scaling request or
renewal request. The cloud broker supervises the billing and pricing service, which
is considered as a serious aspect from the point of view of customers and service
providers. The cloud broker performs several operations in the cloud such as politics
management, security, identity management, access management, scheduling data,
and store data.
2.5 Challenges
The cloud broker provides the customer three categories of services such as service
aggregation, service arbitrage, and service intermediation [10, 16].
430 A. M. H. Obaid et al.
Scheduling is one of the most important services provided by the broker, where the
broker organizes traffic between users and cloud resources. It is a set of policies that
dispatch workflow to properly node machines to improve system efficiency, reduce
the total execution time and computing cost [23]. The scheduling and controlling the
workflow in cloud of things environments by the broker is presented in Fig. 4.
The cloud broker plays a responsible role for autonomous scheduling decisions while
transferring data from users to cloud and vice versa. The aim of scheduling is to
optimize techniques such as data integrity, performance, and execution time [14].
The role of a cloud broker in a cloud of things environments [24] is shown in Fig. 5.
The following steps summarize the process of scheduling data in cloud broker.
• Step 1: Registration of resources in cloud information services (CIS).
• Step 2: User submits tasks to broker with complete specification.
• Step 3: The broker places all submitted tasks in a task set.
• Step 4: The broker queries resources from cloud information services (CIS).
• Step 5: Cloud information services (CIS) returns the attribute of resources such
as the number of processing elements, allocation policy, and number of virtual
machines.
• Step 6: Broker ensures the registered resources are available by sending a query
to resources.
• Step 7: Based on responses, the broker makes resources available to allocate
process.
• Step 8: Broker submits tasks set to an available virtual machine by the aid of a
specific scheduling algorithm such as first come first service scheduling algorithm
(FCFS), Min-Min scheduling algorithm, round robin scheduling algorithm (RR),
and Max-Min algorithm,.
• Step 9: Tasks executed and results returned to the broker.
• Step 10: Broker returns tasks execution status and results to the user.
The sequence diagram for the process of scheduling data in cloud broker is shown
in Fig. 6.
The Min-Min algorithm executes by computing the minimum completion time for
each task. The task with the overall minimum completion time is executed first and
submitted to the corresponding machine for processing. The task with next minimum
completion time is scheduled next for execution. This process repeats until all tasks
mapped. The completion time CT(j, r) a task is computed as:
434 A. M. H. Obaid et al.
where ET(j, r) is the execution time of task T (j, r), and RT(j, r) is ready time for
resource j to take the job r. The Min-Min scheduling algorithm is shown in Fig. 7.
Assume that we have twenty tasks (Task0, Task1, Task2, Task3, … and Task19)
submitted by different users. The service broker receives tasks from customers with
the “REQUEST” tag and schedules the tasks by applying the Min-Min algorithm.
The task with the overall minimum completion time overall task is executed first
(T4), than the task with next minimum completion time (T5) is executed and so on.
The result of scheduling the twenty submitted tasks is showed in Fig. 8.
5 Conclusion
Cloud broker is considered as a major part of cloud environment. This paper focused
on the concepts and requirements of cloud broker in cloud of things environment. It
explores the challenges and open research issues associated with the cloud brokers
focused on scheduling as a service by taking Min-Min scheduling as a case study. This
paper will provide initial background knowledge to researchers on cloud brokering
services and platform to explore the scheduling as a service in cloud of things.
References
1. Aazam, M., and E.N. Huh. 2017. Cloud broker service-oriented resource management model.
Transactions on Emerging Telecommunications Technologies 28 (2): e2937.
2. Alkhanak, E.N., S.P. Lee, and S.U.R. Khan. 2015. Cost-aware challenges for workflow
scheduling approaches in cloud computing environments: taxonomy and opportunities. Future
436 A. M. H. Obaid et al.
24. Singh, L., and S. Singh. 2013. A survey of workflow scheduling algorithms and research issues.
International Journal of Computer Applications 74 (15).
25. Smanchat, S., and K. Viriyapant. 2015. Taxonomies of workflow scheduling problem and
techniques in the cloud. Future Generation Computer Systems 52: 1–12.
26. Tang, S., et al. 2018. Long-term multi-resource fairness for pay-as-you use computing systems.
IEEE Transactions on Parallel and Distributed Systems 29 (5): 1147–1160.
27. Varshney, S., and S. Singh. 2018. A survey on resource scheduling algorithms in cloud
computing. International Journal of Applied Engineering Research 13 (9): 6839–6845.
28. Wadhonkar, A., and D. Theng. 2016. A survey on different scheduling algorithms in cloud
computing. In 2016 2nd International Conference on Advances in Electrical, Electronics,
Information, Communication and Bio-Informatics (AEEICB). IEEE.
29. Yu, X., and X. Yu. 2009. A new grid computation-based Min-Min algorithm. In 2009 Sixth
International Conference on Fuzzy Systems and Knowledge Discovery. IEEE.
30. Zhang, Q., and Z. Li. 2009. Design of grid resource management system based on divided Min-
Min scheduling algorithm. In 2009 First International Workshop on Education Technology and
Computer Science. IEEE.
Application of Neural Networks in Model
Updation Using Frequency Response
and Radial Basis Function Network
Tumbanath Samantara
Abstract The main objective of this paper is to construct a function, out of the given
data which are the functional values defined at some discrete points. Many existing
traditional methods like numerical as well as statistical methods are available and
give good approximate result up to some extent. But if the data set is very large
then the computational complexity rises. Again in case of dynamic data, the working
procedure starts from the beginning at every time and a new function is constructed.
In this paper, a model is developed by using neural network method, a non-traditional
algorithm, which overcomes above difficulties and give results more close towards
accurate. Also when the input data changes, the problem need not to be solved afresh
starting from beginning but can be achieved good approximation solution by updating
the parameter generated from previous data set.
1 Introduction
The model updation can be defined as modifying the mathematical model so that the
relation between input and output data has very good agreement. Many engineering
and scientific problems can be modelled and the output obtained very close to accurate
one by using neural network. The common objective of engineering or scientific
problem is formation of a function on the basis of knowledge obtained from some
examples of input–output pairs. The neural network community calls this process as
supervised learning [1].
This paper presents a technique that enables updating the model of weakly non-
linear in nature. Model updating of engineering problems has been an important
topic of research during the last 25 years [2]. Whenever we are going to find a good
T. Samantara (B)
Centurion University of Technology and Management, Bhubaneswar, Odisha, India
e-mail: tnsamantara@gmail.com
approximation model to the existing accurate model, we should not minimize the dif-
ference between the predicated output and the actual output only rather the difference
between existed and updated parameters also.
In most of the optimization direct problems for getting optimal solution, the
problem has been solved many times. Once the problem is solved for a basic feasible
point in the search space and if that point does not results the optimal solution, then
the solution along with the point is discarded and then approached to another point
in search space. Next time, when we again solve the problem, the previous values are
not used. But in case of our proposed method, i.e. neural network method, a mapping
is created between frequency response functions and model parameters by using the
results of discarded searching solutions. If the parameter of the model changes in
future, the earlier created mapping can be applied to update this parameter without
solving the problem from beginning. Now by using these updated parameters, a
model can be created which is consistent with experimental data.
Out of different type of neural network organization, radial basis function
networks are used here for model updating problems.
2 Problem Statement
In this paper, we analysed the relation between load and frequency in a structure
and constructed a map by using non-traditional algorithm like neural network. The
map consists of parameters that can change with time. An adaptive model [3] has
been developed such that for any change of model parameters [4] in future, the map
constructed can be applied to update these parameters without again solving the direct
problem from beginning. By using these updated parameters, an updated model [5]
can be formulated which is very much consistent with the experimental data given
earlier.
3 Method of Solution
⎡ ⎤
H1 (ω1 ) H2 (ω1 ) · · · HN (ω1 )
X = ⎣ H1 (ω2 ) H2 (ω2 ) · · · HN (ω2 ) ⎦
H1 (ω3 ) H2 (ω3 ) · · · HN (ω3 )
k k . . . kN
Y = 1 2 (1)
c1 c2 . . . c N
The number of rows of X and Y represents the number of neurons in the input
layer (NI) and output layer (NO). The number of columns of X and Y represents the
number N t of sample pairs used to train the network.
In this paper, the centres of RBFNN [6] (matrix C) are chosen to be the input
vectors (matrix X) used to train the network.
First, we train the network and then compare the output with the exact solution
of the problem. Solving N times, a set of model parameters generated randomly.
The output corresponding to these N set of model parameters are input to the
network and the corresponding results are compared with the model parameters
used to generate the input data. Then, the network is measured. If performance is
low, then go for more training.
The network output Y is computed as,
Y = WH (2)
where W = weight matrix representing the connection of hidden layer and output
layer.
Solving we get
W = Y H (3)
Nt a
d
y p,k − y p,k
eavg = , k = 1, 2, . . . , N (4)
p
y dp,k
k−1
avg
Here e p : mean absolute error for estimating pth parameter, y a : actual output of the
network, y d : desired output. This formula shows, on average, how much deviations
may occur in a network during computing of model parameter.
(b) Maximum error:
ya − yd
max p,k p,k
ep = max , k = 1, 2, . . . , N (5)
k y dp,k
442 T. Samantara
(c) Sum-squared-error:
1
Nt
d 2
εp = y p,k − y ap,k (6)
2N k=1
1/2
NI
d= (xm − cm ) 2
(7)
m=1
Here x 1 and x 2 are the mean of the first and second positions of x and σ1 and σ2 are
the expected variations of those positions,
Then distance
1/2
d = (x1 − c1 ± σ1 )2 + (x2 − c2 ± σ2 )2 (9)
The normalization only could not overcome all short of problems. Scaling is
required to get better results in distance measure of two vectors. This is accomplished
by using the spread constants vector σ that is associated with each row vector of the
input matrix.
It was found that for the model updating problem an appropriate way to choose
the vector σ is to make it dependent on the standard deviation [7] of the rows of the
normalized set of training vectors. The vector σ is then calculated by
σm = γ std(xm ) (11)
where std(·) is the standard deviation and is a scaling constant multiplying it, being
the value γ = 1 a very good initial guess.
When training a neural network, one is always interested in minimizing the gener-
alization error, i.e. the resulting output error when an input vector not used to train the
network is presented to the network. Let us assume that N input vectors are presented
to the network. Each input has an output error vector associated with it, defined as
where yid is the vector of desired outputs and yia is the actual output vector when the
vector xi is presented to the network. It is desirable to minimize the squared sum of all
error vectors. Minimizing this scalar is synonymous to improving the generalization
characteristics of the network. Therefore, it is desirable to find the constant such that
the sum-squared-error
1 2
N NO
ε= e (i) (13)
2N i=1 q=1 q
is minimized. Any minimization technique can be used to find the optimal value of
γ . Here, a gradient descent formulation is presented. The gradient of ε with respect
to γ is given by
∂ε ∂ε ∂e j (i) ∂ y a (i)
= k
(14)
∂γ ∂e j (i) ∂ yka (i) ∂γ
where yka = m wk,m h am and h am = e−γ dm .
Analysing term by term, Eq. (2.14) yields
∂ε 1
= e j (i) (15)
∂e j (i) N
∂e j (i)
= −δ jk (16)
∂ yka (i)
444 T. Samantara
The first term of the above expression can be calculated by using Eq. (2.3)
∂W ∂γ H −1 ∂ H −1 ∂ H −1 ∂ H −1
= =Y = −Y H −1 H = −W H (18)
∂γ ∂γ ∂γ ∂γ ∂γ
where H is the matrix output of the hidden layer when presented with the train-
ing vectors. Writing the expression above in summation form and calculating the
derivative of H with respect to γ yields
Nt
∂ p=1 wk, p
Nt
N
I k = 1, 2, . . . , NO
= wk, p d p,n h p,n h n,l , (19)
∂γ l = 1, 2, . . . , Nt
p=1 n=1
I
where h n,l is the (n, k) element of the matrix H −1 . The second term of Eq. (17) is
given by
∂h ak (i)
= − dka (i)h ak (i) (20)
∂γ
Collecting the results from Eqs. (19) and (20), Eq. (17) can be written as:
N N
∂ yka (i) t
Nt
I a
a
= wk, p d p,n h p,n h n,l h k (i) − d p (i)h ap (i) (21)
∂γ p=1 n=1 l=1
∂ y a (i)
= W H̄ H −1 h a (i) − h̄ a (i) (22)
∂γ
where H p,n = d p,n h p,n , and h̄ ap (i) = d ap (i)h ap (i). Now, collecting the results
∂ε
obtained from Eqs. (15), (16) and (21), one can write ∂γ as
N N
1 t
Nt
∂ε
N
I a
a
=− e j (i) wk, p d p,n h p,n h n,l h k (i) − d p (i)h p (i)
a
(23)
∂γ N i=1 p=1 n=1 l=1
1 T
N
∂ε
=− e j (i) W H̄ H −1 h a (i) − h̄ a (i) (24)
∂γ N i=1
Application of Neural Networks in Model … 445
The change in the γ parameter should be in the direction opposite to the gradient
∂ε
∂γ
,therefore, the iteration scheme for minimizing ε is
∂ε
γ k+1 = γ k + η (25)
∂γ
where η is the step size, has value between 10−1 and 10−2 . The estimation problem
has to be solved at each iteration. This can be efficiently done by recognizing that
γ
e−γ d = e−d (26)
Therefore, it is not necessary to measure the distance matrix at each iteration. One
need only to perform the following operation:
d γ k+1 = (d(γ = 1))γ
k+1
(27)
Equations (24), (25) and (27) represent the necessary steps to efficiently adjust the
constant γ in a gradient descent [8] fashion in order to minimize the sum-squared-
error of the network over the validation set.
ẋ = Ax(t) + Bu(t)
(28)
y(t) = C x(t) + Du(t)
This expression for X(s) can be substituted into Eq. (29) to yields
Y (s) = C(s I − A)−1 B + D U (s) (31)
Assuming the force vector is periodic, with frequency ω, the equation above can
be rewritten as
Nt
Y (ω) = H (ω)U (ω), ⇒ Yi (ω) = Hi, j (ω)U j (ω) (33)
j=1
Making the reasonable assumption that C and D are not functions of the model
parameter α. The derivative of the inverse of a matrix [11] can be easily calculated
using Eq. (2.45), yielding
∂ H (ω) ∂( jωI − A) ∂B
= −C( jωI − A)−1 ( jωI − A)−1 B + C( jωI − A)−1
∂α ∂α ∂α
∂ A ∂ B
= C( jωI − A)−1 ( jωI − A)−1 B + (35)
∂α ∂α
Equation (35) is the sensitivity of the FRF due to model parameter changes and
now the issue of which points to select can be analysed. For values of ω close to a
natural frequency of the system, the determinant of the matrix ( jωI − A) becomes a
small number, since this expression is very similar to the eigen value problem. In fact,
it becomes zero for undamped systems at the natural frequency. In the general damped
case, this small number makes the sensitivity ∂ H∂α(ω) a large number. Therefore, the
general rule for selecting points of the frequency response function to be used in the
updating procedure is that the chosen points should be as close as possible to the
natural frequencies. In general, however, experimental frequency response functions
Application of Neural Networks in Model … 447
are not very accurate at the natural frequencies. This occurs for many reasons, among
them, for lightly damped systems, the response become too large and a linear model
is no longer valid. Because of this, the points selected for updating should be nearby
the natural frequencies so a high sensitivity is obtained, but not exactly at the natural
frequencies, so inexact values are not used in the updating.
An extra guideline is that one should use the smallest number of frequency points
necessary to solve the problem. Extra points do not contain any new information,
increase the computational load, and can cause numerical instabilities.
6 Conclusion
In this paper, the objective and the procedure followed are discussed in detail. By use
of neural network technique, a frequency domain data set can be converted to a set of
parameters which model the system accurately. Once the knowledge of conversion
is gained that help us in estimating the parameter being updated and the technique
become more vival for updating systems that change over time frequently.
References
1. Satish, Kumar. 2007. Neural networks. New Delhi: TataMcGraw-Hill Publishing Company
Limited.
2. Friswell, M.I., and J.E. Mottershead. 1995. Finite element model updating in structural
dynamics. Kluwer Academic Publishers https://doi.org/10.1007/978-94-015-8508-8.
3. Rao, Vittal, Rajendra Damie, Chris Tebbe et al. 1994. The adaptive control of smart structures
using neural networks. Smart Materials and Structures 3:354–366.
4. Szewczyk, Z.P., and Prabhat Hajela. 1993. Neural network based selection of dynamic system
parameters. Transactions of the Canadian Society of Mechanical Engineers 17 (4A): 567–584.
5. Natke, H.G. 1988. Updating computational models in the frequency domain based on measured
data: a survey. Probabilistic Engineering Mechanics 3 (1): 28–35.
6. Park, J., and I.W. Sandberg. 1991. Universal approximation using radial-basis-function
networks. Neural Computation 3 (2): 246–257.
7. Gupta, S.C., and V.K. Kapoor. 2005. Fundamentals of Mathematical Statistics.
8. Smith, S.W., and C.A. Beattie. 1991. Secant-method adjustment to structural models. AIAA
Journal 29 (1): 119–126.
9. Sastry, Shankar, and Marc Bodson. 1989. Adaptive Control—Stability, Convergence, and
Robustness, Prentice Hall. https://doi.org/10.1121/1.399905.
10. Nayfeh, A.H. 1998. Introduction to Perturbation Techniques, Wiley.
11. Simonian, S.S. 1981. Inverse problems in structural dynamics ii-applications. International
Journal of Numerical Methods in Engineering 17 (3): 367–386.
A Review on SLA-Based Resource
Provisioning in Cloud
1 Introduction
actually need them. This can be realized because of the utility-based nature of cloud.
Users use the resources or computing infrastructure of the cloud and pay only for
those things which they use and not for the whole infrastructure. The resources of
the cloud are distributed across the globe so that individual or companies can access
and utilize the resources and services from anywhere. Due to many exciting features
such as reduced computational cost, flexibility, and very high degree of reliability,
cloud has become one of the technologies to look for.
The resource requests from the users are handled by cloud service providers
by creating and deploying enough number of virtual machines (VMs) where the
requests are actually tackled. This allocation of resources by cloud service providers
is achieved by utilizing some resource allocation or provisioning technique. Two
main provisioning techniques used in cloud environment are static provisioning and
dynamic provisioning. Some parameters such as cost minimization, maximization
of resources, and response time for each request are also to be considered while
allocating resources. Resources are to be allocated in a way to follow the service
level agreements (SLAs) as prescribed by the cloud service provider to its users.
SLAs typically are defined in terms of mean time to failure (MTTF) and mean time
to recover (MTTR). The MTTR metric measures the availability of the system, and
MTTF metric computes the reliability of the system. These two together are used to
establish a contract between the service provider and the user.
Cloud computing provides many services such as infrastructure to be used by the
users, platform to develop and deploy software, and software created by providers
to be used by the users. Based on these different services, different service models
are available in a cloud environment which is discussed below:
Infrastructure-as-a-Service (IaaS):
• In an IaaS environment, cloud resources such as storage, computing power,
bandwidth, and databases are provided on demand to the users.
• Many users can work simultaneously on a single hardware. Examples include
Amazon Web Service (AWS) and GoGrid.
Platform-as-a-Service (PaaS):
• In a PaaS environment, cloud service providers provide platforms for users to build
and deploy different Web-based applications.
• The constraint of software download and installation for developers is removed
while providing facilities which are required during a complete life cycle of
building and developing applications.
• Different UI scenarios are created, modified, tested, and deployed using Web-based
tools which also handles billing and subscriptions.
• Examples include Microsoft Azure and Salesforce’s Force.com.
A Review on SLA-Based Resource Provisioning in Cloud 451
Software-as-a-Service (SaaS):
• SaaS model distributes ready software, where applications are hosted by service
provider which is made available to customer through the Internet.
• SaaS applications are designed for users and delivered over the Internet.
• Software is managed from a central location, so there is no need of users to handle
and control infrastructures like network, operating system, servers, storage, etc.
Companies that offer SaaS are Google, Microsoft, Zoho, etc.
Some of the benefits attributed to the use of cloud computing are:
• Reduced Cost: Cloud computing provides the facility of pay as per usage, thus
reducing the initial and recurring expenses.
• Increased Storage: As cloud provides huge infrastructure, storage and mainte-
nance of data in quite a large volume becomes easy.
• Increased Flexibility: Cloud computing facilitates employees as there is no
restriction on resources and locations.
The rest of the chapter is organized as follows: In Sect. 2, we give an overview
of resource provisioning policies in cloud computing. Section 3 describes the SLA-
oriented system architecture for resource provisioning in cloud computing. Section 4
presents comparison study of various resource provisioning techniques highlighting
its methodology, result analysis, advantages, and disadvantages. The comparison of
the architectures used in the papers is mentioned in Sect. 5. Section 6 concludes the
paper by mentioning the future scope of work.
The process of assigning resources over the network as needed for the application
requiring different types of computational resources is called resource provisioning
in cloud computing. The term resource provisioning includes a vast area such as
selection, allocation, and dynamic management of both the hardware and software
resources. The provisioning and allocation of resources must be done in way to abide
by the prescribed SLA of the service provider to their users and also to meet Quality
of Services (QoS) parameters such as availability of resources, response time of each
job, throughput and security of data stored by the users.
Two policies for resource provisioning are:
• Static Provisioning: All resources required for the completion of a job are sub-
mitted before the submission of any job by application in static provisioning. The
resources are released gradually as the jobs finish their executions. So, static provi-
sioning technique is best suited for applications which have predictable and static
demands. A cloud provider in this type of provisioning provides the users with
maximum of resources in order to avoid SLA violation. Due to the over allocation
of resources, a lot of resources are wasted, and as such, both users and providers
suffer loss.
452 P. Pranav et al.
• Virtual Machines: Multiple virtual machines are created to meet service request
on the same physical machines.
• Physical Machines: Multiple servers are hosted on data centers to provide
resources to meet demand.
There are many resource provisioning techniques examined so far. These techniques
are detailed in Table 1 describing various factors, viz methodologies used, results
obtained, and their advantages and disadvantages.
5 Architectures Used
Apart from the various resource provisioning techniques available, none of them
are fully devoted to SLA monitoring. Some techniques require the requirements of
users’ resources to be known in advance, whereas in other techniques, time and
cost optimizations are not possible. Moreover, multilayer implementation of cloud
services is under process because of migration cost of VMs.
Out of the discussed techniques, the Mixed Workload-Aware Policy (MWAP)
using artificial neural network (ANN) is the most suitable one as it focuses on many
factors while allocating resources, and thus, the chances of SLA violation are very
less. Also, the utilization of cloud data centers is the most in this technique. So, in
the future, flexible algorithms can be designed by taking into consideration several
aspects of SLA.
Table 1 Comparison of various resource provisioning techniques
Description Methodology Result Advantages Disadvantages
Paper [1] describes a SLA-oriented dynamic QoS requirements of Cost optimization More cost requirement in
SLA-based resource provisioning algorithm is applications and hence the case of time-based
provisioning methodology used which works by prescribed SLA can be optimization
using Aneka platform. getting executed on the effectively met by Aneka
SLA-oriented dynamic entry or exit of a job by allocating the resources
provisioning algorithm in Completion time is dynamically
Aneka is described here calculated and is checked
where users’ SLA is against the deadline.
handled by Aneka Request for extra resources
is submitted only if
completion time is greater
than the deadline. If this is
not the case, then the
resources are released
Authors in paper [2] present The scheduler first Execution time for each job Resource allocation from Further improvements in
a deadline-constraint determines the deadline of was calculated without any different sources can be the dynamic resource
provisioning policy of a task, number of tasks in deadline first and then after done effectively while provisioning algorithms of
resources in cloud using the queue, and run time fixing many different keeping a check on Aneka are under developed
A Review on SLA-Based Resource Provisioning in Cloud
Aneka cloud platform. estimation for each task in deadlines. The result shows application and execution
Resources obtained from a order to predecide if the that execution time of each time
variety of sources such as deadline can be met or not. job decreases significantly
cluster grid, desktop grid, The scheduler also when the deadlines are
and private grid can be determines the number of made stricter
effectively provisioned by extra resources required by
Aneka each process and submits a
resource request to the
provisioner service which
contains the number of
resource request and the
request
(continued)
455
Table 1 (continued)
456
Table 3 (continued)
Algorithm Disadvantages Future work
Spot-instance-aware Not used where external Information about when
provisioning algorithm resources required is not external resources are
known required and knowledge
Not suitable when no about the current and
information about the current predicted utilization of local
and predicted utilization of infrastructure
local infrastructure
SLA-oriented dynamic Time optimization not Effective policies for
provisioning algorithm possible time-based optimization
Multi-parameter bin-packing Multilayer implementation On the top of cloud
problem using algorithm on the top of cloud environment implementation
Priority_Fit_SA architecture is not done of this algorithm in a
multilayer system
MWAP-based resource Not suitable for workflows Optimizing the resource
provisioning algorithm and parallel applications provisioning technique for
other types of workload like
workflows and parallel
application
Use of fuzzy logic modeling Does not consider the cost of Consider the cost of VM
migrating virtual machines migration in the proposed
profit model
Job allocation with SLA Only FCFS scheduling policy Use of job scheduling
(JAS) algorithm and auction is used approaches like SJF and RR
model to maximize available
resources
Adoption of prediction model
to maximize profit
RAS-M market-based Only manages CPU resources Implementing RAS-M in
strategy through GA-based upper-level resource
automatic price adjusted management module
algorithm Manages different types of
resources
Resource allocation Requirements are to be Hybrid optimization
optimization using BFO known initially; cost technique that takes the
estimation for migration is benefits of BFO and genetic
not proposed; violation of algorithm or greedy-knapsack
SLA to give more efficient result
A Review on SLA-Based Resource Provisioning in Cloud 463
References
1. Buyya, R., S.K. Garg, and R.N. Calheiros. 2011. SLA-oriented resource provisioning for cloud
computing: challenges, architecture, and solutions. In International Conference on Cloud and
Service Computing, 1–10.
2. Vecchiola, C., R. Calheiros, D. Karunamoorthy, and R. Buyya. 2012. Deadline driven provi-
sioning of resources for scientific applications in hybrid cloud with Aneka. Future Generation
Systems 28: 58–65.
3. Dhingra, A., and S. Paul. 2014. Green cloud: smart resource allocation and optimization using
simulated annealing technique. International Journal of Computer Science & Engineering 5
(2): 41–49.
4. Pandit, D., S. Chattopadhya, M. Chattopadhya, and N. Chaki. 2014. Resource allocation in
cloud using simulated annealing. In Applications and Innovations in Mobile Computing, 21–27.
5. Li, C., and L. Li. 2012. Optimal resource provisioning for cloud computing. Journal of
Supercomputing 62: 989–1022.
6. Garg, S., A. Toosi, S. Gopalaiyengar, and R. Buyya. 2014. SLA based resource provisioning for
heterogeneous workload in virtualized cloud data centres. Journal of Network and Computer
Applications 45: 108–120.
7. Dhingra, M., J. Lakshmi, S. Nandy, C. Bhattacharyya, and K. Gopinath. 2013. Elastic resource
framework in IaaS, preserving performance SLA. In IEEE 6th International Conference on
Cloud Computing, 430–437.
8. Vecchiola, C., R. Calheiros, D. Karunamoorthy, and R. Buyya. 2012. The Aneka platform and
QoS driven resource provisioning for elastic applications on hybrid cloud. Future Generation
System 28: 861–870.
9. Xu, J., M. Zhao, R. Carpenter, and M. Yousif. 2007. On the use of fuzzy modelling in virtualized
data center management. In IEEE, ICAS, July 16, 2007. https://doi.org/10.1109/icac.2007.28.
10. You, X., X. Xu, J. Wan, and D. Yu. 2009. RAS-M: resource allocation strategy based on market
mechanism in cloud computing. In Fourth China Grid International Conference, 253–263.
11. Munteanu, V., T. Fortis, and V. Negru. 2013. Evolutionary approach for SLA based cloud
resource provisioning. In IEEE 27th International Conference on Advanced Information
Networking and Applications, 506–513.
12. Chang, C., K. Lai, and C. Yang. 2013. Auction based resource provisioning with SLA con-
sideration on multi-cloud systems. In IEEE 37th Annual Computer Software and Applications
Conference Workshops, 445–450.
An Extensive Study on Medical Image
Security with ROI Preservation:
Techniques, Evaluations, and Future
Directions
1 Introduction
In the present-day scenario, digital communication has a vital role. Internet tech-
nology along with cloud computing boosts the use of data communication exponen-
tially. In this regard, the security issue on data communication is a sensitive matter
of concern. Although many efficient and secure mythologies are proposed, still more
secure robust techniques in terms of performance need to be proposed. The infor-
mation over the communicating channel can be sent in two different forms that are
(1) in encrypted format and (2) in hidden format [1]. In a cryptography system,
the encryption technique is used to convert the original information into arbitrary
encrypted form of transmission. The decryption system is used to regenerate the
arbitrary information to the original form; the cryptography system is of two types,
i.e., symmetric and asymmetric. Information hiding techniques involve steganogra-
phy, watermarking, reversible data hiding, etc., where the digital image video, audio,
and textual contents can be hidden in the original file. Also, several techniques use
a combination of both cryptography and steganography for better security and pri-
vacy preservation of sensitive data. In the data transmission process, medical image
transmission plays a significant role in telemedicine applications. There are various
sensitive contents present in the medical image along with the EPR which needs to
be shared between the patient and the authenticated receiver. During the transmission
of medical images, security issues are quite questionable in terms of confidentiality,
reliability, and integrity [2]. To preserve the patient’s private data, that images need
to be encrypted; also, certain information needs to be embedded in the encrypted
sensitive contents. After transmission, the decrypted system also plays an essential
role in data recovery and integration. This paper describes the above discussion in a
precise way.
The rest of the paper is organized as follows: Sect. 2 describes major security issues
on medical data image with significant ROI preservation. Section 3 represents recent
related works on ROI preservation for medical images. Sections 4 and 5 represent a
comparative view on the performance of different reviewed techniques using various
performance parameters. Section 6 summarizes the overall contribution of the paper
and future directions.
A medical image can be divided into two regions based on content sensitivity: (1)
region of interest (ROI) and (2) region of non-interest (RONI), as demonstrated in
Fig. 1. The ROI contains the sensitive information required for medical diagnosis,
so it needs to be preserved over the transmission media. Any kind of tampering or
noise addition to the ROI should be detected, located, and recovered at the receiver
end using various data hiding techniques. In this way, the veracity of ROI can be
well protected even if it is attacked and tampered by any third party, whereas RONI
recovery cannot be achieved losslessly. Further, segmentation of the medical image
includes ROI and RONI in the spatial domain using watermark embedding which
needs extra security as it is easy to destroy the embedded information in RONI.
In this paper, various data hiding techniques for protecting and integrating medical
images are reviewed. Also, the compression and evaluation of their performance are
discussed.
3 Related Works
This technique was proposed by Liu et al. [3] for verifying the integrity and authen-
ticity of medical images. In this scheme, from a hospital logo, the use of authenticity
is to generate data from the hash values. The entire medical image used hash for
the function of tamper detection. Before finding the tampered reasons for ROI, it
is divided into 16 × 16 non-overlapping blocks. Then, for tampered localization in
each ROI block, CRC was adopted for generating tamper localization information
of ROI.
For tamper recovery in ROI regions, the IWT coefficients were used [3]. Further
to minimize the tamper recovery in ROI, BTC techniques were used.
There are four basic phases in all reversible watermarking schemes: watermark
generation, watermark embedding, watermark extraction, and security verification
as illustrated in Fig. 2.
(ii) In the second phase, the tampered detection information is generated using
another D = f (M) using SHA-1, where D is a 160-bit tampered detection
information and M is a medical image.
(iii) In the third phase, tamper localization information is generated using CRC-
16 of ROI. It includes ROI normalization, ROI division into non-overlapping
blocks, conversion of generator function to binary digits, non-version of each
pixel of 8-bit to a vector, appending 0’s to the vector, and dividing the vector
by polynomial generator to 16-bit reminder for tamper localization.
(iv) In the fourth phase, the information regarding tamper recovery is generated.
This phase is also responsible for the quality recovery of ROI. This process
executes in four steps using the coefficient approximation matrix of IWT, divi-
sion of the CA into non-overlapping block, reconstruction of them, and further
conversion of the reconstructed matrix to binary and re-arranging them into a
vector to generate the tamper recovery information.
In this phase, watermark embedding used the technique of SLT. S is the singular
matrix using SVD and the most significant value. Further RDM-based function is
applied to embed watermarks for restoring lossless medical images. During water-
marking, the embedding into ROI and RONI is done without dividing the medical
images.
This phase ensures the authenticity and integrity of the quality and source of the
medical images. This phase includes the authenticity of the hospital logo using a
hash function and the authenticity of a medical image using integrity verification. If
tamper detected using hash function then a series of operations using normalization,
bit mapping, reconstruction, and inverse IWT can be applied to restore the medical
images (Fig. 3).
An Extensive Study on Medical Image Security with ROI … 469
Fig. 3 Test images for ROI tamper detection and recovery: a image watermark, b tampered erased
image, c tampered localization, d tampered erased recovery image, e watermarked image, f copy–
paste tampered image, g localization of copy–paste tampered blocks, and h recovery of copy–paste
tampered image
This method was proposed by Lie et al. for data hiding on medical images in encrypted
form. It comprises three phases: (1) image encryption, (2) data embedding, and (3)
data extraction and recovery, as illustrated in Fig. 4.
In this phase, the image partitioned into non-overlapping blocks then the encryption
is done. The original image is divided into three parts: ROI, RONI, and border area.
The ROI is selected using any polygon function, and the bottom line of the image
can be selected as a border. Then, the hash value of the ROI can be calculated
using MD5. Then, the ROI can be concatenated by the RONI border area. After this
rearrangement, the encryption is done using stream cipher with pseudorandom key
bits. Finally, the ROI bit and the hash values are embedded into the LSBs of the
border area to generate the encrypted image.
In this phase, the embedding processing is applied on encrypted medical image. First,
the ROI portion is identified from the encrypted image; then, the LSBs of the ROI
pixels are manipulated according to the embedded data. The EPR gets concatenated
on the LSBs to generate the embedded data.
The extraction and recovery from the embedded image can be done in three ways,
using the key for data hiding or key for encryption or both. If the data hiding key
is only used, the receiver can de-embed the hidden data from the received image
but cannot recover the original image. If the receiver has only the encryption key,
then the received image can be decrypted but the hidden data cannot be decrypted.
So the receiver must have both keys for encryption and data hiding for extraction
and recovery of the lossless original image. The execution of the above-mentioned
technique is illustrated in Fig. 5, and its performance is analyzed in Table 1.
Fig. 5 Test image analysis: a original image, b rearranged image, c encrypted image, d embedded
image, e decrypted image, and f recovered image
An Extensive Study on Medical Image Security with ROI … 471
The technique uses visual cryptography for secure telemedicine data for proper diag-
nosis. It involves five algorithms. The details of encryption and decryption process
are explained in Fig. 6. The first algorithm identifies the RONI for data embedding. It
analyzes the min threshold value of the pixel and then divides the image into rows and
Fig. 7 a Original image [6], b embedded image, c share 1, d share 2, e share 3, f overlapped share
1, share 2, and share 3
columns. Then, it analyzes the pixels whose threshold value is greater than the min
threshold and marks them as RONI. In this way, ten RONI regions are located. The
second algorithm operates on a portion of the ROI which gets embedded into RONI
for integrity verification. For integrity verification in ROI, the algorithm divides the
image into two portions. The first portion is taken from the top left corner to the
bottom right corner of the image. The second portion is taken from the top right
corner to the bottom left corner of the image. So, it obtains three ICV s1, s2, and s3
to be used for future integrity verification. The algorithm three converts the grayscale
image format to binary and uses pushing error forward-based error diffusion tech-
nique for halftoning. The algorithm four analyzes the original image by comparing
the original pixel value greater than 255 and less than 255 and replaces the values
in three shares. The fifth algorithm implements all above-mentioned algorithms and
other algorithms as required to generate the embedding information in RONI. The
performance is analyzed in Fig. 7.
The mean square error (MSE) between the original and recovered images can be
calculated as follows:
N
(Ci − Ci )2
MSE = i=1 (2)
N
This parameter is used to compare the similarity between the recovered and the
original images. It can be calculated as follows:
(2μx μ y + c1 )(2σx y + c2 )
SSIM(x, y) = (3)
(μ2x+ μ2y + c1 )(σx2 + σ y2 + c2 )
where μx , μy are the averages of x and y, σx2 , σ y2 are the variances, and σ xy are
covariances for x and y, respectively. c1 and c2 are balancing constants.
ROI and signature information; then using a halftoning – Better confidentiality secured SSIM =
RONI algorithm, the RONI is embedded. The VC techniques of patient records 0.9701
integration are used to generate the shares. Then, ROI is – Decryption and Accuracy
combined to get the final transmitted image. The recovery with constant = 99.874
decryption and recovery process is completely time complexity
reversible
475
476 B. Ku. Paikaray et al.
6 Conclusion
This paper explores different techniques proposed for security issues on ROI-based
medical images and the solution for ROI-based privacy preservation. Different tech-
niques are analyzed with basic parameters, methodology advantages, and drawbacks
with performance parameters. Although the region-based analysis is done in all
reviewer papers, still the embedding technique differs. Based on this review, it can be
concluded to adobe heterogeneous parameters for ROI manipulation to achieve robust
security and better privacy and preservation. The comparison table illustrates differ-
ent techniques with a qualitative and quantitative way that source future direction
for the development of new algorithms of medical images.
References
1. Cheddad, A., J. Condell, K. Curran, and P.M. Kevitt. 2010. Digital image steganography survey
and analysis of current methods. Signal Processing 90: 727–752. https://doi.org/10.1016/j.
sigpro.2009.08.010.
2. Coatrieux, G., H. Maitre, and B. Sankur, et al. 2000. Relevance of watermarking in medical
imaging. In Proceedings of IEEE EMBS International Conference on Information Technology
Applications in Biomedicine, 250–255.
3. Xiyao, Liu, Jieting Lou, Hui Fang, Yan Chen, Pingbo Ouyang, Yifen Wang, Beiji Zou, Lei
Wang. 2019. A Novel Robust Reversible Watermarking Scheme for Protecting Authenticity and
Integrity of Medical Images. IEEE. https://doi.org/10.1109/access.2019.2921894.
4. Liu Yuling, Qu, and Xin Guojiang Xinxin. 2016. A ROI-based reversible data hiding scheme
in encrypted medical images. Journal of Visual Communication and Image Representation 39:
51–57. https://doi.org/10.1016/j.jvcir.2016.05.008.
5. Arvind, Bakshi, and A.K. Patel. 2019. Secure telemedicine using RONI halftoned visual
cryptography without pixel expansion. Journal of Information Security and Applications 46:
281–295. https://doi.org/10.1016/j.jisa.2019.03.004.
6. http://imaging.cancer.gov/.
7. Liu, Y., X. Qu, G. Xin, et al. 2015. ROI-based reversible data hiding scheme for medical images
with tamper detection. IEICE Transactions on Information and Systems E98-D (4): 769–774.
8. Lizhi, Xiong, and Dong Danping. 2019. Reversible data hiding in encrypted images with
somewhat homomorphic encryption based on sorting block-level prediction-error expansion.
Journal of Information Security and Applications 47: 78–85. https://doi.org/10.1016/j.jisa.
2019.04.005.
9. Koley, Subhadeep. 2019. A feature adaptive image watermarking framework based on Phase
Congruency and Symmetric Key Cryptography. Journal of King Saud University—CIS. https://
doi.org/10.1016/j.jksuci.2019.03.002.
10. Yan, X., L. Liu, Y. Lu, et al. 2019. Security analysis and classification of image secret sharing.
Journal of Information Security and Applications 47: 208–216. https://doi.org/10.1016/j.jisa.
2019.05.008.
A Competitive Analysis on Digital Image
Tamper Detection and Its Secure
Recovery Techniques Using
Watermarking
Abstract Digital images play a vital role in human life. Hence, its protection from
unauthorized access is a serious matter of concern. Even if the contents are modified
then its detection and recovery must be defined. Nowadays, a number of methods
are proposed to protect digital images based on digital watermarking. But all are not
with similar capability in terms of security, authenticity, recovery. This paper repre-
sents the basics of digital watermarking techniques along with their competency and
weakness for the detection of tampered images and their recovery process. A series
of watermarking techniques with simulated results show their working efficiency
with quantitative result analysis.
1 Introduction
With the rapid improvement of the information sharing over the internet, digital
images are hugely transmitted, whereas digital images are vulnerable to modifications
and manipulation with widely available image manipulation tools. The wholeness
and genuineness of digital images can be assured by using a digital watermarking
process to which embed special information (text or image), known as a watermark,
to digital source [1]. To maintain the full integrity of digitalized images research into
watermarking-based image, authentication has been evolved. The first watermarking-
based image authentication technique was proposed by Van Schyndel et al. [2] in
1994. The major concern for watermarking the digital image was that there should be
M. Swain · D. Swain
Department of Computer Science, Rama Devi Women’s University, Bhubaneswar, India
B. Ku. Paikaray (B)
Department of CSE, Centurion University of Technology and Management, Bhubaneswar,
Odisha, India
e-mail: bijaypaikaray87@gmail.com
the lowest modification, which can be recovered at the receiver end. We are reviewing
previous fragile watermarking techniques in this paper.
The rest of this paper is arranged as follows. In Sect. 2, basics of watermarking are
discussed. In Sect. 3, some of the existing tamper detection and recovery methods
are discussed along with the comparative analysis. At last, the paper is concluded in
Sect. 5.
The watermarking approach can be categorized based on some attributes [3]. Water-
marking concept categorized into spatial domain and frequency domain when water-
mark data is embedded in the form of spatial and frequency domain representation
of the original image. In the spatial domain watermarking method, the image pixel
value is directly manipulated by changing some bits of pixel with watermarked bits.
In frequency domain watermarking method watermark embedding, the first original
image is transformed then watermark data is embedded to the changed coefficients.
According to the robustness of the algorithm, watermarking is classified into three
types, robust, fragile and semi-fragile. The robust watermarking method can accept
to eliminate the watermarks from images in different kinds of attacks and hard. Thus,
this technique can be used to insert copyright information. The fragile watermarking
technique can detect a tampered image by any attack with removed watermark. The
semi-fragile watermarking is similar to fragile watermarking with additional features
but it can bear some unintentional attack (Figs. 1 and 2).
During watermark image transmission, the image can face various attacks. The
attacks are broadly categorized as unintentional attack (UA) and intentional attack
Fig. 1 Classification of
watermarking approach
A Competitive Analysis on Digital Image Tamper Detection … 479
Fig. 2 Watermarking
classifications
(IA). During the unintentional attack, noise may be inserted due to signal fluctua-
tions and toggling of bits due to various reasons. An intentional attack, the image is
tampered intentionally at the time of its storage and transmission. It is also known
as man in the middle attack.
There are various intentional attacks that can be more classified. A removal attack
is an IA, where the attacker removes the original content of the image. It may be
possibly copyright information. Addition attack is an IA where some additional con-
tent is added on the watermark image. Cryptographic attack is an IA, where the
watermarking scheme is cracked to embed misleading watermark by removing the
original watermark. Copy-paste attack is an IA where attacker duplicates the water-
marked data illegally so that the second copy of the watermarked image is generated.
Geometric attack is an IA which involves all distortion on the image generated due to
transformations like translation, rotation, scaling, shearing, cropping, line or column
removal, etc.
It is used in the reconstructed image and original image to find out the cumulative
squared error. The error value of MSE is lowered off the error.
It can be calculated as follows.
N
(Ci − Ci )2
MSE = i=1 (2)
N
This method is used in between two images for measuring the similarity and
comparing the watermarked image with the original. It can be calculated as follows.
(2μx μ y + c1 )(2σx y + c2 )
SSIM(x, y) = (3)
(μ2x+ μ2y + c1 )(σx2 + σ y2 + c2 )
It represents the hidden information present inside the transmitted image, which is
expressed by bits per pixel (BPP). More about payload capacity represented more
privacy in the sent image.
4 Literature Review
In 2005, Hsieh et al. [4] proposed a hierarchical fragile watermarking scheme for
image tamper detection and recovery by using a parity check and comparison between
the pixels average value of each block. To improve tamper detection accuracy,
three detection levels were used. In this approach, when two correlative blocks are
tampered with, recovered image quality reduced.
In 2008, Lee and Lin [5] proposed a dual watermark scheme for tamper detection
and recovery. In this approach, the watermarked data for each non-overlapping block
were maintained in two copies. If one copy is destroyed, then the second chance
was provided for block recovery. For tamper detection, parity check was used and
a public chaotic algorithm was used and for recovery a secret key used. The use
of dual watermark ensured a higher image quality of the recovered image even a
watermarked image contains large tampered regions.
In 2010, Yang and Shen [6] proposed a watermarking scheme by creating an index
table of the original image by vector quantization (VQ) technique and embedded
into an original image for recovery. Wong’s watermarking scheme [7] was used for
tampered detection. In this scheme, higher probability of image recovery because of
the VQ index table can be embedded into the cover image several times.
In 2011, Li et al. [8] proposed a watermarking scheme based on a dual-redundant-
ring structure (DRRS). For each block, two copies of the watermark data were embed-
ded in two different blocks. Tamper was detected through its mapping block in the
block-neighborhood by comparing it with inconsistent blocks. In this process, it pro-
vides two copies of watermark; first chance was to recover the block and second
chance was to destroy the block.
In 2012, He et al. [9] proposed a watermarking scheme based on self-recovery
fragile using block-neighborhood tamper detection. This technique used to generate
the nonlinear block mapping with a pseudorandom sequence and find out the tamper
detection an optimized neighborhood characterization method.
In 2012, Qin et al. [10] proposed a fragile watermarking scheme with content
restoration capability by using non-sub sampled contourlet transform (NSCT) coef-
ficients. NSCT coefficients are utilized the concept of adaptive bit allocation mech-
anism to encoding in restoration bits efficiently. Here, all the blocks are divided
according to their degree of smoothness, such as smooth block and complex block,
where smooth blocks are allocating fewer bits than complex blocks.
In 2013, Adiwijaya et al. [11] proposed a watermarking scheme based on the
separation of region of interest (ROI) and region of non-interest (RONI). Watermark
data was embedded in ROI using a block-based watermarking method and Huffman
compression was used in RONI. It represents a reversible approach to ROI. The
approach was reversible in the ROI region. It was used to recover the attacks of
tamper detection.
In 2013, Tong et al. [12] proposed a watermarking approach for providing tam-
pering localization and self-recovery. By applying two authentication bits MSB and
LSB of combination its output is better results in the tampered detection and defense
482 M. Swain et al.
attacks. Their recovery tampering was improved by using the sister block embedding
and optimization method, which was used to find the average value of the valid blocks
from their eight neighboring blocks and recover those unreconstructed blocks.
In 2014, Shivananda and Chetan [13] proposed a scheme based on a fragile water-
marking for digital image tamper detection and recovery. In this method, images were
divided into uniform-sized blocks and least significant bit of each block was used to
create authentication data. Tamper detection was performed using a dual-option par-
ity check method. The scheme has achieved a higher quality of recovery in tampered
regions.
In 2015, Dhole and Patil [14] proposed a scheme based on fragile watermarking for
tamper detection and recovery using self-recovery blocks. In this method by using
block chaining first watermark image obtained and shuffled image of the original
image was merged with the original image to get the final shuffled image. Then it
has the final watermarked image after merging the first watermark image with the
final shuffled image.
In 2016, Sreenivas and Prasad [15] proposed a scheme based on a self-embedding
watermarking for tamper detection and self- recovery. This proposed method takes
the average intensity of blocks with variable length and comparing the block with
the watermarking bits.
In 2016, Molina-Garcia et al. [16] proposed a watermarking algorithm with
authentication and self-recovery of tampered images using DWT. In this proposed
method first, digital image was protected by halftoning, DWT and QIM methods
and then through IDWT, where it inverses the halftoning then applies the median
filtering authentication. In the end, the tamper detection of watermarked image and
self-recovery was performed.
In 2018, Shehab et al. [17] proposed a scheme based on fragile watermarking
for image authentication and self-recovery in medical applications. In this approach,
images are divided into block and singular value decomposition using block division.
Then singular matrices in each block were used as block authentication. The first five
MSB were used as the self-recovery information. It could survive against the vector
quantization attack using two bits called block authentication and self-recovery bits.
In 2019, Liu et al. [18] proposed a reversible watermarking scheme, where the
images are divided into ROI and RONI. The tamper detection information was gen-
erated using the hash function then the tamper localization can be done by CRC cal-
culation in each block of ROI. Further, the information regarding tampered recovery
was generated using integer wavelet transform. All required data for authentica-
tion, tamper detection, in the whole image the tamper localization and recovery are
embedded.
In 2019, Tamal et al. [19] proposed a watermarking method for tamper detection.
This method used IWT for watermarking. Here, the images were divided into ROI
and RONI. Logistic mapping, ROI hash values and the average value of the last bit in
a 5 × 5 mask of ROI were used to generate the watermark information. Here, tamper
detection was performed by two levels using hash value and last bits (Table 1).
A Competitive Analysis on Digital Image Tamper Detection … 483
Table 1 Comparative analysis of different watermarking schemes in spatial and frequency domains
Methods Basic domain Tamper detection Watermarked image Recovered image
and recovery PSNR PSNR
method
[4] Spatial Yes >40 dB >30 dB
[5] Spatial Yes 40.68 dB 20 dB
[6] Spatial Yes 30–40 dB >30 dB
[8] Spatial Yes 44.26 dB 44–50 dB
[9] Spatial Yes 30–44.15 dB 32.04–62 dB
[10] Spatial Yes 51 dB 41–48 dB
[11] Spatial Yes 47–48.6 Not specified
[12] Spatial Yes 30–40 20–30
[13] Spatial Yes 30–44.15 dB 32.04–62 dB
[14] Spatial Yes 34–38 dB 36–44 dB
[15] Spatial Yes 30–44.15 dB 32.04–62 dB
[16] Frequency Yes >32 dB >32 dB
[17] Spatial Yes Not specified 30.25–38.96 dB
[18] Spatial Yes Not specified 41.2995 dB
[19] Frequency No 51.90 dB Not specified
5 Conclusion
This paper represents various techniques of watermarking which were discussed for
tamper detection, localization and recovery. All techniques classified in two domains
i.e., spatial and frequency. Each technique outperforms others in terms of PSNR value
of watermarked and recovered image. The basic techniques for tampered detections
and recovery can also be analyzed in the context of processing time and security.
Also, new experimentation can be done for minimizing error detection and correction
in the tampered regions. The above discussions reconstruct the idea for developing
new algorithms of the discussed issues in digital images.
References
1. Bhargava, N., M.M. Sharma, A.S. Garhwal, and M. Mathuria. 2012. Digital image authen-
tication system based on digital watermarking. In IEEE International Conference on Radar,
Communication, and Computing (ICRCC). https://doi.org/10.1109/icrcc.2012.6450573.
2. Van Schyndel, R.G., A.Z. Tirkel, and C.F. Osborne. 1994. A digital watermark. In Proceedings
of the IEEE International Conference on Image Processing, vol. 2, 86-90.
484 M. Swain et al.
3. Tamal, T.A., C. Saha, M.D. Foisal Hossain, and S. Rahman. 2019. Integer wavelet trans-
form based medical image watermarking for tamper detection. In International Conference on
Electrical, Computer and Communication Engineering (ECCE), 7–9. https://doi.org/10.1109/
ecace.2019.8679152.
4. Lin, P.L., C.K. Hsieh, and P.W. Huang. 2005. A hierarchical digital watermarking method for
image tamper detection and recovery. Pattern Recognition 38: 2519–2529.
5. Lee, T.Y., and S.D. Lin. 2008. Dual watermarking for image tamper detection and recovery.
Pattern Recognition 41: 3497–3506.
6. Yang, C.W., and J.J. Shen. 2010. Recover the tampered image based on VQ indexing. Signal
Processing 90: 331–343.
7. Wong, P.W., and N. Memon. 2001. Secret and public key image watermarking schemes for
image authentication and ownership verification. IEEE Transactions on Image Processing 10:
1593–1601.
8. Li, Chunlei, Y. Wang, B. Ma, and Z. Zhang. 2011. A novel self-recovery fragile watermarking
scheme based on dual-redundant-ring structure. Computers and Electrical Engineering 37:
927–940.
9. He, H., F. Chen, H.M. Tai, T. Kalker, and J. Zhang. 2012. Performance analysis of a
block-neighborhood-based self-recovery fragile watermarking scheme. IEEE Transactions on
Information Forensics and Security 7: 185–196.
10. Qin, C., C.C. Chang, and P.Y. Chen. 2012. Self-embedding fragile watermarking with
restoration capability based on adaptive bit allocation mechanism. Signal Processing 92:
1137–1150.
11. Adiwijaya, P.N. Faoziyah, F.P. Permana, T.A.B. Wirayuda, and U.N. Wisesty. 2013. Tamper
detection and recovery of medical image watermarking using modified LSB and Huffman
compression. In Second International Conference on Informatics & Applications (ICIA), 129–
132.
12. Tong, X.J., Y. Liu, M. Zhang, and Y. Chen. 2013. A novel chaos-based fragile watermarking
for image tampering detection and self-recovery. Signal Processing: Image Communication
28: 301–308.
13. Shivananda, N., and K.R. Chetan. 2014. A new fragile watermarking approach for tamper
detection and recovery of document images. In: IEEE International Conference on Advances
in Computing, Communications and Informatics (ICACCI), 1494–1498.
14. Dhole, V.S., and N.N. Patil. 2016. Self embedding fragile watermarking for image tamper-
ing detection and image recovery using self recovery blocks. In International Conference on
Computing Communication Control and Automation, 752–757.
15. Sreenivas, K., and V.K. Prasad. 2016. Improved block encoding method for an image self-
recovery approach. In International Conference on Information Communication and Embedded
Systems (ICICES), 3–7.
16. Molina-Garcia, J., R.R. Reyes, V. Pronomaryov, and C.C. Ramos. 2016. Watermarking algo-
rithm for authentication and self-recovery of tampered images using DWT. In 9th International
Kharkiv Symposium on Physics and Engineering of Microwaves, Millimeter and Submillimeter
Waves (MSMW), 1–4.
17. Shehab, A., M. Elhoseny, K. Muhammad, and A.K. Sangaiah. 2018. Secure and robust fragile
watermarking scheme for medical images. IEEE Access 6: 10269–10278.
18. Liu, X., J. Lou, H. Fan, Y. Chen, P. Ouyang, Y. Wang, B. Zou, and L. Wang. 2019. A novel
robust reversible watermarking scheme for protecting authenticity and integrity of medical
images. IEEE Access 7: 76580–76598.
19. Tamal, T.A., C. Saha, and S. Rahman. 2019. Integer wavelet transform based medical image
watermarking for tamper detection. In International Conference on Electrical, Computer and
Communication Engineering (ECCE). https://doi.org/10.1109/ecace.2019.8679152.
Elliptic Curve Cryptography-Based
Signcryption Scheme with a Strong
Designated Verifier for the Internet
of Things
Biswojit Nayak
Abstract The Internet of Things (IoT) is an emerging technology that grows across
the World Wide Web. It has scenarios in which the real-world object is transferring
data over an insecure wireless network. The security in IoT became more challenging
because of the low computational and communication capacity of the object. The
proposed signcryption scheme is a combination of a digital signature and symmetric
key encryption in a single logical unit, which reduces the computational complexity
as compared to the traditional signature, then encryption process along with the digital
signature of the sender can only verify by the designated verifier. The computational
and communication overhead of the elliptic curve cryptography (ECC) scheme have
less because of short key length with the same security level as compared to other
public key cryptosystem. The security hardness of the scheme is based elliptic curve
discrete logarithm (ECDLP) and also provided various security requirements. The
proposed scheme has low computational and communication overhead with low-
power efficiency for IoT scenarios.
1 Introduction
Internet of Things (IoT) comprises of scenarios where all entities are connected
and configured with unique identifiers for transferring data via an unsafe wireless
network. Through their identifiers, the host machines identify senders and handle and
manipulate these items. The IoT system is not only accountable for sensing, but can
also take action and computations based activities. The security and efficiency [1, 12]
B. Nayak (B)
Utkal University, Bhubaneswar, Odisha, India
e-mail: biswojit22@gmail.com
are two significant difficulties in IoT environments. Typical low-power IoT devices
have low computing ability and restricted storage. As a consequence, it becomes an
significant problem for IoT to set up safe and effective communication connections.
The public key cryptosystem is commonly accepted in the IoT to guarantee secure
communications for sensor node.
Computation and communication overhead are two important criteria of computer
network communication. To reduce the computational and communication overhead,
Zheng [13] in 1999 proposed a concept of signcryption. In order to attain both
authentication features and confidentiality, signcryption provides both the notion of
digital signature and symmetric key encryption in a single logical step. Zheng also
proves that the computational and communication cost of signcryption has less than
the traditional signature and encryption process. There is a part of research which
has been done in the range of signcryption since 1997 [2–5, 11, 14].
The one-time symmetric key is used to encrypt messages in the signcryption
scheme. The sender and receiver’s private and public key, respectively, are used to
produce the one-time symmetric key. At the receiver end, the receiver and sender’s
public and private key are used to derive the same session key. In most of the real-
time application communication, a specific receiver can only verify the signature
that is called designated verifier. Jakobsson [6] was first proposed the concept of
designated verifier signature, where only designated verifier can verify the signature
of the signer. The third party cannot recognize in the Jakobsson scheme, whether the
signature was given by the signer or designated verifier.
In 2004, Saeednia et al. [10] proposed a scheme called strong designated verifiable
scheme to overcome the Jakobsson scheme. But later in 2008, Lee–Chang [7] identify
that this scheme not only verify by designated verifier’s private key but also by
the signer’s private key. So, they proposed a strong designated verifiable signature
scheme.
To overcome the computational and communication overhead in 2012, Mohanty
and Majhi [9] consider the mechanism of the Lee–Chang [7] scheme and proposed
a signcryption scheme based on the hardness of the discrete logarithm problem. But
the scheme has one disadvantage that it takes large size of one-time symmetric key.
So, the scheme has more computational and communication overhead. The proposed
scheme can provide a smaller key with the same level of security by using elliptic
curve cryptography [8].
This paper suggested a novel strongly designated verifier signcryption scheme,
where only the signature can be verified by the designated verifier. The suggested
scheme’s hardness is based on the elliptic curve discrete logarithm problem (ECDLP)
with smaller key length. The suggested scheme can therefore be used in IoT tech-
nologies.
Elliptic Curve Cryptography-Based Signcryption Scheme … 487
2 Preliminaries
The proposed scheme is based on the below computational hard problem [3, 4].
Elliptic Curve Discrete Logarithm Problem (ECDLP): If X and Y are two
points of an elliptic curve and X = k.Y , where k is a scalar value. Given X and Y ,
it is difficult to derive k.
A formal model of a signcryption scheme goes through the following three modules
[15].
Setup: In Setup phase, the signcryptor takes contribution as security parameter k
and produces the system’s public parameters. At that point, signcryptor randomly
picks his/her private key and generates his/her public key. The receiver also picks
his/her private key and generates his/her public key.
Signcrypt: In this phase, if the sender wants to send a message M to receiver, then
he/she has to run Signcrypt(M , SKA , PKB ) to generates the ciphertext δ, where
SKA and PKB is private and public key of the sender and receiver, respectively.
Unsigncrypt: After receiving ciphertext δ, receiver computes Unsigncrypt(δ, PKA ,
SKB ) and obtains the plain text M , if δ is an valid ciphertext, where PKA and SKB
is public and private key of the sender and receiver, respectively.
4 Proposed Scheme
This scheme arrangement involves two social affairs explicitly, Signcryptor and
Recipient. There are three phases in the arrangement: Setup, Signcrypt and Unsign-
crypt. In the fundamental stage, signcryptor makes and circulates all open parameters
of an elliptic curve and each user creates his own specific private key and the related
public key.
Setup: The signcryptor selects an elliptic curve that defines set of points which
satisfy the equation y2 = x3 + ax + b mod q. The parameters of elliptic curve are
as follows:
q: a large prime number
(a, b): coefficients of elliptic curves, whose satisfy the equation 4a3 + 27b2 =
0 mod q
G: generator point of elliptic curve
0: point of elliptic curve at infinity
n: order of generator point
H : {0, 1}∗ → Zq (Hash function)
open parameters (q, a, b, G, n, H ).
488 B. Nayak
Signcryptor picks his private key SKS ∈ [1, 2, . . . , (n − 1)] and registers his public
key as
Recipient likewise picks his private key SKR ∈ [1, 2, . . . , (n − 1)] and registers
his public key as
h = H (M K2 )
5.1 Correctness
The δ = (c, h, s) signcrypted text is a valid one; its correctness is shown below.
Proof From the above game, the suggested system is demonstrated to be more secure
against the adaptive chosen ciphertext attack.
Here, the following assumptions are taken:
The adversary selects the private key of a signcryptor, and two messages of the
same length M0 and M1 are selected by an adversary A and sent to challenge.
Then, the challenger randomly selects a bit b and calculates the δ ciphertext with
the specified recipient public key PKR using the following formula.
δ = (c, h, s)
After the challenger receives the δ ciphertext, the adversary guesses a bit b and
produces a new ciphertext by randomly selecting a message M . Then, he selects
a private key SKS → Zq ∗ randomly and computes the following necessary param-
eters for δ.
PKS = SKS .G
K = x .PKR = (K1 , K2 )
c = EK1 (M )
h = H (M K2 )
s = (x − h .SKS ) mod n
Proof In the aforementioned scheme, the attacker cannot generate a valid (c, h, s)
ciphertext without the private key of the sender because the computational complexity
of the private key of the sender falls under the hardness of the discrete logarithm
elliptic curve issue.
The attacker is unable to produce if the value of K is leaked or compromised as it
needs two parameters of x and SKS . An attacker needs to solve ECDLP before deriving
x from K = x.PKR . Again, the solution for ECDLP is computationally infeasible.
Hence, the proposed scheme is unforgeable.
Definition 3 No other person will verify the signcrypted message in the suggested
signcryption scheme except for the designated receiver [9].
Elliptic Curve Cryptography-Based Signcryption Scheme … 491
Proof The sender generates a one-time symmetric key using the public key obtained
from the designated receiver. At the receiver end, the receiver generates the one-
time symmetric key by its own private key. This work cannot be done by another
receiver except the designated receiver because this step is computationally hard
under ECDLP.
6 Conclusion
Our proposed scheme based on elliptic curve discrete logarithmic problem simulta-
neously provides unforgeability and secure against adaptive chosen ciphertext attack
with shorter key length. The proposed system achieves the security properties with
computational cost savings compared to the traditional signature and then the encryp-
tion scheme, thus making the new scheme more suitable for IoT environments with
restricted energy. At long last, the proposed scheme is based on elliptic curve cryp-
tography, and it has great advantages to apply in the IoT environment more efficiently
because of the low computational and communication cost.
References
1. Alsaadi, Ebraheim, and Abdallah Tubaishat. 2015. Internet of things: features, challenges,
and vulnerabilities. International Journal of Advanced Computer Science and Information
Technology 4 (1): 1–13.
2. Baek, Joonsang, Ron Steinfeld, and Yuliang Zheng. 2007. Formal proofs for the security of
signcryption. Journal of Cryptology 20 (2): 203–235.
3. Elkamchouchi, Hassan M., Eman F. Abu Elkhair, and Yasmine Abouelseoud. 2013. An efficient
proxy signcryption scheme based on the discrete logarithm problem. International Journal of
Information Technology.
4. Hwang, Ren-Junn, Chih-Hua Lai, and Feng-Fu Su. 2005. An efficient signcryption scheme
with forward secrecy based on elliptic curve. Applied Mathematics and Computation 167 (2):
870–881.
5. Hyun, Suhng-Ill, Eun-Jun Yoon, and Kee-Young Yoo. 2008. Forgery attacks on Lee–Chang’s
strong designated verifier signature scheme. In Second International Conference on Future
Generation Communication and Networking Symposia, 2008. FGCNS’08, vol. 2. IEEE.
6. Jakobsson, Markus, Kazue Sako, and Russell Impagliazzo. 1996. Designated verifier proofs and
their applications. In Advances in Cryptology—EUROCRYPT96. Berlin, Heidelberg: Springer.
7. Lee, Ji-Seon, and Jik Hyun Chang. 2009. Comment on Saeednia et al.’s strong designated
verifier signature scheme. Computer Standards & Interfaces 31 (1): 258–260.
8. Lopez, Julio, and Ricardo Dahab. 2000. An Overview of Elliptic Curve Cryptography.
9. Mohanty, Sujata, and Banshidhar Majhi. 2012. A strong designated verifiable DL based sign-
cryption scheme. JIPS 8 (4): 567–574.
10. Saeednia, Shahrokh, Steve Kremer, and Olivier Markowitch. 2004. An efficient strong desig-
nated verifier signature scheme. In Information Security and Cryptology—ICISC, 2003, 40–54.
Berlin, Heidelberg: Springer.
11. Steinfeld, Ron, and Yuliang Zheng. 2000. A signcryption scheme based on integer factorization.
Information Security, 308–322. Berlin, Heidelberg: Springer.
492 B. Nayak
12. Ting, Pei-Yih, Jia-Lun Tsai, and Tzong-Sun Wu. 2017. Signcryption method suitable for low-
power IoT devices in a wireless sensor network. IEEE Systems Journal 12 (3): 2385–2394.
13. Zheng, Yuliang. 1997. Digital signcryption or how to achieve cost (signature & encryption)
cost (signature) + cost (encryption). In Advances in Cryptology—CRYPTO’97, 165–179.
Berlin, Heidelberg: Springer.
14. Zheng, Yuliang, and Hideki Imai. 1998. How to construct efficient signcryption schemes on
elliptic curves. Information Processing Letters 68 (5): 227–233.
15. Yu, Yong, et al. 2009. Identity based signcryption scheme without random oracles. Computer
Standards & Interfaces 31 (1): 56–62.
Performance of C++ Language Code
in Quantum Computing Environment
Abstract This paper considers the execution of different C++ language program
codes based on simple program, functions, class inheritance, recursion and file han-
dling under Quantum Computing Environment using the Quantum++ simulator and
the execution of similar C++ language program codes on the traditional computer.
The comparison of the results has been presented, analyzed and concluded with. The
need, advantages and disadvantages of a Quantum Computing Environment have
also been discussed briefly in this paper.
1 Introduction
The world became familiar with Quantum Computing with the creation of the first
Quantum Turing Machine (QTM) in 1985. With this, Deutsch also introduced a new
variety of Quantum Computing Environment which supported a feature known as
Quantum Parallelism, which was absent in Classical Computing Environment. If
replicated in a Classical Computing Environment, it would result into a significant
slowdown in the performance. Climbing the steps in 1996, researchers proved that
a Universal Quantum Simulator could be developed [1]. The Quantum Computing
With a person being able to cross seven seas in a finite amount of time through
accelerated speed of transport mechanism in today’s world, the world needs com-
putations to be accelerated too. A computer has a major amount of contribution in
every field of study and also, the daily life of individuals. Nowadays, people need
their computers to be as much fast as they can be in terms of CPU Processing time.
In an era of transforming and evolving technology, it is a necessity to introduce a
faster environment for a better human–computer interaction. This need can be ful-
filled by the use of Quantum Computing Environments. All the tricky algorithms
Performance of C++ Language Code in Quantum Computing … 495
that require higher execution time can be executed more than 100 times faster using
a Quantum Computing Environment. The comparison of execution times between
a Traditional and a Quantum Computing Environment is shown in later sections.
While searching a database, a Quantum Computing Environment provides up to four
times the speed in the number of queries needed. Using the principle of quantum
superposition, a database with k number of items can be searched using k½ queries.
This square root speed can have a significant impact on the time required for a search
[10]. The Quantum Computing Environment offers a healthy environment of an effi-
cient, time-saving and a faster computing capability which shall basically accelerate
the computing speed. When it comes to transforming an idea into reality, it is very
important to consider the practical implications. The advantages and disadvantages
of Quantum Computing have been listed in the next section.
4.1 Advantages
The main advantage of quantum computing is that it can execute any task very fast
when compared to the Classical Computing Environment. A Quantum Computing
Environment is able to perform a class of computational problems, like prime factor-
ization. Diving further into the topic, we come to know that the atoms exchange their
positions and transfer signals faster in the case of a Quantum Computing Environment
than in case of a Classical Computing Environment. In case of any classical algo-
rithm (non-randomized or randomized), that is capable of solving the problem with
bounded error, needs O(n3 ) queries to the multiplication constants of the algebra. In
a Quantum Computing Environment, it requires O (n3/2 ) queries to the multiplication
constants of the algebra [11]. Quantum Computing Environments have the poten-
tial to satisfy the need for extra computing power needed by the next generation to
transform modern society. With this era, lives could be saved by more methodical
and effective drug designs, discoveries in material science could be revolutionized
by simulations of quantum processes, and Internet encryption could be replaced by
much more reliable methods [12].
4.2 Disadvantages
6 Result Analysis
During the experiment, it has been observed that the execution times of different types
of programs in two different environments, namely Quantum Computing Environ-
ment and Classical (or Sequential) Computing Environment has been recorded in
Table 1. The graphical representation of the execution time of different types of pro-
grams (listed in Table 1 with serial no. 1, 2, 3, 4, 5, 6 and 7) in a Quantum Computing
Environment (abbreviated as QCE in the graph) and the execution time of different
types of programs (listed in Table 1 with serial no. 1, 2, 3, 4, 5, 6 and 7) in a Classical
Computing Environment are shown in Fig. 1.
Figure 1 indicates that for every test case that was performed in both the computing
environments, Quantum Computing Environment has executed the code faster. For
example, in the case of a simple program (listed at Serial No. 1 in Table 1), the
execution time taken by a Quantum Computing Environment is 0.0001 s whereas the
execution time taken by a Classical Computing Environment is 0.126 s. It is evident
from this test case that the QCE executes the code faster.
The ratio of the performance of Quantum Computing Environment over Classical
Computing Environment is calculated as per Equation no. (1).
Table 1 Execution times (in seconds) obtained after executing C++ program codes of different
types of programs in a Quantum Computing Environment (using quantum++) and Classical (or
Sequential) computer, respectively
S. No. Program type (in C++) Execution time in Execution time in
Quantum Computing Classical Computing
Environment (in seconds) Environment (in seconds)
1 Simple program (Hello 0.0001 0.126
World)
2 Program containing a 0.002 0.125
Predefined function
(factorial using random
function)
3 Class Object (operations 0.008 0.151
on a data member
associated with the Class
Object)
4 Inheritance of classes 0.007 0.133
(operations on data
members of inherited
classes)
5 Subroutine execution 0.011 0.143
(bubble sorting of an array
using a static array)
6 File handling (784 0.013 0.15
characters)
7 Factorial program using 0.0009 0.144
recursion
0.18
0.151 0.15
0.143
0.16 0.144
0.133
0.14 0.126 0.125
Time (in seconds)
0.12
0.1
Execution Time in Classical Computing Environment
0.08
Execution Time in Quantum Computing Environment
0.06
0.04
0.011 0.013
0.02 0.008 0.007
0.0001 0.002 0.0009
0
Simple Pre Class Inheritance Subroutine File Recursion
Program Defined Object of Classes execution Handling
function
Type of Program
Fig. 1 Graphical representation of the execution times (in seconds) obtained after executing
C++ program codes of different types of programs in a Quantum Computing Environment (using
Quantum++) and Classical (or Sequential) computer, respectively
7 Conclusion
Table 2 Performance ratio of the Quantum Computing Environment over the Classical Computing
Environment for different types of C++ program language codes
S. No. Program type Performance ratio of Quantum
Computing Environment over Classical
Computing Environment
1 Simple program (Hello World) 1260
2 Program containing a Predefined function 62.5
(factorial using random function)
3 Class Object (operations on a data 18.875
member associated with the Class Object)
4 Inheritance of classes (operations on data 19
members of inherited classes)
5 Subroutine execution (bubble sorting of 13
an array using a static array)
6 File handling (784 characters) 11.54
7 Factorial program using recursion 160
Average ratio 220.702
This table also consists the average of all the performance ratios obtained after executing different
C++ program codes
1400
1260
1200
Performance Ratio
1000
800
600
400
200 160
62.5 18.875 19 13 11.54
0
Type of Program
Fig. 2 Graphical representation of the performance ratio between the Classical Computing Envi-
ronment and the Quantum Computing Environment for different types of C++ program language
codes
500 R. R. Narsapur et al.
References
1. Rylander, Bart, Terence Soule, James Foster, and Jim Alves-Foss. 2000. Quantum genetic
algorithms, 373.
2. DiVincenzo, D.P. 1995. Principles of quantum computing. In Proceedings ISSCC ‘95—Inter-
national Solid-State Circuits Conference, San Francisco, CA, USA, 312–313. https://doi.org/
10.1109/isscc.1995.535569.
3. Narayanan, A. 1999. Quantum computing for beginners. In Proceedings of the 1999 Congress
on Evolutionary Computation-CEC99 (Cat. No. 99TH8406), vol. 3. Washington, DC, USA,
1999, 2231–2238. https://doi.org/10.1109/cec.1999.785552.
4. Jain, S. 2015. Quantum computer architectures: a survey. In 2015 2nd International Conference
on Computing for Sustainable Global Development (INDIACom), New Delhi, 2165–2169.
5. Kasivajhula, Siddhartha. 2006. Quantum computing: a survey, 249–253. https://doi.org/10.
1145/1185448.1185504.
6. Nielsen, Michael A., and Isaac L. Chuang. 2010. Quantum computation and quantum
information, 13. Cambridge University Press. ISBN 978-1-107-00217-3.
7. Ying, Mingsheng. 2010. Quantum computation, quantum theory and AI. Artificial Intelligence
174: 162–176. https://doi.org/10.1016/j.artint.2009.11.009.
8. Altenkirch, T., and Grattage J. 2005. A functional quantum programming language. In 20th
Annual IEEE Symposium on Logic in Computer Science (LICS’ 05), Chicago, IL, USA, 249–
258.
9. Ankenbrandt, C.A. 1990. An extension to the theory of convergence and a proof of the time
complexity of genetic algorithms. In FOGA.
10. Humble, T. 2018. Consumer applications of quantum computing: a promising approach for
secure computation, trusted data storage, and efficient applications. IEEE Consumer Electronics
Magazine 7 (6): 8–14.
11. Combarro, E.F., J. Ranilla, and I.F. Rúa. 2019. A quantum algorithm for the commutativity of
finite dimensional algebras. IEEE Access 7: 45554–45562.
12. McGeoch, C.C., R. Harris, S.P. Reinhardt, and P.I. Bunyk. 2019. Practical annealing-based
quantum computing. Computer 52 (6): 38–46.
13. Gheorghiu, V. 2018. Quantum++: a modern C++ quantum computing library. PLoS One 13
(12): e0208073.
Retrieval of Ontological Knowledge
from Unstructured Text
Abstract In this article, we examined the issue of automatic ontology formation pro-
cess from unstructured text data. To understand the ontology of the domain, ontology
should be expressed in terms of information tables and ontology graphs. Ontology
graph consists of taxonomic and non-taxonomic relations. Non-taxonomic relations
are easier to understand to non-expert users. Extracting non-taxonomic relations from
ontology is a challenge. In order to improve ontology of the domain, appropriate
machine learning classifier needs to be investigated for feature classification.
1 Introduction
Use of Internet is rapidly increasing day by day. A survey suggests that nearly 50% of
world population is using Internet. Volume of data generation is increasing tremen-
dously. All human beings have 340 times digital data in his/her account. This digital
data cannot be processed by orthodox digital data processing instruments within real
time. The inconvenience integrates the areas of trapping, repository, find, sharing,
interchange, investigation, and representation of this digital data. Data generated can
be heavy text files or multimedia. Large amount of data generated at data warehouses,
which can be in structured, semi-structured, and unstructured format. Structured data
is in organized format. It can be represented into relational databases. Meaningful
information can be extracted from relational database by performing upfront search
queries and search algorithms. Semi-structured data cannot be represented into rela-
tional databases. Semi-structured data has some organizational attributes which make
it somewhat easier to analyze. Unstructured data is complex and textual. It does not
have any organization properties. Unstructured data cannot be represented into rela-
tional databases. Unstructured data can be human generated or machine generated.
Study states that the volume of unstructured data is 90% and structured data is 10%.
Unstructured data can be in the form of text or multimedia (image/audio/video).
Analysis, organization, integration, and search for unstructured data are very impor-
tant to infer knowledge which will help practically in various domains. Unstructured
data is present in various domains such as healthcare applications, social media text
analysis, industry applications, automatic personality detection, education, trans-
portation, and many more. To infer knowledge from unstructured data, one needs to
extract ontology of the domain. Ontology is a data model that can be defined as the
formal representation of knowledge of domain which is expressed in terms of con-
cepts and attributes by considering semantic relation between classes. Attributes and
concepts have association between them. This association can be expressed in terms
of taxonomic and non-taxonomic relations. Taxonomy relation is type_of or is_a
(instance_of) and part_of relation between ontology classes whereas non-taxonomic
relation is any relation between concepts except taxonomy relation. Ontology of the
domain needs to be expressed in terms of information tables and ontology graph
which make it easier to understand to non-expert users. Identifying non-taxonomic
relation between concepts and attributes is challenge. Manuscript is divided into
Sect. 2 Ontology Formation Process, Sect. 3 Mathematical modeling of ontology
formation process, Sect. 4 Literature Review, and Sect. 5 Conclusion.
KS = (X, T, Ya , f inf )
where a ∈ T
X = Nonempty finite set of objects. Ex. Sentences from Unstructured Text.
T = Nonempty finite set of attributes/characteristics.
{Y a } a ∈ T = Family
of sets of attribute/characteristic values.
f inf = T × X → a∈T Ya → Information function such that f inf (a, x) ∈ Y a for each
a ∈ T and u ∈ X.
Let On be a given ontology. Formally, an ontological graph is a fourfold.
On G = (N , L , Z , ρ),
where:
• N is the nonempty, finite set of concepts/nodes representing concepts in the
ontology On
• L ∈ N × N is the finite set of edges symbolizing semantic relations between
concepts from N
• Z is series of type of relation between edges and concepts
• ρ: L → Z is the function attributing semantic relation to every edge.
Let On G = (N, L, Z, ρ) is an ontological graph.
A regional ontological part of graph of On G is a graph in a form
LOG = (N L , L L , Z L , ρ L ),
504 D. Pawar and S. Mali
where:
• X is the nonblank, finite set of entities,
• T is the nonblank, finite set of characteristics,
• {OGa }a ∈T is thefamily of ontological graphs associated with characteristics,
• f inf : T × X → a∈T Ya → Knowledge function such that f inf (a,u) ∈ V a for each
a ∈ T and u ∈ X, N a is the set of concepts from the graph OGa .
4 Literature Review
Eduardo Castro et al. [1], proposed Associative Classifier for Entity Resolution
(AssocER), which is used for the classification of unstructured short texts. Results
are not optimized. Not able to identify novel classes. It is not able to perform self-
training during prediction phase. Pratibha [2] emphasized on characteristic-based
learning which is used to find out features and also to estimate attributes of unstruc-
tured text. It is a content-based classification technique. This method is not finding
non-taxonomic relations. Bartoli [3] proposes entity extractor technique with less
human intervention. Improvements are needed in this technique in order to increase
accuracy. Hassan [4] proposes joint convolutional neural network (CNN) and recur-
rent neural network (RNN) approaches for sentence categorization. This research has
tremendous part in decreasing the number of parameters and designing convolutional
layer after recurrent layer as replacement for pooling layer. This technique can be
used in information retrieval. Tekli [5] proposes a review paper which supplements
summarized and exhaustive review of the methods related to semi-structured data
processing. Leng et al. [6] proposed the extraction of relationships from high level
of noisy and unrelated information unstructured text. It is based on semi-supervised
learning approach. This method requires annotation from user. Ritter et al. [7] per-
formed survey on techniques to fetch semantic information from unstructured data.
Harleen et al. [8] analyzed unstructured data on Amazon Elastic Cloud and analyzed
unstructured data for real-time data analytics. Author is not focusing on investigating
the ontology behind domain. Gabriele et al. [9] presented review paper that shows
Retrieval of Ontological Knowledge from Unstructured Text 505
the types of unstructured data available to researchers supporting basic data min-
ing techniques to investigate them. This is a survey paper. Chiange [10] proposes a
graph model and an agglomerative algorithm for text document clustering. Proposed
algorithm is heavily better than orthodox clustering algorithms, such as k-means
clustering, division partitioning, auto class, and hierarchical clustering algorithm.
Between documents, there can be overlap which can be resolved by this method
which supplies more summarized and detailed clustering results which allow con-
cept overlap. Bafna [11] proposed a feature learning framework which automatically
detects features or abstractions. This research investigated the importance of unsu-
pervised learning and non-Euclidian distance techniques. This research also explored
dimensionality reduction. Main drawback of this research is that it is only identi-
fying taxonomic relations. Non-taxonomic relations need to be explored for correct
ontology framework formulations. Reyes Ortiz [12] performed survey based on nat-
ural language processing (NLP) approached for unstructured data. Fang et al. [13]
presented the Unified Automata Processor (UAP), a novel construction that supplies
general and efficient support for Finite Automata (FA) required for NLP. Proposed
architecture is implemented on CPU and GPU. The main drawback of this system is
that it requires external hardware to run proposed algorithm. Islam et al. [14] applied
standard encryption techniques on unstructured data. Researchers show that unstruc-
tured text categorization with respect to delicacy levels improves the attainment of
the system.
Shen et al. [15] proposed a general architecture for connecting named attributes in
Internet-free text with a hybrid knowledge network; a probabilistic linking model,
which consolidates an attribute popularity model with an attribute object model;
and also a knowledge population algorithm to rigorously augment the network. This
paper does not address about non-taxonomic relation mapping. Sriraghav et al. [16]
proposed attribute-based opinion mining algorithm which focuses on user centric
attributes. Algorithm forces user to take well-informed decisions focusing on the
attributes user is relying most. Author proposes a tool that can be applied in any
domain which can analyze unstructured data by accepting domain and user-specific
attributes. Human intervention is needed to input domain-specific attributes. Taras-
coni [17] proposes emergency management tool using tweeter tweets. Author has
performed sentiment analysis on twitter dataset. However, this research is not able to
accurately find the notion of in-formativeness within hazard-related streams. Ahmad
et al. [18] proposed framework for text information extraction in terms of context vec-
tors. This architecture is dependent on mappers and reducers developed on Apache
Hadoop. Large dimensionality is handled by clustering. Extracted context vectors
have large dimensions. Domain-specific attributes entered manually. Fikry et al. [19]
506 D. Pawar and S. Mali
Rajpathak [24] proposes D-Matrix generation technique from unstructured data for
fault diagnosis in automobile domain. While constructing D-matrix from unstruc-
tured data, non-taxonomic relations are not considered. Also, it requires annotation
for entity extraction. Krzysztof et al. [25] proposed the generation of information
tables from unstructured data from real estate domain. It is not generalized solution,
and it requires human intervention. Sadoddin et al. [26] investigated the issue of
extracting not known relation between ‘concepts.’ Author has examined the attain-
ment of various correlation methods in finding both ‘direct’ and ‘indirect’ relations
between concepts. Concept list treated as dictionary given as input to algorithm.
Every entity from dictionary is considered as independent concept. The significance
of this method is automatic investigation of concepts.
Retrieval of Ontological Knowledge from Unstructured Text 507
Gianis et al. [27] proposed framework for graphically representing knowledge from
structured and unstructured data. It can be understood by non-expert users. It forms
query based on graphs. Logical operators like ‘AND,’ ‘OR,’ and ‘NOT’ are not
considered in framework. Categorization of concepts requires human intervention.
It is time-consuming for complicated query. Mallek et al. [28] graphically represented
unstructured data using hidden statistics. Graphical representation is dynamic means
changing according to live data. Ontology creation is manual which requires human
intervention. Work is limited to single domain; it is not generalized solution.
Alexandru et al. [29] presented indexing solution, named FusionDex, which gives
beneficial model for questioning over distributed file systems. It outperforms Hadoop,
Grep, and Cloudera. But graph-based user-oriented query retrieval is not addressed
in this paper. Zhu et al. [30] proposed mixed architecture which is the mixture of
structured and unstructured text data. The combined index is also called joint index.
It is semantic index which represents semantic association between attributes and
their various resources. The main concern here is how to calculate combined joint
index? It is totally complex task and difficult to implement in real time as query
processing will be too high.
5 Conclusion
In this paper, we have reviewed work done by previous researchers in the ontology
formation process. We have illustrated ontology formation process with appropriate
example. To accurately represent ontology of the domain in terms of information
tables and ontology graph, focus needs to be given on non-taxonomic relations.
More work needs to be done on attribute association modeling using non-taxonomic
relations. For feature selection, appropriate machine learning algorithm needs to be
explored. Best effort query answering tool need to be designed to apply business
intelligence on unstructured text data.
Retrieval of Ontological Knowledge from Unstructured Text 509
References
1. Castro, P.S., et al. 2017. Classifying short unstructured data using Apache Spark Platforms
Eduardo. In ACM/IEEE Joint Conference.
2. Pratibha, P. 2014. Attribute based classification and annotation of unstructured data in social
networks. In IEEE International Conference on Advanced Computing.
3. Bartoli, Alberto. 2017. Active learning of regular expressions for entity extraction. In IEEE
transaction on Cybernetics.
4. Hassan, Abdalraouf. 2017. Convolutional recurrent deep learning model for sentence classifi-
cation. Journal of IEEE Access.
5. Tekli, Joe. 2015. An overview on XML semantic disambiguation from unstructured text to
semi-structured data: background, applications, and ongoing challenges. IIEEE Transaction
on Knowledge and Data Engineering.
6. Leng, Jiewu, et al. 2016 Mining and matching relationships from interaction contexts in a social
manufacturing paradigm. IEEE Transactions on Systems, Man, and Cybernetics.
7. Ritter et al. 2017. Toward application integration with multimedia data-Daniel. In IEEE
International Enterprise Distributed Object Computing Conference.
8. Harleen. 2016. Analysis of hadoop performance and unstructured data using zeppelin. In IEEE
International Conference on Research Advances in Integrated Navigation Systems.
9. Gabriele, et al. 2016. Mining unstructured data in software repositories: current and future
trends. In IEEE International Conference on Software Analysis, Evolution and Reengineering.
10. Chiange, I-Jen. 2015. Agglomerative algorithm to discover semantics from unstructured big
data. In IEEE International Conference on Big Data.
11. Bafna, Abhishek. 2015. Automated feature learning: mining unstructured data for useful
abstractions. In IEEE International Conference on Data Mining.
12. Reyes Ortiz, Jose A., et al. 2015. Clinical decision support systems: a survey of NLP-based
approaches from unstructured data. In IEEE International workshop on Database and Expert
Systems Applications.
13. Fang, Yanwei, et al. 2015. Fast support for unstructured data processing: the unified automata
processor. In ACM Proceedings of International Symposium on Microarchitecture.
14. Islam, Md. Rafiqul. 2014. An approach to provide security to unstructured big data. In IEEE
International Conference on Software Knowledge, Information Management and Applications.
15. Shen, Wei, et al. 2018. SHINE+: a general framework for domain-specific entity linking with
heterogeneous information networks. IEEE Transactions on Knowledge and Data Engineering.
16. Sriraghav, K. et al. 2017. ScrAnViz-A tool to scrap, analyze and visualize unstructured
data using attribute based opinion mining algorithm. In IEEE International Conference on
Innovations in Power and Advanced Computing Technologies.
17. Tarasconi, Frensacsco. 2017. The role of unstructured data in real time disaster related social
media monitoring. In IEEE Inernational Conference on Big Data.
18. Ahmad, Tanvir, et al. 2016. Framework to extract context vectors from unstructured data using
big data analytics. In IEEE Conference on Contemporary Computing.
19. Mohammad Fikry Abdullah et al. 2015. Business intelligence model for unstructured data
management. In IEEE Conference on Electrical Engineering and Informatics.
20. Istephan, Sarmad, et al. 2015. Extensible query framework for unstructured medical data—a
big data approach. In IEEE International Conference on Data Mining Workshops.
21. Lee, Saun, et al. 2014. A multi-dimensional analysis and data cube for unstructured text and
social media. In IEEE International Conference on Big Data and Cloud Computing.
22. Saini, Akriti, et al. 2014. EmoXract: domain independent emotion mining model for
unstructured data. In IEEE Conference on Contemprory Computing.
23. Ali, Mohamaed, et al. 2017. The problem learning non taxonomic relationships of ontol-
ogy from unstructured data sources. In IEEE International Conference on Automation &
Computing.
24. Rajpathak, Dnyanesh. 2014. An ontology-based text mining method to develop D-matrix from
unstructured text. In IEEE Transactions on Systems, Man, and Cybernetics: Systems.
510 D. Pawar and S. Mali
25. Krzysztof, et al. 2017. From unstructured data included in real estate listings to information
systems over ontological graphs. In IEEE Conference on Information and Digital Technologies.
26. Sadoddin, Reza, et al. 2016. Mining and visualizing associations of concepts on a large-scale
unstructured data. In IEEE Conference on Big Data Computing Service & Application.
27. Gianis, et al. 2017. Graph Based Information Exploration Over Structured and Unstructured
Data. In IEEE Conference on Big Data.
28. Mallek, Maha, et al. 2017. Graphical representation of statistics hidden in unstructured data: a
software application. In IEEE International Conference on Systems, Man & Cybernetics.
29. Alexandru, et al. 2017. Toward scalable indexing and search on distributed and unstructured
data. In IEEE International Congress on Big Data.
30. Zhu, Chunying, et al. 2015. A combined index for mixed structured and unstructured data. In
IEEE International Conference Web Information Systems.
31. Sheokand, Vishal, et al. 2016. Best effort query answering in data spaces on unstructured data.
In IEEE International Conference on Computing, Communications and Automation.
Critical Non-functional Requirements
Modeling Profiler with Meta-Model
Semantics
Mahesh R. Dube
1 Introduction
The system is defined using well-formed goals and objectives requiring management
support to deal with the strategic and tactical planning. System has operational as
well as functional performance which can be assured by performing variety of tests
with the aid from analysis design specification. Quality assurance indicates that the
system processes are verified and the system outcomes are validated. The capabilities
are integrated using hierarchical structure and architectural baselines. The system
classification can be seen in many multiple contexts and domains. The external
M. R. Dube (B)
Department of Computer Engineering, Vishwakarma Institute of Technology,
Pune, Maharashtra, India
e-mail: mahesh.dube@vit.edu
The approach for indicating the requirement evolution management is through Ver-
sion Control visualized in terms of a Version-Graph. The input for this graph will be
through the configuration information of requirement. The main argument for visu-
alization is that it can represent information in ways that are easier to understand,
therefore, enhancing communication. When applied to requirements engineering, it
is expected that the requirements specification can be transformed into a form, which
the customer and the developer can both comprehend more easily, thus bridging the
communications gap between the two [6].
Object Management Group (OMG) came up with Unified Modeling Language
(UML) facilitating the system modelers to model the system based on use case-based
approach that can be initiated in project analysis phase to deployment structure
that can be observed in the product release phase. Over a period of two decades,
UML became standard in modeling arena incorporating various methods as well
methodologies for software development. The concepts and diagrams represented in
UML are based on object-oriented paradigm that considers abstraction, hierarchy,
modularity, and encapsulation to be the prime properties [7–9].
Since UML is flexible and customizable, the multiple models can be created
indicating organizational business processes, application design, database architec-
ture design, and workflow structures. UML foundation was laid by adopting three
methodologies namely object-oriented analysis and design (OOAD, Booch Method),
object-oriented software engineering (OOSE, Jacobson) and object-modeling tech-
nique (OMT, Rumbaugh). A model can represent minimal as well as maximal repre-
sentations of objects. This means that the object design can be extended as a shallow
or deep structures indicating levels of abstractions within the scope and limits of the
objects. Each object carries features, properties, aspects or characteristics that can be
static or dynamic. The models are thought as a repository of knowledge about object
characteristics and the cycles the object is expected to undergo during its construction
a, and deployment [10–12].
Features of Models: A model needs to possess the following three features:
• Mapping feature: A model is based on an abstraction that is observable.
• Reduction feature: A model cause reflection of a selection of an abstraction
properties leading to a set of homogenous abstractions in a view.
• Pragmatic feature: A model replaces an abstraction with respect to defined
intention, theme, scope and some purpose.
514 M. R. Dube
universe of class names ids defined by UECLASS with universe of object identifiers
UEOBJECT-ID with UECAR as a set of values used for associating the values
assignment to each type. A series of semantic properties are described below, each
describing a semantic domain in an environment. UETYPE indicates the universe
of all type names. Members of all type of names are gathered in the universe
UEUVAL of values. A T ∈ UETYPE denotes type name.
4.1 Availability
It is the degree to which the system can maintain its resource log and make them avail-
able to running program segments. The notation catalog for availability is indicated
in Table 1.
Critical Non-functional Requirements Modeling Profiler … 517
4.2 Efficiency
The efficiency requirement describes “the degree to which software makes optimal
use of system resources as indicated by the attributes: time behavior and resource
behavior”. Table 2 indicates the notation profile for efficiency.
518 M. R. Dube
4.3 Flexibility
4.4 Performance
5 Conclusion
References
1. Jarke, M., and K. Pohl. 1994. Requirements engineering in 2001: (virtually) managing a
changing reality. Software Engineering Journal 257–266.
Critical Non-functional Requirements Modeling Profiler … 521
2. Easterbrook, S., R. Lutz, R. Covington, J. Kelly, Y. Ampo, and D. Hamilton. 1998. Experiences
using lightweight formal methods for requirements modeling. IEEE Transactions on Software
Engineering 24 (1): 04–14.
3. Kotonya, G., and I. Sommerville. 1996. Requirements engineering with viewpoints. Software
Engineering Journal 05–18.
4. Lamsweerde, A., D. Robert, and L. Emmanuel. 1998. Managing conflicts in goal-driven
requirements engineering. IEEE Transactions on Software Engineering 24 (11): 908–925.
5. Lamsweerde, A., and E. Letier. 2000. Handling obstacles in goal-oriented requirements
engineering. IEEE Transactions on Software Engineering 26 (10): 978–1005.
6. Mylopoulos, J., L. Chung, and B. Nixon. 1992. Representing and using nonfunctional require-
ments: a process-oriented approach. IEEE Transactions on Software Engineering 18 (6):
483–497.
7. Cysneiros, L., and J. Leite. 2004. Nonfunctional requirements: from elicitation to conceptual
models. IEEE Transactions on Software Engineering 30 (5): 328–350.
8. Gregoriades, A., and A. Sutcliffe. 2005. Scenario-based assessment of nonfunctional require-
ments. IEEE Transactions on Software Engineering 31 (5): 392–409.
9. Balasubramaniam, R., and M. Jarke. 2001. Toward reference models for requirements
traceability. IEEE Transactions on Software Engineering 27 (1): 58–93.
10. Jackson, E., and J. Sztipanovits. 2009. Formalizing the structural semantics of domain-specific
modeling languages. Journal of Software and System Model 8: 451–478. Springer-Verlag.
11. Thalheim, B., K.D. Schewe, and H. Ma. 2009. Conceptual application domain modeling. In
Proc. Sixth Asia-Pacific Conference on Conceptual Modelling (APCCM 2009), Wellington,
New Zealand. CRPIT, vol. 96, 49–57.
12. OMG. 2010. Unified Modeling Language (OMG UML) Superstructure. Version 2.3.
13. OMG. 2010. Unified Modeling Language (OMG UML) Infrastructure. Version 2.3.
14. Meta Object Facility (MOF) Core Specification OMG Specification. 2006. Version 2.0.
15. Object Constraint Language. 2010. OMG Specification. Version 2.2.
16. Nugroho, A. 2009. Level of detail in UML models and its impact on model comprehension: a
controlled experiment. Journal of Information and Software Technology 51: 1670–1685.
17. Aburub, F., M. Odeh, and I. Beeson. 2007. Modelling non-functional requirements of business
processes. Information and Software Technology 49: 1162–1171.
18. Mens, T., and P. Gorp. 2006. A taxonomy of model transformation. Electronic Notes in
Theoretical Computer Science 152: 125–142.
Some Routing Schemes and Mobility
Models for Real Terrain MANET
Abstract The primary challenges in mobile ad hoc network (MANET) are presence
of obstacles, mobility, energy efficiency and network in dynamic topology environ-
ment. Efficient routing with obstacles avoidance in dynamic topology is a critical
issue in MANET. Many mobility patterns have been recommended for the move-
ment of nodes in presence of obstacles in MANET terrain. Some obstacles avoiding
routing techniques are also proposed by some popular researchers. In this paper,
many related articles have been reviewed and briefly discussed. The paper outlines
advantages and drawbacks of each approach to get possible research scope in route
planning in dynamic MANET topology in presence of obstacles.
1 Introduction
B. K. Panda
Department of Electronics & Telecommunication Engineering,
Utkal University, Bhubaneswar, India
e-mail: bk_panda2001@yahoo.com
U. Bhanja
Departments of Electronics & Telecommunication Engineering, IGIT, Sarang, India
e-mail: urmilabhanja@gmail.com
P. K. Pattnaik (B)
School of Computer Engineering, Kalinga Institute of Industrial Technology (KIIT-DU),
Bhubaneswar, Odisha 751024, India
e-mail: patnaikprasant@gmail.com
ad hoc network, testing lab requires very high cost and so simulator tools are used
for performance analysis. All simulation tools are designed considering plain terrain
and do not taken into account real terrain features and obstacles in the simulation
area. Presence of obstacles affects the received signal strength at the receiver and
hence performance of MANET is reduced. Previously many authors have considered
the geographic features and obstacles, and analysed its impact on performance on
MANET but they have not developed an efficient obstacle, mobility and conges-
tion aware optimal energy efficient routing protocol to overcome routing problems
in terrain in the presence of convex and concave obstacles. Hence, in the current
work, authors have reviewed the outcomes of all existing routing protocols, mobility
schemes and outlined advantages and drawbacks of each approach and their scope
to mitigate the challenges of mobility, network congestion and energy efficiency.
The mobility models used in MANET can be segregated into different as per their
dependencies and restrictions given below.
Random based node movement-These mobility models are based on random move-
ments of nodes. Here, any types of internodes dependencies or restrictions are not
considered.
Temporal dependencies in movement-Here, present movement of a MANET node is
decided based on its previous movement.
Group dependencies movement-Here moving pattern of any MANET node deter-
mined by the motion of other neighbouring nodes. Geographic restrictions in
terrain-Some areas of the terrain are restricted from MANET node movement.
Many random-based mobility models given by different authors are discussed below.
The random way point (RWP) model for node mobility presumes a fixed number
of nodes in a fixed sized rectangle terrain area. At start of simulation, the nodes are
placed in a rectangular terrain uniformly. Here, each mobile node sets a destination
arbitrarily and selects a random speed that distributed uniformly between [vmin , vmax ].
When it reaches near the destination, node halts for an arbitrary time which is evenly
distributed between [Pmin , Pmax ], again node selects a next direction and speed.
Then node continues the same mobility pattern [5]. In case of random walk (RWM)
mobility model, every nodes set an arbitrary direction which is evenly spread between
[0, 2π] (and with an arbitrary speed which is evenly spread in between [vmin , vmax ])
and it proceeds for a set distance with that speed. After reaching at new point, same
Some Routing Schemes and Mobility Models for Real Terrain … 525
process is repeated. The model is also known as Brownian motion, which is similar
to the motion of particles present in a fluid [6].
Random direction model (RDM) allows a MANET node to set a direction and
proceeds in the set direction with an arbitrary speed till an edge comes. Afterwards
it selects a next direction and same process is repeated. The benefit of the process is
that we get a even distribution of nodes in rectangular terrain area. Here, drawback
of RDM is same as RWP and RWM that it is unrealistic [6].
RWP, RWM and RDM mobility models are unrealistic in nature and also suffer
edge effects. So to avoid the above limitations, smooth mobility model (SMM) is
proposed. Here, mobile nodes change the speed and direction gradually [6]. Ran-
dom movement based mobility patterns of nodes covers the total terrain area. These
mobility mechanisms are very easy to implement. Here, heterogeneous velocity may
be combined very smoothly.
Apart from temporal dependencies, there is a possibility that nodes may move in
groups. Hence, there is chance that a motion of a node may influence it neighbouring
node or node around it.
Realising spatial dependence can be done using reference points. The paper [10]
proposes a reference point to model the group node movement. Here, group move-
ment is modelled as per to a random mobility model. Here, reference points positions
changes as per random mobility model but within a group the reference point’s posi-
tion does not changes. Moreover, this model is silent about presence of obstacles.
In the work [11], structured-group-mobility model is discussed. Here, a non-random
movement vector is used. The nodes in group move depending on a fixed non vary-
ing formation. These formations are similar as the information exchanges among
fire-fighters, conference participants and tanks. However, this model considers plain
terrain area. In [12], social-network-inspired model uses interaction indicators, which
are associated with all source-destination nodes. Here it is assumed higher value of
interaction indicator to higher chance of social relational bonding, hence geographic
gap is less. Here, nodes are clumped together with in a cloud depending on corre-
sponding interaction parameter indicator. Movement of all the nodes follows random-
waypoint model. Here, the waypoints are selected as per their interaction indicators.
This model is described for plain terrain. In work [13], community-oriented model
is proposed. Here, algorithms are utilised for the separation of terrain nodes into
different group depending on interaction parameter. Here, the interaction indicators
are modified over time. This does not consider real terrain situations.
In group mobility scenarios, the RPGM model performs better than others. In
RPGM model, apart from the feature of batch movement, additional required features
can be obtained by using a suitable model for given reference coordinate points in
plain terrain.
Assumption of free node movement across entire area is also considered as unrealistic
although we consider temporal and group dependencies. Many different approaches
are proposed by different authors to stop the nodes motion to some parts of the terrain
area where mountain, river or buildings present. The section given below describes
some approaches to handle node movement avoiding restricted area.
Graph dependant model [14] utilises a trace where vertex points are the probable
destination nodes and its edges referred as routes to destinations. Depending on the
trace, random-waypoint method is utilised. Here, nodes from an arbitrary position
set graph initially and selects a vertex point, proceed towards that with an arbitrary
Some Routing Schemes and Mobility Models for Real Terrain … 527
velocity. Then again, it selects the next destination randomly and moves with ran-
dom velocity. In weight based-waypoint model [15], the graph traces are particular
areas like lecture room, meeting room, tea stall, etc. Here, every node selects sink
node within terrain areas. These graph edges contain probabilities of selecting a
destination.
In real terrain, many kinds of obstacles are present like mountain, river, building
and wall, etc. Hence, while doing simulation obstacles must be considered in the
simulation area. In this model [16], the edges of a ground are utilised as a feed into
create Voronoi plot. Here, motion graph includes the Voronoi plot and extra vertices.
Extra apex points are formed due to junction of edges of Voronoi plot and building
edges. Here, with the help of Voronoi diagrams, the paths are emulated having equal
distance from all obstacles. These paths are not optimal paths. In this network, all
lanes are assumed to be almost equal distance from entire houses and all nodes
proceeds at middle of the lane. Paper [17] extends more realistically the presence
of buildings and streets. Here, lanes and houses are realised more thoroughly and
movement more realistic, but these movement not on optimal paths.
In this approach, the simulation is divided into many sub-areas and then these used
mobility modelling.
Author in paper [18] has divided the area into low density and high-density sub-
area. Here, clusters are assumed as apex points of the plot and the routes are assumed
as its extremities (edges). Here, each edge is associated with a probability ratio. A
MANET node is assumed to proceed in the sub-area for an arbitrary time stretch as per
RWP model. Then it selects one path depending on probabilities of the extremities.
Then, the node proceeds forward using the selected path to the upcoming sub-area.
In paper [19], the entire terrain is divided into non-intersecting sub-areas. Within
respective area, nodes proceed as per a random mobility pattern. Inter area transition
is done in a similar process as it is done in graph-based mobility model by help of
transit probabilities. When a mobile node is selected to alter area, it proceeds to a
handed over area and changes to some other mobility pattern.
In the paper [20], the terrain is segmented to few squared pieces. Mobile nodes are
modelled as walkers, which are proceeding on the apex points of the lanes. At initial
time, the MANET nodes are arbitrarily scattered throughout the square area. Here,
every node determines a particular direction along with a specified velocity. When a
528 B. K. Panda et al.
mobile node arrives a junction, it diverts its direction associated to a particular value
probability. Velocity of the node is altered due course of time.
Work [21] describes urban environments vehicular traffic pattern. Here, each node
selects a destination same way as RWM and selects a path with smallest moving time.
Here, crossroads delays are emulated depending on number of nodes available. Here,
a regular placement of the nodes along the terrain is perceived. In [22], the author eval-
uates routing protocols performance for different mobility like random walk (RW),
random way point (RWP) and random direction (RD) considering obstacles such as
mountain in the terrain which limits movement of nodes and blocks data transmis-
sion between nodes depending on the value of a new parameter i.e. probability of
reachability (POR). The POR is obtained by dividing number of approachable routes
with all probable routes present among all source-sink pairs. It is simulated using
specially designed MATLAB simulator. It gives a noticeable difference in value of
POR in presence of obstacles. Also it changes when number of obstacles varies.
The paper in [23] outlines physical layer’s impacts on network output efficiency of
MANET. Here simulation is done considering propagation loss models that are ITU
line of sight (ITU-LoS) and nonline of Sight (NLoS). First same scenario simulated
using RWP mobility model for different stumbling block count in terrain area to
estimate amount of data drops in different propagation path models. Here, outcomes
of AODV routing protocol is analysed with 20 nodes.
Some of the routing protocols mainly MANET protocols may be segregated into
three main categories [24–26], they are as Proactive l [24], reactive [25, 26] and
hybrid [27].
possibility of route looping. If a high topology alteration occurs, then whole routing
table information is changed that adds very high overhead to the network in high
mobility scenarios.
DSR protocol was proposed in paper [25] and as per the author DSR protocol search
a path as a source device tries to transmit data packet to a receiving node. There are
mainly two activities present in DSR i.e. route discovery phase and route maintenance
phase. In DSR, source routing technique is used. Here, the complete route address
is stored in the route cache of a node. And this complete address guides data to
the destination. Route error occurs when route breaks and that initiate the route
maintenance process. So that expired routes are removed from route cache and a
new route discovery is started. The work [28] compares a popular proactive protocol
i.e. DSDV with a popular reactive protocol DSR and outlines individual’s advantages
and disadvantages.
AODV do route discovery when required using route requests (RREQ) packets.
Unlike as DSR, AODV uses distance vector technique; i.e. each node present along
the path to keep a interim route-guiding table during communication. AODV has
better technique over the DSR RRQ packet preventing excessive RREQ flooding
using time-to-live (TTL) concept. Nodes within a connected route keep the source/
destination IP address, senders address and sequence numbers in their routing tables.
These information are used to construct reverse paths by route reply packets (RREP)
[28]. AODV uses sequence numbers to track and eliminate outdated routes. Route
error packets (RERR) are generated and sent to source when links break are identified
to eliminate the stale links and initiate fresh route discovery process [28].
In [29], core principles of AODV protocol are discussed but it was silent about pos-
sible insight of future directions. During simulation the data of metrics like dropped
packets, throughput and data delay may be considered [30, 31]. These parameters
are useful indicator for grade of service and network efficiency, but testing is done
530 B. K. Panda et al.
taking AODV only without comparing with other protocols using random waypoint
mobility model.
AODV protocol is the popular MANET protocol among all available protocols; so
many improvements of AODV are proposed by different authors to combat some
problems of MANET. Some variant AODV protocols are suggested below.
The major issue of MANET routing protocols like AODV, DSR and DSDV is lack of
multicast support, which enables for communicating with multiple number of nodes
and reduces control traffic overheads [32]. To tackle this issue MAODV routing
technique is proposed. MAODV adds multicast operation support to existing AODV
protocol. MAODV protocol uses the architecture of modified AODV and it adds
multicast feature triggers (MACT) and the group level hello (GRPH) messages; here
every node keeps two different routing tables, one for unicast and other for multi-
cast function [33]. During route discovery process, MAODV broadcasts route request
packets towards destination, which assists several end node IP addresses. From every
IP addresses, route reply packets are returned, after receiving RREP the transmitting
node will forward a MACT message towards the sink node triggering a multicast
path. Multicast paths are cached in source, which keeps entire multicast end nodes in
its memory and it enables the get unicast destinations from the source-tree without
initiating route discovery. Leader of the group is that node which joined the group
first. Leader is liable for batch management and maintenance, which is obtained by
broadcasting GRPH messages that accommodate contain the group head IP address.
Authors in [34] propose a modified AODV routing protocol which is aware of node
energy and node mobility (MEAODV). The performance MEAODV protocol has
4–5% better in packet delivery ratio, 20–24% less in convergence time and network
lifetime is increased 10–15% as compared to AODV. In [35], authors present a new
route selection technique based on some points like speed of the node, direction of
movement and halt time. Depending on these parameter values, a new parameter
called mobility_factor is taken for choosing nodes to get a pathway between trans-
mitting node and receiving node. Here, simulations of routing algorithms like DSR
and AODV are done with RWP and levy walk models. It is clearly observed from
simulation that using proposed method these protocols show superior performance
contrast to normal AODV and DSR. However, this particular protocol is worthy for
plain terrain.
Some Routing Schemes and Mobility Models for Real Terrain … 531
Paper [36] proposes a new routing protocol known as cartography enhanced OLSR
(CE-OLSR) for multi hop mobile ad hoc networks. The protocol uses an effective
cartography method with a robust routing scheme. This cartography method uses
the exact OLSR reduced traffic and improves activeness to the network dynamics.
The proposed method is better and accurate than the normal network topology. In
stability routing method, a reduced view is used from collected cartography, which
only considers links not exceeding the threshold distance without crossing obsta-
cles. In urban area, radio signal suffers very high shadowing and fading effects and
many times fully obstructed by concrete structure. However, these protocols are not
mobility, congestion or energy aware. The work [37] is the extended work with a
distributed autonomic technique to locate obstacles (ALOE) with CE-OLSR pro-
tocol. This integration of ALOE and CE-OLSR gives very good improvement in
comparison with CE-OLSR with or without knowledge of obstacle map. Here, two
parameters like covering range and precision ratios are used to assess accurately the
impact of this new method. Simulation outcomes indicate that suggested CE-OLSR
obstacle aware technique perfectly locates the borderline obstacles present in terrain.
Here, it is observed that ALOE-CE-OLSR obtains similar route viability, throughput
and end-to-end delay as CE-OLSR having previous awareness of obstacles map.
Moreover, these protocols are not designed mitigate mobility, congestion or energy
consumption issues.
4 Conclusion
It is been observed from the above review that MANET performance depends mostly
on uses of mobility models and routing protocol. The initial stage protocols were
designed for ideal or plain terrain where presence of obstacles like river, mountain or
building were not taken care of when designing moment of nodes in specified terrain.
Later, a lot of work has been done considering realistic terrain i.e. in presence of
obstacles. First-generation protocols are normally proactive protocols where control
overheads are major problem which is taken care of in second-generation protocol
design. In second-generation protocol, AODV is most popular and efficient one but
it is not mobility, congestion or energy aware. In modified AODV protocols, few of
these are taken care of but moreover they are designed for plain terrain. Hence, more
research should be done to make the present protocols aware of mobility, congestion
or energy consumption in presence of obstacles.
532 B. K. Panda et al.
References
1. Johnson, D.B., and D.A. Maltz. 1996. Dynamic source routing in Ad Hoc wireless networks.
In Mobile Computing, vol. 353, ed. T. Imielinski and H.F. Korth, 153–181. The Kluwer
International Series in Engineering and Computer Science, US, Springer.
2. Royer, E.M., and C.K. Toh. 1999. A review of current routing protocols for ad hoc mobile
wireless networks. IEEE Personal Communications 6 (2): 46–55.
3. Royer, E.M., P.M. Melliar-Smith, and L.E. Moser. 2001. An analysis of the optimum node
density for ad hoc mobile networks. In Proceedings of IEEE International Conference on
Communications, vol. 3, 857–861, Helsinki, Finland.
4. Chlamtac, I., M. Conti, and J.J.N. Liu. 2003. Mobile ad hoc networking: imperatives and
challenges. Ad Hoc Networks 1 (1): 13–64.
5. Bai, F., and Helmy A. 2004. Wireless ad hoc and sensor networks. In Chapter 1: a survey of
mobility models. http://nile.usc.edu/helmy/important/Modified-Chapter1-5-30-04.pdf.
6. Camp, T., J. Boleng, and V. Davies. 2002. A survey of mobility models for ad hoc network
research. Wireless Communications and Mobile Computing 2 (5): 483–502.
7. Ariakhajorn, J., P. Wannawilai, and C. Sathiyatwiriyawong. 2006. A comparative study of ran-
dom waypoint and gauss-markov mobility models in the performance evaluation of MANET.
In International Symposium on Communications and Information Technologies, 894–899.
8. Bettstetter, C. 2001. Mobility modeling in wireless networks: categorization, smooth move-
ment, and border effects. ACM SIGMOBILE Mobile Computing and Communications Review
5 (3): 55–66.
9. Bettstetter, C. 2001. Smooth is better than sharp: a random mobility model for simulation
of wireless networks. In Proceedings 4th International Symposium Modeling, Analysis and
Simulation of Wireless and Mobile Systems MSWIM, 19–27. Rome, Italy.
10. Jayakumar, G. and G. Ganapati. 2008. Reference point group mobility and random waypoint
models in performance evaluation of MANET routing protocols. Journal of Computer Systems,
Networks, and Communications 2008: 10. Article ID 860364.
11. Blakely, K., and B. Lowekamp. 2004. A structured group mobility model for the simulation
of mobile ad hoc networks. In International Conference of Mobile Computing Network, Pro-
ceedings of the 2nd International Workshop Mobile Management & Wireless Access Protocol,
111–118. Philadelphia, USA.
12. Musolesi, M., S. Hailes, and C. Mascolo. 2004. An ad hoc mobility model founded on social
network theory. In Proceedings of the 7th ACM Int. Symposium on Modeling, Analysis and
Simulation of Wireless and Mobile Systems, 20–24. Venice, Italy.
13. Musolesi, M., and C. Mascolo. A community based mobility model for ad hoc network research.
In Proceedings of the 2nd ACM/SIGMOBILE International Workshop on Multi-hop Ad Hoc Net-
works: from Theory to Reality REALMAN’06, Colocated with MobiHoc2006, 31–38. Florence,
Italy.
14. Bittner, S., W.-U. Raffel, and M. Scholz. 2005. The area graph-based mobility model and its
impact on data dissemination. In Proceedings IEEE PerCom, 268–272. Kuaai Island, Hawaii,
USA.
15. Hsu, W.-J., K. Merchant, H.-W. Shu, C.-H. Hsu, and A. Helmy. 2005. Weighted waypoint
mobility model and its impact on ad hoc net-works. ACM SIGMOBILE Mobile Computing and
Communications Review 9 (1): 59–63.
16. Jardosh, A., E.M. Belding-Royer, K.C. Almeroth, and S. Suri. 2003. Towards realistic mobility
models for mobile ad hoc networks. In Proceedings IEEE MobiCom, 217–229. San Diego,
USA.
17. Jardosh, A.P., E.M. Belding-Royer, K.C. Almeroth, and S. Suri. 2005. Real-world environment
models for mobile network evaluation. IEEE Journal on Selected Areas in Communications
23(3): 622–632.
18. Bettstetter, C., and C. Wagner. 2002. The spatial node distribution of the random way-
point mobility model. In Proceedings of the 1st German Workshop Mobile Ad-Hoc Network
WMAN’02, 41–58. Ulm, Germany.
Some Routing Schemes and Mobility Models for Real Terrain … 533
19. Gunes, M., and J. Siekermann. 2005. CosMos—communication scenario and mobility scenario
generator for mobile Ad-Hoc networks. In Proceedings of the 2nd International Workshop
MANETs Interoper. Iss. MANETII’05. Las Vegas, USA.
20. Kraaier, J., and U. Killat. 2005. The random waypoint city model—user distribution in a
street-based mobility model for wireless network simulations. In Proceedings of the 3rd ACM
International Workshop on Wireless Mobile Applications and Services on WLAN hotspots,
100–103. Cologne, Germany.
21. Chones, D.R., and F.E. Bustamante. 2005. An integrated mobility and traffic model for vehicular
wireless networks. In International Conference on Mobile Computing Network, Proceedings
of the 2nd ACM International Workshop on Vehicular Ad Hoc Networks, 69–78. Cologne,
Germany.
22. Kumar, C., B. Bhushan, and S. Gupta. 2012. Evaluation of MANET performance in presence
of obstacles. International Journal of Ad hoc, Sensor & Ubiquitous Computing (IJASUC) 3(3).
23. Amjad, K., M. Ali, S. Jabbar, M. Hussain, S. Rho, and M. Kim. 2015. Impact of dynamic path
loss models in an urban obstacle aware ad hoc network environment. Journal of Sensors 2015:
8. Article ID 286270.
24. Perkins, C., and P. Bhagwat. 1994. Highly dynamic destination-sequenced distance-vector rout-
ing (DSDV) for mobile computers. In Proceedings of Sigcomm Conference on Communications
Architectures, Protocols and Applications, 234–244. London, England, UK.
25. Johnson, D.B., and D.A. Maltz. 1996. Dynamic source routing in ad hoc wireless networks.
In Mobile computing, vol. 5, ed. T. Imielinski and H. Korth, 153–181. Kluwer Academic
Publishers.
26. Perkins, C.E., and E.M. Royer. 1997. Ad-Hoc on-demand distance vector routing. In Pro-
ceedings of the 2nd IEEE Workshop on Mobile Computing Systems and Applications,
1–11.
27. Alotaibi, E., and B. Mukherjee. 2011. A survey on routing algorithms for wireless Ad-
Hoc and mesh networks. Computer Networks the International Journal of Computer and
Telecommunications Networking 56 (2): 940–965.
28. Divecha, B., A. Abraham, C. Grosan, and S. Sanyal. 2007. Analysis of dynamic source routing
and destination-sequenced distance-vector protocols for different mobility models. In Pro-
ceedings of First Asia International Conference on Modelling & Simulation, 224–229. Phuket,
Thailand.
29. Morshed, M., H. Rahman, R.R. Mazumder, and K.A.M. Lutfullah. 2009. Simulation and anal-
ysis of Ad-Hoc on-demand distance vector routing protocol. In Proceedings of ICIS, 610–614.
Seoul, Korea.
30. Huang, J., X. Fan, X. Xiang, M. Wan, Z. Zhuo, and Y. Yang. 2016. A clustering routing
protocol for mobile ad hoc networks. Mathematical Problems in Engineering 2016: 10. Article
ID 5395894.
31. Perkins, C.E., and E.M. Royer. 2001. Multicast operation of the Ad-Hoc on-demand distance
vector routing protocol. In Proceedings of 5th Annual ACM/IEEE International Conference on
MOBILE Computing and Networking, 207–218. Seattle, Washington, USA.
32. Mobaideen, W.A., H.M. Mimi, F.A. Masoud, and E. Qaddoura. 2007. Performance evaluation
of multicast Ad-Hoc on-demand distance vector protocol. Computer Communications 30 (9):
1931–1941.
33. Malarkodi, B., P. Gopal, and B. Venkataramani. 2009. Performance evaluation of Ad-Hoc
networks with different multicast routing protocols and mobility models. In Proceedings of
2009 International Conference on Advances in Recent Technologies in Communication and
Computing IEEE. 81–84. India.
34. Rashid, U., O. Waqar, and A.K. Kiani. 2017. Mobility and energy aware routing algorithm for
mobile Ad-Hoc networks. In International Conference on Electrical Engineering (ICEE), 1–5.
35. Sarkar, S., and R. Dutta. 2017. Mobility-aware route selection technique for mobile ad hoc
networks. IET Wireless Sensor Systems 3: 55–64.
534 B. K. Panda et al.
36. Belghith, Abdelfettah, and M. Belhassen. 2012. CE-OLSR: a cartography and stability
enhanced OLSR for dynamic MANETs with obstacles. KSII Transactions on Internet and
Information Systems 6(1).
37. Belhassen, M., A. Dhraief, A. Belghith, and H. Mathkour. 2018. ALOE: autonomic locat-
ing of obstructing entities in MANETs. Journal of Ambient Intelligence and Humanized
Computing1(13).
Author Index
M
Mahajan, Manas, 235 R
Mahat, Maheep, 95 Rajgure, Sumit, 95
Malekar, Mrunal, 137 Rewale, Rahul, 193
Mali, Suresh, 501 Rizvi, Naela, 449
Marar, Shreerag, 297 Rout, Raghunath, 403
Mehra, Rahul, 61, 287 Roy, Sharmistha, 449
Mehta, Bhavya, 61, 287
Mekhe, Yash, 95
S
Mirani, Param, 305
Sagnika, Santwana, 149, 183
Mishra, Sanjeevani Subhadra, 173
Sahoo, Monalisa, 213
Mishra, Sashikala, 255
Sahoo, Sony Snigdha, 213
Mishra, Shubham Kr., 81
Samal, Abhaya Kumar, 105, 403
Mohanty, Surajit, 105
Samantara, Tumbanath, 439
Moharil, Ambarish, 137
Samantraj, Sambhrant, 1
Motwani, Akshay, 277
Sangamkar, Prasad, 235
Motwani, Nikhil, 277
Santhappan, Jayasri, 39, 223
Sengupta, Sharmila, 27
Shadi, Fagun, 137
N
Shah, Meet K., 493
Naik, Pandurang, 113
Sheikh, Faisal, 297
Naik, Sourabh, 137
Shrivastava, Akanksha, 355
Narsapur, Rajat Rajesh, 493
Singhar, Sunil Samanta, 213
Naskar, Subham, 367
Singh, Jay Prakash, 161
Nath, Mahendra Prasad, 183
Singh, Puran, 51
Navghare, Nilesh, 235
Sinha, Sheona, 81
Nayak, Biswojit, 485
Sonavane, Nikhil, 137
Nayak, Suvendu Kumar, 203
Sridharan, Aadityan, 343
Neware, Nisha, 51
Swain, Debabala, 245, 465, 477
Swain, Debabrata, 39, 51, 193, 277, 297
Swain, Monalisa, 477
O
Obaid, Abdulrahman Mohammed Hussein,
425 U
Ojha, Ananta Charan, 203 Unavane, Aditya, 331
Urkude, Priya, 413
P
Padwal, Shrinit, 255 V
Paikaray, Bijay Ku., 245, 465, 477 Valecha, Shivani, 27
Panda, Banoj Kumar, 523 Varade, Rahul S., 265
Pani, Santosh Kumar, 425 Vatwani, Nikhil, 27
Pathak, Vikas B., 265 Veenadhari, S., 69
Patil, Aseem, 317 Vidhate, Ashok D., 383
Patnaik, Prasant Kumar, 1 Vijan, Vinit, 277
Pattanaik, Anshuman, 173 Vijayabaskar, V., 393
Author Index 537
W Y
Wadhwani, Priyanka, 27 Yadav, Rahul, 113