Giduthuri - Sateesh - Babu - PHD - Thesis
Giduthuri - Sateesh - Babu - PHD - Thesis
Giduthuri - Sateesh - Babu - PHD - Thesis
A thesis submitted to
School of Computer Engineering
Nanyang Technological University
by
2014
Acknowledgements
I would like to express my deepest gratitude to my supervisor Dr. Suresh Sundaram for
his intelligent guidance and patient nurture for the past years. I have learned so much
from his ways of critical thinking and his analytic insights into the problems helped
greatly in the accomplishment of this research work. I am proud to have such a great
mentor on my way towards research and he has made my stay at Nanyang Technological
I want to thank Dr. R. Savitha and Dr. B. S. Mahanand for their numerous helpful
comments and enlightening discussions throughout my research course. Their time and
I would also like to dedicate this special thanks to my family and friends, especially
my parents, who have always been there for me. This research work would not have been
Thanks are also dedicated to Center for Computational Intelligence for the research
facilities. I also acknowledge Nanyang Technological University for the nancial support
and this precious opportunity of study. Finally, I pay my tributes to the God Almighty
i
Contents
Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . i
Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vi
List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . x
1 Introduction 1
1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.2 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
ii
3 An Overview on Meta-cognition 22
3.1 Denitions of Important Concepts in Meta-cognition . . . . . . . . . . . 22
3.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
4 Meta-cognitive Radial Basis Function Network and Its EKF Based Se-
quential Learning Algorithm for Classication Problems 26
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
4.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
5.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
iii
6.3.1 Statistical Signicance Test . . . . . . . . . . . . . . . . . . . . . 61
6.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
7.2.1 Materials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
Detection of AD . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
7.3.2 Imaging Biomarkers for AD Based on Age in OASIS Data Set . . 107
7.3.3 Imaging Biomarkers for AD Based on gender in OASIS Data Set . 110
iv
8 Parkinson's Disease Diagnosis using PBL-McRBFN Classier 116
8.1 Literature Review on Parkinson's Disease . . . . . . . . . . . . . . . . . . 117
8.3 Early Diagnosis of Parkinson's Disease Based on Gene Expression Features 121
Publications 144
References 146
v
Abstract
This research work focuses on the development of meta-cognitive sequential learning al-
gorithms in Radial Basis Function (RBF) network classiers, and their application to
the early diagnosis of neurodegenerative diseases. The important issues in existing se-
quential learning algorithms are proper selection of training samples, nding minimal
random sequence of sample arrival inuences the performance signicantly. It has been
reported in human learning that best learning strategies employ meta-cognition (meta-
human learning. Accordingly, McRBFN has two components, namely cognitive and
suitable learning strategies for each sample. When a new sample is presented, the meta-
cognitive component either deletes the sample or learns the sample or reserves the sample
for future use. Learning includes adding a new neuron or updating the parameters of the
existing neurons using an extended Kalman lter (EKF). The McRBFN using EKF for
EKF-McRBFN uses computationally intensive EKF based parameter update and does
not utilize the past knowledge stored in the network. Therefore, an ecient Projection
Based Learning (PBL) algorithm for McRBFN referred as PBL-McRBFN has been de-
veloped. When a neuron is added to the cognitive component, the Gaussian parameters
vi
are determined based on the current sample and the output weights are estimated using
the PBL algorithm. When a new neuron is added, existing neurons in the cognitive
on multiple data sets clearly indicate the superior performance of the proposed PBL-
McRBFN and EKF-McRBFN over existing popular classiers. Experimental results also
The early diagnosis of AD problem from Magnetic Resonance Imaging (MRI) scans
classier has been evaluated on two well-known open access Open Access Series of Imag-
ing Studies (OASIS) and Alzheimer's disease Neuroimaging Initiative (ADNI) data sets.
Morphometric features are extracted from MRI scans using Voxel-Based Morphometry
(VBM). The study results clearly show that the PBL-RBFN classier produces a better
generalization conducted on ADNI data set with PBL-McRBFN classier trained on OA-
SIS data set shows that the proposed PBL-McRBFN can also achieve signicant results
on the unseen data set. Finally, PBL-McRBFN-RFE feature selection approach has been
proposed to detect imaging biomarkers responsible for AD for dierent age groups and
Next, PBL-McRBFN classier is used to predict PD from MRI scans. Further, imag-
ing biomarkers responsible for PD are detected with the proposed PBL-McRBFN-RFE
approach based on MRI scans. For completeness, PBL-McRBFN classier is also used
to detect PD from vocal and gait features. From the performance evaluation study, it
vii
List of Figures
4.1 (a) Nelson and Narens model of meta-cognition (b) EKF-McRBFN model 27
overlapping conditions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
6.2 Class-wise signicance (a), and instantaneous hinge error with self-regulatory
6.3 History of number of hidden neurons (a), self-regulated addition (b), and
7.2 Schematic diagram of the stages in feature extraction based on the VBM
analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
MRI of an AD patient (from right: sagittal view, coronal view and axial
view) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
7.4 Maximum intensity projections from OASIS data set - Normal persons vs.
AD patients (a) sagittal view (b) coronal view (c) axial view . . . . . . . 93
viii
7.5 Maximum intensity projections from ADNI data set - Normal persons vs.
AD patients (a) sagittal view (b) coronal view (c) axial view . . . . . . . 93
7.6 Gray matter volume change from OASIS data set - Normal persons vs.
AD patients (a) sagittal view (b) coronal view (c) axial view . . . . . . . 94
7.7 Gray matter volume change from ADNI data set - Normal persons vs. AD
patients (a) sagittal view (b) coronal view (c) axial view . . . . . . . . . 95
7.9 Comparison of gray matter volume change - Normal persons vs. AD pa-
7.10 Comparison of gray matter volume change - Normal persons vs. AD pa-
tients from 60-69 (a&b), 70-79 (c&d), 80-Above (e&f ) age groups in OASIS
7.11 Comparison of gray matter volume change - Normal persons vs. AD pa-
7.12 Comparison of gray matter volume change - Normal persons vs. AD pa-
8.2 Schematic diagram of the stages in feature extraction based on the VBM
analysis. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
8.3 Maximum intensity projections from PPMI MRI data set - Normal persons
vs. PD patients (a) sagittal view (b) coronal view (c) axial view . . . . . 128
8.4 Gray matter volume change from PPMI MRI data set - Normal persons
vs. PD patients (a) sagittal view (b) coronal view (c) axial view . . . . . 129
8.5 Comparison of gray matter volume change - Normal persons vs. PD pa-
ix
List of Tables
condence level . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
condence level . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
t
6.11 Two-tailed critical values ( -distribution) for the Dunnett test at 95 % con-
dence level . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
7.4 Performance comparison with existing results on the OASIS data set . . 97
x
7.6 Performance comparison with existing results on the ADNI data set . . . 100
samples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
8.1 Demographic information of PPMI MRI data used in our study . . . . . 120
8.3 Performance comparison on selected gene expression data set with p-value
< 0.05 from an average of 10 trials . . . . . . . . . . . . . . . . . . . . . 124
8.4 Performance comparison on selected gene expression data set with p-value
< 0.01 from an average of 10 trials . . . . . . . . . . . . . . . . . . . . . 125
8.5 Performance comparison on 2981 VBM features data set from an average
of 10 trials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128
xi
8.6 Performance comparison on ICA reduced features data sets from an aver-
8.7 VBM detected and PBL-McRBFN-RFE selected regions responsible for PD131
8.9 Performance comparison on vocal data set from an average of 10 trials . 133
8.11 Performance comparison on gait data set from an average of 10 trials . . 135
xii
List of Abbreviations
RBF Radial Basis Function
xiii
SNN Spiking Neural Networks
AD Alzheimer's disease
PD Parkinson's disease
xiv
MCI Mild Cognitive Impairment
LD Liver Disorders
BC Breast Cancer
ION Ionosphere
IS Image segmentation
VC Vehicle Classication
GI Glass Identication
xv
CT Computed Tomography
xvi
Chapter 1
Introduction
1.1 Motivation
Over the past decade, a number of supervised learning algorithms have been developed
for pattern classication applications. Articial neural networks have been widely used in
the elds of pattern classication and they show their advantages over other methods in
their learning, generalization and adaptation capability as well as their unique power for
the complete training data describing the input-output relationship is not available a
priori. For these problems, classical batch-learning algorithms are rather infeasible and
In a sequential learning framework, the training samples arrive one-by-one and the
samples are discarded after the learning process. Hence, it requires less memory and com-
putational time during the learning process. In addition, sequential learning algorithms
automatically determine the architecture that can accurately approximate the true de-
cision function described by a stream of training samples. Samples from data stream do
not follow the same static underlying distribution. Traditionally all sequential learning
algorithms uses all training samples for learning and does not regulate the learning pro-
cess. This inspires us to study human learning and develop learning algorithm which
In the families of articial neural networks, Radial Basis Function (RBF) neural net-
works have been extensively used in a sequential learning framework due to its universal
1
Chapter 1. Introduction
RBF network for classication problems. Many sequential learning algorithms in RBF
framework are available in the literature to solve classication problems and detailed
On the other hand, educational psychologists have studied human learning for years
and suggested that the learning process is eective when the learners adopt self-regulation
symbolic mental activities and mental representations that includes attention, memory,
producing and understanding language, learning, reasoning, problem solving, and deci-
sion making. Meta-cognition means cognition about cognition . The term meta-cognition
was rst coined by Flavell [5]. He dened meta-cognition as the thoughts about one's
own thoughts process and cognitions . Precisely the learner should control the learning
process, by planning and selecting learning strategies and monitor their progress by an-
alyzing the eectiveness of the proposed learning strategies [6]. When necessary, these
vides a means to address what-to-learn, when-to-learn and how-to-learn, i.e., the ability
to identify the specic piece of required knowledge, judge when to start and stop learning
There are several meta-cognition models available in human psychology and a brief
survey of various meta-cognition models is reported in [7]. Among the various models,
the model proposed by Nelson and Narens in [8] is simple and clearly highlights the
} Meta−cognitive component
Monitoring Control
} Flow of information
} Cognitive component
2
Chapter 1. Introduction
Nelson and Narens [8] model has two components, the cognitive component and the
meta-cognitive component. The information ow from the cognitive component to meta-
cognitive component is considered monitoring, while the information ow in the reverse
direction is considered control. The basic notion underlying control is that the meta-
cognitive component modies the cognitive component based on these signals. The in-
formation owing from the meta-cognitive component to the cognitive component either
changes the state of the cognitive component or changes the cognitive component itself.
Monitoring informs the meta-cognitive component about the state of cognitive com-
In neural networks, the current state-of-the algorithms address pure cognitive aspect
of human learning inspired from human brain, the concept of self-regulated learning using
the-art neural network algorithms address only the how-to-learn component of human
learning. In the neural networks eld, a few algorithms have been developed which
address some of the meta-cognitive learning aspects [9, 10]. One of the rst works in
(SRAN) [9]. SRAN is a sequential learning algorithm and also addresses the what-to-
learn component of meta-cognition by selecting signicant samples using misclassication
error and hinge loss error. The complex version of above algorithm is Complex-valued
Self-regulating Resource Allocation Network (CSRAN) [10]. It has been shown in [9, 10],
that selecting signicant samples and removing repetitive samples in learning helps to
three components of meta-cognition with suitable learning strategies would improve the
generalization ability of a neural network. The drawbacks in the above algorithms are:
• The selection of signicant samples from stream of training data are based on simple
• The allocation of new hidden neuron center without considering the amount of
3
Chapter 1. Introduction
• Knowledge gained from past trained samples is not utilized in further learning.
• These algorithms use computationally intensive Extended Kalman Filter (EKF) for
parameter update.
appropriate samples for learning and adopt best learning strategy to learn them accu-
rately. This thesis shall deal with the development of meta-cognitive sequential learning
algorithm for RBF network classiers which overcomes the above drawbacks. Also, the
interaction between meta-cognitive and cognitive components and their inuence on RBF
network learning will be dealt with, accordingly. We evaluate the performance of the de-
veloped classier and compare with existing classiers. Also, we use proposed classier
considered as a group of diseases that seriously and progressively impair the functions
of the nervous system through selective neuronal vulnerability of specic brain regions.
and most of them have no cure. The goal of treatment for such diseases is usually to
improve symptoms, relieve pain and increase mobility. In modern world, management and
diseases is critical due to growing data sets. Also, the sample imbalance in the classes of
them. Alzheimer's disease (AD) is the most common neurodegenerative disease and is
one of the leading cause of death worldwide with the associated estimated cost of care
exceeding $200 billion annually [11]. It is estimated that there are more than 5.4 million
Americans and about 30 million people worldwide suering from AD, with the number
expected to increase dramatically as the global population ages. Today, AD remains the
largest unmet medical need in neurology, with the disease expected to aict 100 million
by 2050.
Parkinson's disease (PD) is the second most common neurodegenerative disease, after
AD. It is estimated that there are more than 1 million Americans and about 10 million
4
Chapter 1. Introduction
people worldwide living with PD. Incidence of PD increases with age, but an estimated
four percent of people with PD are diagnosed before the age of 50 [12].
(MRI) will help to slow down the progress of diseases. MRI is the most important brain
imaging procedure that provides accurate information about the shape and volume of the
brain. MRI accurately monitors and identies the tissue volume changes at all anatomical
regions in the brain. MRI helps to detect AD at an early stage - before irreversible
damage has been done [13]. As with the increased number of elderly people, there will
be many cases of AD and PD, the databases of clinical study of these diseases are also
increasing. Also, identifying the most relevant and meaningful imaging biomarkers with
there is a need to develop algorithms which can handle these growing medical data sets
the early stages. This thesis shall also deal with handling these growing medical data
sets for detection of AD and PD, and identication of imaging biomarkers responsible
for AD and PD, which is a challenging task for the machine learning community.
1.2 Objectives
The main aim of this research work is to develop a generic framework of human meta-
cognitive learning mechanism in RBF network architecture. The research discusses and
evaluates the inter-relationship between the meta-cognitive knowledge, control and mon-
itory signals such that the learning in RBF is ecient, this in turn improve the perfor-
mance of RBF classier signicantly. The main research objectives could be summarized
as follows:
handle data one by one and only one. Aspects of meta-cognitive monitoring such as
5
Chapter 1. Introduction
tial learning algorithm are fast and ecient learning, and must use past knowledge
properly for self-regulation of learning. RBF networks is chosen due to its localiza-
tion property of Gaussian function, and widely used in classication problems. The
algorithm must also handle growing data sets of neurodegnerative disease particu-
larly AD and PD. To handle the growing data sets it must handle the samples in
sequential mode with less computation eort and also network architecture must
plays a major role in providing treatment that may slow down its progress. MRI is
the most important brain imaging procedure that provides accurate information of
the brain with high spatial resolution and can detect minute abnormalities. Hence,
early diagnosis of AD using MRI scans is another objective of this research work.
This objective is two-folds, rst is AD classication using MRI scans with accurate
prediction rate and second is selection of relevant imaging biomarkers for AD using
MRI scans. The wrapper based feature selection method which depends on the
literature, it is reported that gender and age may be an important modifying factor
methods had been reported using vocal and gait features. Research studies on PD
discovered that, early diagnosis of PD using vocal and gait features is impossible be-
cause tremor and slow movements develop in PD patients only after approximately
70% of vulnerable dopaminergic neurons in the substantia nigra have already died.
Recent studies on gene expression analysis found that there is a profound change
in gene expression for individuals aected by PD. Also, early diagnosis of PD using
chine learning techniques were not employed to early diagnosis of PD using gene
6
Chapter 1. Introduction
expression and MRI features. Hence, some other objectives of this research work
are
MRI scans.
of neurodegenerative diseases. Major contributions of this thesis are categorized into two
parts, (I) Algorithm part and (II) Application part. First, we highlight contributions in
meta-cognitive radial basis function network based on Nelson and Narens model. If a RBF
network analyzes its cognitive process and chooses suitable learning strategies adaptively
Function Network' (McRBFN). McRBFN has two components, namely the cognitive
component and the meta-cognitive component. A RBF network with evolving structure
is the fundamental building block of the cognitive component. The meta-cognitive com-
ponent devices sample deletion, neuron growth, parameter update and sample reserve
strategies which directly addresses the basic principles of self-regulated human learning
the sequential learning process by selecting one of the above learning strategies for the
new training sample. The strategies are also adapted to accommodate coarse knowledge
rst followed by ne tuning. Sample delete strategy removes redundant sample to avoid
over-training.
(1) EKF Based Meta-cognitive Radial Basis Function Network : Extended Kalman
Filter (EKF) based sequential learning algorithm has been proposed for McRBFN
7
Chapter 1. Introduction
• Hinge loss error function is used for better estimation of the posterior probabil-
ity.
• Overlapping criteria are introduced to initialize the new hidden neuron param-
eters.
tion of hinge error function and nds the optimal network output parameters
past knowledge.
mous toll on aected patients and their families. In the application part, the proposed
PBL-McRBFN classier is used in the early diagnosis of AD based on MRI scans. Also,
8
Chapter 1. Introduction
problem. For this, morphometric features are extracted from MRI scans using
Voxel-Based Morphometry (VBM). VBM is one of the widely used, fully automated,
Mapping (SPM) method, often employed for the investigation of tissue volume
changes between the brains MRI scans of the diseased group versus the normal
persons. We have used VBM analysis to identify probability of the gray matter
are two-fold
pocampal gyrus, the hippocampus, the superior temporal gyrus, the insula, the
precentral gyrus and the extra-nuclear regions. Next, the PBL-McRBFN-RFE has
also been used to identify imaging biomarkers for AD from the OASIS gender-wise
and age-wise analysis. In medical literature [15, 16], it is reported that age and
and gender.
The results from the imaging biomarkers detection analysis based on age are:
• In the 60-69 age group AD patients, gray matter atrophy is observed in the
• In the 70-79 age group AD patients, gray matter atrophy is observed in the
9
Chapter 1. Introduction
• In the 80-89 age group AD patients, gray matter atrophy is observed in the
The results from the imaging biomarkers detection analysis based on gender are:
pal gyrus and the extra-nuclear regions which are responsible for memory
obtained from MRI scans using VBM analysis. We also used PBL-McRBFN-
MRI scans. The superior temporal gyrus brain region detected by PBL-
10
Chapter 1. Introduction
In here, sequential learning algorithms are classied based on the framework: er-
ror driven, neuron signicance, extreme learning machine, spiking neural networks,
algorithms is presented.
dress the what-to-learn by deleting insignicant samples from data stream, neu-
ron growth strategy and parameters update strategy address the how-to-learn by
which the cognitive component learns from the samples, and self-adaptive nature
McRBFN classier uses computationally less intensive PBL algorithm. PBL al-
PBL-McRBFN allows the network to `reuse' the knowledge gained from the past
samples in new hidden neuron parameters initialization and output weights esti-
mations.
11
Chapter 1. Introduction
[17], batch ELM [18] and standard Support Vector Machine (SVM). A quantitative
from MRI scans. In here, morphometric features are extracted from MRI scans us-
ing VBM. The performance of the PBL-McRBFN classier has been evaluated on
two well-known open access OASIS and ADNI data sets. The performance of PBL-
McRBFN classier has been compared with other state-of-the-art methods in the
literature in AD diagnosis problem using these data sets. Next, we have demon-
with the OASIS data set being tested on the unseen ADNI data set. Finally, we
for AD on complete, dierent age group and gender OASIS data sets.
McRBFN classier with microarray gene expression, MRI, vocal and gait features
are presented. The obtained complete microarray gene expression data set consists
of 22283 genes expression information from subjects. Since the complete gene ex-
pression data set consists of large number of redundant genes, we also conducted
p -value selection at values less than 0.05/0.01. The quantitative performance com-
• Chapter 9 summarizes the conclusions and provides directions for future research
work.
12
Chapter 2
Literature Review on Sequential
Learning Algorithms in Neural
Networks
In this chapter, key concepts in sequential learning algorithm for neural networks are
7 and 8. The review discusses briey on dierent types of learning models in neural
In a sequential learning framework, the training samples arrive one-by-one and the
samples are discarded after the learning process. Hence, it requires less memory and com-
putational time during the learning process. In addition, sequential learning algorithms
automatically determine the minimal architecture that can accurately approximate the
networks, Radial Basis Function (RBF) neural networks have been extensively used in a
sequential learning framework due to its universal approximation ability and simplicity
of architecture. Hence, in this thesis, we consider radial basis function neural network
for classication problems. Many sequential learning algorithms in radial basis function
framework are available in the literature to solve classication problems and depending
on training method of the network and the structure, these learning algorithms could
be broadly classied as belonging to one the these: error driven algorithms, neuron sig-
nicance based algorithms, extreme learning machine based algorithms, spiking neural
networks algorithms, kernel least mean square based algorithms, and sequential classica-
13
Chapter 2. Literature Review
tion algorithms. These are discussed in detail next, along with incremental-decremental
SVM Algorithms.
duced in the literature. RAN evolves the network architecture required to approximate
the true function using novelty based neuron growth criterion. In RAN, the novelty of
a sample is determined based on the error and distance to the nearest neuron. If the
novelty criterion is satised then a new hidden neuron is added to the network otherwise
network parameters are updated using Least Mean Squares (LMS) algorithm.
has been proposed in [20]. In RANEKF, extended Kalman lter (EKF) is used rather
One drawback of both RAN and RANEKF is that once a hidden neuron is added,
it can never be removed. Thus, both RAN and RANEKF could produce networks in
which some hidden neurons, although active initially, may subsequently end up con-
tributing little to the network output [21]. To overcome the above drawback, Minimal
Resource Allocation Network (MRAN) algorithm has been proposed in [21]. In MRAN,
the RANEKF is augmented with pruning strategy. The pruning strategy in MRAN re-
moves those hidden neurons that consistently make little contribution to the network
output. MRAN uses a sliding window of training samples in the growing and pruning
criteria to identify the hidden neurons that contribute relatively little to the network
output. Selection of the appropriate sizes for these windows critically depends on the
MRAN updates all network parameters, the EKF require storage of huge covariance
matrix and inverse of the same. Hence, computational eort and memory requirements in
training phase are quite high. To overcome the above weakness for real-time implemen-
tation, an algorithm has been proposed called Extended Minimum Resource Allocation
14
Chapter 2. Literature Review
that hidden neuron in the network which is closest to the training sample. The main
contribution of the EMRAN algorithm is that, in every step, only those parameters that
are related to the winner neuron are updated by the EKF algorithm.
growing and pruning the RBF network architecture, and is called Growing And Pruning
RBFN (GAP-RBFN). A new hidden neuron is added if its signicance exceeds a threshold
value, whereas existing hidden neurons are pruned if their signicance is less than certain
contribution made by that neuron to the network output averaged over all the training
samples received so far and this requires the knowledge of the input data distribution.
the Gaussian function to reduce the computational eort. Similar to EMRAN, in GAP-
RBFN also only nearest hidden neuron parameters are updated by EKF algorithm.
that the training samples are uniformly distributed. If the training samples distribution
presented in [24]. The GGAP-RBFN algorithm can be used for any arbitrary input
distribution.
The performance of GAP-RBFN algorithm for classication problems has been evalu-
ated in [25], improvements to GAP-RBFN for enhancing its performance in both accuracy
and training speed for classication problems has been presented in Fast GAP-RBFN
used, whereas EKF is used in MRAN and GAP-RBFN algorithms for network parame-
ters update. EKF requires more computational eorts and large memory for large input
dimensional problems. On the other hand, DEKF only considers the pair wise interdepen-
dence of the parameters from the same decouple group, rather than the interdependence
of all the parameters in the network. When the number of hidden neurons becomes large,
15
Chapter 2. Literature Review
DEKF results in a signicant reduction in computational cost per training sample and
paradigm. It is a batch learning algorithm for a single-hidden layer feed forward neural
network. ELM randomly chooses input weights and analytically determines the output
weights using minimum norm least-squares. A complete survey of research works in ELM
framework are presented in [27]. Sequential version of ELM using recursive least squares
has been presented in [28], referred as an On-line Sequential Extreme Learning Machine
chunk size. In the OS-ELM algorithm, the input weights are selected randomly and
output weights are calculated analytically using the least square error. For sequential
learning, the output weights are updated using recursive least. In the output weight
calculations, OS-ELM uses a small chunk of initial training data. The small chunk of
initial training data aects the training performance of OS-ELM. In case of sparse and
imbalance data sets, the random selection of input weights with xed number of hidden
Incremental versions of ELM have been proposed in [30, 31, 32]. In [30], Incremental
ELM (I-ELM) algorithm is presented. In I-ELM, every time only one hidden neuron
is randomly generated and added to the existing network. However, I-ELM does not
recalculate the output weights of all the existing hidden neurons when a new hidden
neuron is added. To improve the convergence of I-ELM, Convex Incremental ELM (CI-
improved by recalculating the output weights of the existing hidden neurons based on
a convex optimization method when a new hidden neuron is randomly added. CI-ELM
obtains a faster convergence rate and more compact network architecture while retaining
the I-ELM's simplicity and eciency. The performance of I-ELM is further improved in
Enhanced I-ELM algorithm (EI-ELM) [32]. In EI-ELM, every time k hidden neurons
are randomly generated and among the k randomly generated hidden neurons only the
16
Chapter 2. Literature Review
most appropriate hidden neuron is added to the existing network. However I-ELM, CI-
ELM and EI-ELM requires complete training data so these algorithms cannot handle
sequential data.
els, are more closely related to their biological neurons compared to classical articial
neural networks from the previous generations. A sequential learning algorithm in SNN
framework has been presented in [33] for a four layer hierarchical neural network of
formed through synaptic plasticity and adaptive network structure. An event driven
approach is used to optimize computation speed in order to simulate networks with large
number of spiking neurons. Another sequential learning algorithm for SNN and its ap-
plication to taste recognition has been presented in [34]. This algorithm is developed
based on integrate-and-re neurons with rank order coded inputs. The inuence of in-
also explored.
There are number of unaddressed issues in above sequential algorithms for SNN [33,
34], such as ne tuning of learning parameters, automatic update of learning parameters
in continuously changing environments (as these are set manually), improving learning
speed for large size data sets, and the eect of handling imbalanced data sets on the
training performance.
In [35], a novel self-adaptation system has been presented to train a real mobile robot
the Spike Timing Dependent Plasticity (STDP) property. The spike response model is
used and the trained SNNs are stored in a tree-type memory structure that are used
as experiences for the robot to enhance its navigation ability in new and previously
trained environments. The memory was designed to have a simple searching mechanism.
Forgetting and online dynamic clustering techniques are used in order to control the
memory size. The system used the minimum network structure required for performing
obstacle avoidance task and its synaptic weights are changed online. However, more
17
Chapter 2. Literature Review
an incremental SVM has been presented in [38]. Incremental and decremental SVM
algorithm referred as single incremental decremental SVM (SIDSVM) has been presented
in [39]. It uses an on-line recursive algorithm for training SVMs, and it handles one
training samples, while `adiabatically' adding a new training samples to the solution.
The drawback of SIDSVM algorithm is when multiple training samples are added/removed
it will repeat the updating operation for each single training sample. It often requires
higher computational cost for real-time implementation. To overcome the above draw-
back, a multiple incremental decremental SVM (MIDSVM) algorithm has been proposed
the optimization literature [42]. Here, multiple samples are added or removed simul-
taneously and is faster than the conventional incremental decremental support vector
based learning other than SVM based learning algorithms. One of the rst Kernel Least
Mean Square (KLMS) algorithm is presented in [43]. In KLMS algorithm, LMS has
lter built from a weighted sum of kernel functions evaluated at each incoming data
sample. The mean square convergence study of the KLMS algorithm is presented in [44].
The major drawback of KLMS algorithm is with time, the size of the lter as well as
the computational eort and requirement of memory increases. To overcome the above
18
Chapter 2. Literature Review
drawback an ecient method to constrain the increase in the length of RBF is proposed
uses sequential Gaussian elimination method to test the linear dependency of the each
new sample feature vector with all the previous samples feature vectors. The extended
and quantized versions of KLMS algorithm are presented in [46, 47, 48, 49, 50, 51, 52].
proposed in [51], here surprise quanties the amount of information a sample contains
given a system state. Quantized KLMS (QKLMS) algorithm is proposed in [49], which is
to compress the input (or feature) space of the kernel adaptive lters so as to control the
and growing strategy is proposed based on signicance measure to constrain its network
size.
Multi-Category Radial Basis Function network (SMC-RBFN) [53] algorithm has been
developed exclusively to solve classication objectives. As the training samples are pre-
sented, SMC-RBFN adds the new hidden neurons or updates the network parameters.
In the growth criterion, SMC-RBFN considers the similarity measure within class, mis-
classication rate and prediction error. SMC-RBFN uses the hinge loss function instead
of the mean square loss function for a more accurate estimate of the posterior probabil-
ity. New hidden neuron parameters are allocated similar to RAN algorithm. Otherwise,
network parameters of nearest hidden neuron of the same class are updated using DEKF.
source Allocation Network (SRAN) [9]. As each training sample is presented to the
SRAN, based on the sample hinge error, the sample is either used for network training
(growing/update) immediately or pushed to the rear end of the stack for learning in
future or deleted from the data set. SRAN uses self-adaptive error based control pa-
rameters to identify reduced training sample sequence with signicant information and
removes the less signicant samples (which are similar to the stored knowledge in the
19
Chapter 2. Literature Review
misclassication error and hinge loss error. New hidden neuron parameters are allocated
similar to other sequential algorithms and all the network parameters are updated using
EKF. Otherwise, the sample is pushed to the rear end of the stack, to be presented to
the network in future. These reserved samples can be used to ne-tune the network
parameters.
The Table 2.1 summarizes the key dierence among the all the sequential learning
20
Chapter 2. Literature Review
2.2 Summary
In this chapter, we presented the dierent sequential learning algorithms for neural net-
works. We then categorized sequential learning algorithms as: error driven algorithms,
neuron signicance based algorithms, ELM based algorithms, spiking neural networks
algorithms, KLMS based algorithms, and sequential classication algorithms are dis-
cussed along with incremental decremental SVM Algorithms. All the sequential learning
algorithms for neural networks presented in literature addresses the technique used to
learn the information contained in the training sample, eciently. But they do not self-
regulate their learning. In literature, it has been shown that a self-regulated learner
using meta-cognition is the best learner. In next chapter, we give an overview of human
21
Chapter 3
An Overview on Meta-cognition
In the previous chapter, a complete literature survey on sequential learning algorithms for
neural networks was presented. Existing sequential learning algorithms for radial basis
function neural networks use all the samples in the training data set to gain knowledge
about the information contained in the samples. In other words, they possess information-
problem-solving, and these abilities are cognitive in nature. However, recent studies on
human learning have revealed that the learning process is eective when the learners
about their cognitive processes, develop new strategies to improve their cognitive skills
and evaluate the information contained in their memory. This chapter gives an overview
concerning one's own cognitive processes or anything related to them'. The recent
22
Chapter 3. metacognition
mental processes such that one can monitor, regulate, and direct them to a desired
goal.
monitoring, and meta-cognitive control. The denitions of these terms are as follow:
nitive activity. Such as, judging whether you are approaching the correct
solution to a problem and assessing how well you understand? What you are
reading? [5].
of various meta-cognition models are reported in [7]. Among the various models, the
model proposed by Nelson and Narens in [8] is simple and clearly highlights the various
the meta-cognition in human-beings and has two components, the cognitive component
and the meta-cognitive component. The information ow from the cognitive component
the reverse direction is considered control. The information owing from the meta-
cognitive component to the cognitive component either changes the state of the cognitive
component or changes the cognitive component itself. Monitoring informs the meta-
cognitive component about the state of cognitive component, thus continuously updating
state'.
23
Chapter 3. metacognition
} Meta−cognitive component
Monitoring Control
} Flow of information
} Cognitive component
the learners adopt self-regulation in learning process using meta-cognition [3, 4]. In the
meta-cognitive learning, learner controls the learning process by planning and selecting
learning strategies and monitors their progress by analyzing the eectiveness of the pro-
posed learning strategies. When necessary, these strategies should be adapted appropri-
knowledge, judge when to start and stop learning by emphasizing best learning strategy.
Hence, there is a need to develop a meta-cognitive neural network classier that is capa-
ble of deciding what-to-learn, when-to-learn and how-to-learn the decision function from
signicant samples using misclassication error and hinge loss error. It has been shown
that the selecting appropriate samples for learning and removing repetitive samples help
the three components of human learning with suitable learning strategies would improve
the generalization ability of a neural network. The drawbacks in the existing sequential
learning algorithms are: a) The samples for training are selected based on simple error
criterion which is not sucient to address the signicance of samples; b) The new hidden
24
Chapter 3. metacognition
neuron center is allocated independently which may overlap with already existed neuron
centers leading to misclassication; c) Knowledge gained from past samples is not used;
algorithm for a radial basis function neural network based on generic framework of meta-
3.4 Summary
In this chapter, we presented an overview of meta-cognition including the denitions
of major concepts. Next, the models of meta-cognitive learning are reviewed. Finally,
this thesis, a radial basis function neural network which can self-regulate its learning
network. In the next chapter, we introduce such a meta-cognitive radial basis function
network and present its sequential learning algorithm to perform classication tasks.
25
Chapter 4
Meta-cognitive Radial Basis Function
Network and Its EKF Based Sequential
Learning Algorithm for Classication
Problems
4.1 Introduction
In the previous chapter, an overview of human meta-cognition, models of meta-cognition
and motivation for meta-cognitive learning in neural network framework was presented.
velop new strategies to improve their cognitive skills and evaluate the information con-
tained in their memory. If a radial basis function network analyzes its cognitive process
and chooses suitable learning strategies adaptively to improve its cognitive process then it
ter focuses on the development of McRBFN and its Extended Kalman Filter (EKF) based
Given stream of training data samples, {(x1 , c1 ) , · · · , (xt , ct ) , · · · }, where xt = [xt1 , · · · , xtm ]T ∈
<m is the m-dimensional input of the tth sample, and ct ∈ (1, n) is its class label. Where n
26
Chapter 4. EKF-McRBFN
(a) (b)
Figure 4.1: (a) Nelson and Narens model of meta-cognition (b) EKF-McRBFN model
T
is the total number of classes. The coded class labels yt = y1t , · · · , yjt , · · · , ynt ∈ <n
are given by:
1 if ct = j
yjt = j = 1, · · · , n (4.1)
−1 otherwise
The objective of a classier is to approximate the underlying decision function that maps
xt ∈ <m → yt ∈ <n .
architecture is developed based on the Nelson and Narens meta-cognition model [8]. Fig.
4.1(a) shows the Nelson and Narens meta-cognition model which is analogous to the
and Narens meta-cognition model as shown in Fig. 4.1(b). Similar to Nelson and Narens
model, EKF-McRBFN has two components as shown in schematic diagram in Fig. 4.2,
namely the cognitive component and the meta-cognitive component. The cognitive com-
ponent of EKF-McRBFN is a three layered feed forward radial basis function network
with Gaussian activation function in the hidden layer as shown in Fig. 4.3. The meta-
cognitive component contains copy of the cognitive component. When a new training
sample arrives, the meta-cognitive component of EKF-McRBFN predicts the class label
27
Chapter 4. EKF-McRBFN
Meta−cognitive component
00
11
00
11 Predicted class label Sample Delete
000
111
00
11 00
11
1
0000
111
00
11
000
111 00
11
0
1 00
11
000
1110
1 0
1
00
11
0
1
0
1 00
110
100
11 Confidence of classifier Neuron Growth
0111
1000
00
11
000
111
00
11
0
1
0
1
0
1
00
11
0
1
0
1
0
1000
111
00
110
100
11
0
1
000
111
001
11
0111
1 000
11
0
1 Parameter Update
000
00
11 00
11
011
100
Maximum hinge error
000
111
00
11 00
11
000
111
00 11
11 00 Class−wise significance Sample Reserve
Monitoring Control
(Best learning strategy)
Cognitive component
New sample
( x t, c t )
h(x t ) w11
t
x1t 1 y^1
Σ
Data stream
(x t, yt ) x2t h(x t ) ct
^
2
yt
^
Decision
Device
y^nt
xmt Σ
h (x t ) wΚn
K
and estimates the knowledge present in the new training sample with respect to the cog-
suitable learning strategy, for the current sample. Thereby, addressing the three funda-
system, where the meta-cognition provides appropriate learning strategies based on the
EKF-McRBFN begins with zero hidden neuron and selects suitable strategy for each
sample to achieve the objective. First, we present the cognitive component and next we
function network. The input layer maps all features to the hidden layer without doing any
28
Chapter 4. EKF-McRBFN
h(x t ) w11
t
x1t 1 y^1
Σ
x2t h(x t )
2
ynt
^
t
xm Σ
h (x t ) wΚn
K
transformation, the hidden layer employs Gaussian activation function and the output
builds K Gaussian neurons from t−1 training samples. For given training sample xt ,
t
T
the predicted output ( ŷ = ŷ1t , · · · , ŷjt , · · · , ŷnt ) of EKF-McRBFN classier with K
hidden neurons is
K
X
ybjt = wkj htk , j = 1, · · · , n (4.2)
k=1
where wkj is the weight connecting the k th hidden neuron to the j th output neuron and
kxt − µlk k2
htk = exp − , k = 1, · · · , K (4.3)
(σkl )2
where µlk ∈ <m is the center and σkl ∈ <+ is the width of the k th hidden neuron. Here,
l
The objective is to estimate the number of hidden neurons (K), neuron centers ( µk ),
l
widths (σk ) and output weights (w) of the network, such that network approximates the
knowledge measures and self-regulated thresholds. During the learning process, meta-
29
Chapter 4. EKF-McRBFN
cognitive component monitors the cognitive component and updates its dynamic model
th
of the cognitive component. When a new training sample ( t ) sample is presented to the
edge present in the new training sample with respect to the cognitive component using
ct ),
its knowledge measures. The meta-cognitive component uses predicted class label ( b
t
b(ct |xt )) and class-wise signicance
maximum hinge error ( E ), condence of classier ( p
(ψc ) as the measures of knowledge in the new training sample. Self-regulated thresholds
are adapted to capture the knowledge presented in the new training sample. Using the
structs two sample based learning strategies and two neuron based learning strategies.
One of these strategies is selected for the new training sample such that the cognitive
ct = arg
b max ybjt (4.4)
j∈1,··· ,n
Maximum hinge error (E t ): The objective of the classier is to minimize the error be-
been shown in [56, 57] that the classier developed using hinge loss error estimates the
posterior probability more accurately than the classier developed using mean square er-
T
ror. Hence, in EKF-McRBFN, we use the hinge loss error et = et1 , · · · , etj , · · · , etn ∈
<n dened as
yjt ybjt > 1
0 if
etj = j = 1, · · · , n (4.5)
yj − ybjt
t
otherwise
where yjt is the actual output and ybjt is the predicted output at the j th -neuron for tth -
sample.
t
The maximum absolute hinge error ( E ) is given by
E t = max etj (4.6)
j∈1,2,··· ,n
30
Chapter 4. EKF-McRBFN
where etj is the hinge error at the j th -neuron for tth -sample.
Condence of classier (pb(ct |xt )): The condence level of classication or predicted
dimensional sphere as shown in [58]. The knowledge or spherical potential of any sample
S [59].
the feature space S. Let the center of the K -dimensional feature space be φ0 =
1 PK
K k=1 φ(µk ). The knowledge present in the new data xt can be expressed as the poten-
tial of the data in the original space, which is squared distance from the K -dimensional
feature space to the center φ0 . The potential ( ψ ) is given as
K K
t t
2 X t
1 X
, µlk h µlk , µlr
ψ = h x ,x − h x + (4.9)
K k=1
K2 k,r=1
Where, Gaussian kernel h xt , µlk is expressed as exp −kxt − µlk k2 /(σkl )2 . From
t
the above equation, we can see that for Gaussian function the rst term ( h (x , xt )) and
1 PK
last term (
K2 k,r=1 h µlk , µlr ) are constants. Since potential is a measure of novelty,
K
2 X
h xt , µlk
ψ≈− (4.10)
K k=1
31
Chapter 4. EKF-McRBFN
vital role and it will inuence the performance of the classier signicantly [53]. Hence,
we use the measure of the spherical potential of the new training sample xt belonging
to class c with respect to the neurons associated to same class (i.e., l = c). Let Kc be
the number of neurons associated with the class c, then class-wise spherical potential or
K c
1 X
h xt , µck
ψc = (4.11)
Kc k=1
We can observe that negative sign and constant 1/2 in Eq. 4.10 is removed in Eq. 4.11,
because the measure of spherical potential is not eected. Also note that the measure
of the spherical potential is dierent from the potential function method referred in [60].
The spherical potential explicitly indicates the knowledge contained in the sample, a
higher value of spherical potential (close to one) indicates that the sample is similar
to the existing knowledge in the cognitive component and a smaller value of spherical
Meta-cognitive component devices various learning strategies using the knowledge mea-
sures and self-regulated thresholds, which directly addresses the basic principles of self-
one of the following four learning strategies for the new training sample.
• Sample delete strategy : If the new training sample contains information similar
to the knowledge present in the cognitive component, then delete the new training
sample from the training data set without using it in the learning process.
• Neuron growth strategy : Use the new training sample to add a new hidden
• Parameter update strategy : The new training sample is used to update the
32
Chapter 4. EKF-McRBFN
• Sample reserve strategy : The new training sample contains some information
but not signicant, they can be used at later stage of the learning process for ne
tuning the parameters of the cognitive component. These samples may be discarded
without learning or used for ne tuning the cognitive component parameters in a
later stage.
Most of the existing sequential learning algorithms address only neuron addition/pruning
strategies help to achieve the best human learning ability. Also, these strategies are
adapted such that it suits to the current training sample. Since, the meta-cognitive
The principle behind these four learning strategies is described in detail below:
• Sample delete strategy : Prevents similar samples from being learnt, which
avoids over-training and reduces the computational eort. When the predicted
class label of the new training sample is same as the actual class label and the con-
dence level (estimated posterior probability) is greater than expected value then
the new training sample does not provide additional information to the classier
and can be deleted from training sequence without being used in learning process.
learning process. If one selects βd close to 1 then no sample will be deleted and
all the training samples participates in the learning process which results in over-
training with similar samples. Reducing βd beyond the desired accuracy results
in deletion of too many samples from the training sequence. But, the resultant
network may not satisfy the desired accuracy. Hence, it is xed at the expected
accuracy level. In our simulation studies, it is selected in the range of [0.9 - 0.95].
The sample deletion strategy prevents learning of samples with similar information,
33
Chapter 4. EKF-McRBFN
• Neuron growth strategy : When the new training sample contains signicant
information and the estimated class label is dierent from the actual class label
then one need to add new hidden neuron to capture the knowledge. The neuron
where βc is the knowledge threshold and βa is the addition threshold. The thresh-
olds βc and βa allows samples with signicant knowledge for learning rst then uses
the other samples for ne tuning. If βc is chosen closer to zero and the initial value
of βa is chosen closer to the maximum value of hinge error (i.e., 2 because class
labels are coded to -1 or 1), then very few neurons will be added to the network.
Such a network will not approximate the function properly. If βc is chosen closer
to one and the initial value of βa is chosen closer to the minimum value of hinge
error, then the resultant network may contain many neurons with poor generaliza-
tion ability. Hence, the range for the knowledge threshold can be selected in the
interval [0.3 - 0.7] and the range for the initial value of addition threshold can be
where δ is the slope that controls rate of self-adaptation and is set close to 1.
βa allows samples with signicant knowledge for learning rst then uses the other
samples for ne tuning. The hypothesis behind Eq. (4.14) is as the learning process
progresses the network uses samples with higher hinge error than initial for neuron
growth.
If growth criterion given in Eq. (4.13) is satised, then a new hidden neuron K +1
is added and its parameters are initialized as explained below. Existing learning
algorithms in the literature do not consider overlapping and distinct cluster criterion
in assigning the new neuron parameters. However, the overlapping condition will
signicantly inuence the performance. The new training sample may have overlap
34
Chapter 4. EKF-McRBFN
with other classes or will be from a distinct cluster far away from the nearest neuron
in the same class. Hence, EKF-McRBFN measures inter/intra class nearest neuron
distances from the current sample in assigning the new neuron parameters. The
Let nrS be the nearest hidden neuron in the intra-class (i.e, l == c) with center
c c
(µnrS ) and width (σnrS ) parameters, and nrI be the nearest hidden neuron in
l l
the inter-class (i.e, l 6= c) with center ( µnrI ) and width (σnrI ) parameters. They
are dened as
Let the Euclidian distances between the new training sample to nrS and nrI be
given as follows
conditions as in four categories, Fig. 4.4 shows pictorially the distribution of intra-
class (same class), inter-class (dierent class) and four dierent samples for each
overlapping condition:
Distinct sample : when a new training sample is far away from both intra/inter
training sample does not overlap with any class cluster, and is forms a new
distinct cluster. In Fig. 4.4, square symbol represents this case of training
c c
sample. In this case, the new hidden neuron center ( µK+1 ), width (σK+1 )
p
µcK+1 = xt ; σK+1
c
= max(0.00001, κ xt T xt ); wK+1 = et (4.19)
the hidden units in the input space, which lies in the range 0.5 ≤ κ ≤ 1.
35
Chapter 4. EKF-McRBFN
Distinct sample
Intra−class Inter−class
est neuron then the sample does not overlap with the other classes, i.e., the
intra/inter class distance ratio is less than 1, then the sample does not overlap
with the other classes. In Fig. 4.4, plus symbol represents this case of training
c c
sample. In this case, the new hidden neuron center ( µK+1 ), width (σK+1 )
µcK+1 = xt ; σK+1
c
= κkxt − µcnrS k; wK+1 = et (4.20)
Minimum overlapping with the inter-class : when a new training sample is close
i.e., the intra/inter class distance ratio is in range 1 to 1.5, then the sample
has minimum overlapping with the other class. In Fig. 4.4, cross symbol
represents this case of training sample. In this case, the center of the new
hidden neuron is shifted away from the inter-class nearest neuron and shifted
36
Chapter 4. EKF-McRBFN
where ζ is center shift factor which determines how much center has to be
shifted from the new training sample location. It lies in range [0.01-0.1].
In this case, the center new hidden neuron shifted from the position of new
training sample, the weight parameter of the new hidden neuron is calculated
as
where
!
kxt − µcK+1 k2
htk+1 = exp − c
(4.23)
(σK+1 )2
very close to the inter-class nearest neuron compared to the intra-class nearest
neuron, i.e., the intra/inter class distance ratio is more than 1.5, then the
sample has signicant overlapping with the other class. In Fig. 4.4, triangle
symbol represents this case of training sample. In this case, the center of the
new hidden neuron is shifted away from the inter-class nearest neuron and is
initialized as
(4.22).
The above mentioned center and width determination conditions help in minimizing
satised.
hinge error (i.e., 1), then very few samples will be used for adapting the network
37
Chapter 4. EKF-McRBFN
parameters and most of the samples will be pushed to the end of the training
sequence. The resultant network will not accurately approximate the function. If
a lower value is chosen, then all samples will be used in updating the network
parameters without altering the training sequence. Hence, the range for the initial
value of update threshold can be selected in the interval [0.4 - 0.7]. The βu is
where δ is the slope that controls the rate of self-adaption of update threshold and
parameters
αt = αt−1 + Gt et (4.27)
where et is the error obtained from the hinge loss function for tth sample and
−1
Gt = Pt Bt R + (Bt )T Pt Bt
(4.28)
noise, Pt ∈ <z×z is the error covariance matrix, Bt is the partial derivatives for
minima.
38
Chapter 4. EKF-McRBFN
When a new hidden neuron is added, the dimensionality of error covariance matrix
Pt is increased to
Pt−1
z×z 0z×(m+n+1)
(4.31)
0(m+n+1)×z p0 I(m+n+1)×(m+n+1)
where I is the identity matrix, 0 is the zero matrix and p0 is the initial estimated
uncertainty.
• Sample reserve strategy : If the new training sample does not satisfy either
the deletion or the neuron growth or the cognitive component parameters update
criterion, then the sample is pushed to the rear of the training sequence. Since,
Ideally, training process stops when no further sample is available in the data
remain same.
code 1:
insignicant samples from data stream, neuron growth strategy and parameters update
strategy addresses the how-to-learn by which the cognitive component learns from the
reserve strategy addresses the when-to-learn by presenting the samples in the learning
mance of the EKF-McRBFN and provide some guidelines for their initialization.
39
Chapter 4. EKF-McRBFN
40
Chapter 4. EKF-McRBFN
Neuron
Growing
Region
Misclassification
Region
1.3
1
Parameters
Updating
Region
βd
spherical potential is close to 1 then sample is similar to existing knowledge, lesser value
of spherical potential indicates that the sample is novel. If one selects the threshold βc
close to zero then the network does not allow addition of neurons. Similarly, if one selects
the threshold βc close to one then all samples will be identied as novel sample. Hence,
accurately with high condences. The self-regulated addition threshold ( βa ) and update
threshold (βu ) are used to select appropriate samples for ecient learning. βd , βa and βu
t t
thresholds depends on hinge error ( E ). Note that the hinge error E is between [0, 2].
The characteristics of thresholds and their inuence on EKF-McRBFN performance can
be explained by dividing the error region into three sub regions, namely, the sample
deleting region, the parameter updating region, and the neuron growing region, as shown
in Fig. 4.5.
41
Chapter 4. EKF-McRBFN
to existing knowledge in the cognitive component. In that case the current sample is
deleted from the training sequence without being used in the learning process. Condence
t
of classier decreases from 1 to 0 as hinge error ( E ) increases from 0 to 2 as shown in Fig.
4.5. Suppose one selects the deletion threshold to be 0.5 then many samples satisfying the
condition will be deleted without being used in learning. Hence, the resultant classier
may not provide better generalization performance. If one selects close to 1, say 0.99 then
most of the similar samples will be used in learning, which will result in over training.
The addition threshold βa is combined with other conditions, which measure mis-
classication and knowledge. The minimum possible prediction error one can get when
threshold βa close to 1 then all samples misclassied will be used for neuron addition. If
one selects the threshold βa close to 2 then very few neurons will be added and the re-
sultant network may not approximate the decision surface. Note that the meta-cognitive
component adapts addition threshold such that new hidden neuron added for a sample
with higher error. Hence, the initial value of addition threshold ( βa ) can be selected in
EKF-McRBFN updates the parameters when the predicted class is accurate and the
t
hinge error (E ) is greater than βu . When predicted class label is accurate then value
be used in updating and hence the resultant network does not approximate the decision
function. If one selects the threshold βu close to 0 then all samples will be used for
updating. EKF-McRBFN updates the parameters using samples, which produces higher
error. Hence, the initial value of update threshold ( βu ) can be selected in the range
[0.4-0.7].
4.5 Summary
In this chapter, we have presented the sequential learning algorithm for meta-cognitive
radial basis function network using EKF based on human meta-cognitive learning princi-
42
Chapter 4. EKF-McRBFN
choosing suitable strategy for training the cognitive component in EKF-McRBFN. Also,
the learning strategies consider sample overlapping condition for proper initialization of
component appropriately adapts the learning strategies and hence it eciently decides
present in neuron growth strategy helps in proper initialization of new hidden neurons
parameters and also minimize the misclassication error. The main drawbacks in EKF-
McRBFN classier is knowledge gained from past samples is not used properly and also
In the next chapter, a fast and ecient projection based sequential learning algorithm
43
Chapter 5
Projection Based Learning Algorithm
for Meta-cognitive RBF Network
Classier
5.1 Introduction
In the previous chapter, an EKF based sequential learning algorithm for meta-cognitive
radial basis function network (EKF-McRBFN) classier based on the principles of hu-
man meta-cognition was presented. Therein, the meta-cognitive learning helps the radial
neuron parameters minimizes the misclassication. Also, the knowledge measures and
self-regulated thresholds help the network to approximate the underlying function e-
However, EKF-McRBFN does not use the knowledge gained from past samples and
also uses a computationally intensive EKF for parameter update. To overcome these
drawbacks, in this chapter we introduce a fast and ecient Projection Based Learning
(PBL) algorithm for McRBFN. The McRBFN using the PBL to obtain the network
44
Chapter 5. PBL-McRBFN
parameters (center and width) are determined based on the current sample and the
output weights are estimated using the projection based algorithm. When a new neuron
in projection based learning. There-by, the proposed algorithm exploits the knowledge
stored in the network for proper initialization. The problem of nding optimal weights
and real calculus. The Projection Based Learning (PBL) algorithm then converts the
linear programming problem into a system of linear equations and provides a solution
for the optimal weights, corresponding to the minima of the error function.
rithm for learning process instead of computationally intensive extended Kalman lter.
Projection based learning algorithm : The projection based learning algorithm works
on the principle of minimization of error function and nds the optimal network output
parameters for which the error function is minimum, i.e, the network achieves the mini-
The considered error function is the sum of squared hinge loss error at output neurons.
n
X 2
Jt = etj , t = 1, 2, · · · (5.1)
j=1
h iT
t t t t
Where e = e1 , · · · , ej , · · · , en ∈ <n is the hinge loss error dened as
45
Chapter 5. PBL-McRBFN
From the denition of hinge loss error in Eq. (5.2), Jt becomes zero when yjt ybjt > 1 and
when yjt ybjt < 1, Jt becomes
n K
!2
X X
Jt = yjt − wkj htk (5.3)
j=1 k=1
t t n K
!2
1X 1 XX X
J (W) = Ji = yji − wkj hik (5.4)
2 i=1
2 i=1 j=1 k=1
where hik is the response of the kth hidden neuron for ith training sample.
∗
The optimal output weights ( W ∈ <K×n ) are estimated such that the total error
to zero, i.e.,
∂J (W)
= 0, p = 1, · · · , K; j = 1, · · · , n (5.6)
∂wpj
Equating the rst partial derivative to zero and re-arranging we get
X t
K X t
X
hik hip wkj = hip yji (5.7)
k=1 i=1 i=1
K
X
akp wkj = bpj , p = 1, · · · , K; j = 1, · · · , n (5.8)
k=1
AW = B (5.9)
t
X
akp = hik hip , k = 1, · · · , K; p = 1, · · · , K (5.10)
i=1
46
Chapter 5. PBL-McRBFN
t
X
bpj = hip yji , p = 1, · · · , K; j = 1, · · · , n (5.11)
i=1
Eq. (5.8) gives the set of K×n linear equations with K×n unknown output weights
W. The proof for A matrix invertible is given at the end of this chapter.
The solution for W obtained as a solution to the set of equations as given in Eq. (5.9)
∂2J
is minimum, if
∂wlp 2
> 0. The second derivative of the error function ( J ) with respect
t t
∂ 2 J (W) X X
= hip hip = |hip |2 > 0 (5.12)
∂wlp 2 i=1 i=1
As the second derivative of the error function J (W) is positive, the following observations
can be made from Eq. (5.12):
2. The output weight W∗ obtained as a solution to the set of linear equations (Eq.
The solution for the system of equations in Eq. (5.9) can be determined as follows:
W∗ = A−1 B (5.13)
however the principles underlying the learning strategies are modied for PBL algorithm.
The principle behind the learning strategies in PBL-McRBFN is described in detail below:
• Sample delete strategy : Prevents similar samples from being learnt, which
avoids over-training and reduces the computational eort. When the predicted
47
Chapter 5. PBL-McRBFN
class label of the new training sample is same as the actual class label and the con-
dence level (estimated posterior probability) is greater than expected value then
the new training sample does not provide additional information to the classier
and can be deleted from training sequence without being used in learning process.
learning process. The sample deletion strategy prevents learning of samples with
similar information, and thereby, avoids over-training and reduces the computa-
tional eort.
• Neuron growth strategy : When a new training sample contains signicant in-
formation and the predicted class label is dierent from the actual class label then
one need to add a new hidden neuron to represent the knowledge contained in the
where βc is the knowledge threshold and βa is the addition threshold. The thresh-
olds βc and βa allows samples with signicant knowledge for learning rst then
The PBL-McRBFN growth strategy in Eq. (5.15) is slightly dierent from EKF-
McRBFN, here a new neuron is added when the class labels are dierent even if
the hinge error is low. This change is will reduce the misclassication error.
where δ is the slope that controls rate of self-adaptation and is set close to one. The
48
Chapter 5. PBL-McRBFN
The center and width parameters of the new neuron ( K + 1) are initialized based
better continuity, the overlapping conditions are also described in this section. Ex-
isting learning algorithms in the literature do not consider overlapping and distinct
cluster criterion in assigning the new neuron parameters. However, the overlapping
condition will signicantly inuence the performance. The new training sample
may have overlap with other classes or will be from a distinct cluster far away from
the nearest neuron in the same class. Hence, PBL-McRBFN measures inter/intra
class nearest neuron distances from the current sample in assigning the new neuron
parameters.
Let nrS be the nearest hidden neuron in the intra-class (i.e, l == c) with center
c c
(µnrS ) and width (σnrS ) parameters, and nrI be the nearest hidden neuron in
l l
the inter-class (i.e, l 6= c) with center ( µnrI ) and width (σnrI ) parameters. They
are dened as
Let the Euclidian distances between the new training sample to nrS and nrI are
given as follows
conditions as follows:
Distinct sample : when a new training sample is far away from both intra/inter
training sample does not overlap with any class cluster, and is forms a new
c
distinct cluster. In this case, the new hidden neuron center ( µK+1 ) and width
c
(σK+1 ) parameters are determined as
p
µcK+1 = xt ; σK+1
c
= max(0.00001, κ xt T xt ) (5.21)
49
Chapter 5. PBL-McRBFN
the hidden units in the input space, which lies in the range 0.5 ≤ κ ≤ 1.
est neuron then the sample does not overlap with the other classes, i.e., the
intra/inter class distance ratio is less than 1, then the sample does not overlap
c
with the other classes. In this case, the new hidden neuron center ( µK+1 ) and
c
width (σK+1 ) parameters are determined as
µcK+1 = xt ; σK+1
c
= κkxt − µcnrS k (5.22)
Minimum overlapping with the inter-class : when a new training sample is close
i.e., the intra/inter class distance ratio is in range 1 to 1.5, then the sample
has minimum overlapping with the other class. In this case, the center of the
new hidden neuron is shifted away from the inter-class nearest neuron and
where ζ is center shift factor which determines how much center has to be
shifted from the new training sample location. It lies in range [0.01-0.1].
very close to the inter-class nearest neuron compared to the intra-class nearest
neuron, i.e., the intra/inter class distance ratio is more than 1.5, then the
sample has signicant overlapping with the other class. In this case, the
center of the new hidden neuron is shifted away from the inter-class nearest
The above mentioned center and width determination conditions help in minimizing
50
Chapter 5. PBL-McRBFN
The existing sequential learning algorithms initialize output weight as error based
on the current sample. The inuence of past samples is not considered in weight
The above mentioned issue is dealt in this chapter as existing knowledge of past
samples stored in the network as neuron center is used to initialize the weight of
knowledge of past samples stored in the network the output weights are estimated
after learning, but the information present in the past samples is stored in the
network. The centers of neuron provide the distribution of past samples in feature
space. These centers can be used as pseudo-samples to capture the eect of past
K+1
!
X kµli − µlp k2
aK+1,p = hiK+1 hip , p = 1, · · · , K where hip = exp −
i=1
(σpl )2
(5.26)
+
and aK+1,K+1 ∈ < value is assigned as
K+1
X
aK+1,K+1 = hiK+1 hiK+1 (5.27)
i=1
t−1 T T
BK×n + (ht ) (yt )
Bt(K+1)×n = (5.28)
bK+1
and bK+1 ∈ <1×n is a row vector assigned as
K+1
X
bK+1,j = hiK+1 ỹji , j = 1, · · · , n (5.29)
i=1
51
Chapter 5. PBL-McRBFN
given as
1 if l = j
ỹji = j = 1, · · · , n (5.30)
−1 otherwise
Finally, the output weights are estimated as
t
−1
WK
t = At(K+1)×(K+1) Bt(K+1)×n (5.31)
wK+1
−1
t
At−1
T
aTK+1 Bt−1
T T
t t t t
WK K×K + (h ) h K×n + (h ) (y )
=
t
wK+1 aK+1 aK+1,K+1 bK+1
(5.32)
t t
where WK is the output weight matrix for K hidden neurons, and wK+1 is the
vector of output weights for new hidden neuron after learning from tth sample.
tities as
" −1 # −1 −1 T
aT aT
−1
AtK×K 0 1 AtK×K AtK×K
At(K+1)×(K+1) = +
0 0 ∆ 1 1
(5.33)
−1
where ∆ = aK+1,K+1 − aK+1 AK×K + (h ) h aTK+1 , AtK×K = AK×K
t−1 t T t t−1
+
T −1
(ht ) ht , and AtK×K is calculated as
t−1
−1 T −1
−1 −1 AK×K (ht ) ht At−1
K×K
AtK×K = At−1
K×K − −1 (5.34)
t−1
1 + ht AK×K (ht )T
After calculating inverse of matrix in Eq. (5.32) using Eqs. (5.33) &(5.34), the
−1
aTK+1 aK+1 h
" #
At−1
K×K t−1
−1 T T i
t
WK = IK×K + WK + At−1
K×K ht yt
∆
−1
t−1
AK×K aTK+1 bK+1
− (5.35)
∆
1 h −1 T i
aK+1 WK − bK+1
t t−1 t−1
T
wK+1 =− + AK×K ht yt (5.36)
∆
52
Chapter 5. PBL-McRBFN
where δ is the slope that controls the rate of self-adaption of parameter update and
When a sample is used to update the output weight parameters, the PBL algorithm
t t t
∂J1,t (WK ) ∂J1,(t−1) (WK ) ∂Jt (WK )
= + = 0, p = 1, · · · , K; j = 1, · · · , n
∂wpj ∂wpj ∂wpj
(5.39)
Equating the rst partial derivative to zero and re-arranging the Eq. (5.39), we get
t−1 t T t t t−1 t T t T
A + h h WK − B + h y =0 (5.40)
t−1 T
By substituting Bt−1 = At−1 WK & At−1 +(ht ) ht = At and adding/subtracting
T t−1
the term (ht ) ht WK on both sides the Eq. (5.40) reduced to
t
−1 t−1
T T t−1
WK = At A t WK + ht yt − ht WK (5.41)
t t−1
−1 T T
WK = WK + At ht et (5.42)
where et is the hinge loss error for tth sample obtained from Eq. (5.2).
in chapter 4. If the new training sample does not satisfy either the deletion or the
neuron growth or the cognitive component parameters update criterion, then the
current sample is pushed to the rear of the training sequence. Since PBL-McRBFN
53
Chapter 5. PBL-McRBFN
modies the strategies based on the current sample knowledge, these samples may
Ideally, training process stops when no further sample is available in the data
section 4.4.1.
code 2:
signicant samples from training data set, neuron growth strategy and parameters update
strategy address the how-to-learn eciently by which the cognitive component learns
the sample reserve strategy address the when-to-learn by presenting the samples in the
• The sample deletion strategy in both of the algorithms is same. The sample deletion
strategy helps the network to avoid over-training and save computational eort.
The sample deletion strategy address the what-to-learn, thus what-to-learn in both
• The sample reserve strategy in both of the algorithms is same. The sample reserve
cognitive thresholds. The meta-cognitive addition and update thresholds are also
54
Chapter 5. PBL-McRBFN
the predicted class label of sample dierent from actual class label and maximum
hinge error is greater than the knowledge threshold in addition to the novelty
the predicted class label of sample is dieren from actual class label or maximum
hinge error is greater than the knowledge threshold in addition to the novelty
condition. Hence, the PBL-McRBFN algorithm may add slightly more hidden
• In EKF-McRBFN algorithm, the new hidden neuron output weights are estimated
t
based on instantaneous prediction error ( e ). While in PBL-McRBFN algorithm,
the new hidden neuron output weights are estimated using existing knowledge of
past trained samples stored in the network. Thus knowledge gained from past
algorithm.
lter algorithm. In PBL-McRBFN, the existing neuron output weights are updated
• Neuron growth and parameter strategies in both algorithms are dierent, thus the
rithm uses past knowledge of trained samples in how-to-learn, thus its classication
performance will be better than the EKF-McRBFN algorithm.
56
Chapter 5. PBL-McRBFN
5.5 Summary
In this chapter, we have presented a Projection Based Learning (PBL) algorithm for
learning accurately estimates the output weight by direct minimization of hinge loss
error. Knowledge gained from the past samples is used in new hidden neuron parameters
classiers have been evaluated using a number of benchmark multi-category, binary clas-
57
Chapter 6
Performance Evaluation of
EKF-McRBFN and PBL-McRBFN
Classiers
In the previous chapters 4 and 5, we proposed two sequential learning algorithms for
meta-cognitive radial basis function neural network which are EKF-McRBFN and PBL-
and ecient.
and PBL-McRBFN with the best performing sequential learning algorithm reported in
the literature (SRAN) [9], batch ELM [18] and standard Support Vector Machine (SVM)
[37] on real-world benchmark binary and multi-category classication data sets from the
data sets with small and large number of samples, low and high dimensional features,
and balanced and unbalanced data sets in both binary and multi classication problems.
The detailed specications of 5 binary and 10 multi classication data sets are given
in Table 6.1. Note that the data sets are taken from UCI machine learning repository,
except for satellite imaging [56], global cancer map using micro-array gene expression
58
Chapter 6. Performance Evaluation of EKF-McRBFN and PBL-McRBFN
Table 6.1: Specication of benchmark binary and multi class data sets
Data sets # Features # Classes # Samples I.F Random
Training Testing Training Testing Trial
[62] and acoustic emission [63] data sets. The sample imbalance in training and testing
n
I.F = 1− ∗ min Nj (6.1)
N j=1···n
Pn
where Nj is the total number of training samples in class j and N = j=1 Nj .
For ecient comparison, we present them under the following categories as described
below:
• Binary class data sets : All considered binary class data sets have high sample
Low dimensional : Liver Disorders (LD), Pima Indian diabetes (PIMA) and
Breast Cancer (BC) are having low dimensional features with relatively smaller
High dimensional : Heart disease (HEART) and Ionosphere (ION) data sets
are having smaller number of training samples with high dimensional features.
59
Chapter 6. Performance Evaluation of EKF-McRBFN and PBL-McRBFN
• Multi class data sets : Considered 9 multi class data sets are grouped into three
categories:
Well balanced : Iris classication (IRIS), Image segmentation (IS) and Wine de-
termination (WINE) data sets have equal number of training samples per class.
These data sets are having varying number of features and training/testing
samples.
and Glass Identication (GI) data sets have lower dimensional features and
highly imbalanced training samples. The Global Cancer Mapping using micro-
array gene expression (GCM) is having high dimensional features with high
sample imbalance.
sication (SI) and Landsat Satellite (LAND) data sets have relatively large
EKF-McRBFN, SRAN, ELM and SVM classiers on all the data sets in MATLAB 2011
on a desktop PC with Intel Core 2 Duo, 2.66GHz CPU and 3GB RAM. The tuneable
on the training data sets. For ELM classier [18], the number of hidden neurons are
for batch SVM with Gaussian kernels are carried out using the LIBSVM package in C
[65]. For SVM classier, the parameters ( c,γ ) are optimized using grid search technique.
κ) are also optimized using grid search technique by cross-validating results on the train-
ing samples. Simulations on large data sets LETTER, SI and LAND are conducted in a
high-performance computer with Intel Xeon, 3.16 GHz CPU and 16 GB RAM.
60
Chapter 6. Performance Evaluation of EKF-McRBFN and PBL-McRBFN
signicance test on performance of multiple classiers on multiple data sets are used
for performance comparison. The confusion matrix Q is used to obtain the class-level
qjj
ηj = × 100% (6.2)
Nj
where qjj is the total number of correctly classied samples in the class j. The global
measures used in the evaluation are the average per-class classication accuracy (ηa ),
the over-all classication accuracy ( ηo ) and the geometric classication accuracy ( ηg )
dened as:
n
1X
ηa = ηj (6.3)
n j=1
Pn
j=1 qjj
ηo = × 100% (6.4)
√ N
ηg = n η1 η2 · · · ηn (6.5)
[66]. Since the developed classier is compared with multiple classiers over multiple data
sets, the Friedman test followed by the Benferroni-Dunn test are used to establish the
The Friedman test is used to compare multiple classiers ( U ) over multiple data sets
(V ). Let rij be the rank of the j th classier on the ith data set. The Friedman test
1 P j
compares the average ranks of classiers, Rj = V i ri . Under the null-hypothesis,
which states that all the classiers are equivalent and so their ranks Rj should be equal,
" #
12V X U (U + 1)2
χ2F = Rj2 − (6.6)
U (U + 1) j
4
61
Chapter 6. Performance Evaluation of EKF-McRBFN and PBL-McRBFN
(V − 1)χ2F
FF = (6.7)
V (U − 1) − χ2F
which follows the F -distribution with U − 1 (df1) and (U − 1)(V − 1) (df2) degrees
of freedom is used in this paper. F-distribution is dened as the probability distribution
of the ratio of two independent χ2 distributions over their respective degrees of freedom.
The aim of the statistical test is to prove that the performance of PBL-McRBFN classier
is substantially dierent from the other classiers with a condence level of value 1 − α.
If calculated FF > Fα/2,(U −1),(U −1)(V −1) or FF < F1−α/2,(U −1),(U −1)(V −1) , then the
null-hypothesis is rejected. The statistical tables for critical values can be found in [68].
The Benferroni-Dunn test [69] is a post-hoc test that can be performed after rejection
of the null-hypothesis. It is used to compare PBL-McRBFN classier against all the other
classiers. This test assumes that the performances of two classiers are signicantly
dierent if the corresponding average ranks dier by at least the critical dierence (CD)
s
U (U + 1)
CD = qα (6.8)
6V
√
where critical values qα are based on the studentized range statistic divided by 2
as given in [66].
as follows:
of neurons and samples used for PBL-McRBFN, EKF-McRBFN, SRAN, ELM and SVM
62
Chapter 6. Performance Evaluation of EKF-McRBFN and PBL-McRBFN
63
Chapter 6. Performance Evaluation of EKF-McRBFN and PBL-McRBFN
classiers on all the 5 binary class data sets are reported in Table 6.2. From the perfor-
mance comparison results in Table 6.2, one can see that in case of low dimensional LD
and PIMA data sets, the proposed PBL-McRBFN uses fewer samples for training and
McRBFN, 4-7 % improvement over SRAN, 1-4 % improvement over ELM and SVM with
less number of neurons. In case of simple BC data set, PBL-McRBFN uses fewer samples
improvement over SRAN, ELM and SVM and same performance as EKF-McRBFN. In
case of large dimensional HEART and ION data sets, PBL-McRBFN uses fewer samples
for training and achieves better generalization performance 1-2 % improvement over EKF-
McRBFN, 4-5 % improvement over SRAN, 6-9 % improvement over SVM and ELM. The
EKF-McRBFN helps in capturing the knowledge accurately in case of high sample im-
balance problems. We can notice that the PBL-McRBFN takes less computational time
McRBFN, EKF-McRBFN, SRAN, ELM and SVM classiers on all the 10 multi-category
data sets are reported in Table 6.3. From Table 6.3, one can see that in case of well
balanced IS, IRIS, and WINE data sets, PBL-McRBFN uses only 42-50 % training sam-
EKF-McRBFN, SRAN, SVM and ELM with the less number of neurons. Sample dele-
the training set and thereby improves the generalization performance. For highly un-
balanced data sets, the proposed PBL-McRBFN is able to achieve signicantly better
performance than the other classiers. In case of VC and GI data sets, PBL-McRBFN
uses fewer samples and achieves better generalization performance approximately 1-5 %
improvement over EKF-McRBFN and ELM, 3-12 % improvement over SRAN and SVM.
In case of low dimensional AE data set, PBL-McRBFN achieves slightly better general-
64
Chapter 6. Performance Evaluation of EKF-McRBFN and PBL-McRBFN
ELM. In case of large dimensional GCM data set, PBL-McRBFN achieves signicantly
SRAN and EKF-McRBFN. In case of large samples LETTER, SI and LAND data sets,
the generalization performance of PBL-McRBFN is better than ELM and SVM by ap-
proximately 2 %, 2-3% and 2-4% respectively using less number of training samples and
neurons. We can notice that the PBL-McRBFN takes less computational time than
EKF-McRBFN on all data sets. Due to computational complex EKF parameter update,
EKF-McRBFN and SRAN experiences memory problem for large problems like Letter,
SI and LAND. Hence, the results for EKF-McRBFN and SRAN on these problems are
From the Tables 6.2 and 6.3, we can say that the proposed PBL-McRBFN improves
SRAN, ELM and SVM classiers on various benchmark data sets, we employ a non-
parametric Friedman test followed by the Benforroni-Dunn test as described in [66]. The
Friedman test compares whether the mean of individual experimental condition diers
signicantly from the aggregate mean across all conditions. If the measured F -statistic is
greater than the critical F -Statistic at 95% condence level then one rejects the equality
of mean hypothesis (the classiers used in our study perform similarly on dierent data
sets). If Friedman test rejects the equality hypothesis, then pair-wise post-hoc should be
conducted to test which mean is dierent from others. We have used 5 dierent classiers
ranks based on ηo and ηa testing eciencies on 12 dierent data sets from the Table 6.1.
based on the overall testing eciency for each data set are provided in Table 6.4.
2
The Friedman statistic ( χF as in Eq. (6.6)) is 25.49 and modied (Iman and
sets, the modied statistic is distributed according to the F -distribution with 4 and
44 degrees-of-freedom. The critical value for rejecting the null hypothesis at 95 %
66
Chapter 6. Performance Evaluation of EKF-McRBFN and PBL-McRBFN
HEART 1 2 3 4 5
LD 2 1 5 3 4
PIMA 2 1 3 5 4
BC 1.5 1.5 3 5 4
ION 1 2 4 5 3
IS 1 2 3 5 4
IRIS 1 2 4 4 4
VC 1 2 4 3 5
GI 3 2 1 4 5
GCM 1 4 2 4 4
Table 6.5: Two-tailed critical values ( F -distribution) for the Friedman test at 95 % con-
dence level
df1=1 2 3 4 5 6 7 8 9 10
df2=44 5.385 4.016 3.429 3.093 2.871 2.711 2.591 2.496 2.419 2.355
(13.78 >> 3.09), we can reject the null hypothesis at a condence of level 95 %.
Hence, we can say that the performance of all 5 classiers is dierent on these 12
siers. Here, the proposed PBL-McRBFN classier is used as a control. From the
Table 6.6: Critical values for the Benferroni-Dunn test at 95 % condence level
# Classiers=2 3 4 5 6 7 8 9 10
q0.05 1.960 2.241 2.394 2.498 2.576 2.638 2.690 2.724 2.773
67
Chapter 6. Performance Evaluation of EKF-McRBFN and PBL-McRBFN
level and the reference qα values for dierent number of classiers are provided in
Table 6.6. From the Table 6.4, we can obtain the average ranks for all ve classiers
are PBL-McRBFN: 1.50, EKF-McRBFN: 2.00, SRAN: 3.29, ELM: 4.00, and SVM:
4.20. The dierence in average rank between the proposed PBL-McRBFN classi-
er and the other four classiers are PBL-McRBFN &EKF-McRBFN: 0.50, PBL-
2.70. Note that dierence in average rank for PBL-McRBFN &SRAN, PBL-McRBFN
&ELM and PBL-McRBFN &SVM pairs are greater than critical dierence (CD)
at 95% condence level, i.e, 1.79 > 1.61, 2.50 > 1.61 and 2.70 > 1.61. The dif-
at 95% condence level, i.e, 0.50 < 1.61. Hence, we can say that PBL-McRBFN
performs slightly better than the EKF-McRBFN classier and signicantly bet-
ter than SRAN, ELM and SVM classiers with a condence of 95 % based on the
based on the average testing eciency for each data set are provided in Table 6.7.
2
The Friedman statistic ( χF as in Eq. (6.6)) is 27.69 and modied statistic ( FF as
in Eq. (6.7)) is 16.99. Since, modied statistic is greater than the critical value
(16.99 >> 3.09), we can reject the null hypothesis at a condence of level 95 %.
Hence, we can say that the performance of all 5 classiers is dierent on these 12
From the Table 6.7, we can obtain the average ranks for all ve classiers and
are PBL-McRBFN: 1.16, EKF-McRBFN: 2.25, SRAN: 3.87, ELM: 3.58, and SVM:
4.12. The dierence in average rank between the proposed PBL-McRBFN classi-
er and the other three classiers are PBL-McRBFN &EKF-McRBFN: 1.09, PBL-
2.96. Note that dierence in average rank for PBL-McRBFN &SRAN, PBL-McRBFN
&ELM and PBL-McRBFN &SVM pairs are greater than critical dierence (CD)
at 95% condence level, i.e, 2.71 > 1.61, 2.42 > 1.61 and 2.96 > 1.61. The dif-
68
Chapter 6. Performance Evaluation of EKF-McRBFN and PBL-McRBFN
HEART 1 2 3 4 5
LD 1 2 5 3 4
PIMA 1 2 5 4 3
BC 1.5 1.5 3 5 4
ION 1 2 3 5 4
IS 1 2 3 5 4
IRIS 1 2 4 4 4
VC 1 2 4 3 5
GI 1 3 4 2 5
GCM 1 4 5 2 3
at 95% condence level, i.e, 1.09 < 1.61. Hence, we can say that PBL-McRBFN
performs slightly better than the EKF-McRBFN classier and signicantly bet-
ter than SRAN, ELM and SVM classiers with a condence of 95 % based on the
the PBL-McRBFN, EKF-McRBFN, ELM and SVM classiers on 12 dierent data sets
from the Table 6.1 except LETTER, SI and LANDSAT data sets. For this study, 10
random trial data sets are generated by maintaining the imbalance factor for 12 data
sets.
Binary class data sets : The performance measures such as overall ( ηo ), geometric (ηg )
testing eciencies and F -score, number of neurons and samples used for PBL-McRBFN,
EKF-McRBFN, ELM and SVM classiers on all the 5 binary class data sets are reported
in Table 6.8. From the performance comparison results in Table 6.8, one can see that
69
Chapter 6. Performance Evaluation of EKF-McRBFN and PBL-McRBFN
Table 6.8: PBL-McRBFN, EKF-McRBFN, SRAN, ELM and SVM classiers 10 random
trial results comparison on binary class data sets
Data # Neurons # Samples Testing
set Classier used(%) ηo ηg F − score
Mean Dev Mean Dev Mean Dev Mean Dev Mean Dev
SVM 44
a 7.39 100 0 77.3 2.69 77.35 2.56 0.76 0.02
HEART ELM 46.5 2.41 100 0 73.2 2.79 73.18 2.93 0.71 0.03
EKF-McRBFN 20.2 3.96 67.5 10.87 78.25 3.29 78.01 3.18 0.76 0.02
PBL-McRBFN 28.2 2.39 81.2 12.2 81.7 1.13 81.37 1.02 0.79 0.01
a
SVM 157.5 4.72 100 0 69.21 2.1 62.91 4.17 0.75 0.02
LD ELM 127 15.67 100 0 64.55 3.8 63.52 4.04 0.68 0.04
EKF-McRBFN 64.9 8.65 53.2 8.83 67.79 1.86 64.46 4.15 0.73 0.01
PBL-McRBFN 78.1 8.55 65.1 8.22 70.82 1.08 69.87 1.57 0.74 0.01
a
SVM 252.7 42.28 100 0 76.76 1.45 66.78 6.58 0.57 0.07
PIMA ELM 172 25.73 100 0 70.86 1.44 65.50 2.67 0.53 0.03
EKF-McRBFN 79.8 6.66 46.1 7.90 75.67 1.67 73.56 1.35 0.63 0.01
PBL-McRBFN 100.8 8.67 46.4 7.39 74.45 2.12 74.41 1.00 0.63 0.01
a
SVM 27.7 3.6 100 0 96.55 0.57 96.41 0.67 0.94 0.01
BC ELM 37.9 1.34 100 0 96.97 0.47 96.89 0.87 0.95 0.01
EKF-McRBFN 9.4 3.47 13.4 4.99 97.72 0.52 98.12 0.35 0.96 0.0
PBL-McRBFN 12.3 4.02 35.8 11.61 97.77 0.60 97.95 0.37 0.96 0.01
a
SVM 70.9 10.27 100 0 91.24 1.11 90.10 2.69 0.87 0.02
ION ELM 46 1.76 100 0 80.26 2.3 74.63 2.59 0.69 0.03
EKF-McRBFN 18.4 6.16 51.5 11.08 90.63 3.31 90.47 2.92 0.87 0.04
PBL-McRBFN 20.6 3.68 47.5 11.14 93.90 1.31 93.74 1.78 0.91 0.01
in case of low dimensional LD and PIMA data sets, the proposed PBL-McRBFN uses
fewer samples for training and achieves signicantly better generalization performance
approximately 4-9 % improvement over ELM and SVM with less number of neurons,
EKF-McRBFN with slightly more number of neurons. In case of simple BC data set,
PBL-McRBFN and EKF-McRBFN performs similarly, uses fewer samples for training
over ELM and SVM with less number of neurons. In case of large dimensional HEART
and ION data sets, PBL-McRBFN uses fewer samples for training and achieves better
generalization performance 3-4 % improvement over SVM and 18-19 % improvement over
over EKF-McRBFN.
Multi-category data sets : Overall (ηo ) and geometric ( ηg ) testing eciencies, num-
70
Chapter 6. Performance Evaluation of EKF-McRBFN and PBL-McRBFN
ber of neurons and samples used for PBL-McRBFN, EKF-McRBFN, ELM and SVM
classiers on all the 10 multi-category data sets are reported in Table 6.9. From Table
6.9, we can see that PBL-McRBFN performs similar to EKF-McRBFN and signicantly
better than the ELM and SVM on all the 10 multi-category data sets. In case of well
balanced IS, IRIS, and WINE data sets, PBL-McRBFN and EKF-McRBFN uses only
45-55% training samples to achieve 2-3 % more generalization performance over SVM and
ELM with the less number of neurons. Meta-cognitive sample deletion criteria helps in
removing redundant samples from the training set and thereby improving the generaliza-
tion performance. For highly unbalanced data sets, one can see that the proposed PBL-
McRBFN is able to achieve signicantly better performance than the other classiers.
In case of VC and GI data sets, PBL-McRBFN uses fewer samples and achieves better
improvement over SVM and 3-6 % improvement over ELM. In case of low dimensional
performance 1 % improvement over SVM and ELM. In case of large dimensional GCM
data set, with less number of neurons and using few samples PBL-McRBFN achieves
From the Tables 6.8 and 6.9, we can see the better generalization performance of
EKF-McRBFN, ELM and SVM classiers on various benchmark data sets statistically
ance (ANOVA) measure followed by a pair wise comparison using post-hoc Dunnett test
[70]. The ANOVA measure compares whether the mean of individual experimental con-
dition diers signicantly from the aggregate mean across all conditions. If the measure
F score is greater than the F -Statics at 95% condence level then one rejects the equality
of mean hypothesis (the classiers used in our study perform similarly on dierent data
sets).
71
Chapter 6. Performance Evaluation of EKF-McRBFN and PBL-McRBFN
SVM 107.3
a 10.91 100 0 90.92 0.43 90.13 0.46
IS ELM 55.59 10.91 100 0 90.13 0.72 89.40 0.83
EKF-McRBFN 36.8 5.63 54.1 8.60 91.64 0.95 91.16 1.08
PBL-McRBFN 48.9 4.67 44.3 5.60 91.50 1.31 91.07 1.39
a
SVM 16.6 0.93 100 0 95.99 1.17 95.90 1.22
IRIS ELM 16 4.59 100 0 96.38 1.08 96.29 1.16
EKF-McRBFN 5.2 1.93 44.6 16.51 96.95 1.17 96.86 1.23
PBL-McRBFN 7.8 1.61 47.7 20.55 97.14 1.18 97.08 1.21
a
SVM 37 3.02 100 0 96.02 1.2 96.77 0.93
WINE ELM 15.2 2.09 100 0 94.83 1.35 95.58 1.08
EKF-McRBFN 5.5 2.36 50 14.68 98.64 0.71 98.86 0.66
PBL-McRBFN 10.6 1.77 53.6 14.43 98.31 1.16 98.62 1.17
a
SVM 246.8 58.2 100 0 73.2 2.01 69.4 3.61
VC ELM 150 25.3 100 0 77.09 3.44 75.16 3.83
EKF-McRBFN 140 10.98 78.7 7.03 78.96 1.32 76.34 2.85
PBL-McRBFN 210.8 7.28 68.39 11.65 79.62 1.28 78.33 1.55
a
SVM 21.7 4.48 100 0 97.66 1.02 97.62 1.43
AE ELM 14.5 3.68 100 0 97.88 0.87 97.34 1.11
EKF-McRBFN 5.7 1.70 34.5 11.35 98.90 0.51 98.52 0.72
PBL-McRBFN 7.2 2.69 30.4 9.56 98.31 1.31 98.13 1.25
a
SVM 120.6 5.78 100 0 76.53 2.80 63.91 5.59
GCM ELM 114.4 10.2 100 0 91.80 2.83 63.39 4.71
EKF-McRBFN 71 1.58 73.6 4.13 71.73 0.03 73.17 7.79
PBL-McRBFN 68.2 2.82 70.0 5.02 83.04 4.55 76.78 5.42
a
SVM 176.6 18.9 100 0 77.23 5.01 85.83 3.62
GI ELM 86 9.17 100 0 80.09 3.05 88.81 2.28
EKF-McRBFN 73.2 9.17 41.8 3.45 80.00 4.62 86.67 4.18
PBL-McRBFN 82.4 9.17 40.0 6.55 89.71 1.94 94.40 1.08
72
Chapter 6. Performance Evaluation of EKF-McRBFN and PBL-McRBFN
Table 6.10: One-tailed critical values ( F -distribution) for the ANOVA test at 95 % con-
dence level
df1=1 2 3 4 5 6 7 8 9 10
df2=33 4.139 3.284 2.891 2.658 2.502 2.389 2.302 2.234 2.178 2.132
t
Table 6.11: Two-tailed critical values ( -distribution) for the Dunnett test at 95 % con-
dence level
df t0.1 t0.05 t0.025 t0.01 t0.005
33 1.692 2.034 2.348 2.378 3.008
If one-way repeated ANOVA test rejects the equality hypothesis, then pair-wise post-
hoc should be conducted to test which mean is dierent from others. In our study, we
have used 4 dierent classiers and 12 dierent data sets. For a given data set and
a classier, 10 random trials are conducted to measure the mean and variance of the
sets are organized as four groups and ANOVA monitors three kinds of variations in
the data, viz., within-group variations, between-group variations and the total variation.
The F -score obtained using repeated measures of one-way ANOVA test is 7.42, which is
greater than the F -statistic at 95 % condence level ( F 3,33,0.05 is 2.89), i.e, 7.42 > 2.89
and the corresponding reference F -distribution table is given in Table 6.10. Hence, one
Here, the proposed PBL-McRBFN classier is used as a control. The t -observed obtained
for individual pairs are PBL-McRBFN & EKF-McRBFN: 1.93, PBL-McRBFN & ELM:
4.31 PBL-McRBFN & SVM: 3.63. Note that the t -observed PBL-McRBFN &ELM and
PBL-McRBFN &SVM pairs are greater than t -critical (t 33,0.025 is 2.34 at 95% condence
level; The corresponding reference t -distribution table is given in Table 6.11) and the
say that PBL-McRBFN performs slightly better than the EKF-McRBFN classier and
signicantly better than ELM and SVM classiers with a condence of 95 % based on 10
73
Chapter 6. Performance Evaluation of EKF-McRBFN and PBL-McRBFN
(delete, growth, update and reserve) using PBL-McRBFN for Image segmentation (IS)
data set. IS is a seven classes multi-category data set contains 210 training samples and
2100 testing samples. The tuneable parameters are chosen as follows: deletion threshold
neuron overlap constant ( κ) = 0.655 and hidden neuron center shifting factor ( ζ ) =
0.0111. We shall now consider how the dierent strategies of meta-cognitive learning in
• Sample delete strategy : When the predicted class label of the new training sam-
ple is same as the actual class label and the condence level (estimated posterior
probability) is greater than expected value then the new training sample does not
provide additional information to the classier and can be deleted from training
sequence without being used in learning process. We shall exemplify the working
of this strategy with a Fig. 6.1. Fig. 6.1 gives a snap-shot of prediction condence
of PBL-McRBFN classier for samples in range 100-150 along with the deletion
threshold (βd ). Those samples with condence level greater than βd are deleted
without participating in the PBL-McRBFN learning process. In Fig. 6.1, the con-
dence of PBL-McRBFN classier for sample at instant 130 is higher than deletion
threshold (βd = 0.9) and is thus deleted without participating in the learning pro-
cess. By deleting correctly classied samples with negligible novel information, the
sample deletion strategy helps the network to avoid over-training and save compu-
tational eort.
• Neuron growth strategy : When a new training sample contains novel information
and the predicted class label is dierent from the actual class label then a new hid-
wise signicance for these samples and knowledge threshold ( βc ) are given in Fig.
74
Chapter 6. Performance Evaluation of EKF-McRBFN and PBL-McRBFN
0.9
0.8
0.7
Confidence of classifier
0.6
0.5
0.4
0.3
0.2
Confidence of classifer
0.1
Delete threshold (βd)
Deleted samples
0
100 105 110 115 120 125 130 135 140 145 150
Training sample instants
6.2(a), whereas Fig. 6.2(b) gives the hinge loss error, self-regulated addition and
deletion threshold for these 50-100 training samples. A sample contains the novel
0.5117) and hinge error is greater than the self-regulatory addition threshold ( βa
= 1.3). It could be noticed from the Figs. 6.2(a)&(b) that even though the sample
is novel, a new hidden neuron is added to the network if the hinge error criterion
is not satised. Let us consider sample at instant 61, since both the class-wise
signicance measured is lower than the knowledge threshold and the instantaneous
hinge error is higher than the addition threshold, a new hidden neuron is added
learning process for IS data set are given in Fig. 6.3(a)&(b). The neuron history
is plotted against the only samples used in training samples; PBL-McRBFN uses
only 89 samples out of 210 training samples. One could notice from Fig. 6.3(b)
that the self-regulatory addition threshold adapts its value based on the predicted
75
Chapter 6. Performance Evaluation of EKF-McRBFN and PBL-McRBFN
0.7
Knowledge threshold (βc)
Class−wise significance (ψc)
0.6
0.5
Class−wise significance (ψc)
0.4
0.3
0.2
0.1
0
50 55 60 65 70 75 80 85 90 95 100
Training sample instants
1.4
Instantaneous hinge error (Et)
1.2
Sample considered for
parameters update
1
0.8
0.6
0.4
0
50 55 60 65 70 75 80 85 90 95 100
Training sample instants
Figure 6.2: Class-wise signicance (a), and instantaneous hinge error with self-regulatory
thresholds (b) in PBL-McRBFN for Image segmentation data set
76
Chapter 6. Performance Evaluation of EKF-McRBFN and PBL-McRBFN
50 1.4
45
1.38
40
35
1.36
30
25 1.34
20
1.32
15
10
1.3
0 1.28
0 10 20 30 40 50 60 70 80 90 0 20 40 60 80 100 120 140 160 180 200
Training sample instants Training samples instants
0.73
0.72
Self−reguled update threshold
0.71
0.7
0.69
0.68
0.67
0 20 40 60 80 100 120 140 160 180 200
Training samples instants
Figure 6.3: History of number of hidden neurons (a), self-regulated addition (b), and
update thresholds (c) in PBL-McRBFN for Image segmentation data set
77
Chapter 6. Performance Evaluation of EKF-McRBFN and PBL-McRBFN
• Parameters update strategy : A correctly classied sample (i.e., Predicted class label
is same as the actual class label) is used to update the network parameters in PBL-
McRBFN for IS data set when the hinge error is greater than the self-regulatory
update threshold ( βa = 0.6769). In Fig. 6.2(b), the sample at instant 79 does not
contain signicant novel information and the hinge error is greater than the update
parameter update threshold adapts its value based on the predicted maximum hinge
• Sample reserve strategy : The samples which does not satisfy the either the deletion
or the neuron growth or the parameters update criterion are reserved by network to
be considered for learning later. These samples, by the virtue of the self-regulatory
nature of addition and parameter update thresholds, may be used in the learning
be later in the learning process. There are a few such reserve samples in PBL-
McRBFN training process for IS data set. It could be noticed that these samples
will be pushed to the rear of the data-stream to be learnt later. It should be noted
that these samples may be used in the learning process at a later stage to ne tune
the network.
rithm using Quantized Kernel Least Mean Square (QKLMS) [49] algorithm as a baseline.
QKLMS algorithm is a recently developed online kernel adaptive ltering algorithm which
plied to compress the input (or feature) space of the kernel adaptive lters so as to control
the growth of the RBF structure. For this purpose, we have conducted three dierent
(1) Eect of how-to-learn : The QKLMS algorithm is trained using original samples
78
Chapter 6. Performance Evaluation of EKF-McRBFN and PBL-McRBFN
(2) Eect of when-to-learn : The QKLMS algorithm is trained using samples selected by
sequentially presented in the same order in which they are used. We call this algo-
∗
rithm QKLMS . This experiment emphasizes advantage of the self-adaptive nature
(3) Eect of what-to-learn : The QKLMS algorithm is trained using selected samples
∗∗
QKLMS . This experiment emphasizes the advantage of sample delete strategy in
PBL-McRBFN.
The overall and average eciencies of training and testing and the number of hidden
neurons are given in Table 6.12 for all the nine benchmark data sets. From the results
(1) PBL-McRBFN classier training and testing performance is better than QKLMS on
all the nine data sets. In case of large dimensional binary class ION data set, PBL-
McRBFN uses fewer hidden neurons and achieves better testing performance 23 %
PBL-McRBFN uses fewer hidden neurons and achieves better testing performance
15% improvement over QKLMS. This clearly shows that the meta-cognitive learning
∗
(2) QKLMS uses less training samples selected by PBL-McRBFN and achieves 1-3 %
improvement in testing over QKLMS with less number of hidden neurons. On large
∗
dimensional binary class ION data set, QKLMS testing performance is 15 % im-
proved over QKLMS. This show when-to-learn the training sample is important in
training a classier.
∗∗
(3) QKLMS uses all samples (selected samples followed by deleted samples by PBL-
∗
McRBFN), its training performance is similar or better than the QKLMS algorithm.
∗
However, the testing performance is slightly lower than the QKLMS . In case of large
∗∗
dimensional binary class ION data set, QKLMS uses more hidden neurons and
79
Chapter 6. Performance Evaluation of EKF-McRBFN and PBL-McRBFN
80
Chapter 6. Performance Evaluation of EKF-McRBFN and PBL-McRBFN
∗
achieves 4% improvement in training and 5 % decrement in testing over QKLMS .
∗∗
Also in case of unbalanced multi-category GI data set, QKLMS uses more hidden
∗
QKLMS . This shows what-to-learn in training is important in a learning algorithm.
ferent (delete, growth, update and reserve) strategies and addresses the what-to-learn,
when-to-learn and how-to-learn eciently. The aforementioned results also clearly high-
light that the use of meta-cognitive principles present in PBL-McRBFN improves the
6.7 Summary
In this chapter, we have presented the performance evaluation study of proposed EKF-
cation problems with a wide range of imbalance factor. The qualitative and quantitative
performance analysis using multiple data sets clearly indicate the superior performance of
the proposed PBL-McRBFN and EKF-McRBFN classiers over other classiers consid-
ered in this study. Results also show that PBL-McRBFN classier performance is better
than EKF-McRBFN classier. Hence, in the next chapters, PBL-McRBFN is used in the
disease.
81
Chapter 7
Alzheimer's Disease Diagnosis using
PBL-McRBFN Classier
and progressively impair the functions of the nervous system through selective neuronal
eases can be serious or life-threatening and most of them have no cure. The goal of
treatment for such diseases is usually to improve symptoms, relieve pain and increase
mobility. Alzheimer's disease (AD) is the most common neurodegenerative disease [71].
Parkinson's disease (PD) is the second most common neurodegenerative disease, after
In this chapter, we use PBL-McRBFN classier for the early diagnosis of AD using
MRI scans. Since, the classier developed using PBL-McRBFN accurately approximates
the decision boundary, we also propose a Recursive Feature Elimination approach (called
learning, confusion and poor judgment. AD is considered to be one of the most common
by signicant loss or decline in memory and other cognitive abilities. Around 60 −80%
of age related dementia is caused due to AD [73]. The only way to make a denitive
82
Chapter 7. Alzheimer's Disease Diagnosis using PBL-McRBFN
tangles and amyloid plaques that dene AD. Early detection of AD using non-invasive
neuroimaging techniques will help in providing assistance to them and thereby one can
ment that may slow down its progress. One such non-invasive method of early detection
of AD is by brain imaging. Commonly used brain imaging techniques for this purpose are:
(SPECT) [76, 77], Positron Emission Tomography (PET) [77] and Magnetic Resonance
Studies using CT scans for AD diagnosis have been described in [74, 75]. However, due
to a lower spatial resolution and the possibility of unreliable structural change detection
in the early stages of the disease, CT has been employed only in very few cases. SPECT
and PET are functional brain imaging techniques which use a radioactive substance to
detect the changes in the blood ow and metabolism in the brain. For AD diagnosis,
several studies using SPECT and PET images have been reported in [76, 77]. Both
PET and SPECT involve the use of ionizing radiation and they are harmful if these
are repeatedly used for the study. Hence, use of PET and SPECT in normal persons
is typically limited to a single scan which may not provide adequate information for a
proper diagnosis. Also, lack of spatial resolution in the SPECT images inuence the
MRI is one of the most important brain imaging procedures that provides accurate
information about the shape and volume of the brain. Compared to CT, SPECT and
PET scans, MRI provides a high spatial resolution and can detect minute abnormalities
in the brain. Usage of MRI for the accurate detection of AD has recently become very
active [80, 79]. MRI helps to detect AD at an early stage - before irreversible damage has
been done [13]. Early detection of AD from MRI requires appropriate methods to detect,
83
Chapter 7. Alzheimer's Disease Diagnosis using PBL-McRBFN
locate and quantify tissue atrophy in the brain. Primarily, the visual assessment of the
MRI. However, this may be adequate in a normal clinical setting, but is not enough to
get quantitative measures such as the ne incremental grades of atrophy and overall brain
volume [81].
The early detection of AD from MRI can be cast as a binary classication problem and
one can employ machine learning techniques to automatically detect the AD [78, 79, 82].
The main idea behind using machine learning techniques is to relate the brain volume
changes to the onset of AD. Two major ways of estimating the brain volume changes
from the MRI are: i) Region-of-Interest (ROI) approach; ii) Whole brain morphometric
approach.
volume and to investigate the abnormal tissue structures with AD [83]. In the ROI ap-
In practice, a priori knowledge about abnormal regions is not always available. However,
cortex which is laborious and time consuming [84, 85]. In [86], the volumes of manually
segmented hippocampus and entorhinal cortex are measured to discriminate between the
AD patients and normal persons. Major shortcomings in the use of the manual ROI
prone. Recently, an automatic method for the segmentation of the hippocampus using
probabilistic and anatomical priors have been proposed for the detection of AD patients
[82]. In [82], automatically segmented hippocampus volumes have been used to classify
AD patients and normal persons. Although, ROI techniques for AD analysis have been
widely used, they are dicult to accurately identify the brain volume changes in the
AD patients when the tissue loss is generally smaller. To overcome these shortcomings,
several approaches that enable the assessment of the whole brain have been reported in
84
Chapter 7. Alzheimer's Disease Diagnosis using PBL-McRBFN
brain morphometric analysis [90]. VBM is based on the Statistical Parametric Mapping
(SPM) method, often employed for the investigation of tissue volume changes between
the brain MRI scans of the diseased group versus the normal persons. In the VBM analy-
sis, the brain MRI undergo various preprocessing steps before undergoing the voxel-wise
parametric tests [91]. The preprocessing steps involved in the VBM analysis are: normal-
of the gray matter, white matter and cerebrospinal uid tissue classes in a given voxel,
features and Support Vector Machine (SVM) classier [78, 79, 92, 78, 93, 94]. These
methods use dierent morphological features and dierent data sets for AD detection.
Minnesota is used. A statistical parametric map on gray matter tissues is obtained using
these 90 samples and this map is used to extract the features for a SVM classier. In
(RAVENS) is used to extract the features from a smaller set of Baltimore longitudinal
study data. Here, 15 probable mild AD patients and 15 normal person's samples are used
for the AD detection. RAVENS based feature extraction and SVM classier provides
good performance, but the feature extraction process is computationally intensive. For
completely mild AD patients data, the computational eort in RAVENS increases further
and it inuences the accuracy of the features extracted which aects the SVM classier
performance.
SVM is based on the evaluation of discrimination power for classication, hence it has
limitation in dealing with noisy data which is the case for neuroimaging data. Also, high
dimensional VBM features make AD classication dicult and hence feature reduction
techniques have been increasingly used for dimensionality reduction in neuroimage clas-
sication studies [79, 95, 96, 97]. Principal Component Analysis (PCA) and Independent
Component Analysis (ICA) are the widely used feature construction techniques. PCA is
85
Chapter 7. Alzheimer's Disease Diagnosis using PBL-McRBFN
a subspace learning method and transfers the original features into a new linear subspace
[79, 97]. ICA, as one of the important techniques of blind signal separation, has been
shown to provide a powerful method for neuroimaging data [98, 99]. However, these
to preserve the important subsets of the feature space. Also, the reduced features do
not provide any information on the original voxels and hence the regions in the brain
responsible for AD. Feature selection problem is also addressed using genetic algorithms,
Integer Coded Genetic Algorithm (ICGA) has been used along with a neural network
in [100]. However, the selection of size of population and other parameters aect the
tationally less intensive wrapper based feature selection method - selection of the features
Machine learning algorithms for AD detection requires samples with signicant in-
formation as training samples. Ideal classier for AD detection must incorporate sample
the samples for learning and found to be eective for learning. Hence, in this chapter
phometric features are extracted using the VBM. Next, high dimensional VBM features
are used for classication using PBL-McRBFN. The following sections present a descrip-
tion of the MRI data, VBM analysis for feature extraction and performance results of
86
Chapter 7. Alzheimer's Disease Diagnosis using PBL-McRBFN
MRI Scan
Unified Segmentation
Morphometric
Features PBL-McRBFN AD/
Smoothing Non AD
Classifier
Statistical Testing
87
Chapter 7. Alzheimer's Disease Diagnosis using PBL-McRBFN
7.2.1 Materials
7.2.1.1 OASIS data set
In our study, the publicly available Open Access Series of Imaging Studies (OASIS) data
set has been used [101]. OASIS data set has a cross-sectional collection of 416 persons
covering the adult life span, aged between 18 to 96 including individuals with AD in
an early-stage. The data includes 218 persons aged between 18 to 59 years and 198
persons aged between 60 to 96 years. Of the 198 older persons, 98 had no AD i.e., with a
Clinical Dementia Rating (CDR) of 0, 70 persons have been diagnosed with a very mild
AD (CDR=0.5), 28 persons are diagnosed with mild AD (CDR=1) and 2 persons with
moderate AD (CDR=2). The AD patients has scores between 14 −30 on the Mini-Mental
State Examination (MMSE) and normal persons have MMSE scores between 25 −30. In
our study, we have considered 198 elderly persons comprising of 98 normal persons and
100 AD patients. For each person, the whole-brain T1-weighted 3-dimensional MPRAGE
a Siemens 1.5T scanner. The acquired volumes had 128 sagittal 1.25 mm slices without
gaps and pixel resolution of 256 ×256 (1×1 mm). The OASIS data set demographics and
In our study, we also used the data obtained from the Alzheimer's Disease Neuroimaging
Initiative (ADNI) data set [102]. The ADNI was launched in 2003 by the National Insti-
tute on Aging (NIA), the National Institute of Biomedical Imaging and Bioengineering
(NIBIB), the Food and Drug Administration (FDA), private pharmaceutical companies
88
Chapter 7. Alzheimer's Disease Diagnosis using PBL-McRBFN
The primary goal of ADNI has been to test whether serial MRI, positron emission to-
mography, other biological markers, and clinical and neuropsychological assessment can
be combined to measure the progression of Mild Cognitive Impairment (MCI) and early
AD. Determination of sensitive and specic markers of very early AD progression is in-
tended to aid the researchers and clinicians to develop new treatments and monitor their
eectiveness, as well as lessen the time and cost of clinical trials. The Principal Inves-
tigator of this initiative is Michael W. Weiner, MD, VA Medical Center and University
of California, San Francisco. ADNI is the result of eorts of many co-investigators from
a broad range of academic institutions and private corporations, and persons have been
recruited from over 50 sites across the U.S. and Canada. The initial goal of ADNI is
to recruit 800 adults, ages 55 to 90, to participate in the research, approximately 200
cognitively normal older individuals to be followed for 3 years, 400 people with MCI to
be followed for 3 years and 200 people with early AD to be followed for 2 years. For
In our study, we have considered all the 432 elderly persons (232 normal persons and
200 AD patients) available in the ADNI data set as of February 2012. Standard 1.5T
protocols and preprocessing steps are presented in [103]. The demographics for the 432
89
Chapter 7. Alzheimer's Disease Diagnosis using PBL-McRBFN
Figure 7.2: Schematic diagram of the stages in feature extraction based on the VBM
analysis
ow diagram of the feature extraction process is shown in Fig. 7.2. VBM is a voxel-wise
comparison of local tissue volumes of gray matter within a group or across groups of
persons using MRI scans. In our study, VBM is used to detect signicant gray matter
dierences between the AD patients and normal persons. The VBM detected voxels
locations of signicant regions are further used as masks in order to extract the features
from all gray matter segmented MRI scans. VBM is performed on the OASIS and ADNI
data sets using the Statistical Parametric Map (SPM) software package [91].
In a VBM analysis, the brain MR images undergo various preprocessing steps before
the voxel-wise parametric tests are carried out on them. In our study, VBM analysis
based on a recently proposed unied segmentation model is performed [104]. The steps
involved in VBM analysis are: unied segmentation, smoothing and statistical testing,
in that order. The unied segmentation step is a generative modeling approach, in which
tissue segmentation, bias correction and image registration are combined in a single
model [104]. The unied segmentation framework combines deformable tissue probability
maps with a Gaussian mixture model. The MR brain images of both the AD patients
90
Chapter 7. Alzheimer's Disease Diagnosis using PBL-McRBFN
and normal persons are segmented into gray matter tissue class. The segmented and
normalized gray matter images are then smoothed by convolving them with an isotropic
is employed.
For better understanding of the VBM analysis, we have shown three planar views
(sagittal, coronal and axial) of the original images and images after every stage in the
VBM analysis in Fig. 7.3(a-c). Fig. 7.3(a) shows the dierent planar views of the MRI
scan. From the MRI scan, one has to perform a bias correction, tissue segmentation and
register the segmented image in a standard template for removing non-uniform artifacts.
The images after undergoing these steps are shown in Fig. 7.3(b). From Fig. 7.3(b), we
can see that the unied segmentation in the VBM analysis eciently identies the gray
matter from the MRI scans. These segmented images are now smoothed by convolving
them with an isotropic Gaussian kernel and the resultant images are shown in Fig. 7.3(c).
The smoothing process averages the concentration of the gray matter around the voxel
and this helps considerably in the subsequent voxel-by-voxel statistical analysis [104].
The smoothed brain volumes of AD patients and normal persons are used in the
statistical analysis to identify regions of gray matter concentration that are signicantly
related to the AD. These regions will be used to extract the features for accurate iden-
tication of AD. For the statistical analysis, a general linear model is used to detect
the volumetric changes in gray matter across the AD patients and normal persons. In
our statistical analysis, estimated total intracranial volume is used as a covariate in the
design matrix of the general linear model. Also a two-sample t -test is performed on the
smoothed images of normal persons and AD patients and a multiple comparison correc-
tion method, namely, a family wise error with P < 0.05 has been applied. Following
the application of the general linear model and statistical tests, the signicance of any
dierences in gray matter volume is ascertained using the theory of Gaussian random
elds [105]. These tests result in a maximum intensity projection map, which will be then
used to extract the features from individual segmented gray images for further analysis.
For better understanding, we show the maximum intensity projections of the signicant
voxels in sagittal, coronal and axial views in Fig. 7.4 and Fig. 7.5.
From Fig. 7.4 and Fig. 7.5, it can be noted that there are signicant areas with
decreased gray matter density in the AD patients relative to the normal persons indicating
91
Chapter 7. Alzheimer's Disease Diagnosis using PBL-McRBFN
Figure 7.3: Results of the unied segmentation and smoothing steps performed on MRI
of an AD patient (from right: sagittal view, coronal view and axial view)
92
Chapter 7. Alzheimer's Disease Diagnosis using PBL-McRBFN
Figure 7.4: Maximum intensity projections from OASIS data set - Normal persons vs.
AD patients (a) sagittal view (b) coronal view (c) axial view
Figure 7.5: Maximum intensity projections from ADNI data set - Normal persons vs.
AD patients (a) sagittal view (b) coronal view (c) axial view
93
Chapter 7. Alzheimer's Disease Diagnosis using PBL-McRBFN
Figure 7.6: Gray matter volume change from OASIS data set - Normal persons vs. AD
patients (a) sagittal view (b) coronal view (c) axial view
that for the AD patients gray matter in these locations are lower. A total of 19879/23797
features are extracted from OASIS/ADNI data sets using the above VBM analysis and
these features are then used for classication of AD patients. It is found in the literature
that the VBM produces dierent signicant areas with increased gray matter density
in the brain, when the voxel-wise statistics with dierent groups of persons (e.g., male
Vs female, only female etc.) and dierent covariates (e.g., gender, age etc.) are used
[78, 106]. This also implies that by employing the above VBM analysis one can obtain
To locate the above regions with respect to the spatial locations in the brain, these
regions are overlaid on the sliced sections of the commonly used Montreal Neurological
Institute (MNI) brain template and the results of the same are shown in Fig. 7.6 and
Fig. 7.7. From Fig. 7.6 and Fig. 7.7, one can infer the regions of the brain which get
aected signicantly for the AD patients. In other words, during the MRI scans if we
can notice that the gray matter in these specic locations are lower, one can infer a good
94
Chapter 7. Alzheimer's Disease Diagnosis using PBL-McRBFN
Figure 7.7: Gray matter volume change from ADNI data set - Normal persons vs. AD
patients (a) sagittal view (b) coronal view (c) axial view
detection using the PBL-McRBFN classier with MRI. Before, we present the details of
the study results, we highlight the sequence in which the studies have been carried-out.
using OASIS/ADNI data sets and also compare the performance with existing results
shown by testing the ADNI samples on PBL-McRBFN classier developed using the
OASIS data set. Further, we present a method to identify the imaging biomarkers for
AD using proposed a RFE method for feature reduction and the PBL-McRBFN classier.
Finally, we present the detailed studies based on age/gender groups in OASIS data to
95
Chapter 7. Alzheimer's Disease Diagnosis using PBL-McRBFN
metric features were extracted using the VBM. However, as the feature space is large,
all features may not be responsible for AD. Hence, the obtained morphometric features
from VBM analysis are further reduced statistically by using the ICA. We employed the
FastICA (xed-point) algorithm [107] - the FastICA package for MATLAB [108] is used
complete 19879 feature set and reduced 50 features set with each trial containing 50%
total samples for training and the remaining for testing. The training/testing accuracy,
sensitivity and specicity obtained from the PBL-McRBFN classier is presented in Table
7.3. PBL-McRBFN produces the testing accuracy using 50 reduced features is 72.33 %.
PBL-McRBFN testing accuracy on complete feature set is 75.8 % which is 3% more than
the 50 reduced feature set, this is because the considered binary classication problem
consists of MRI scans of 100 `very mild to moderate AD' patients and 98 healthy elder
AD patients group consists of wide range of CDR from 0.5−2 and healthy elder persons.
Next, we compare the PBL-McRBFN classication results based on the OASIS data
96
Chapter 7. Alzheimer's Disease Diagnosis using PBL-McRBFN
Table 7.4: Performance comparison with existing results on the OASIS data set
and selection algorithm that contains two iterative steps, viz. : a constrained subspace
learning based feature extraction method and SVM based feature selection. Here, VBM
is used to extract features from MRI scans and an integrated feature extraction and
selection algorithm (IPCA) which is used to select features in conjunction with the SVM
based classication.
In [110], Wenlu Yang et. al proposed a method based on ICA called ICA based
feature extraction and automatic classication of AD related MRI data. Here, features
are extracted from VBM followed by ICA with the SVM based classication.
The above two methods used the same OASIS data set as described in Table 7.1.
First method reported 4 random trial results results and second method reported single
trial result. Hence, we compared the single and 10 random trial results obtained from
PBL-McRBFN with the results from methods in [109, 110] and are presented in Table
7.4.
From the Table 7.4, it is observed the PBL-McRBFN based classication of AD pa-
tients and normal persons (both single trial and 10 random trial) is 6-14 % than the
97
Chapter 7. Alzheimer's Disease Diagnosis using PBL-McRBFN
results from methods proposed in [109, 110]. PBL-McRBFN gives better results on com-
plete features set extracted VBM than reduced features after ICA. The PBL-McRBFN
classication eciency on complete features set is 10 % more than the PCA based SVM
based SVM classication eciency proposed in [109] and 14 % more than the ICA based
features set is 8 % more than the PCA based SVM classication eciency, 5 % more
than classication eciency of the IPCA algorithm based SVM classication eciency
proposed in [109] and 12 % more than the ICA based SVM classication eciency pro-
posed in [110]. Since PBL-McRBFN uses sample selection for proper learning of decision
function. The performance of PBL-McRBFN classier is better than the results reported
data set [103]. The complete ADNI data set consists of 232 normal persons and 200 AD
patients. After verication of the unied segmentation results, 6 normal persons and 4
AD patients were excluded (due to bad segmentation) from the VBM analysis. In our
study we considered 422 samples, for each sample 23797 morphometric features were
obtained from the VBM analysis. Here also, the obtained morphometric features are
further statistically reduced to 200 features by ICA [107]. In our classication study, for
each of the 10 random trial experiments, 50% samples are randomly chosen for training
and the remaining used for testing. PBL-McRBFN produces the testing accuracy using
set is 85.28% which is 3% more than the 200 reduced feature set. The classication
performance of the PBL-McRBFN classier using both the complete and 200 reduced
Next, we compare the PBL-McRBFN classication results using the ADNI data set
98
Chapter 7. Alzheimer's Disease Diagnosis using PBL-McRBFN
Here, we compare the results of the PBL-McRBFN classier results with some recent
results reported in the literature that are also based on MRI ADNI data set for AD
classication. In particular, four recent methods have been compared in Table 7.6. In
[111], the automatic diagnostic capabilities of four structural MRI feature extraction
methods (manifold based learning, hippocampal volume, cortical thickness and tensor-
based morphometry) are compared using a SVM classier. The best obtained result using
the tensor-based morphometry is provided in Table 7.6. In [112], the Linear Program
(LP) boosting method with a novel additional regularization have been proposed to
incorporate the spatial smoothness of MRI feature space into the learning process. In
[14], ten methods, which include ve voxel-based methods, three cortical thickness based
methods, and two hippocampus based methods are compared using a SVM classier.
The best result obtained using the voxel-wise Gray Matter (GM) features is provided in
Table 7.6. In [94], 93 volumetric features extracted from the 93 ROI in GM density maps
From Table 7.6, it can be seen that among the VBM based features method, PBL-
McRBFN's performance is 3 % more than that of the LP boosting method [112] and 2 %
lower than that of the SVM method [14]. This may be due to the fact that the SVM
method in [14] uses a lower number of subjects in the study. Comparing the performance
of the PBL-McRBFN classier using the VBM features with the method of SVM using the
93 ROI features [94] and also the method using the tensor based morphometry features
99
Chapter 7. Alzheimer's Disease Diagnosis using PBL-McRBFN
Table 7.6: Performance comparison with existing results on the ADNI data set
100
Chapter 7. Alzheimer's Disease Diagnosis using PBL-McRBFN
classier trained with the OASIS data set being tested on the unseen samples from the
ADNI data set. The logical schematic diagram of the generalization capability study of
PBL-McRBFN classier is shown in Fig. 7.8. For this study, the Maximum Intensity
Projection (MIP) of the gray matter voxel locations obtained from the OASIS data set
(possible brain regions for AD) are used to extract the morphometric features from the
ADNI data set in VBM feature extraction procedure apart from the normal processes
of the unied segmentation and smoothing. VBM selected 19879 voxel locations from
OASIS data set and the same 19879 voxel locations are used to extract the morphometric
features from ADNI data set. These unseen ADNI samples with 19879 morphometric
features are tested with the best classier developed using the OASIS training samples.
From the description of OASIS and ADNI data sets, we can see that these data sets are
collected from dierent demographic people with dierent geographic locations. Hence,
the data sets represent a wide variation in the data distribution. Hence, 25 % of ADNI
samples are used further for adaptation of OASIS trained PBL-McRBFN classier and
the same is tested using the remaining ADNI samples. Such a generalization capability
of the classier will prevent computationally intensive VBM feature extraction process
to be done again and unify and simplify the diagnosis mechanism. The generalization
performance of OASIS trained PBL-McRBFN classier on the unseen ADNI data set is
101
Chapter 7. Alzheimer's Disease Diagnosis using PBL-McRBFN
OASIS MIP
Meta-
New ADNI MRI cognitive
Learning
er is trained on OASIS training data set (50% OASIS samples) and is tested on the 422
samples from the ADNI data set. This experiment is called `Without' because all ADNI
samples are tested using PBL-McRBFN classier trained on OASIS samples and without
ADNI samples for adaptation. The classication accuracy on unseen ADNI samples from
the above experiment is 62.39%. Hence, we can say that PBL-McRBFN classier for AD
detection trained with VBM features using one data set (OASIS) can classify unseen
samples from the other data set (ADNI). Further, 25% of samples from the ADNI data
set were used to adapt the PBL-McRBFN classier using meta-cognitive principles and
the same is tested on the remaining 75% of samples and the testing accuracy is 77.27%.
This experiment is called `With 25 %' because 25% of ADNI samples are used for adap-
tation in PBL-McRBFN classier trained on OASIS samples. Hence, we can say that,
with minor adaptation, PBL-McRBFN can classify unseen samples from other data sets
accurately.
Results shows that the MIP of the gray matter voxels locations generated using VBM
from one data set samples (OASIS data set) are able to discriminate samples from other
102
Chapter 7. Alzheimer's Disease Diagnosis using PBL-McRBFN
data set (ADNI data set). For growing data sets, the sequential learning PBL-McRBFN
classier is able to capture the functional relationship between the VBM features and the
class labels (Disease status). Results shows that PBL-McRBFN classier trained on one
data set (OASIS data set) with minor adaptation using fewer samples from other data
set (ADNI data set) achieves signicant testing accuracy on a larger unseen samples.
OASIS data set. VBM extracted the gray matter voxels which are statistically variant
between the normal persons and AD patients. All the voxels generated from VBM may
minimal set of features among the VBM generated voxels that maximizes the detection
of AD. The selected minimal set of features can be termed as the imaging biomarkers for
AD. In literature, many feature selection techniques have been proposed, in general the
goal of feature selection are to reduce the dimensionality. Filter and wrapper methods are
two kinds of well-known feature selection techniques for high dimensional data set [113].
In the lter method, features are selected on the basis of feature separability of training
samples, which is independent of the learning algorithm. The separability only takes
into account the correlations between the features, so the selected features may not be
optimal. Wrapper methods search for critical features based on the learning algorithm,
based feature selection method. The basic principle of RFE is to include initially all
features of a large region, and to gradually exclude features, that do not contribute in
discriminating patterns from dierent classes. Whether a feature in the current feature
resulting from training a classier with the current set of features. In order to increase the
likelihood that the best feature are selected, feature elimination progresses gradually
and includes cross-validation steps. In each feature elimination step, a single feature is
discarded until a core set of features remains with the highest discriminative power.
103
Chapter 7. Alzheimer's Disease Diagnosis using PBL-McRBFN
In this study, informative features are selected by the method of RFE utilizing a
which starts with all the features and discards one feature at a time from the top. The
discard feature added to the feature set again if the training/testing eciency decreases.
The PBL-McRBFN-RFE feature selection algorithm runs till the number of selected
features in current iteration ( s) is more than predened minimum limit for number of
features to be selected ( r ) and is not equal to the number of selected features in previous
First we present the identication of imaging biomarkers for AD from the complete
104
Chapter 7. Alzheimer's Disease Diagnosis using PBL-McRBFN
Table 7.8: VBM detected and PBL-McRBFN-RFE selected regions from complete OASIS
data set
Table 7.9: PBL-McRBFN classier performance comparison with VBM detected and
PBL-McRBFN-RFE selected features on complete OASIS data set
OASIS data set using PBL-McRBFN-RFE. In medical literature it is reported that there
is dierence between the regions eected by AD among male and female persons [15,
16]. Hence, we also conducted an age-wise and gender-wise analysis to identify imaging
minimal set of voxels which are more relevant to AD are found using PBL-McRBFN-
RFE. Brain regions corresponding to the VBM detected voxels (19879 voxels with 198
OASIS samples) are reported in Table 7.8. PBL-McRBFN-RFE selected 906 voxels
among 19879 voxels, corresponding brain regions to 906 voxels are reported in Table 7.8.
MNI templates of the complete 19879 and 906 voxel regions are shown in Fig. 7.9.
The testing performance of PBL-McRBFN on this selected 906 features set is 84.95 %
105
Chapter 7. Alzheimer's Disease Diagnosis using PBL-McRBFN
(a) VBM Detected 19879 voxels (b) PBL-McRBFN-RFE Selected 906 vox-
els
Figure 7.9: Comparison of gray matter volume change - Normal persons vs. AD patients
from complete OASIS data set
as shown in Table 7.9. To check the discriminating capability of the selected 906 voxels,
tion performance of PBL-McRBFN classier on unseen ADNI samples with selected 906
We found these selected voxels by PBL-McRBFN-RFE are located at the brain regions
such as: the superior temporal gyrus, the insula, the precentral gyrus and the extra-
nuclear. These regions are consistent with those reported in the medical literature as the
biomarkers for AD [114, 115, 116]. Hence, the gray matter atrophy in the brain regions
detected by PBL-McRBFN-RFE among VBM features are may be more relevant to the
106
Chapter 7. Alzheimer's Disease Diagnosis using PBL-McRBFN
detection of AD.
an analysis is conducted in the OASIS data set based on the ages of the patients. Among
the 198 persons in the OASIS data set, 40 persons are in the age group of 60-69, 83
persons are in the age group of 70-79 and 75 persons are in the ages of 80 and above. We
Study of the 60-69 age group : VBM extracted 292 features with 40 persons on this
age group, with a 50-50 % training and testing split PBL-McRBFN obtained the testing
features set, 25 features are selected and the testing performance of PBL-McRBFN on
Study of the 70-79 age group : VBM extracted 3298 features with 83 persons on this
age group, with a 50-50 % training and testing split PBL-McRBFN obtained the best
RFE on 3298 features set, 90 features are selected and the testing performance of PBL-
Study of the Above 80 age group : VBM extracted 1047 features with 75 persons on
this age group, with a 50-50 % training and testing split PBL-McRBFN obtained the best
RFE on 1047 features set, 154 features are selected and the testing performance of PBL-
The VBM detected and PBL-McRBFN-RFE selected brain regions of voxels for dier-
ent age groups are listed in Table 7.11 and the corresponding PBL-McRBFN performance
107
Chapter 7. Alzheimer's Disease Diagnosis using PBL-McRBFN
Table 7.11: VBM detected and PBL-McRBFN-RFE selected regions from age-wise OA-
SIS data sets
Table 7.12: PBL-McRBFN classier performance comparison with VBM detected and
PBL-McRBFN-RFE selected features on age-wise OASIS data sets
108
Chapter 7. Alzheimer's Disease Diagnosis using PBL-McRBFN
(e) VBM Detected 1047 voxels (f) PBL-McRBFN-RFE Selected 154 voxels
Figure 7.10: Comparison of gray matter volume change - Normal persons vs. AD patients
from 60-69 (a&b), 70-79 (c&d), 80-Above (e&f ) age groups in OASIS data set
109
Chapter 7. Alzheimer's Disease Diagnosis using PBL-McRBFN
results are shown in Table 7.12. MNI templates of complete VBM and PBL-McRBFN-
RFE selected voxel regions are shown in Fig. 7.10. In each age group analysis, the
selected voxels from PBL-McRBFN-RFE gives better classication accuracy than the
From the age-wise analysis, we can see that PBL-McRBFN classier detects AD
accurately in the 60-69 age group and 25 voxels selected from the PBL-McRBFN-RFE
are still able to classify AD accurately. The PBL-McRBFN-RFE detected brain region
responsible for AD in the 60-69 age group is the superior temporal gyrus which contains
the primary auditory cortex and is responsible for processing sounds. Hence, we can
conclude that, AD patients in the 60-69 age group may have auditory related problems
as indicators of AD. In the 70-79 age group, the detected brain regions responsible for AD
are the parahippocampal gyrus and the extra-nuclear which are responsible for memory
encoding and retrieval. Hence, we can conclude that, AD patients in the 70-79 age group
may have memory related problems. In the 80-89 age group, the detected brain regions
responsible for AD are the hippocampus, the parahippocampal gyrus and the lateral
ventrical which are responsible for short-term memory to long-term memory, spatial
navigation, memory encoding and retrieval. Hence, we can conclude that, AD patients
fying factor in AD's development and expression. To verify this, an analysis is conducted
gender-wise in the OASIS data sets. Among the 198 persons in the OASIS data set, 67
persons are male and 131 persons are female. We have conducted the analysis using the
considering 67 male persons alone. VBM extracted 1239 voxels with 67 male persons.
The corresponding brain regions are shown in Table 7.13. PBL-McRBFN obtained best
testing performance on the complete 1239 features is 79.8 % with 50-50% training and
testing data set split as shown in Table 7.14. After performing PBL-McRBFN-RFE on
110
Chapter 7. Alzheimer's Disease Diagnosis using PBL-McRBFN
Table 7.13: VBM detected and PBL-McRBFN-RFE selected regions from male-OASIS
data set
Table 7.14: PBL-McRBFN classier performance comparison with VBM detected and
PBL-McRBFN-RFE selected features on male-OASIS data set
OASIS male complete data set, 31 voxels were selected and the testing performance of
The corresponding brain regions of the 31 voxels are listed in Table 7.13. MNI tem-
plates of the complete 1239 and 31 voxel regions are shown in Fig. 7.11. All the 31 voxels
are from the insular cortex region which is responsible for emotion and consciousness.
The insula region is also reported in AD research studies [117, 118] and it is associated
with hypometabolism. Hence, we can conclude that the male AD patients may have
ducted considering 131 female persons alone. VBM extracted 15203 voxels with 131
female persons. The corresponding brain regions of 15203 voxels are shown in Table
79.93 % with 50-50% training and testing data set split as shown in Table 7.16.
111
Chapter 7. Alzheimer's Disease Diagnosis using PBL-McRBFN
Figure 7.11: Comparison of gray matter volume change - Normal persons vs. AD patients
from Male-OASIS data set
Table 7.15: VBM detected and PBL-McRBFN-RFE selected regions from female-OASIS
data set
112
Chapter 7. Alzheimer's Disease Diagnosis using PBL-McRBFN
Table 7.16: PBL-McRBFN classier performance comparison with VBM detected and
PBL-McRBFN-RFE selected features on female-OASIS data set
(a) VBM Detected 15203 voxels (b) PBL-McRBFN-RFE Selected 294 vox-
els
Figure 7.12: Comparison of gray matter volume change - Normal persons vs. AD patients
from Female-OASIS data set
113
Chapter 7. Alzheimer's Disease Diagnosis using PBL-McRBFN
After performing PBL-McRBFN-RFE on the OASIS female data set, 294 voxels were
selected and the testing performance of PBL-McRBFN on this selected features set is
85.44 % as shown in Table 7.16. The corresponding brain regions of 294 voxels are listed
in Table 7.15. MNI templates of the complete 15203 and 294 voxel regions are shown in
Fig. 7.12. We found that these selected 294 voxels are located at the brain regions such
as the parahippocampal gyrus and the extra-nuclear regions, which are responsible for
memory encoding and retrieval. Hence, we can conclude that female AD patients may
The above detailed study results indicate the superior performance of the proposed
7.4 Discussion
The AD detection performance of PBL-McRBFN is better than the existing results in
the literature for OASIS and ADNI data sets. The generalization capability of the PBL-
McRBFN classier has been demonstrated by testing the unseen samples from ADNI data
set using the trained PBL-McRBFN classier with samples from OASIS data set. Using
regions in the brain) responsible for AD using the OASIS data set. Based on our study, the
gray matter atrophy identied in AD patients in the superior temporal gyrus, the insula,
the precentral gyrus and the extra nuclear regions, which have also been highlighted in
the medical literature [114, 115, 116]. Further, we have carried out a detailed analysis
based on the age and gender. Based on our analysis, the indicators that emerge for the
onset of AD are:
• In the age group 60-69 : Degradation in sound processing capability (primary au-
ditory cortex)
• In the age group 70-79 : Memory related problems (parahippocampal gyrus and
extra nuclear)
114
Chapter 7. Alzheimer's Disease Diagnosis using PBL-McRBFN
(insula)
extra nuclear)
for AD and this approach can be used for other similar problems.
7.5 Summary
In this chapter, AD diagnosis problem is solved by employing PBL-McRBFN classi-
er. Morphometric features were extracted from MRI scans using VBM. For simulation
studies, we have used well-known OASIS and ADNI data sets. The performance of the
PBL-McRBFN classier has been evaluated on complete morphometric features set ob-
tained from the VBM analysis and also on reduced features sets from ICA. Since, the
data sets contain very mild AD patients and fewer samples, AD detection using complete
VBM features provide better performance than ICA reduced features. The performance
ture. Next, the performance evaluation on ADNI data set with PBL-McRBFN classier
trained on OASIS data set shows that the proposed PBL-McRBFN can also achieve
signicant results on the unseen data set. Finally, imaging biomarkers responsible for
AD are detected with PBL-McRBFN-RFE approach using OASIS data set. Imaging
biomarkers responsible for AD were found for dierent age groups, and for both genders.
Parkinson's disease is the second widely reported neurodegenerative disease, next only
to AD. Hence, in the next chapter, diagnosis of Parkinson disease using PBL-McRBFN
classier is presented.
115
Chapter 8
Parkinson's Disease Diagnosis using
PBL-McRBFN Classier
In this chapter we use PBL-McRBFN classier for the diagnosis of Parkinson's disease
based on microarray gene expression, MRI scans, vocal and gait features.
gic neurons in the pars compacta of the substantia nigra [119]. Most important symptoms
of PD include muscle rigidity, tremors, and changes in speech and gait [119, 120]. PD
is more common in elderly people over the age of 50, which has inuenced millions of
people worldwide. According to the global declaration for PD, 6.3 million people are
aected by this disease worldwide, and aect all races and cultures in 2013. Albeit sig-
nicant research advances have been made, including the recent identication of possible
genetic and environmental risk factors for PD, further research is required to illustrate
is no cure for PD and the diagnosis of PD is based on medical history and neurological
examination conducted by interviewing and observing the patient in person using the
disease rating scales. Unied Parkinson's Disease Rating Scale (UPDRS), Hoehn and
Yahr scale, Schwab and England Activities of Daily Living (ADL) scale, PDQ-39, PD
Non-motor symptoms (NMS) questionnaire, NMS survey are most commonly used PD
rating scales. The reliable diagnosis of PD using these scales is dicult, especially in its
early stages [121]. As the symptoms of PD are comorbid with other neurological diseases,
automatic approaches based on machine learning techniques are needed to increase the
116
Chapter 8. Parkinson's Disease Diagnosis using PBL-McRBFN
tecting dysphonia and tremor symptoms. Nearly one third of PD patients exhibit a
ments are experimented with a data set created in [122], consisting of sustained vowel
phonation's from 31 people of whom 23 with PD. In [123], kernel-support vector ma-
select the best kernel width and penalty value. In [124], Support Vector Machine (SVM)
(mRMR) criterion. For mRMR, all the available samples are used in mutual information
computations. Also, the above two approaches uses all the available data samples to
optimize the SVM parameters which is inevitable when working with such a small data
set. In [125], four independent classication approaches (neural networks, data mining
neural, logistic regression and decision trees) are compared for diagnosis of PD. Among
the four approaches, neural networks classier (Multi-layer feed-forward neural network
the above neural network approach is random initialization of weights and heuristic de-
signicantly. Also, it requires retraining when the training samples changes with time.
In [126], parallel neural networks approach is used for prediction of PD. The training time
and complexity of the parallel network approach do increase as the number of parallel
networks increases. In [127], adaptive neuro-fuzzy classier with linguistic hedges is used
for feature selection and classication. Linguistic hedges feature selection will require
the optimization search from huge set of theoretically possible encoding combinations for
each feature and hence computationally intensive. In [128], a fuzzy c-means clustering-
based feature weighting with k -NN classication approach is presented. The choice of
117
Chapter 8. Parkinson's Disease Diagnosis using PBL-McRBFN
addition, increasing the number of training samples may further complicate the choice
of number of clusters.
PD patients exhibit large gait variability compared to the normal persons [129]. Gait
analysis is routinely used in clinical settings to assess these gait disorders. Gait analysis is
of the gait cycle, motion of joints and segments, force/moments, and electromyography
patterns of muscle activation. Machine learning approaches using the dierent gait fea-
tures have been reported in the literature for PD classication [130, 131, 121]. In [130],
image data is obtained from plantar pressure measurements of right foot during heel
to motion from 17 control and 21 PD patients, and SVM is applied to distinguish gait
pattern. Here, other important basic, kinetic and kinematic features are not used in gait
analysis. In [131], a data set from 20 control and 12 PD patients consists of basic spa-
tiotemporal, kinematic and kinetic gait features is used, and the ability of ANN and SVM
classiers is discussed. The above two studies on gait features use their own proprietary
data, with less number of subjects. In [121], data collected from sensors located under
the feet of 73 controls and 93 PD patients is used. In their approach, wavelet transform
has been employed to extract the relevant features and neural networks with weighted
fuzzy membership have been used to approximate the functional relationship between
Recent studies on gene expression analysis found that there is a profound change
in gene expression for individuals aected by PD [132]. These studies discovered that,
diagnosis of early stage PD using vocal and gait features is impossible because tremor
able dopaminergic neurons in the substantia nigra have already died [132]. However,
machine learning approaches for PD classication based on gene analysis have been lim-
ited. Therefore, there is a need for devising a new machine learning approach for PD
Over the past two decades, neuroimaging techniques such as Positron Emission To-
Magnetic Resonance Imaging (MRI) [135] and Transcranial Brain Sonography (TCS)
118
Chapter 8. Parkinson's Disease Diagnosis using PBL-McRBFN
[136] have increasingly been employed to predict PD, to strengthen the neuropathological
complications, and to monitor disease progression [137]. MRI is far more widely available
than PET and SPECT and is most commonly used in clinical practice to dierentiate PD
from normal persons [135]. However, machine learning approaches for PD classication
based on MRI scans have been limited. Therefore, there is a need for devising a new
8.2 Materials
We have considered four possible ways to diagnosis PD using PBL-McRBFN classier
database [138] under the accession number E-GEOD-6613. ParkDB database is the rst
complete set of re-analyzed, curated and annotated microarray data sets. The considered
data set is obtained by transcriptional proling from the RNA extracted from whole
blood of 50 PD patients at early stage and 22 controls [132]. The extracted 22283
oligonucleotide probe sets (short section of gene) on microarrays is used to analyze the
dierence in expression of the genes between PD patients and control. Robust Multi-array
Analysis (RMA) method in Limma package [139] is used to normalize and summarize
the probe intensity measurements. Thus, the complete gene expression data set contains
119
Chapter 8. Parkinson's Disease Diagnosis using PBL-McRBFN
MR images were selected from the PPMI data set. We have considered 239 persons (112
normal persons and 127 PD patients) available in the data set as of April 2012. Among
239 persons, 31 normal persons and 34 PD patient's MR images were excluded due to
Whole brain T1-weighted, 3D MPRAGE MR images were acquired using at least 1.5
Tesla scanner with repetition time between 5-11 ms and echo delay time between 2-6 ms.
The acquired volumes have slice thickness ranging from 1-1.5 mm and voxel dimensions
of 1.0 mm × 1.0 mm × 1.20 mm. The detailed information of the MRI protocols and
preprocessing steps are presented in [140]. The demographics for the data used in our
Table 8.1: Demographic information of PPMI MRI data used in our study
Group Normal Persons PD Patients
Max Little [122] has been used to PD classication by detecting dysphonia. The recording
consists of 195 entries collected from 31 people of whom 23 are suering from PD. From
the 195 samples, 147 are of PD patients and 48 controls. Averages of six phonations
were recorded from each subject, ranging from 1 to 36 sec in length. The 22 attributes
used in this prediction task can be broadly classied into jitter (variation in fundamental
120
Chapter 8. Parkinson's Disease Diagnosis using PBL-McRBFN
and normal gait. This data set consists 166 samples, contains measures of gait from 93
PD patients and 73 controls. The data set collected by 8 sensors underneath each foot,
includes the vertical ground reaction force records of subjects as they walked at their
usual, self-selected pace for approximately 2 minutes on level ground. The 10 attributes
used in this prediction task are left swing interval (sec), right swing interval (sec), left
swing interval ( % of stride), right swing interval ( % of stride), double support interval
(sec), double support interval ( % of stride), left stride variability, right stride variability,
cadence and speed. The 4 swing interval measures and 2 double support interval measures
are ranked the top attributes with maximum relevance and less redundancy for gait
analysis [142].
cation using microarray gene expression data set. The PBL-McRBFN classier perfor-
mance is evaluated using gene expression features in two scenarios as shown in Fig. 8.1.
In rst scenario, its performance is evaluated on ICA reduced features from complete
22283 genes as shown in Fig. 8.1(a). Next, its performance is evaluated on ICA reduced
features from selected 1594/412 genes with signicance levels p < 0.05/0.01 as shown in
Fig. 8.1(b). We have conducted 10 random trials of experiments for every ICA reduced
feature set. In each trial, randomly 75 % of total samples are selected for training and
25% for testing. The classication performance of PBL-McRBFN is compared with the
aects the classier performance on PD prediction. Hence, we select the most informative
genes based on p -value selection from ParkDB database. When less constraints are
121
Chapter 8. Parkinson's Disease Diagnosis using PBL-McRBFN
Pre−processing
Complete gene 22283 genes 10/25/50 features PD/
Expression ICA PBL−McRBFN
Non PD
Data set Classifier
(a)
Pre−processing
Selected genes
Complete gene p < 0.05/0.01 10/25/50 features PBL−McRBFN PD/
Expression ICA Non PD
Classifier
Data set 1594/412 genes
(b)
Figure 8.1: PBL-McRBFN classier on ICA reduced features from: (a) Complete genes,
(b) Selected genes
incorporated, with gene fold change greater than 1.5 (on a binary logarithmic scale)
and p-value less than 0.05, 1594 genes are extracted. When more stringent constraints
are incorporated, with the same fold change (1.5) and increased level signicance level
(p-value less than 0.01), 412 genes are extracted. The above two sets of selected genes
expression features of the same 72 subjects are considered as selected gene expression
data sets.
However, as the feature space in complete gene expression data set and selected gene
expression data sets is high compared to the number of samples, it will be dicult
to predict PD accurately. Hence, the obtained complete and selected genes expression
the components of transformed data are statistically as independent from each other
as possible. ICA can be applied for blind source separation, exploratory data analysis
and feature extraction. Feature extraction using the ICA is a promising application.
The extracted feature vectors from the ICA analysis are as independent from each other
as possible, i.e. the extracted features do not contain mutual information about other
features.
In our study, we employ FastICA (xed-point) algorithm [107], the FastICA package
for MATLAB [108] is used to reduce complete and selected genes expression features to
122
Chapter 8. Parkinson's Disease Diagnosis using PBL-McRBFN
and F -score obtained from 10 random trials of experiments with PBL-McRBFN and
SVM using ICA reduced feature data sets from complete and selected gene expression
data sets are presented in Tables 8.2-8.4. From the Tables 8.2-8.4, it is evident that PBL-
McRBFN generalization performance is better than SVM classier on both complete and
Table 8.2: Performance comparison on complete gene expression data set from an average
of 10 trials
Features # Features Algorithm Testing Accuracy
Type Overall (ηo %) Average (ηa %) F -score
mean (std) mean (std) mean (std)
On complete gene expression data set, PBL-McRBFN classier achieves better gener-
alization performance with 10 ICA reduced features than 25, 50 features as shown in the
Table 8.2. From the Table 8.2, we can see that on 10 ICA reduced features ηa of PBL-
McRBFN is 8 % more than SVM with better F -score value. Similarly on 25, 50 features,
than PBL-McRBFN classier. We can also see that the PBL-McRBFN performance on
original 22283 features is more than ICA reduced features and 2 % more when compared
to SVM performance.
123
Chapter 8. Parkinson's Disease Diagnosis using PBL-McRBFN
achieves better generalization performance with 10 ICA reduced features than 25, 50
features as shown in Table 8.3. From the Table 8.3, we can see that on 10 ICA reduced
features, ηa of PBL-McRBFN is 7 % more than SVM with better F -score value. The bet-
ter performance of both classiers on the selected gene expression data set with p-value
< 0.05 can be observed when compared to the performance on complete gene expression
data set. The ηa of PBL-McRBFN on the selected gene expression data set with p-value
< 0.05 is 13% more than complete gene expression data set, this is due to presence of
more redundant genes information relative to PD in the complete gene expression data
set.
Table 8.3: Performance comparison on selected gene expression data set with p-value <
0.05 from an average of 10 trials
Features # Features Algorithm Testing Accuracy
Type Overall (ηo ) Average (ηa ) F -score
mean (std) mean (std) mean (std)
On selected gene expression data set with p-value < 0.01, PBL-McRBFN classier
achieves better generalization performance with 10 ICA reduced features than 25, 50
features as shown in Table 8.4. From Table 8.4, we can see that on 10 ICA reduced
features data set ηa of PBL-McRBFN is 30 % more than SVM with better F -score value.
The minor reduction in performance of both classiers on the selected gene expression
data set with p-value < 0.01 can be observed when compared to the performance on
selected gene expression data set with p-value < 0.05 and better when compare to the
performance on the complete gene expression data set. The ηa of PBL-McRBFN on the
124
Chapter 8. Parkinson's Disease Diagnosis using PBL-McRBFN
selected gene expression data set with p-value < 0.01 is 1 % less than the selected gene
expression data set with p-value < 0.05, this is due to absence of few informative genes
relative to PD in the selected gene expression data set with p-value < 0.01.
Table 8.4: Performance comparison on selected gene expression data set with p-value <
0.01 from an average of 10 trials
Features # Features Algorithm Testing Accuracy
Type Overall (ηo ) Average (ηa ) F -score
mean (std) mean (std) mean (std)
From the Tables (8.2 to 8.4), we can see that the PBL-McRBFN classier achieves best
performance on the 10 ICA features data set obtained from selected gene expression data
set with p-value < 0.05. The ηa of PBL-McRBFN classier on the 10 ICA features data
set obtained from selected gene expression data set with p-value < 0.05 is 97.17 % which
is 1% more than the ηa of PBL-McRBFN classier on the selected gene expression data
set with p-value < 0.05 without ICA feature reduction (96.87 %). The PBL-McRBFN
classier performance on all the three gene expression data sets without ICA feature
reduction is same. Thus, we can observe that the changes in the performance of PBL-
McRBFN classier on the three gene expression data sets with ICA reduced features is
due to bad performance of ICA. We can also see that the PBL-McRBFN performance
on original 1594 and 412 features is more than ICA reduced features and better when
125
Chapter 8. Parkinson's Disease Diagnosis using PBL-McRBFN
cation using MRI data set. First time in the literature, we conducted PD prediction
study using MRI scans. Hence, no study available in the literature on PD prediction
PBL-McRBFN is used for PD classication and RFE method is used for feature selection.
RFE uses the training algorithm (PBL-McRBFN) recursively to eliminate irrelevant fea-
the least important feature whose elimination will have the no eect on classication
performance.
between PD patients and normal persons, and to extract morphometric features from
MRI scans. VBM analysis used in this study is as described in section 7.2.2. The ow
diagram of the feature extraction process is shown in Fig. 8.2. For better understanding,
we show the maximum intensity projections of the signicant voxels in sagittal, coronal
To locate the above regions with respect to the spatial locations in the brain, these
regions were overlaid on the sliced sections of the commonly used MNI brain template
and the results of the same are shown in Fig. 8.4. From Fig. 8.3 and Fig. 8.4, it is
inferred that there is a signicant gray matter volume dierences in the superior temporal
gyrus, middle temporal gyrus, parahippocampal gyrus, sub-gyral and insula regions of
the brain, which have also been highlighted in the medical literature [144].
The voxels locations of the VBM detected signicant regions are used as mask in
order to extract the features from all the segmented gray matter images. The feature
extraction process computes a vector with all the gray matter segmentation values for
126
Chapter 8. Parkinson's Disease Diagnosis using PBL-McRBFN
Figure 8.2: Schematic diagram of the stages in feature extraction based on the VBM
analysis.
A total of 2981 features (gray matter tissue probability values) are extracted from
the VBM identied region and are then used as an input to the PBL-McRBFN classier.
However, as the feature space in VBM detected features set is high compared to the
VBM features are further reduced statistically by ICA [107]. In our study, we employ
FastICA (xed-point) algorithm [107]. The FastICA package for MATLAB [108] is used
and ICA reduced features. We have conducted 10 random trials of experiments for VBM
feature set and for every ICA reduced feature set. In each trial, 75 % and 25% samples
are randomly chosen for training and testing respectively. The classication performance
obtained during the 10 random trials for the PBL-McRBFN and the SVM classiers on
127
Chapter 8. Parkinson's Disease Diagnosis using PBL-McRBFN
Figure 8.3: Maximum intensity projections from PPMI MRI data set - Normal persons
vs. PD patients (a) sagittal view (b) coronal view (c) axial view
2981 VBM features set are presented in Table 8.5. In each trial, randomly 75 % of total
samples are selected for training and 25 % for testing. From Table 8.5, we can see that
testing accuracy of PBL-McRBFN is 3 % more than SVM with better sensitivity and
the VBM morphometric features from MRI scans for prediction of PD.
Table 8.5: Performance comparison on 2981 VBM features data set from an average of
10 trials
SVM 96.40 (2.18) 98.52 (1.55) 94.00 (3.61) 79.06 (3.63) 83.04 (6.30) 74.5 (9.55)
PBL-McRBFN 93.04 (2.45) 95.88 (2.92) 89.83 (6.20) 82.32 (2.50) 83.47 (6.41) 81.00 (4.59)
128
Chapter 8. Parkinson's Disease Diagnosis using PBL-McRBFN
Figure 8.4: Gray matter volume change from PPMI MRI data set - Normal persons vs.
PD patients (a) sagittal view (b) coronal view (c) axial view
from VBM analysis. A total 2981 morphometric features extracted from the VBM were
reduced to dierent combinations of 10 and 50 features using FastICA [108]. The mean,
ing the 10 random trials for the PBL-McRBFN and the SVM classiers on dierent ICA
reduced features sets are presented in Table 8.6. In each trial, randomly 75 % of total
samples are selected for training and 25 % for testing. From Table 8.6, we can see that
the PBL-McRBFN classier achieves better generalization performance with 10 ICA re-
duced features than 50 ICA reduced features as shown in Table 8.6. On 10 ICA reduced
features, testing accuracy of PBL-McRBFN is 4 % more than SVM with better sensitivity
and specicity values. Similarly on 50 ICA reduced features, testing accuracy of PBL-
McRBFN is 5 % more than SVM classier with better sensitivity and specicity values.
129
Chapter 8. Parkinson's Disease Diagnosis using PBL-McRBFN
Table 8.6: Performance comparison on ICA reduced features data sets from an average
of 10 trials
10 SVM 94.76 (2.52) 97.94 (1.02) 91.16 (4.97) 67.44 (3.46) 75.65 (8.50) 58.00 (10.85)
PBL-McRBFN 93.28 (2.18) 94.55 (3.18) 91.83 (4.04) 71.39 (4.25) 63.91 (12.64) 80.00 (9.71)
50 SVM 98.35 (2.88) 98.52 (2.77) 98.16 (3.08) 63.25 (3.06) 67.82 (8.24) 58.00 (8.88)
PBL-McRBFN 92.50 (9.69) 92.79 (9.59) 92.16 (1.27) 68.83 (2.94) 68.26 (7.68) 69.50 (10.12)
From Tables 8.5 and 8.6, it is observed that PBL-McRBFN classier gives better
results with 2981 morphometric features extracted from VBM analysis than reduced
features from VBM and ICA analysis. PBL-McRBFN classication accuracy on VBM
features set is 82.32 % which is 11% more than the classication accuracy on ICA reduced
10 features set, this is because the considered binary classication problem consists of
small number MRI volumes with high dimensional VBM features of PD patients and
using PPMI MRI data set. In the previous section, PBL-McRBFN classier performance
has been evaluated on features obtained from VBM analysis and further reduced ICA
features. VBM involves voxel-wise statistical analysis of MRI volumes and infers regions
in which brain volume diers between PD and normal persons. All the inferred regions
from VBM may not be useful to predict PD and further reduced ICA features do not
provide any information relevant to the critical brain regions relevant to PD. In this
study, we have conducted an analysis to identify most signicant brain regions (imaging
130
Chapter 8. Parkinson's Disease Diagnosis using PBL-McRBFN
many irrelevant and redundant features as possible and to nd a feature subset such
that, with reduced low dimensional data, a machine learning classier can achieve better
performance. Filter and wrapper methods are two kinds of well-known feature selection
techniques for high dimensional data [113]. In the lter method, features are selected
learning algorithm. The separability only takes into account the correlations between
the features, so the selected features may not be optimal. Wrapper methods search for
critical features based on the learning algorithm, and often result in better results than
lter methods. RFE is a computationally less intensive wrapper based feature selection
method. In this study, we used RFE feature selection utilizing a PBL-McRBFN classier.
conducts feature selection in a sequential elimination manner, which starts with all the
Table 8.7: VBM detected and PBL-McRBFN-RFE selected regions responsible for PD
VBM analysis detected a total of 2981 features. The corresponding brain regions to
the 2981 VBM detected features are reported in Table 8.7. The mean testing performance
Table 8.8. To identify most signicant brain regions responsible for PD, the minimal set
of features are found using PBL-McRBFN-RFE. After performing feature selection using
regions to the 19 PBL-McRBFN-RFE selected features are reported in Table 8.7. MNI
templates of the complete 2981 and 19 features regions are shown in Fig. 8.5. The mean
131
Chapter 8. Parkinson's Disease Diagnosis using PBL-McRBFN
is 87.21% as shown in Table 8.8. From Table 8.8, we can see that the selected 19 features
from PBL-McRBFN-RFE approach give better prediction rate than 2981 features.
2981 (VBM) 93.04 (2.45) 95.88 (2.92) 89.83 (6.20) 82.32 (2.50) 83.47 (6.41) 81.00 (4.59)
19 (VBM+ 92.50 (9.69) 92.79 (9.59) 92.17 (12.77) 87.21 (3.67) 87.39 (4.32) 87.00 (10.32)
PBL-McRBFN-RFE)
Figure 8.5: Comparison of gray matter volume change - Normal persons vs. PD patients
in Superior temporal gyrus region
gyrus brain region. The superior temporal gyrus is one of three (sometimes two) gyri
in the temporal lobe of the human brain. The superior temporal gyrus is involved in
auditory processing, including language and social cognition. Superior temporal gyrus is
consistently reported in medical research studies as the biomarker of PD [137, 143, 144].
Hence, the brain region detected by PBL-McRBFN-RFE among VBM features from MRI
132
Chapter 8. Parkinson's Disease Diagnosis using PBL-McRBFN
Table 8.9: Performance comparison on vocal data set from an average of 10 trials
Algorithm Testing Accuracy
Overall (ηo ) Average (ηa ) F -score
SVM 96.94 (2.40) 96.67 (2.16) 0.9221 (0.0286)
PBL-McRBFN 98.97 (1.07) 99.35 (0.68) 0.9934 (0.0069)
impairment in the normal production of vocal sounds, which is dysphonia. The voice
of people with dysphonia will sound hoarse, strained or eortful. Telemonitoring of the
PD disease using measurements of dysphonia has a vital role in its early diagnosis as
the symptoms of PD occur gradually and mostly targeting the elderly people for whom
The mean, standard deviation of testing eciencies and F -score obtained during the
10 random trials with PBL-McRBFN and SVM classiers are presented in Table 8.9. In
each trial, randomly 75 % of total samples are selected for training and 25 % for testing.
From the Table 8.9, we can see that on vocal data set, ηa of PBL-McRBFN is 3 % more
than SVM classier with better F -score value. The reported results in the literature
on the same vocal data is given in Table 8.10. From the Table 8.10, it is evident that
in the literature on the same vocal data set. On 50-50 % train-test combination, the best
than k -NN approach with fuzzy c-means clustering (97.93 %) [128]. Thus, PBL-McRBFN
classier performs an ecient classication of the vocal features for prediction of PD.
movement during walking. Gait analysis is a systematic study of human motion from the
measurements of spatial-temporal parameters of the gait cycle, motion of joints and seg-
133
Chapter 8. Parkinson's Disease Diagnosis using PBL-McRBFN
Table 8.10: PBL-McRBFN classier performance comparison with studies in the litera-
ture on vocal data set
Study Algorithm Testing
Accuracy
134
Chapter 8. Parkinson's Disease Diagnosis using PBL-McRBFN
Table 8.11: Performance comparison on gait data set from an average of 10 trials
Algorithm Testing Accuracy
Overall (ηo ) Average (ηa ) F -score
SVM 77.56 (3.78) 77.37 (3.68) 0.7803 (0.0508)
PBL-McRBFN 83.90 (2.86) 84.36 (2.42) 0.8519 (0.0340)
Table 8.12: PBL-McRBFN classier performance comparison with studies in the litera-
ture using gait patterns
Study Algorithm Testing
Accuracy
The mean, standard deviation of testing eciencies and F -score obtained during the
10 random trials for PBL-McRBFN and SVM classiers are presented in Table 8.11. In
each trial, randomly 75 % of total samples are selected for training and 25 % for testing.
From the Table 8.11, we can see that ηa of PBL-McRBFN is 7 % more than SVM with
better F -score value. The reported result in the literature on the same gait features
data is given in Table 8.12. From Table 8.12, we can see that on the same gait data
approximately 5 % more than neural network approach with weighted fuzzy member-
ship functions on wavelet based feature extraction [121]. Thus, PBL-McRBFN classier
135
Chapter 8. Parkinson's Disease Diagnosis using PBL-McRBFN
8.7 Summary
In this chapter, PD diagnosis problem was solved by employing PBL-McRBFN classier.
The early diagnosis of PD based on microarray gene expression and MRI features, and
detection of PD based on vocal and gait features are presented. The quantitative com-
parison with SVM classier and existing literature results clearly indicates the superior
or without PD. Imaging biomarkers responsible for PD are detected with PBL-McRBFN-
RFE approach using PPMI MRI data set. Identication of genes responsible for PD using
feature selection techniques will help doctor's to track the development of the disease.
In the next chapter, we shall summarize the work done in this thesis and conclude it
136
Chapter 9
Conclusions and Future Works
9.1 Conclusions
This thesis focuses on the development and application of meta-cognitive sequential learn-
ing algorithms in radial basis function network for classication problems with fewer
samples, high dimensional feature set, and high sample imbalance. First time in the lit-
erature, human meta-cognition principles are integrated in radial basis function network.
Human like self-regulated learning helps to radial basis function network in achieving
disease and Parkinson's disease. To summarize, the major contributions of this thesis
are:
(a) Development of an Extended Kalman Filter based Meta-cognitive Radial Basis Func-
proach for imaging biomarkers detection of Alzheimer's disease based on MRI scans.
137
Chapter 9. Conclusion
on micro-array gene expression, MRI scans, gait and vocal features and application
and its sequential learning algorithm has been derived using Extended Kalman Filter
(EKF). The McRBFN using EKF is referred to as, `EKF-McRBFN'. The McRBFN
has a cognitive and a meta-cognitive component that monitors and controls the
learning ability of the cognitive component. A radial basis function with the Gaus-
zero hidden neurons and adds a neuron or updates an existing neuron based on the
learning strategy chosen by the meta-cognitive component for every sample in the
training data set. Thus meta-cognitive component controls the cognitive component
the parameters of the new neuron are initialized based on the sample overlapping
are updated using the EKF. The meta-cognitive component decides when-to-learn
by reserving the sample for future use. Performance study of EKF-McRBFN on
138
Chapter 9. Conclusion
a new neuron is added to the network, the projection based learning algorithm of
PBL-McRBFN initializes the input parameters based on the distance criterion, and
computes the optimum output weights by minimizing a hinge-loss error function. The
and the output weights are obtained by solving a set of simultaneous linear equations.
Thus, while adding a new neuron, the existing neurons are used as pseudo-samples
to represent knowledge of the past samples. Thus, the PBL-McRBFN explicitly uses
the input-output relationship dened by the training data set. Performance study
from MRI scans using the OASIS [101] and ADNI [102] data sets. Study results show
jects in both the OASIS and ADNI data sets. It is observed from the results that the
formance than ICA reduced features, as the data sets contain very mild AD patients
and fewer samples. Thus, from the studies conducted on OASIS and ADNI data
sets, we can infer that human meta-cognitive principles in machine learning algo-
set, and testing its performance on the ADNI data set. Performance results on this
139
Chapter 9. Conclusion
proposed to identify imaging biomarkers for AD. Imaging biomarkers responsible for
AD are detected with the proposed PBL-McRBFN-RFE approach using OASIS data
set. The imaging biomarkers identied using PBL-McRBF-RFE approach are in the
parahippocampal gyrus, the hippocampus, the superior temporal gyrus, the insula,
the precentral gyrus and the extra-nuclear regions. These regions are also indicated
Next, the PBL-McRBFN-RFE approach has also been used to identify imaging
biomarkers for AD from the OASIS gender-wise and age-wise analysis. The results
In the 60-69 age group AD patients, gray matter atrophy is observed in the superior
temporal gyrus region which is responsible for processing sounds. In the 70-79 age
and the extra-nuclear regions which are responsible for memory encoding and re-
trieval. In the 80-89 age group AD patients, gray matter atrophy is observed in the
hippocampus, the parahippocampal gyrus and the lateral ventrical regions which are
ory encoding and retrieval. In male AD patients, gray matter atrophy is observed
in the insula region which is responsible for emotion and consciousness. The in-
sula region is also reported in AD research studies [117, 118] and it is associated
the parahippocampal gyrus and the extra-nuclear regions which are responsible for
expression, MRI scans, gait and vocal features. Here, PBL-McRBFN classier is
140
Chapter 9. Conclusion
In comparison with existing results in the literature the performance results from
the above studies on these data sets show that the PBL-McRBFN performs better
than existing results. Finally, imaging biomarkers responsible for PD are detected
with the proposed PBL-McRBFN-RFE feature selection approach using PPMI MRI
data set. PBL-McRBFN-RFE approach results shows that the superior temporal
gyrus brain region plays a more signicant role than others in detecting PD. The
tance and the predicted class labels as the monitoring signals. However, in the literature
of human meta-cognition, the feel-of-knowing has been used as the monitoring signal.
The term feel-of-knowing (FOK) stated as, an individual may fail to recall an item from
memory but still feel that it would be recognized on a later test, a retrieval state, the
criterion test.
141
Chapter 9. Conclusion
Cauchy radial basis function is preferred in applications like image retrieval [149] and
q -exponential expressions [152]. Thus, the modication of the q -parameter allows the
etc) and helps the q -Gaussian function to match the shape of the kernel and the distri-
bution of the distances better [152]. The q -parameter helps the q-Gaussian RBF to
reproduce dierent RBF's [152]. For example, when q → 1, the q -Gaussian converges
to a Gaussian RBF, while q→2 and q→3 converges to a Cauchy RBF, and inverted
multi-quadratic RBF, respectively. Thus, the q -Gaussian helps to realize dierent radial
basis functions for dierent values of the parameter q. Therefore, it is desirable to em-
ploy an activation function like the q-Gaussian function that helps to employ radial basis
Feature Selection :
One important issue in the diagnosis of neurodegenerative diseases is the curse of di-
mensionality, where the data sets have fewer samples with huge dimensional feature set.
Hence, there is a need to select appropriate feature set for better performance, and to
select the best feature subset that are non-redundant and more relevant to the class
distributions. We plan to work along this direction for better performance results.
9.2.2 Applications
Alzheimer's Disease Diagnosis :
In the present work, PBL-McRBFN classier has been used to predict AD patients from
normal persons based on MRI scans with VBM features extracted from MRI scans and
142
Chapter 9. Conclusion
MRI scans. Similar approach can be used to predict Mild Cognitive Impairment (MCI)
patients from normal persons. MCI is early stage of AD and it increases the risk of
developing AD. If one were able to successfully treat MCI such that the progression of
these individuals to AD could be delayed by one year, there would be a signicant saving.
normal persons based on micro-array gene expression and MRI scans, and PD detection
based on gait and vocal features. Also, PBL-McRBFN-RFE approach is used to detection
selection approach can be used to detect biomarkers based on gene expression features.
143
Publications List
Journals
1. G. Sateesh Babu, S. Suresh and B. S. Mahanand, A novel PBL-McRBFN-RFE
approach for identication of critical brain regions responsible for Parkinson's dis-
ing in a Radial Basis Function Network for Classication Problems, IEEE Trans-
actions on Neural Networks and Learning Systems , vol. 24(2), pp: 194-206, 2013.
3. G. Sateesh Babu and S. Suresh, Parkinson's Disease Prediction Using Gene Ex-
4. G. Sateesh Babu and S. Suresh, Meta-cognitive RBF Network and Its Projection
Based Learning algorithm for classication problems, Applied Soft Computing , vol.
13(1), pp: 654-666, 2013.
2012.
Conference Proceedings
1. G. Sateesh Babu, S. Suresh and B. S. Mahanand, Meta-cognitive q-Gaussian
ment (MCI), Intl. Joint Conf. Neural Networks (IJCNN) , Dallas (Texas, USA),
144
Chapter 9. Conclusion
cognitive Radial Basis Function Network for classication problems, Intl. Joint
Conf. Neural Networks (IJCNN) , Brisbane (Australia), pp: 2907-2914, 2012.
using a Projection Based Learning Meta-cognitive RBF Network, Intl. Joint Conf.
Neural Networks (IJCNN) , Brisbane (Australia), pp: 408-415, 2012.
tion Based Learning Meta-cognitive RBF Network Classier for eective diagno-
berg, 2012.
145
Bibliography
lag, 2002.
[2] A. Wenden, Learner strategies for learner autonomy . Great Britain: Prentice Hall,
1998.
ment and self management among experience language learners, Modern Language
Journal, vol. 85, no. 2, pp. 279 290, 2001.
developmental inquiry, American Psychologist , vol. 34, no. 10, pp. 906 911, 1979.
ndings, Psychology of Learning and Motivation , vol. 26, pp. 125 173, 1990.
146
BIBLIOGRAPHY
[9] S. Suresh, K. Dong, and H. Kim, A sequential learning algorithm for self adap-
tive resource allocation network classier, Neurocomputing, vol. 73, no. 16 18,
Alzheimer's disease from structural MRI: A comparison of ten methods using the
ADNI database, NeuroImage, vol. 56, no. 2, pp. 766 781, 2011.
language of AD patients: A longitudinal study, Neurology, vol. 45, no. 2, pp. 299
302, 1995.
behavior problems in Alzheimer's disease, Neurology, vol. 54, no. 2, pp. 427432,
2000.
[17] G. Sateesh Babu and S. Suresh, Meta-cognitive Neural Network for classication
96, 2012.
147
BIBLIOGRAPHY
[18] G.-B. Huang, H. Zhou, X. Ding, and R. Zhang, Extreme Learning Machine for
[19] J. C. Platt, A resource allocation network for function interpolation, Neural Com-
putation, vol. 3, no. 2, pp. 213225, 1991.
for function approximation using minimal radial basis function neural networks,
tems, IEE Proceedings: Control Theory and Applications , vol. 147, no. 4, pp. 476
484, 2000.
learning algorithm for growing and pruning RBF (GAP-RBF) networks, IEEE
transactions on Systems, Man, and Cybernetics. Part B, Cybernetics , vol. 34, no. 6,
pp. 22842292, 2004.
RBF network for classication problems, Neurocomputing, vol. 70, no. 16-18,
[26] G.-B. Huang, Q. Y. Zhu, and C. K. Siew, Extreme learning machine: A new learn-
148
BIBLIOGRAPHY
[27] G.-B. Huang, D. Wang, and Y. Lana, Extreme Learning Machines: A Survey,
International Journal of Machine Leaning and Cybernetics , vol. 2, no. 2, pp. 107
122, 2011.
[28] N.-Y. Liang, G.-B. Huang, P. Saratchandran, and N. Sundararajan, A fast and
[30] G.-B. Huang, L. Chen, and C.-K. Siew, Universal approximation using incremental
[31] G.-B. Huang and L. Chen, Convex incremental extreme learning machine, Neu-
rocomputing, vol. 70, no. 1618, pp. 30563062, 2007.
[32] G.-B. Huang and L. Chen, Enhanced random search based incremental extreme
learning machine, Neurocomputing, vol. 71, no. 1618, pp. 34603468, 2008.
[34] S. Soltic, S. Wysoski, and N. Kasabov, Evolving spiking neural networks for taste
[35] F. Alnajjar, I. Bin Mohd Zin, and K. Murase, A Spiking Neural Network with
149
BIBLIOGRAPHY
learning for spiking neural networks: A review and new strategies, in 2010 IEEE
9th International Conference on Cybernetic Intelligent Systems (CIS) , pp. 16,
2010.
[37] C. Cortes and V. Vapnik, Support vector networks, Machine Learning, vol. 20,
[38] P. Laskov, C. Gehl, S. Krüger, and K.-R. Müller, Incremental Support Vector
[40] J. Ma, J. Theiler, and S. Perkins, Accurate Online Support Vector Regression,
support vector machines, IEEE Transactions on Neural Networks , vol. 21, no. 7,
gorithm, IEEE Transactions on Signal Processing , vol. 56, no. 2, pp. 543554,
2008.
[44] B. Chen, S. Zhao, P. Zhu, and J. C. Príncipe, Mean square convergence analysis
for kernel least mean square algorithm, Signal Processing, vol. 92, no. 11, pp. 2624
2632, 2012.
150
BIBLIOGRAPHY
[45] P. P. Pokharel, W. Liu, and J. C. Príncipe, Kernel least mean square algorithm
with constrained growth, Signal Processing, vol. 89, no. 3, pp. 257 265, 2009.
Least Squares Algorithm, IEEE Transactions on Signal Processing , vol. 57, no. 10,
pp. 38013814, 2009.
framework for spike train signal processing, Neural Computation , vol. 21, no. 2,
[49] B. Chen, S. Zhao, P. Zhu, and J. C. Príncipe, Quantized Kernel Least Mean Square
Algorithm, IEEE Transactions on Neural Networks and Learning Systems , vol. 23,
no. 1, pp. 2232, 2012.
[50] P. Zhu, B. Chen, and J. C. Príncipe, A novel extended kernel recursive least
squares algorithm, Neural Networks, vol. 32, pp. 349 357, 2012.
[52] S. Zhao, B. Chen, P. Zhu, and J. C. Príncipe, Fixed budget quantized kernel
least-mean-square algorithm, Signal Processing, vol. 93, no. 9, pp. 2759 2770,
2013.
Classier using Radial Basis Function Networks, Neurocomputing, vol. 71, no. 7-
[54] T. Harris and R. Hodges, eds., The literacy dictionary: The vocabulary of reading
and writing. Newark, DE: International Reading Association, 1995.
151
BIBLIOGRAPHY
[55] L. R. Squire, Mechanisms of memory, Science, vol. 232, no. 4758, pp. 16121619,
1986.
convex risk minimization, Annals of Statistics, vol. 32, no. 1, pp. 5685, 2003.
[58] B. Scholkopf and A.-J. Smola, Learning with Kernels . MIT Press, Cambridge, MA,
2002.
[59] H. Homann, Kernel PCA for novelty detection, Pattern Recognition , vol. 40,
[61] C. Blake and C. Merz, UCI repository of machine learning databases, University
[64] S. Suresh, S. N. Omkar, V. Mani, and T. N. Guru Prakash, Lift coecient pre-
diction at high angle of attack using recurrent neural network, Aerospace Science
and Technology, vol. 7, no. 8, pp. 595 602, 2003.
152
BIBLIOGRAPHY
[65] C.-C. Chang and C.-J. Lin, LIBSVM: A library for support vector machines,
[66] J. Demsar, Statistical comparisons of classiers over multiple data sets, Journal
of Machine Learning Research , vol. 7, no. 1, pp. 130, 2006.
Friedman static, Communication in Statistics , vol. 9, no. 6, pp. 571 595, 1980.
[68] J. H. Zar, Biostatistical analysis (4th Ed.) . Englewood Clifs, New Jersey: Prentice-
Hall, 1999.
[69] O. J. Dunn, Multiple comparisons among means, Journal of the American Sta-
tistical Association, vol. 56, no. 293, pp. 52 64, 1961.
lewy body, vascular and frontotemporal dementia, and hippocampal sclerosis in the
state of Florida brain bank, Alzheimer Disease and Associated Disorder , vol. 16,
153
BIBLIOGRAPHY
probable Alzheimer's disease, Archives of Neurology , vol. 52, no. 7, pp. 659664,
1995.
the use of NINCDS-ADRDA and DSM-III-R criteria, SPECT, X-ray CT, and Apo
classication of MR scans in Alzheimer's disease, Brain, vol. 131, no. 3, pp. 681
689, 2008.
154
BIBLIOGRAPHY
Toga, Tracking Alzheimer's disease, Annals Of The New York Academy Of Sci-
ences, vol. 1097, pp. 183214, 2007.
[81] P. Vemuri and C. Jack Jr., Role of structural MRI in Alzheimer's disease,
ease and mild cognitive impairment applied on data from ADNI, Hippocampus,
vol. 19, no. 6, pp. 579587, 2009.
hippocampus in preclinical AD, Neurology, vol. 58, no. 8, pp. 11881196, 2002.
dementia and Alzheimer's disease, Neurology, vol. 52, no. 1, pp. 91100, 1999.
gulate cortex in rst episode schizophrenia, Human Brain Mapping , vol. 29, no. 4,
pp. 478489, 2008.
155
BIBLIOGRAPHY
[91] SPM8, Wellcome trust center for neuroimaging, Institute of neurology, UCL, Lon-
Alzheimer's disease and mild cognitive impairment, NeuroImage, vol. 55, no. 3,
[95] Y. Fan, D. Shen, and C. Davatzikos, Classication of structural images via high-
Springer-Verlag, 2005.
156
BIBLIOGRAPHY
[97] U. Yoon, J.-M. Lee, K. Im, Y.-W. Shin, B. H. Cho, I. Y. Kim, J. S. Kwon, and S. I.
its discriminative pattern in schizophrenia, NeuroImage, vol. 34, no. 4, pp. 1405
1415, 2007.
data: Examining the assumptions, Human Brain Mapping , vol. 6, no. 5-6, pp. 368
372, 1998.
[99] L. Xu, G. Pearlson, and V. D. Calhoun, Joint source based morphometry identies
linked gray and white matter group dierences, NeuroImage, vol. 44, no. 3, pp. 777
789, 2009.
data in young, middle aged, nondemented, and demented older adults, Journal of
Cognitive Neuroscience , vol. 19, pp. 14981507, 2007.
Initiative, Neuroimaging Clinics of North America , vol. 15, no. 4, pp. 869 877,
2005.
157
BIBLIOGRAPHY
initiative (ADNI): MRI methods, Journal of Magnetic Resonance Imaging , vol. 27,
no. 4, pp. 685691, 2008.
[106] M. García-Sebastián, A. Savio, M. Graña, and J. Villanúa, On the use of mor-
[107] A. Hyvarinen, Fast and robust xed-point algorithms for independent component
analysis, IEEE Transactions on Neural Networks , vol. 10, no. 3, pp. 626634, 1999.
[108] H. Gavert, J. Hurri, J. Sarela, and A. Hyvarinen, The FastICA package for MAT-
[109] Y. Fan and D. Shen, Integrated feature extraction and selection for neuroimage
classication, in Medical Imaging 2009: Image Processing , (Lake Buena Vista, FL,
USA), p. 72591U, 2009.
[110] W. Yang, H. Xia, B. Xia, L. M. Lui, and X. Huang, ICA-based feature extrac-
158
BIBLIOGRAPHY
ADNI dataset, NeuroImage, vol. 48, no. 1, pp. 138 149, 2009.
[113] I. Guyon and A. Elissee, An introduction to variable and feature selection, Jour-
nal of Machine Learning Research , vol. 3, pp. 11571182, 2003.
[118] D. Bonthius, A. Solodkin, and G. Van Hoesen, Pathology of the insular cortex
159
BIBLIOGRAPHY
[120] C.-W. Cho, W.-H. Chao, S.-H. Lin, and Y.-Y. Chen, A vision-based analysis
system for gait recognition in patients with Parkinson's disease, Expert Systems
with Applications, vol. 36, no. 3, Part 2, pp. 70337039, 2009.
[121] S.-H. Lee and J. S. Lim, Parkinson's disease classication using gait characteristics
and wavelet-based feature extraction, Expert Systems with Applications , vol. 39,
Exploiting Nonlinear Recurrence and Fractal Scaling Properties for Voice Disorder
Detection, BioMedical Engineering OnLine , vol. 6, no. 23, pp. 119, 2007.
of Dysphonia, Journal of Medical Systems , vol. 34, no. 4, pp. 591599, 2010.
disease, Expert Systems with Applications , vol. 37, no. 2, pp. 15681572, 2010.
[126] F. Strom and R. Koker, A parallel neural network approach to prediction of Parkin-
son's Disease, Expert Systems with Applications , vol. 38, no. 10, pp. 1247012474,
2011.
Disease from Sustained Phonation Tests Using ANN and Adaptive Neuro-Fuzzy
Classier, Journal of Engineering Science and Design , vol. 1, no. 2, pp. 5964,
2010.
160
BIBLIOGRAPHY
[129] G. T. Stebbins and C. G. Goetz, Factor structure of the Unied Parkinson's Dis-
ease Rating Scale: Motor Examination section, Movement Disorders , vol. 13, no. 4,
pp. 633636, 1998.
[130] H.-S. Jeon, J. Han, W.-J. Yi, B. Jeon, and K. S. Park, Classication of Parkinson
gait and normal gait using Spatial-Temporal Image of Plantar pressure, in Engi-
neering in Medicine and Biology Society. 30th Annual International Conference of
the IEEE, pp. 46724675, 2008.
Machine Learning Approach, Journal of Applied Sciences , vol. 12, no. 2, pp. 180
185, 2012.
[133] H. Shinotoh and D. B. Calne, The Use of PET in Parkinsons-Disease, Brain and
Cognition, vol. 28, no. 3, pp. 297310, 1995.
161
BIBLIOGRAPHY
coded real-time sonography, Neurology, vol. 45, no. 1, pp. 182184, 1995.
[139] G. K. Smyth, Linear models and empirical bayes methods for assessing dierential
H. Tony, L. Johan, B. Marcel van der, D. R. Alastair, and T. Peggy, The Parkin-
son Progression Marker Initiative (PPMI), Progress in Neurobiology , vol. 95, no. 4,
pp. 629635, 2011.
ease, European Journal of Neuroscience , vol. 26, no. 8, pp. 23692375, 2007.
162
BIBLIOGRAPHY
[142] H. Zheng, M. Yang, H. Wang, and S. McClean, Machine Learning and Statistical
imaging study of patients with Parkinson's disease with mild cognitive impairment
[145] B. Shahbaba and R. Neal, Nonlinear Models Using Dirichlet Process Mixtures,
chines: Sparsity and Accuracy, IEEE Transactions on Neural Networks , vol. 21,
Disease, in Medical Biometrics (D. Zhang and M. Sonka, eds.), vol. 6165 of Lecture
Notes in Computer Science , pp. 306314, Springer Berlin / Heidelberg, 2010.
[149] K. Shkurko and X. Qi, A Radial Basis Function and Semantic Learning Space
[150] J. Zhang and H. Li, A Reconstruction Approach to CT with Cauchy RBFs Net-
work, in Advances in Neural Networks - ISNN (F.-L. Yin, J. Wang, and C. Guo,
163
BIBLIOGRAPHY
eds.), vol. 3174 of Lecture Notes in Computer Science , pp. 531536, Springer Berlin
Heidelberg, 2004.
[151] A. Saranli and B. Baykal, Complexity reduction in radial basis function (RBF)
networks by using radial B-spline functions, Neurocomputing, vol. 18, no. 1-3,
tions Neural Networks with a Hybrid Algorithm for binary classication, Neuro-
computing, vol. 75, no. 1, pp. 123 134, 2012.
164