Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Giduthuri - Sateesh - Babu - PHD - Thesis

Download as pdf or txt
Download as pdf or txt
You are on page 1of 181

Meta-cognitive Sequential Learning in

RBF Network for Diagnosis of


Neurodegenerative Diseases

A thesis submitted to
School of Computer Engineering
Nanyang Technological University

by

Giduthuri Sateesh Babu

in fullment of the requirements


for the degree of Doctor of Philosophy

2014
Acknowledgements

I would like to express my deepest gratitude to my supervisor Dr. Suresh Sundaram for

his intelligent guidance and patient nurture for the past years. I have learned so much

from his ways of critical thinking and his analytic insights into the problems helped

greatly in the accomplishment of this research work. I am proud to have such a great

mentor on my way towards research and he has made my stay at Nanyang Technological

University a truly valuable experience.

I want to thank Dr. R. Savitha and Dr. B. S. Mahanand for their numerous helpful

comments and enlightening discussions throughout my research course. Their time and

assistance were a valuable contribution to this research work.

I would also like to dedicate this special thanks to my family and friends, especially

my parents, who have always been there for me. This research work would not have been

possible without their love and encouragements.

Thanks are also dedicated to Center for Computational Intelligence for the research

facilities. I also acknowledge Nanyang Technological University for the nancial support

and this precious opportunity of study. Finally, I pay my tributes to the God Almighty

for blessing me in all my endeavors.

i
Contents

Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . i

Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vi

List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii

List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . x

List of Abbreviations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiii

1 Introduction 1
1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.2 Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

1.3 Thesis Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

1.4 Thesis Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

2 Literature Review on Sequential Learning Algorithms in Neural Net-


works 13
2.1 Sequential Learning Algorithms . . . . . . . . . . . . . . . . . . . . . . . 14

2.1.1 Error Driven Algorithms . . . . . . . . . . . . . . . . . . . . . . . 14

2.1.2 Neuron Signicance Based Algorithms . . . . . . . . . . . . . . . 15

2.1.3 Extreme Learning Machine Based Algorithms . . . . . . . . . . . 16

2.1.4 Spiking Neural Networks Algorithms . . . . . . . . . . . . . . . . 17

2.1.5 Incremental-Decremental SVM Algorithms . . . . . . . . . . . . . 18

2.1.6 Kernel Least Mean Square Based Algorithms . . . . . . . . . . . . 18

2.1.7 Sequential Classication Algorithms . . . . . . . . . . . . . . . . . 19

2.2 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

ii
3 An Overview on Meta-cognition 22
3.1 Denitions of Important Concepts in Meta-cognition . . . . . . . . . . . 22

3.2 Models of Meta-cognition . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

3.3 Motivation for Meta-cognitive Learning . . . . . . . . . . . . . . . . . . . 24

3.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

4 Meta-cognitive Radial Basis Function Network and Its EKF Based Se-
quential Learning Algorithm for Classication Problems 26
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

4.2 Classication Problem Denition . . . . . . . . . . . . . . . . . . . . . . 26

4.3 EKF-McRBFN Classier . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

4.3.1 Cognitive Component of EKF-McRBFN . . . . . . . . . . . . . . 28

4.3.2 Meta-cognitive Component of EKF-McRBFN . . . . . . . . . . . 29

4.4 EKF-McRBFN Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . 39

4.4.1 Guidelines for EKF-McRBFN Thresholds Initialization . . . . . . 39

4.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

5 Projection Based Learning Algorithm for Meta-cognitive RBF Network


Classier 44
5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

5.2 PBL-McRBFN Classier . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

5.2.1 Cognitive Component of PBL-McRBFN . . . . . . . . . . . . . . 45

5.2.2 Meta-cognitive Component of PBL-McRBFN . . . . . . . . . . . 47

5.3 PBL-McRBFN Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . 54

5.4 Salient Features of PBL-McRBFN Algorithm . . . . . . . . . . . . . . . . 54

5.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

6 Performance Evaluation of EKF-McRBFN and PBL-McRBFN Classi-


ers 58
6.1 Data Sets Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58

6.2 Simulation Environment . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

6.3 Performance Measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61

iii
6.3.1 Statistical Signicance Test . . . . . . . . . . . . . . . . . . . . . 61

6.4 Performance Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . 62

6.4.1 Binary-class Data Sets . . . . . . . . . . . . . . . . . . . . . . . . 62

6.4.2 Multi-category Data Sets . . . . . . . . . . . . . . . . . . . . . . . 64

6.4.3 Statistical Performance Comparison . . . . . . . . . . . . . . . . . 66

6.4.4 10 Random Trial Results . . . . . . . . . . . . . . . . . . . . . . . 69

6.5 Work Flow of Meta-cognitive Strategies . . . . . . . . . . . . . . . . . . . 74

6.6 Study on the Eect of Meta-cognition . . . . . . . . . . . . . . . . . . . . 78

6.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81

7 Alzheimer's Disease Diagnosis using PBL-McRBFN Classier 82


7.1 Literature Review on Alzheimer's Disease . . . . . . . . . . . . . . . . . . 83

7.1.1 Region-of-Interest Approach . . . . . . . . . . . . . . . . . . . . . 84

7.1.2 Whole Brain Morphometric Approach . . . . . . . . . . . . . . . . 85

7.2 Early Diagnosis of Alzheimer's Disease Based on MRI features . . . . . . 86

7.2.1 Materials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88

7.2.2 Voxel Based Morphometry Based Feature Extraction . . . . . . . 90

7.2.3 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . 95

7.2.4 PBL-McRBFN Classier Performance on the OASIS Data Set . . 96

7.2.5 PBL-McRBFN Classication Performance on the ADNI Data Set 98

7.2.6 Generalization Capability of the PBL-McRBFN Classier for the

Detection of AD . . . . . . . . . . . . . . . . . . . . . . . . . . . 101

7.3 Identication of Imaging Biomarkers for AD . . . . . . . . . . . . . . . . 103

7.3.1 Imaging Biomarkers for AD in Complete OASIS Data Set . . . . 105

7.3.2 Imaging Biomarkers for AD Based on Age in OASIS Data Set . . 107

7.3.3 Imaging Biomarkers for AD Based on gender in OASIS Data Set . 110

7.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114

7.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115

iv
8 Parkinson's Disease Diagnosis using PBL-McRBFN Classier 116
8.1 Literature Review on Parkinson's Disease . . . . . . . . . . . . . . . . . . 117

8.2 Materials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119

8.2.1 Microarray Gene Expression Data Set . . . . . . . . . . . . . . . . 119

8.2.2 MRI Data Set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120

8.2.3 Vocal Data Set . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120

8.2.4 Gait Data Set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121

8.3 Early Diagnosis of Parkinson's Disease Based on Gene Expression Features 121

8.3.1 p-value Based Gene Selection . . . . . . . . . . . . . . . . . . . . 121

8.3.2 ICA Based Feature Reduction . . . . . . . . . . . . . . . . . . . . 122

8.3.3 Performance of PBL-McRBFN on ICA Reduced Features from

Complete Genes . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123

8.3.4 Performance of PBL-McRBFN on ICA Reduced Features from Sta-

tistically Selected Genes . . . . . . . . . . . . . . . . . . . . . . . 124

8.4 Early Diagnosis of Parkinson's Disease Based on MRI features . . . . . . 126

8.4.1 VBM Based Feature Extraction . . . . . . . . . . . . . . . . . . . 126

8.4.2 Performance of PBL-McRBFN on VBM Features . . . . . . . . . 127

8.4.3 Performance of PBL-McRBFN on Reduced features . . . . . . . . 129

8.4.4 Identication of Imaging Biomarkers for PD . . . . . . . . . . . . 130

8.5 PD Diagnosis Based on Vocal Features . . . . . . . . . . . . . . . . . . . 133

8.6 PD Diagnosis Based on Gait Features . . . . . . . . . . . . . . . . . . . . 133

8.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136

9 Conclusions and Future Works 137


9.1 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137

9.2 Future Works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141

9.2.1 Plan of Work for McRBFN . . . . . . . . . . . . . . . . . . . . . . 141

9.2.2 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142

Publications 144

References 146

v
Abstract

This research work focuses on the development of meta-cognitive sequential learning al-

gorithms in Radial Basis Function (RBF) network classiers, and their application to

the early diagnosis of neurodegenerative diseases. The important issues in existing se-

quential learning algorithms are proper selection of training samples, nding minimal

network structure and selection of an appropriate learning strategy. In addition, the

random sequence of sample arrival inuences the performance signicantly. It has been

reported in human learning that best learning strategies employ meta-cognition (meta-

cognition means cognition about cognition ) to address fundamental problems of what-to-


learn, when-to-learn and how-to-learn. This thesis develops such meta-cognitive sequen-
tial learning algorithms in RBF network for classication problems. We call a RBF net-

work employing meta-cognitive algorithm as `meta-cognitive RBF network' (McRBFN).

McRBFN is developed based on Nelson and Narens model of meta-cognition for

human learning. Accordingly, McRBFN has two components, namely cognitive and

meta-cognitive components. A RBF network with evolving structure is the cognitive

component and a self-regulatory learning mechanism is its meta-cognitive component.

The meta-cognitive component controls the learning of cognitive component by choosing

suitable learning strategies for each sample. When a new sample is presented, the meta-

cognitive component either deletes the sample or learns the sample or reserves the sample

for future use. Learning includes adding a new neuron or updating the parameters of the

existing neurons using an extended Kalman lter (EKF). The McRBFN using EKF for

parameter updates are referred as `EKF-McRBFN'.

EKF-McRBFN uses computationally intensive EKF based parameter update and does

not utilize the past knowledge stored in the network. Therefore, an ecient Projection

Based Learning (PBL) algorithm for McRBFN referred as PBL-McRBFN has been de-

veloped. When a neuron is added to the cognitive component, the Gaussian parameters

vi
are determined based on the current sample and the output weights are estimated using

the PBL algorithm. When a new neuron is added, existing neurons in the cognitive

component will be used as pseudo-samples in PBL. There-by, the proposed algorithm

exploits the knowledge stored in the network for proper initialization.

The performance of EKF-McRBFN and PBL-McRBFN has been evaluated using a

number of benchmark classication problems. The statistical performance comparisons

on multiple data sets clearly indicate the superior performance of the proposed PBL-

McRBFN and EKF-McRBFN over existing popular classiers. Experimental results also

show that PBL-McRBFN performance is better than EKF-McRBFN classier.

Another signicant contribution of this thesis is in early diagnosis of neurodegenera-

tive diseases. In this thesis, we employed PBL-McRBFN to early diagnosis of Alzheimer's

disease (AD) and Parkinson's disease (PD).

The early diagnosis of AD problem from Magnetic Resonance Imaging (MRI) scans

is formed as a binary classication problem. The performance of the PBL-McRBFN

classier has been evaluated on two well-known open access Open Access Series of Imag-

ing Studies (OASIS) and Alzheimer's disease Neuroimaging Initiative (ADNI) data sets.

Morphometric features are extracted from MRI scans using Voxel-Based Morphometry

(VBM). The study results clearly show that the PBL-RBFN classier produces a better

generalization performance compared to the state-of-the-art AD detection results. Also,

generalization conducted on ADNI data set with PBL-McRBFN classier trained on OA-

SIS data set shows that the proposed PBL-McRBFN can also achieve signicant results

on the unseen data set. Finally, PBL-McRBFN-RFE feature selection approach has been

proposed to detect imaging biomarkers responsible for AD for dierent age groups and

for both genders using OASIS data set.

The early diagnosis of PD problem is also formed as a binary classication problem.

PBL-McRBFN classier is used to predict PD using microarray gene expression data.

Next, PBL-McRBFN classier is used to predict PD from MRI scans. Further, imag-

ing biomarkers responsible for PD are detected with the proposed PBL-McRBFN-RFE

approach based on MRI scans. For completeness, PBL-McRBFN classier is also used

to detect PD from vocal and gait features. From the performance evaluation study, it

is evident that the generalization performance of proposed PBL-McRBFN classier is

better than the state-of-the-art PD detection results.

vii
List of Figures

1.1 Nelson and Narens model of meta-cognition . . . . . . . . . . . . . . . . 2

3.1 Nelson and Narens model of meta-cognition . . . . . . . . . . . . . . . . 24

4.1 (a) Nelson and Narens model of meta-cognition (b) EKF-McRBFN model 27

4.2 Schematic diagram of EKF-McRBFN . . . . . . . . . . . . . . . . . . . . 28

4.3 Cognitive component: RBF network . . . . . . . . . . . . . . . . . . . . . 29

4.4 Schematic representation of training samples corresponding to overlapping/no-

overlapping conditions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

4.5 Error regions of various thresholds in EKF-McRBFN . . . . . . . . . . . 41

6.1 Exemplication of sample deletion strategy in PBL-McRBFN for Image

segmentation data set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75

6.2 Class-wise signicance (a), and instantaneous hinge error with self-regulatory

thresholds (b) in PBL-McRBFN for Image segmentation data set . . . . 76

6.3 History of number of hidden neurons (a), self-regulated addition (b), and

update thresholds (c) in PBL-McRBFN for Image segmentation data set 77

7.1 Schematic diagram of the AD detection using PBL-McRBFN classier . . 87

7.2 Schematic diagram of the stages in feature extraction based on the VBM

analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90

7.3 Results of the unied segmentation and smoothing steps performed on

MRI of an AD patient (from right: sagittal view, coronal view and axial

view) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92

7.4 Maximum intensity projections from OASIS data set - Normal persons vs.

AD patients (a) sagittal view (b) coronal view (c) axial view . . . . . . . 93

viii
7.5 Maximum intensity projections from ADNI data set - Normal persons vs.

AD patients (a) sagittal view (b) coronal view (c) axial view . . . . . . . 93

7.6 Gray matter volume change from OASIS data set - Normal persons vs.

AD patients (a) sagittal view (b) coronal view (c) axial view . . . . . . . 94

7.7 Gray matter volume change from ADNI data set - Normal persons vs. AD

patients (a) sagittal view (b) coronal view (c) axial view . . . . . . . . . 95

7.8 Schematic representation of the generalization capability study of OASIS

trained PBL-McRBFN classier on ADNI data set. . . . . . . . . . . . . 102

7.9 Comparison of gray matter volume change - Normal persons vs. AD pa-

tients from complete OASIS data set . . . . . . . . . . . . . . . . . . . . 106

7.10 Comparison of gray matter volume change - Normal persons vs. AD pa-

tients from 60-69 (a&b), 70-79 (c&d), 80-Above (e&f ) age groups in OASIS

data set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109

7.11 Comparison of gray matter volume change - Normal persons vs. AD pa-

tients from Male-OASIS data set . . . . . . . . . . . . . . . . . . . . . . 112

7.12 Comparison of gray matter volume change - Normal persons vs. AD pa-

tients from Female-OASIS data set . . . . . . . . . . . . . . . . . . . . . 113

8.1 PBL-McRBFN classier on ICA reduced features from: (a) Complete

genes, (b) Selected genes . . . . . . . . . . . . . . . . . . . . . . . . . . . 122

8.2 Schematic diagram of the stages in feature extraction based on the VBM

analysis. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127

8.3 Maximum intensity projections from PPMI MRI data set - Normal persons

vs. PD patients (a) sagittal view (b) coronal view (c) axial view . . . . . 128

8.4 Gray matter volume change from PPMI MRI data set - Normal persons

vs. PD patients (a) sagittal view (b) coronal view (c) axial view . . . . . 129

8.5 Comparison of gray matter volume change - Normal persons vs. PD pa-

tients in Superior temporal gyrus region . . . . . . . . . . . . . . . . . . 132

ix
List of Tables

2.1 Comparison of supervised sequential learning algorithms . . . . . . . . . 20

6.1 Specication of benchmark binary and multi class data sets . . . . . . . . 59

6.2 Performance comparison of PBL-McRBFN, EKF-McRBFN, SRAN, ELM

and SVM on binary class data sets . . . . . . . . . . . . . . . . . . . . . 63

6.3 Performance comparison on multi-category data sets . . . . . . . . . . . 65

6.4 Ranks based on the overall testing eciency ( ηo ) . . . . . . . . . . . . . . 67

6.5 Two-tailed critical values ( F -distribution) for the Friedman test at 95 %

condence level . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67

6.6 Critical values for the Benferroni-Dunn test at 95 % condence level . . . 67

6.7 Ranks based on the average testing eciency ( ηa ) . . . . . . . . . . . . . 69

6.8 PBL-McRBFN, EKF-McRBFN, SRAN, ELM and SVM classiers 10 ran-

dom trial results comparison on binary class data sets . . . . . . . . . . 70

6.9 10 random trial results comparison on multi-category data sets . . . . . . 72

6.10 One-tailed critical values ( F -distribution) for the ANOVA test at 95 %

condence level . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73

t
6.11 Two-tailed critical values ( -distribution) for the Dunnett test at 95 % con-

dence level . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73

6.12 Eect of Meta-cognitive learning principles in the QKLMS algorithm . . 80

7.1 Demographic information of OASIS data used in our study . . . . . . . . 88

7.2 Demographic information of ADNI data used in our study . . . . . . . . 89

7.3 Classication performance of PBL-McRBFN on the OASIS data set . . . 96

7.4 Performance comparison with existing results on the OASIS data set . . 97

7.5 Classication performance of PBL-McRBFN on the ADNI Dataset . . . 99

x
7.6 Performance comparison with existing results on the ADNI data set . . . 100

7.7 Generalization performance of PBL-McRBFN classier on unseen ADNI

samples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101

7.8 VBM detected and PBL-McRBFN-RFE selected regions from complete

OASIS data set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105

7.9 PBL-McRBFN classier performance comparison with VBM detected and

PBL-McRBFN-RFE selected features on complete OASIS data set . . . . 105

7.10 Generalization performance of PBL-McRBFN classier on unseen ADNI

samples with selected 906 features . . . . . . . . . . . . . . . . . . . . . . 106

7.11 VBM detected and PBL-McRBFN-RFE selected regions from age-wise

OASIS data sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108

7.12 PBL-McRBFN classier performance comparison with VBM detected and

PBL-McRBFN-RFE selected features on age-wise OASIS data sets . . . . 108

7.13 VBM detected and PBL-McRBFN-RFE selected regions from male-OASIS

data set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111

7.14 PBL-McRBFN classier performance comparison with VBM detected and

PBL-McRBFN-RFE selected features on male-OASIS data set . . . . . . 111

7.15 VBM detected and PBL-McRBFN-RFE selected regions from female-OASIS

data set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112

7.16 PBL-McRBFN classier performance comparison with VBM detected and

PBL-McRBFN-RFE selected features on female-OASIS data set . . . . . 113

8.1 Demographic information of PPMI MRI data used in our study . . . . . 120

8.2 Performance comparison on complete gene expression data set from an

average of 10 trials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123

8.3 Performance comparison on selected gene expression data set with p-value
< 0.05 from an average of 10 trials . . . . . . . . . . . . . . . . . . . . . 124

8.4 Performance comparison on selected gene expression data set with p-value
< 0.01 from an average of 10 trials . . . . . . . . . . . . . . . . . . . . . 125

8.5 Performance comparison on 2981 VBM features data set from an average

of 10 trials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128

xi
8.6 Performance comparison on ICA reduced features data sets from an aver-

age of 10 trials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130

8.7 VBM detected and PBL-McRBFN-RFE selected regions responsible for PD131

8.8 PBL-McRBFN classier performance on VBM detected and PBL-McRBFN-

RFE selected features from an average of 10 trials . . . . . . . . . . . . . 132

8.9 Performance comparison on vocal data set from an average of 10 trials . 133

8.10 PBL-McRBFN classier performance comparison with studies in the lit-

erature on vocal data set . . . . . . . . . . . . . . . . . . . . . . . . . . . 134

8.11 Performance comparison on gait data set from an average of 10 trials . . 135

8.12 PBL-McRBFN classier performance comparison with studies in the lit-

erature using gait patterns . . . . . . . . . . . . . . . . . . . . . . . . . . 135

xii
List of Abbreviations
RBF Radial Basis Function

EKF Extended Kalman Filter

RAN Resource Allocation Network

RANEKF Resource Allocation Network-Extended Kalman Filter

MRAN Minimal Resource Allocation Network

EMRAN Extended Minimum Resource Allocation Network

GAP-RBFN Growing And Pruning Radial Basis Function Network

GGAP-RBFN Generalized Growing And Pruning Radial Basis Function Network

FGAP-RBFN Fast Growing And Pruning Radial Basis Function Network

SRAN Self-adaptive Resource Allocation Network

SMC-RBFN Sequential Multi-Category Radial Basis Function Network

ELM Extreme Learning Machine

OS-ELM On-line Sequential Extreme Learning Machine

I-ELM Incremental Extreme Learning Machine

CI-ELM Convex Incremental Extreme Learning Machine

EI-ELM Enhanced Incremental Extreme Learning Machine

xiii
SNN Spiking Neural Networks

SVM Support Vector Machine

SIDSVM Single Incremental Decremental Support Vector Machine

MIDSVM Multiple Incremental Decremental Support Vector Machine

LMS Least Mean Squares

RKHS Reproducing Kernel Hilbert Space

KLMS Kernel Least Mean Square

KLMS-CG Kernel Least Mean Square with Constrained Growth

QKLMS Quantized Kernel Least Mean Square

QKLMS-FB Quantized Kernel Least Mean Square with Fixed-Budget

McRBFN Meta-cognitive Radial Basis Function Network

EKF-McRBFN Extended Kalman Filter based Meta-cognitive Radial

Basis Function Network

PBL Projection Based Learning

PBL-McRBFN Projection Based Learning Meta-cognitive Radial Basis Network

RFE Recursive Feature Selection

PBL-McRBFN- Projection Based Learning Meta-cognitive Radial Basis Network


RFE
with Recursive Feature Selection

AD Alzheimer's disease

PD Parkinson's disease

xiv
MCI Mild Cognitive Impairment

MRI Magnetic Resonance Imaging

VBM Voxel-Based Morphometry

SPM Statistical Parametric Mapping

LD Liver Disorders

PIMA Pima Indian diabetes

BC Breast Cancer

HEART Heart disease

ION Ionosphere

IRIS Iris classication

IS Image segmentation

WINE Wine determination

AE Acoustic Emission classication

VC Vehicle Classication

GI Glass Identication

GCM Global Cancer Mapping using micro-array gene expression

LETTER Letter recognition

SI Satellite Image classication

LAND Landsat Satellite

ANOVA Analysis of Variance

xv
CT Computed Tomography

SPECT Single-Photon Emission Computed Tomography

PET Positron Emission Tomography

TCS Transcranial Brain Sonography

CDR Clinical Dementia Rating

MMSE Mini-Mental State Examination

MPRAGE Magnetization-Prepared Rapid-Acquisition Gradient Echo

ADNI Alzheimer's Disease Neuroimaging Initiative

OASIS Open Access Series of Imaging Studies

PPMI Parkinson's Progression Markers Initiative

xvi
Chapter 1
Introduction

1.1 Motivation
Over the past decade, a number of supervised learning algorithms have been developed

for pattern classication applications. Articial neural networks have been widely used in

the elds of pattern classication and they show their advantages over other methods in

their learning, generalization and adaptation capability as well as their unique power for

nonlinear mapping. In most of the practical applications especially in medical diagnosis,

the complete training data describing the input-output relationship is not available a

priori. For these problems, classical batch-learning algorithms are rather infeasible and

instead sequential learning is employed [1].

In a sequential learning framework, the training samples arrive one-by-one and the

samples are discarded after the learning process. Hence, it requires less memory and com-

putational time during the learning process. In addition, sequential learning algorithms

automatically determine the architecture that can accurately approximate the true de-

cision function described by a stream of training samples. Samples from data stream do

not follow the same static underlying distribution. Traditionally all sequential learning

algorithms uses all training samples for learning and does not regulate the learning pro-

cess. This inspires us to study human learning and develop learning algorithm which

mimic the best learning strategy in neural networks.

In the families of articial neural networks, Radial Basis Function (RBF) neural net-

works have been extensively used in a sequential learning framework due to its universal

approximation ability and simplicity of architecture. Hence, in this thesis, we consider

1
Chapter 1. Introduction

RBF network for classication problems. Many sequential learning algorithms in RBF

framework are available in the literature to solve classication problems and detailed

review on these algorithms is provided in chapter 2.

On the other hand, educational psychologists have studied human learning for years

and suggested that the learning process is eective when the learners adopt self-regulation

in learning process using meta-cognition [2, 3, 4]. Cognition is dened as a group of

symbolic mental activities and mental representations that includes attention, memory,

producing and understanding language, learning, reasoning, problem solving, and deci-

sion making. Meta-cognition means cognition about cognition . The term meta-cognition

was rst coined by Flavell [5]. He dened meta-cognition as the thoughts about one's
own thoughts process and cognitions . Precisely the learner should control the learning

process, by planning and selecting learning strategies and monitor their progress by an-

alyzing the eectiveness of the proposed learning strategies [6]. When necessary, these

strategies should be appropriately adapted. Meta-cognition present in human-being pro-

vides a means to address what-to-learn, when-to-learn and how-to-learn, i.e., the ability

to identify the specic piece of required knowledge, judge when to start and stop learning

by emphasizing best learning strategy.

There are several meta-cognition models available in human psychology and a brief

survey of various meta-cognition models is reported in [7]. Among the various models,

the model proposed by Nelson and Narens in [8] is simple and clearly highlights the

various actions in human meta-cognition as shown in Fig. 1.1.

} Meta−cognitive component

Monitoring Control
} Flow of information

} Cognitive component

Figure 1.1: Nelson and Narens model of meta-cognition

2
Chapter 1. Introduction

Nelson and Narens [8] model has two components, the cognitive component and the

meta-cognitive component. The information ow from the cognitive component to meta-

cognitive component is considered monitoring, while the information ow in the reverse

direction is considered control. The basic notion underlying control is that the meta-

cognitive component modies the cognitive component based on these signals. The in-

formation owing from the meta-cognitive component to the cognitive component either

changes the state of the cognitive component or changes the cognitive component itself.

Monitoring informs the meta-cognitive component about the state of cognitive com-

ponent, thus continuously updating the meta-cognitive component's model of cognitive

component, including, `no change in state'.

In neural networks, the current state-of-the algorithms address pure cognitive aspect

of human learning inspired from human brain, the concept of self-regulated learning using

meta-cognition is not exploited. Self-regulated learning refers to the ability of a learning

system to decide what-to-learn when-to-learn and how-to-learn. The current state-of-

the-art neural network algorithms address only the how-to-learn component of human

learning. In the neural networks eld, a few algorithms have been developed which

address some of the meta-cognitive learning aspects [9, 10]. One of the rst works in

neural networks to deal with meta-cognition is Self-adaptive Resource Allocation Network

(SRAN) [9]. SRAN is a sequential learning algorithm and also addresses the what-to-
learn component of meta-cognition by selecting signicant samples using misclassication
error and hinge loss error. The complex version of above algorithm is Complex-valued

Self-regulating Resource Allocation Network (CSRAN) [10]. It has been shown in [9, 10],

that selecting signicant samples and removing repetitive samples in learning helps to

improve the generalization performance. Therefore, it is apparent that emulating the

three components of meta-cognition with suitable learning strategies would improve the

generalization ability of a neural network. The drawbacks in the above algorithms are:

• The selection of signicant samples from stream of training data are based on simple

error criterion which is not sucient.

• The allocation of new hidden neuron center without considering the amount of

overlap with already existing neuron centers leads to misclassication.

3
Chapter 1. Introduction

• Knowledge gained from past trained samples is not utilized in further learning.

• These algorithms use computationally intensive Extended Kalman Filter (EKF) for

parameter update.

Hence, there is a need to develop a learning algorithm which automatically selects

appropriate samples for learning and adopt best learning strategy to learn them accu-

rately. This thesis shall deal with the development of meta-cognitive sequential learning

algorithm for RBF network classiers which overcomes the above drawbacks. Also, the

interaction between meta-cognitive and cognitive components and their inuence on RBF

network learning will be dealt with, accordingly. We evaluate the performance of the de-

veloped classier and compare with existing classiers. Also, we use proposed classier

to detect neurodegenerative diseases.

Another important motivation of the thesis is to identify imaging biomarkers for

early diagnosis of neurodegenerative diseases. Neurodegenerative diseases are generally

considered as a group of diseases that seriously and progressively impair the functions

of the nervous system through selective neuronal vulnerability of specic brain regions.

Depending on their type, neurodegenerative diseases can be serious or life-threatening

and most of them have no cure. The goal of treatment for such diseases is usually to

improve symptoms, relieve pain and increase mobility. In modern world, management and

the development of computerized systems of patient care for chronic neurodegenerative

diseases is critical due to growing data sets. Also, the sample imbalance in the classes of

neurodegenerative problems makes machine learning techniques dicult to solve using

them. Alzheimer's disease (AD) is the most common neurodegenerative disease and is

one of the leading cause of death worldwide with the associated estimated cost of care

exceeding $200 billion annually [11]. It is estimated that there are more than 5.4 million

Americans and about 30 million people worldwide suering from AD, with the number

expected to increase dramatically as the global population ages. Today, AD remains the

largest unmet medical need in neurology, with the disease expected to aict 100 million

by 2050.

Parkinson's disease (PD) is the second most common neurodegenerative disease, after

AD. It is estimated that there are more than 1 million Americans and about 10 million

4
Chapter 1. Introduction

people worldwide living with PD. Incidence of PD increases with age, but an estimated

four percent of people with PD are diagnosed before the age of 50 [12].

The early diagnosis of most common neurodegenerative diseases particularly AD and

PD using non-invasive brain imaging techniques such as Magnetic Resonance Imaging

(MRI) will help to slow down the progress of diseases. MRI is the most important brain

imaging procedure that provides accurate information about the shape and volume of the

brain. MRI accurately monitors and identies the tissue volume changes at all anatomical

regions in the brain. MRI helps to detect AD at an early stage - before irreversible

damage has been done [13]. As with the increased number of elderly people, there will

be many cases of AD and PD, the databases of clinical study of these diseases are also

increasing. Also, identifying the most relevant and meaningful imaging biomarkers with

a predictive power for neurodegenerative diseases detection is important [14]. Hence,

there is a need to develop algorithms which can handle these growing medical data sets

and identication of imaging biomarkers to predict these neurodegenerative diseases in

the early stages. This thesis shall also deal with handling these growing medical data

sets for detection of AD and PD, and identication of imaging biomarkers responsible

for AD and PD, which is a challenging task for the machine learning community.

1.2 Objectives
The main aim of this research work is to develop a generic framework of human meta-

cognitive learning mechanism in RBF network architecture. The research discusses and

evaluates the inter-relationship between the meta-cognitive knowledge, control and mon-

itory signals such that the learning in RBF is ecient, this in turn improve the perfor-

mance of RBF classier signicantly. The main research objectives could be summarized

as follows:

• Sequential Meta-cognitive Learning Algorithm for Radial Basis Function


Network: Develop a meta-cognitive learning algorithm for a RBF network based

on generic framework of meta-cognition proposed by Nelson and Narens model to

handle data one by one and only one. Aspects of meta-cognitive monitoring such as

ease-of-learning, judgement-of-learning and feeling-of-knowing need to be explored

5
Chapter 1. Introduction

in a machine learning framework. The major objectives of meta-cognitive sequen-

tial learning algorithm are fast and ecient learning, and must use past knowledge

properly for self-regulation of learning. RBF networks is chosen due to its localiza-

tion property of Gaussian function, and widely used in classication problems. The

algorithm must also handle growing data sets of neurodegnerative disease particu-

larly AD and PD. To handle the growing data sets it must handle the samples in

sequential mode with less computation eort and also network architecture must

be adaptive to the change in data distribution.

• Alzheimer's Disease Diagnosis Problem : AD is the most common neurode-

genetative disease. Early diagnosis of AD using non-invasive brain imaging methods

plays a major role in providing treatment that may slow down its progress. MRI is

the most important brain imaging procedure that provides accurate information of

the brain with high spatial resolution and can detect minute abnormalities. Hence,

early diagnosis of AD using MRI scans is another objective of this research work.

This objective is two-folds, rst is AD classication using MRI scans with accurate

prediction rate and second is selection of relevant imaging biomarkers for AD using

MRI scans. The wrapper based feature selection method which depends on the

classication algorithm to detect imaging biomarkers for AD is needed. In medical

literature, it is reported that gender and age may be an important modifying factor

in AD's development and expression. Hence, nding imaging biomarkers for AD

for dierent age groups and genders is another objective.

• Parkinson's Disease Diagnosis Problem : PD is the second most common neu-

rodegenerative disease, after AD. In literature, PD diagnosis using machine learning

methods had been reported using vocal and gait features. Research studies on PD

discovered that, early diagnosis of PD using vocal and gait features is impossible be-

cause tremor and slow movements develop in PD patients only after approximately

70% of vulnerable dopaminergic neurons in the substantia nigra have already died.

Recent studies on gene expression analysis found that there is a profound change

in gene expression for individuals aected by PD. Also, early diagnosis of PD using

non-invasive brain imaging techniques such as MRI is needed. In literature, ma-

chine learning techniques were not employed to early diagnosis of PD using gene

6
Chapter 1. Introduction

expression and MRI features. Hence, some other objectives of this research work

are

 Early diagnosis of PD using gene expression features.

 Early diagnosis of PD and identication of imaging biomarkers for PD using

MRI scans.

1.3 Thesis Contributions


This thesis brings several contributions to the eld of neural networks and early diagnosis

of neurodegenerative diseases. Major contributions of this thesis are categorized into two

parts, (I) Algorithm part and (II) Application part. First, we highlight contributions in

algorithm part and then present contributions in applications part.

I. Meta-cognitive Radial Basis Function Network


To incorporate human meta-cognitive principles in neural networks, we implemented a

meta-cognitive radial basis function network based on Nelson and Narens model. If a RBF

network analyzes its cognitive process and chooses suitable learning strategies adaptively

to improve its cognitive process then it is referred to as `Meta-cognitive Radial Basis

Function Network' (McRBFN). McRBFN has two components, namely the cognitive

component and the meta-cognitive component. A RBF network with evolving structure

is the fundamental building block of the cognitive component. The meta-cognitive com-

ponent devices sample deletion, neuron growth, parameter update and sample reserve

strategies which directly addresses the basic principles of self-regulated human learning

(i.e., what-to-learn, when-to-learn and how-to-learn ). The meta-cognitive part controls

the sequential learning process by selecting one of the above learning strategies for the

new training sample. The strategies are also adapted to accommodate coarse knowledge

rst followed by ne tuning. Sample delete strategy removes redundant sample to avoid

over-training.

(1) EKF Based Meta-cognitive Radial Basis Function Network : Extended Kalman

Filter (EKF) based sequential learning algorithm has been proposed for McRBFN

7
Chapter 1. Introduction

referred as EKF-McRBFN for classication problems. The novel contributions from

this algorithm are listed below

• Hinge loss error function is used for better estimation of the posterior probabil-

ity.

• Spherical potential is used to calculate the class-wise signicance of sample.

• Overlapping criteria are introduced to initialize the new hidden neuron param-

eters.

• EKF is used to estimate the network parameters.

(2) Projection Based Learning Meta-cognitive Radial Basis Function Net-


work: Above EKF based sequential algorithm uses computationally EKF parameter
update and does not utilize the past knowledge stored in the network. Hence, less

computationally intensive Projection Based Learning (PBL) sequential algorithm has

been proposed for McRBFN referred as PBL-McRBFN for classication problems.

• The ecient PBL algorithm is implemented based on the principle of minimiza-

tion of hinge error function and nds the optimal network output parameters

for which the error function is minimum.

• Existing hidden neurons in the network in used as pseudosamples to exploit the

past knowledge.

II. Early Diagnosis of Neurodegenerative Diseases


Neurodegenerative diseases, of which AD and PD are the most common, take an enor-

mous toll on aected patients and their families. In the application part, the proposed

PBL-McRBFN classier is used in the early diagnosis of AD based on MRI scans. Also,

PBL-McRBFN classier is used in the early diagnosis of PD based on micro-array gene

expression features and MRI scans.

3. Alzheimer's Disease Diagnosis Problem : PBL-McRBFN classier is used to

solve one of the most common neurodegenerative AD diagnosis problem. The

early diagnosis of AD problem from MRI scans formed as a binary classication

8
Chapter 1. Introduction

problem. For this, morphometric features are extracted from MRI scans using

Voxel-Based Morphometry (VBM). VBM is one of the widely used, fully automated,

whole brain morphometric analysis. VBM is based on the Statistical Parametric

Mapping (SPM) method, often employed for the investigation of tissue volume

changes between the brains MRI scans of the diseased group versus the normal

persons. We have used VBM analysis to identify probability of the gray matter

in a given voxel, where a voxel is dened as a volume element representing the

intensity of a point in a three-dimensional space. In this study, the contributions

are two-fold

• PBL-McRBFN classier is used to predict AD from morphometric features

set obtained from the VBM analysis.

• Recursive Feature Selection (RFE) is incorporated in PBL-McRBFN and pro-

posed feature selection scheme to identify critical imaging biomarkers relevant

to AD using MRI scans called PBL-McRBFN-RFE.

The imaging biomarkers identied using PBL-McRBF-RFE are in the parahip-

pocampal gyrus, the hippocampus, the superior temporal gyrus, the insula, the

precentral gyrus and the extra-nuclear regions. Next, the PBL-McRBFN-RFE has

also been used to identify imaging biomarkers for AD from the OASIS gender-wise

and age-wise analysis. In medical literature [15, 16], it is reported that age and

gender may be important modifying factors in AD's development and expression.

To verify this, we conducted imaging biomarkers detection analysis based on age

and gender.

The results from the imaging biomarkers detection analysis based on age are:

• In the 60-69 age group AD patients, gray matter atrophy is observed in the

superior temporal gyrus region which is responsible for processing sounds.

• In the 70-79 age group AD patients, gray matter atrophy is observed in the

parahippocampal gyrus and the extra-nuclear regions which are responsible

for memory encoding and retrieval.

9
Chapter 1. Introduction

• In the 80-89 age group AD patients, gray matter atrophy is observed in the

hippocampus, the parahippocampal gyrus and the lateral ventrical regions

which are responsible for short-term memory to long-term memory, spatial

navigation, memory encoding and retrieval.

The results from the imaging biomarkers detection analysis based on gender are:

• In male AD patients, gray matter atrophy is observed in the insula region

which is responsible for emotion and consciousness.

• In female AD patients, gray matter atrophy is observed in the parahippocam-

pal gyrus and the extra-nuclear regions which are responsible for memory

encoding and retrieval.

4. Parkinson's Disease Diagnosis Problem : Another most common neurodegen-

erative PD diagnosis problem has been handled by employing PBL-McRBFN clas-

sier. The early diagnosis of PD problem is also formed as a binary classication

problem. In this study, the contributions are

• PBL-McRBFN classier is used to predict PD using microarray gene expres-

sion features obtained from genes with dierent signicance levels.

• PBL-McRBFN classier is used to predict PD from morphometric features set

obtained from MRI scans using VBM analysis. We also used PBL-McRBFN-

RFE approach to identify critical imaging biomarkers relevant to PD using

MRI scans. The superior temporal gyrus brain region detected by PBL-

McRBFN-RFE imaging biomarkers detection analysis may play more signi-

cant role than others in PD.

• PBL-McRBFN classier is also used to detect PD using the standard vocal

and gait PD data sets.

1.4 Thesis Organization


The thesis is organized as follows:

10
Chapter 1. Introduction

• In chapter 2, a literature review on the existing supervised sequential learning algo-


rithms for neural networks and motivation for meta-cognitive learning is presented.

In here, sequential learning algorithms are classied based on the framework: er-

ror driven, neuron signicance, extreme learning machine, spiking neural networks,

incremental-decremental SVM algorithms, kernel least mean square and sequen-

tial classication algorithms. A brief review of each class of sequential learning

algorithms is presented.

• In chapter 3, an overview of the human meta-cognition including the main con-

cepts of meta-cognition and models of meta-cognition available in literature are

discussed. Next, motivation of meta-cognitive learning is explained.

• In Chapter 4, the EKF-McRBFN is proposed and its sequential learning algo-

rithm for classication problems is presented in detail. A self-regulatory sequential

learning mechanism which decides what-to-learn, when-to-learn and how-to-learn


eciently is presented. In EKF-McRBFN algorithm, sample delete strategy ad-

dress the what-to-learn by deleting insignicant samples from data stream, neu-

ron growth strategy and parameters update strategy address the how-to-learn by

which the cognitive component learns from the samples, and self-adaptive nature

of meta-cognitive thresholds in addition to the sample reserve strategy address the

when-to-learn by presenting the samples in the learning process according to the

knowledge present in the sample.

• In Chapter 5, an ecient sequential PBL-McRBFN classier is presented. PBL-

McRBFN classier uses computationally less intensive PBL algorithm. PBL al-

gorithm accurately estimates the output weight by direct minimization of error.

PBL-McRBFN allows the network to `reuse' the knowledge gained from the past

samples in new hidden neuron parameters initialization and output weights esti-

mations.

• In Chapter 6, the performance of two developed algorithms EKF-McRBFN and

PBL-McRBFN is evaluated on real-world benchmark binary and multi-category

classication problems with a wide range of imbalance factor. The performance

11
Chapter 1. Introduction

of proposed EKF-McRBFN and PBL-McRBFN algorithms is compared with the

best performing sequential learning algorithm reported in the literature (SRAN)

[17], batch ELM [18] and standard Support Vector Machine (SVM). A quantitative

performance analysis, based on number of samples used in training, number of

hidden neurons and average/overall testing eciency is performed. A qualitative

performance study based on Friedman test is also conducted.

• In Chapter 7, PBL-McRBFN classier developed in the earlier chapter is em-

ployed to solve one of the most common neurodegenerative AD diagnosis problem

from MRI scans. In here, morphometric features are extracted from MRI scans us-

ing VBM. The performance of the PBL-McRBFN classier has been evaluated on

two well-known open access OASIS and ADNI data sets. The performance of PBL-

McRBFN classier has been compared with other state-of-the-art methods in the

literature in AD diagnosis problem using these data sets. Next, we have demon-

strated the generalization capability of the PBL-McRBFN classier by training

with the OASIS data set being tested on the unseen ADNI data set. Finally, we

have proposed PBL-McRBFN-RFE approach to identify the imaging biomarkers

for AD on complete, dierent age group and gender OASIS data sets.

• In Chapter 8, PBL-McRBFN classier is employed to solve another common

neurodegenerative PD diagnosis problem. In here, diagnosis of PD using PBL-

McRBFN classier with microarray gene expression, MRI, vocal and gait features

are presented. The obtained complete microarray gene expression data set consists

of 22283 genes expression information from subjects. Since the complete gene ex-

pression data set consists of large number of redundant genes, we also conducted

PD prediction using PBL-McRBFN classier on the most informative genes with

p -value selection at values less than 0.05/0.01. The quantitative performance com-

parison of PBL-McRBFN with SVM classier has been presented in PD diagnosis

problem. Further, proposed PBl-McRBFN-RFE approach is used to identify the

imaging biomarkers for PD using MRI scans.

• Chapter 9 summarizes the conclusions and provides directions for future research

work.

12
Chapter 2
Literature Review on Sequential
Learning Algorithms in Neural
Networks

In this chapter, key concepts in sequential learning algorithm for neural networks are

reviewed. The literature review on neurodegenerative diseases is provided in chapters

7 and 8. The review discusses briey on dierent types of learning models in neural

networks to handle sequential data.

In a sequential learning framework, the training samples arrive one-by-one and the

samples are discarded after the learning process. Hence, it requires less memory and com-

putational time during the learning process. In addition, sequential learning algorithms

automatically determine the minimal architecture that can accurately approximate the

decision function described by a stream of training samples. In the families of neural

networks, Radial Basis Function (RBF) neural networks have been extensively used in a

sequential learning framework due to its universal approximation ability and simplicity

of architecture. Hence, in this thesis, we consider radial basis function neural network

for classication problems. Many sequential learning algorithms in radial basis function

framework are available in the literature to solve classication problems and depending

on training method of the network and the structure, these learning algorithms could

be broadly classied as belonging to one the these: error driven algorithms, neuron sig-

nicance based algorithms, extreme learning machine based algorithms, spiking neural

networks algorithms, kernel least mean square based algorithms, and sequential classica-

13
Chapter 2. Literature Review

tion algorithms. These are discussed in detail next, along with incremental-decremental

SVM Algorithms.

2.1 Sequential Learning Algorithms


2.1.1 Error Driven Algorithms
Resource Allocation Network (RAN) [19] is the rst sequential learning algorithm intro-

duced in the literature. RAN evolves the network architecture required to approximate

the true function using novelty based neuron growth criterion. In RAN, the novelty of

a sample is determined based on the error and distance to the nearest neuron. If the

novelty criterion is satised then a new hidden neuron is added to the network otherwise

network parameters are updated using Least Mean Squares (LMS) algorithm.

Enhancement of RAN, known as RAN Extended Kalman Filter (RANEKF) algorithm

has been proposed in [20]. In RANEKF, extended Kalman lter (EKF) is used rather

than the LMS algorithm for updating the network parameters.

One drawback of both RAN and RANEKF is that once a hidden neuron is added,

it can never be removed. Thus, both RAN and RANEKF could produce networks in

which some hidden neurons, although active initially, may subsequently end up con-

tributing little to the network output [21]. To overcome the above drawback, Minimal

Resource Allocation Network (MRAN) algorithm has been proposed in [21]. In MRAN,

the RANEKF is augmented with pruning strategy. The pruning strategy in MRAN re-

moves those hidden neurons that consistently make little contribution to the network

output. MRAN uses a sliding window of training samples in the growing and pruning

criteria to identify the hidden neurons that contribute relatively little to the network

output. Selection of the appropriate sizes for these windows critically depends on the

distribution of the training samples.

MRAN updates all network parameters, the EKF require storage of huge covariance

matrix and inverse of the same. Hence, computational eort and memory requirements in

training phase are quite high. To overcome the above weakness for real-time implemen-

tation, an algorithm has been proposed called Extended Minimum Resource Allocation

Network (EMRAN) in [22]. EMRAN is an improved version of the MRAN. In EMRAN,

14
Chapter 2. Literature Review

a `winner neuron' strategy is incorporated. In EMRAN, winner neuron is dened as

that hidden neuron in the network which is closest to the training sample. The main

contribution of the EMRAN algorithm is that, in every step, only those parameters that

are related to the winner neuron are updated by the EKF algorithm.

2.1.2 Neuron Signicance Based Algorithms


In [23], neuron signicance with respect to input distribution is used as a criterion for

growing and pruning the RBF network architecture, and is called Growing And Pruning

RBFN (GAP-RBFN). A new hidden neuron is added if its signicance exceeds a threshold

value, whereas existing hidden neurons are pruned if their signicance is less than certain

threshold. In GAP-RBFN algorithm, signicance of a neuron is dened based on the

contribution made by that neuron to the network output averaged over all the training

samples received so far and this requires the knowledge of the input data distribution.

The signicance of neuron is calculated with simplied piecewise linear approximation to

the Gaussian function to reduce the computational eort. Similar to EMRAN, in GAP-

RBFN also only nearest hidden neuron parameters are updated by EKF algorithm.

In GAP-RBFN algorithm, neuron signicance is calculated based on the assumption

that the training samples are uniformly distributed. If the training samples distribution

is non-uniform then the performance of GAP-RBFN will be aected. To overcome the

above drawback, Generalized version of GAP-RBFN (GGAP-RBFN) algorithm has been

presented in [24]. The GGAP-RBFN algorithm can be used for any arbitrary input

distribution.

The performance of GAP-RBFN algorithm for classication problems has been evalu-

ated in [25], improvements to GAP-RBFN for enhancing its performance in both accuracy

and training speed for classication problems has been presented in Fast GAP-RBFN

(FGAP-RBFN) algorithm [25]. In FGAP-RBFN algorithm Decoupled EKF (DEKF) is

used, whereas EKF is used in MRAN and GAP-RBFN algorithms for network parame-

ters update. EKF requires more computational eorts and large memory for large input

dimensional problems. On the other hand, DEKF only considers the pair wise interdepen-

dence of the parameters from the same decouple group, rather than the interdependence

of all the parameters in the network. When the number of hidden neurons becomes large,

15
Chapter 2. Literature Review

DEKF results in a signicant reduction in computational cost per training sample and

in storage requirements for error covariance matrix.

2.1.3 Extreme Learning Machine Based Algorithms


Extreme Learning Machine (ELM) [26] is a well known fast learning neural network

paradigm. It is a batch learning algorithm for a single-hidden layer feed forward neural

network. ELM randomly chooses input weights and analytically determines the output

weights using minimum norm least-squares. A complete survey of research works in ELM

framework are presented in [27]. Sequential version of ELM using recursive least squares

has been presented in [28], referred as an On-line Sequential Extreme Learning Machine

(OS-ELM). OS-ELM can handle samples one-by-one or chunk-by-chunk with varying

chunk size. In the OS-ELM algorithm, the input weights are selected randomly and

output weights are calculated analytically using the least square error. For sequential

learning, the output weights are updated using recursive least. In the output weight

calculations, OS-ELM uses a small chunk of initial training data. The small chunk of

initial training data aects the training performance of OS-ELM. In case of sparse and

imbalance data sets, the random selection of input weights with xed number of hidden

neurons in the OS-ELM aects the performance signicantly as shown in [29].

Incremental versions of ELM have been proposed in [30, 31, 32]. In [30], Incremental

ELM (I-ELM) algorithm is presented. In I-ELM, every time only one hidden neuron

is randomly generated and added to the existing network. However, I-ELM does not

recalculate the output weights of all the existing hidden neurons when a new hidden

neuron is added. To improve the convergence of I-ELM, Convex Incremental ELM (CI-

ELM) algorithm is presented is [31]. In CI-ELM, the convergence rate of I-ELM is

improved by recalculating the output weights of the existing hidden neurons based on

a convex optimization method when a new hidden neuron is randomly added. CI-ELM

obtains a faster convergence rate and more compact network architecture while retaining

the I-ELM's simplicity and eciency. The performance of I-ELM is further improved in

Enhanced I-ELM algorithm (EI-ELM) [32]. In EI-ELM, every time k hidden neurons

are randomly generated and among the k randomly generated hidden neurons only the

16
Chapter 2. Literature Review

most appropriate hidden neuron is added to the existing network. However I-ELM, CI-

ELM and EI-ELM requires complete training data so these algorithms cannot handle

sequential data.

2.1.4 Spiking Neural Networks Algorithms


Spiking Neural Networks (SNN) known as the third generation of neural network mod-

els, are more closely related to their biological neurons compared to classical articial

neural networks from the previous generations. A sequential learning algorithm in SNN

framework has been presented in [33] for a four layer hierarchical neural network of

two-dimensional integrate-and-re neuronal maps. In this algorithm, the training is per-

formed through synaptic plasticity and adaptive network structure. An event driven

approach is used to optimize computation speed in order to simulate networks with large

number of spiking neurons. Another sequential learning algorithm for SNN and its ap-

plication to taste recognition has been presented in [34]. This algorithm is developed

based on integrate-and-re neurons with rank order coded inputs. The inuence of in-

formation encoding in a population of spiking neurons on the performance of SNN was

also explored.

There are number of unaddressed issues in above sequential algorithms for SNN [33,

34], such as ne tuning of learning parameters, automatic update of learning parameters

in continuously changing environments (as these are set manually), improving learning

speed for large size data sets, and the eect of handling imbalanced data sets on the

training performance.

In [35], a novel self-adaptation system has been presented to train a real mobile robot

for optimal navigation in dynamic environments by training a number of SNNs having

the Spike Timing Dependent Plasticity (STDP) property. The spike response model is

used and the trained SNNs are stored in a tree-type memory structure that are used

as experiences for the robot to enhance its navigation ability in new and previously

trained environments. The memory was designed to have a simple searching mechanism.

Forgetting and online dynamic clustering techniques are used in order to control the

memory size. The system used the minimum network structure required for performing

obstacle avoidance task and its synaptic weights are changed online. However, more

17
Chapter 2. Literature Review

experimental data needs to be collected in order to demonstrate the robot navigation

ability in a dynamic oce environment. The complete review on sequential learning

algorithms for SNN is presented in [36].

2.1.5 Incremental-Decremental SVM Algorithms


Support Vector Machines (SVM) [37] is widely used algorithm to solve classication

problems. A sequential learning version in support vector machine framework is called

an incremental SVM has been presented in [38]. Incremental and decremental SVM

algorithm referred as single incremental decremental SVM (SIDSVM) has been presented

in [39]. It uses an on-line recursive algorithm for training SVMs, and it handles one

sample at a time by retaining Karush-Kuhn-Tucker conditions on all previously seen

training samples, while `adiabatically' adding a new training samples to the solution.

This approach is adapted to other variants of kernel machines in [38, 40].

The drawback of SIDSVM algorithm is when multiple training samples are added/removed

it will repeat the updating operation for each single training sample. It often requires

higher computational cost for real-time implementation. To overcome the above draw-

back, a multiple incremental decremental SVM (MIDSVM) algorithm has been proposed

in [41]. MIDSVM algorithm is developed based on multi-parametric programming in

the optimization literature [42]. Here, multiple samples are added or removed simul-

taneously and is faster than the conventional incremental decremental support vector

machines presented in [39].

2.1.6 Kernel Least Mean Square Based Algorithms


Kernel least mean square based learning algorithms are also candidates for on-line kernel

based learning other than SVM based learning algorithms. One of the rst Kernel Least

Mean Square (KLMS) algorithm is presented in [43]. In KLMS algorithm, LMS has

been extended to a Reproducing Kernel Hilbert Space (RKHS), resulting in an adaptive

lter built from a weighted sum of kernel functions evaluated at each incoming data

sample. The mean square convergence study of the KLMS algorithm is presented in [44].

The major drawback of KLMS algorithm is with time, the size of the lter as well as

the computational eort and requirement of memory increases. To overcome the above

18
Chapter 2. Literature Review

drawback an ecient method to constrain the increase in the length of RBF is proposed

in [45] referred as KLMS with Constrained Growth (KLMS-CG). KLMS-CG algorithm

uses sequential Gaussian elimination method to test the linear dependency of the each

new sample feature vector with all the previous samples feature vectors. The extended

and quantized versions of KLMS algorithm are presented in [46, 47, 48, 49, 50, 51, 52].

A nonparametric information theoretic approach based KLMS using surprise criterion is

proposed in [51], here surprise quanties the amount of information a sample contains

given a system state. Quantized KLMS (QKLMS) algorithm is proposed in [49], which is

based on simple online vector quantization method. In QKLMS, quantization is applied

to compress the input (or feature) space of the kernel adaptive lters so as to control the

growth of the RBF structure. In Fixed-Budget QKLMS (QKLMS-FB) [52], a pruning

and growing strategy is proposed based on signicance measure to constrain its network

size.

2.1.7 Sequential Classication Algorithms


In literature, some algorithms developed to solve only classication problems. Sequential

Multi-Category Radial Basis Function network (SMC-RBFN) [53] algorithm has been

developed exclusively to solve classication objectives. As the training samples are pre-

sented, SMC-RBFN adds the new hidden neurons or updates the network parameters.

In the growth criterion, SMC-RBFN considers the similarity measure within class, mis-

classication rate and prediction error. SMC-RBFN uses the hinge loss function instead

of the mean square loss function for a more accurate estimate of the posterior probabil-

ity. New hidden neuron parameters are allocated similar to RAN algorithm. Otherwise,

network parameters of nearest hidden neuron of the same class are updated using DEKF.

Another sequential classication algorithm has been presented in Self-adaptive Re-

source Allocation Network (SRAN) [9]. As each training sample is presented to the

SRAN, based on the sample hinge error, the sample is either used for network training

(growing/update) immediately or pushed to the rear end of the stack for learning in

future or deleted from the data set. SRAN uses self-adaptive error based control pa-

rameters to identify reduced training sample sequence with signicant information and

removes the less signicant samples (which are similar to the stored knowledge in the

19
Chapter 2. Literature Review

Table 2.1: Comparison of supervised sequential learning algorithms

Algorithms Architecture Activation Features Complexity Sample


Function Selection

RAN Self-adaptive Gaussian Novelty based Least mean square NO


neuron growth method
RANEKF Self-adaptive Gaussian EKF parameter EKF parameter NO
update update
MRAN Self-adaptive Gaussian Pruning EKF parameter NO
strategy update
EMRAN Self-adaptive Gaussian Winner neuron EKF parameter NO
strategy update
GAP-RBFN Self-adaptive Gaussian Neuron EKF parameter NO
signicance concept update
GGAP-RBFN Self-adaptive Gaussian Any arbitrary input EKF parameter NO
sampling distribution update
FGAP-RBFN Self-adaptive Gaussian Decoupled EKF Decoupled EKF NO
parameter update parameter update
OS-ELM Fixed Additive, Learn data one-by-one Random selection of NO
Radial or chunk-by-chunk neuron parameters
SMC-RBFN Self-adaptive Gaussian Hinge loss function EKF parameter NO
update
SRAN Self-adaptive Gaussian Sequence alteration EKF parameter YES
update
SIDSVM Self-adaptive Kernels On-line SVM learning, Parametric NO
Adiabatic increments programming
MIDSVM Self-adaptive RBF kernel Handle multiple Mutli-parametric NO
samples at a time programming
KLMS Self-adaptive RBF kernel LMS extended to a Stochastic gradient NO
RKHS in RKHS
QKLMS Self-adaptive RBF kernel Quantized Stochastic gradient NO
feature space in RKHS
QKLMS-FB Self-adaptive RBF kernel Signicance measure Stochastic gradient NO
based growing and pruning in RKHS

network) to avoid overtraining. In growth/update criterion, SRAN considers explicit

misclassication error and hinge loss error. New hidden neuron parameters are allocated

similar to other sequential algorithms and all the network parameters are updated using

EKF. Otherwise, the sample is pushed to the rear end of the stack, to be presented to

the network in future. These reserved samples can be used to ne-tune the network

parameters.

The Table 2.1 summarizes the key dierence among the all the sequential learning

algorithms discussed above.

20
Chapter 2. Literature Review

2.2 Summary
In this chapter, we presented the dierent sequential learning algorithms for neural net-

works. We then categorized sequential learning algorithms as: error driven algorithms,

neuron signicance based algorithms, ELM based algorithms, spiking neural networks

algorithms, KLMS based algorithms, and sequential classication algorithms are dis-

cussed along with incremental decremental SVM Algorithms. All the sequential learning

algorithms for neural networks presented in literature addresses the technique used to

learn the information contained in the training sample, eciently. But they do not self-

regulate their learning. In literature, it has been shown that a self-regulated learner

using meta-cognition is the best learner. In next chapter, we give an overview of human

meta-cognitive learning and introduce a literature review of models of meta-cognition

which encompasses meta-cognitive knowledge and self-regulation.

21
Chapter 3
An Overview on Meta-cognition

In the previous chapter, a complete literature survey on sequential learning algorithms for

neural networks was presented. Existing sequential learning algorithms for radial basis

function neural networks use all the samples in the training data set to gain knowledge

about the information contained in the samples. In other words, they possess information-

processing abilities of humans, including perception, learning, remembering, judging, and

problem-solving, and these abilities are cognitive in nature. However, recent studies on

human learning have revealed that the learning process is eective when the learners

adopt self-regulation in learning process using meta-cognition [3, 4]. Meta-cognition

means cognition about cognition . In a meta-cognitive framework, human-beings think

about their cognitive processes, develop new strategies to improve their cognitive skills

and evaluate the information contained in their memory. This chapter gives an overview

of human meta-cognition and motivation for meta-cognitive learning. First we dene

important concepts relevant to meta-cognition. Next, we present a brief review on models

of meta-cognition in literature. Finally, we give motivation for meta-cognitive learning

for a radial basis function neural network.

3.1 Denitions of Important Concepts in Meta-cognition


• Cognition : The mental process of knowing, including aspects such as awareness,

perception, reasoning, and judgment.

• Meta-cognition : The term meta-cognition is dened in [5] as `one's knowledge

concerning one's own cognitive processes or anything related to them'. The recent

22
Chapter 3. metacognition

denition of meta-cognition is given in [54] as the awareness and knowledge of one's

mental processes such that one can monitor, regulate, and direct them to a desired

goal.

• Major concepts of meta-cognition : The three major concepts of meta-cognition

that have been investigated extensively are: meta-cognitive knowledge, meta-cognitive

monitoring, and meta-cognitive control. The denitions of these terms are as follow:

 Meta-cognitive Knowledge : Dened as declarative knowledge about cognition.

Declarative knowledge is composed of facts, beliefs, and episodes that can be

stated and used to access conscious awareness [55].

 Meta-cognitive Monitoring : Dened as assessing the current state of a cog-

nitive activity. Such as, judging whether you are approaching the correct

solution to a problem and assessing how well you understand? What you are

reading? [5].

 Meta-cognitive Control : Dened as regulating ongoing cognitive activity. Such

as, stop the process, continue it or change the process [5].

3.2 Models of Meta-cognition


There are several meta-cognition models available in human physiology and a brief survey

of various meta-cognition models are reported in [7]. Among the various models, the

model proposed by Nelson and Narens in [8] is simple and clearly highlights the various

actions in human meta-cognition as shown in Fig. 3.1. The model is analogous to

the meta-cognition in human-beings and has two components, the cognitive component

and the meta-cognitive component. The information ow from the cognitive component

to meta-cognitive component is considered monitoring, while the information ow in

the reverse direction is considered control. The information owing from the meta-

cognitive component to the cognitive component either changes the state of the cognitive

component or changes the cognitive component itself. Monitoring informs the meta-

cognitive component about the state of cognitive component, thus continuously updating

the meta-cognitive component's model of cognitive component, including, `no change in

state'.

23
Chapter 3. metacognition

} Meta−cognitive component

Monitoring Control
} Flow of information

} Cognitive component

Figure 3.1: Nelson and Narens model of meta-cognition

3.3 Motivation for Meta-cognitive Learning


Recent studies in human learning suggested that the learning process is eective when

the learners adopt self-regulation in learning process using meta-cognition [3, 4]. In the

meta-cognitive learning, learner controls the learning process by planning and selecting

learning strategies and monitors their progress by analyzing the eectiveness of the pro-

posed learning strategies. When necessary, these strategies should be adapted appropri-

ately. Meta-cognition present in human-being provides a means to address what-to-learn,


when-to-learn and how-to-learn, i.e., the ability to identify the specic piece of required

knowledge, judge when to start and stop learning by emphasizing best learning strategy.

Hence, there is a need to develop a meta-cognitive neural network classier that is capa-

ble of deciding what-to-learn, when-to-learn and how-to-learn the decision function from

the training data by emulating the human self-regulated learning.

In the existing sequential learning algorithms, Self-adaptive Resource Allocation Net-

work (SRAN) [9] address the what-to-learn component of meta-cognition by selecting

signicant samples using misclassication error and hinge loss error. It has been shown

that the selecting appropriate samples for learning and removing repetitive samples help

in improving the generalization performance. Therefore, it is evident that emulating

the three components of human learning with suitable learning strategies would improve

the generalization ability of a neural network. The drawbacks in the existing sequential

learning algorithms are: a) The samples for training are selected based on simple error

criterion which is not sucient to address the signicance of samples; b) The new hidden

24
Chapter 3. metacognition

neuron center is allocated independently which may overlap with already existed neuron

centers leading to misclassication; c) Knowledge gained from past samples is not used;

and d) Uses computationally intensive parameter update.

In this thesis, to overcome the above drawbacks we develop a meta-cognitive learning

algorithm for a radial basis function neural network based on generic framework of meta-

cognition proposed by Nelson and Narens.

3.4 Summary
In this chapter, we presented an overview of meta-cognition including the denitions

of major concepts. Next, the models of meta-cognitive learning are reviewed. Finally,

motivation for meta-cognitive learning in neural networks framework is explained. In

this thesis, a radial basis function neural network which can self-regulate its learning

based on its meta-knowledge is termed as a meta-cognitive radial basis function neural

network. In the next chapter, we introduce such a meta-cognitive radial basis function

network and present its sequential learning algorithm to perform classication tasks.

25
Chapter 4
Meta-cognitive Radial Basis Function
Network and Its EKF Based Sequential
Learning Algorithm for Classication
Problems

4.1 Introduction
In the previous chapter, an overview of human meta-cognition, models of meta-cognition

and motivation for meta-cognitive learning in neural network framework was presented.

In a meta-cognitive framework, human-beings think about their cognitive processes, de-

velop new strategies to improve their cognitive skills and evaluate the information con-

tained in their memory. If a radial basis function network analyzes its cognitive process

and chooses suitable learning strategies adaptively to improve its cognitive process then it

is referred to as `Meta-cognitive Radial Basis Function Network' (McRBFN). This chap-

ter focuses on the development of McRBFN and its Extended Kalman Filter (EKF) based

sequential learning algorithm. First, we dene the classication problem in sequential

framework. Next, we present the learning algorithm.

4.2 Classication Problem Denition


The classication problem in a sequential learning framework can be dened as:

Given stream of training data samples, {(x1 , c1 ) , · · · , (xt , ct ) , · · · }, where xt = [xt1 , · · · , xtm ]T ∈
<m is the m-dimensional input of the tth sample, and ct ∈ (1, n) is its class label. Where n

26
Chapter 4. EKF-McRBFN

Meta−cognition Meta−cognitive Component

Monitoring Control Predicted Best learning


Output Strategy

Cognition Cognitive Component


(RBF Neural Network)

(a) (b)

Figure 4.1: (a) Nelson and Narens model of meta-cognition (b) EKF-McRBFN model

  T 
is the total number of classes. The coded class labels yt = y1t , · · · , yjt , · · · , ynt ∈ <n
are given by:
1 if ct = j

yjt = j = 1, · · · , n (4.1)
−1 otherwise
The objective of a classier is to approximate the underlying decision function that maps

xt ∈ <m → yt ∈ <n .

4.3 EKF-McRBFN Classier


In this section, we present the architecture of the EKF based Meta-cognitive Radial Basis

Function Network (EKF-McRBFN) classier and its working principles. EKF-McRBFN

architecture is developed based on the Nelson and Narens meta-cognition model [8]. Fig.

4.1(a) shows the Nelson and Narens meta-cognition model which is analogous to the

meta-cognition in human-beings and has two components, a cognitive component and

a meta-cognitive component. The information ow from the cognitive component to

meta-cognitive component is considered monitoring, while the information ow in the

reverse direction is considered control. EKF-McRBFN is developed based on the Nelson

and Narens meta-cognition model as shown in Fig. 4.1(b). Similar to Nelson and Narens

model, EKF-McRBFN has two components as shown in schematic diagram in Fig. 4.2,

namely the cognitive component and the meta-cognitive component. The cognitive com-

ponent of EKF-McRBFN is a three layered feed forward radial basis function network

with Gaussian activation function in the hidden layer as shown in Fig. 4.3. The meta-

cognitive component contains copy of the cognitive component. When a new training

sample arrives, the meta-cognitive component of EKF-McRBFN predicts the class label

27
Chapter 4. EKF-McRBFN

Meta−cognitive component

Dynamic Model Knowledge Measures Strategies

00
11
00
11 Predicted class label Sample Delete
000
111
00
11 00
11
1
0000
111
00
11
000
111 00
11
0
1 00
11
000
1110
1 0
1
00
11
0
1
0
1 00
110
100
11 Confidence of classifier Neuron Growth
0111
1000
00
11
000
111
00
11
0
1
0
1
0
1
00
11
0
1
0
1
0
1000
111
00
110
100
11
0
1
000
111
001
11
0111
1 000
11
0
1 Parameter Update
000
00
11 00
11
011
100
Maximum hinge error
000
111
00
11 00
11
000
111
00 11
11 00 Class−wise significance Sample Reserve

Monitoring Control
(Best learning strategy)
Cognitive component
New sample
( x t, c t )
h(x t ) w11
t
x1t 1 y^1
Σ
Data stream
(x t, yt ) x2t h(x t ) ct
^
2
yt
^
Decision
Device

y^nt
xmt Σ

h (x t ) wΚn
K

Figure 4.2: Schematic diagram of EKF-McRBFN

and estimates the knowledge present in the new training sample with respect to the cog-

nitive component. Based on this information, the meta-cognitive component selects a

suitable learning strategy, for the current sample. Thereby, addressing the three funda-

mental issues in learning process: a) what-to-learn b) when-to-learn and c) how-to-learn.


Meta-cognitive component is a regulatory system, which helps the adaptive cognitive

component to learn the input-output relationship eciently. It is similar to feedback

system, where the meta-cognition provides appropriate learning strategies based on the

monitory signals from the cognitive component.

EKF-McRBFN begins with zero hidden neuron and selects suitable strategy for each

sample to achieve the objective. First, we present the cognitive component and next we

highlight various learning strategies of the meta-cognitive component.

4.3.1 Cognitive Component of EKF-McRBFN


The cognitive component of EKF-McRBFN is a three layered feed forward radial basis

function network. The input layer maps all features to the hidden layer without doing any

28
Chapter 4. EKF-McRBFN

h(x t ) w11
t
x1t 1 y^1
Σ

x2t h(x t )
2

ynt
^
t
xm Σ

h (x t ) wΚn
K

Figure 4.3: Cognitive component: RBF network

transformation, the hidden layer employs Gaussian activation function and the output

layer uses a linear activation function as shown in Fig. 4.3.

Without loss of generality, we assume that the meta-cognitive learning algorithm

builds K Gaussian neurons from t−1 training samples. For given training sample xt ,
t
 T
the predicted output ( ŷ = ŷ1t , · · · , ŷjt , · · · , ŷnt ) of EKF-McRBFN classier with K
hidden neurons is
K
X
ybjt = wkj htk , j = 1, · · · , n (4.2)
k=1

where wkj is the weight connecting the k th hidden neuron to the j th output neuron and

htk is the response of the k th


hidden neuron to the input x t
is given by

kxt − µlk k2
 
htk = exp − , k = 1, · · · , K (4.3)
(σkl )2
where µlk ∈ <m is the center and σkl ∈ <+ is the width of the k th hidden neuron. Here,

the superscript l represents the corresponding class of the hidden neuron.

l
The objective is to estimate the number of hidden neurons (K), neuron centers ( µk ),
l
widths (σk ) and output weights (w) of the network, such that network approximates the

decision function accurately.

4.3.2 Meta-cognitive Component of EKF-McRBFN


The meta-cognitive component contains a dynamic model of the cognitive component,

knowledge measures and self-regulated thresholds. During the learning process, meta-

29
Chapter 4. EKF-McRBFN

cognitive component monitors the cognitive component and updates its dynamic model
th
of the cognitive component. When a new training sample ( t ) sample is presented to the

EKF-McRBFN, the meta-cognitive component of EKF-McRBFN estimates the knowl-

edge present in the new training sample with respect to the cognitive component using

ct ),
its knowledge measures. The meta-cognitive component uses predicted class label ( b
t
b(ct |xt )) and class-wise signicance
maximum hinge error ( E ), condence of classier ( p

(ψc ) as the measures of knowledge in the new training sample. Self-regulated thresholds

are adapted to capture the knowledge presented in the new training sample. Using the

knowledge measures and self-regulated thresholds, the meta-cognitive component con-

structs two sample based learning strategies and two neuron based learning strategies.

One of these strategies is selected for the new training sample such that the cognitive

component learn them accurately and achieves better generalization performance.

The meta-cognitive component knowledge measures are dened as shown below:

Predicted class label (bct ): bt ), the predicted class label (b


Using the predicted output ( y ct )
can be obtained as

ct = arg
b max ybjt (4.4)
j∈1,··· ,n

where n is the total number of classes.

Maximum hinge error (E t ): The objective of the classier is to minimize the error be-

bt ) and actual output ( yt ). In classication problems, it has


tween the predicted output ( y

been shown in [56, 57] that the classier developed using hinge loss error estimates the

posterior probability more accurately than the classier developed using mean square er-
  T 
ror. Hence, in EKF-McRBFN, we use the hinge loss error et = et1 , · · · , etj , · · · , etn ∈
<n dened as
yjt ybjt > 1

0 if
etj = j = 1, · · · , n (4.5)
yj − ybjt
t
otherwise

where yjt is the actual output and ybjt is the predicted output at the j th -neuron for tth -
sample.
t
The maximum absolute hinge error ( E ) is given by


E t = max etj (4.6)
j∈1,2,··· ,n

30
Chapter 4. EKF-McRBFN

where etj is the hinge error at the j th -neuron for tth -sample.

Condence of classier (pb(ct |xt )): The condence level of classication or predicted

posterior probability is given as

min(1, max(−1, ybjt )) + 1


pb(ct |xt ) = , j = ct (4.7)
2
where ybjt is the predicted output at the j th -neuron and actual class label for tth -sample.

Class-wise signicance (ψc ): t


In general, the input feature ( x ) is mapped on to a hyper-

dimensional spherical feature space S using K Gaussian neurons, i.e., xt → φ(xt ).


Where φ(xt ) t
is input feature ( x ) in feature space. Therefore, all φ(xt ) lie on a hyper-

dimensional sphere as shown in [58]. The knowledge or spherical potential of any sample

in original space is expressed as a squared distance from the hyper-dimensional mapping

S [59].

In EKF-McRBFN, the center ( µ) and width (σ ) of the Gaussian neurons describe

the feature space S. Let the center of the K -dimensional feature space be φ0 =
1 PK
K k=1 φ(µk ). The knowledge present in the new data xt can be expressed as the poten-
tial of the data in the original space, which is squared distance from the K -dimensional
feature space to the center φ0 . The potential ( ψ ) is given as

ψ = ||φ(xt ) − φ0 ||2 (4.8)

As shown in [59], the above equation can be expressed as

K K
t t
2 X t
1 X
, µlk h µlk , µlr
  
ψ = h x ,x − h x + (4.9)
K k=1
K2 k,r=1
 
Where, Gaussian kernel h xt , µlk is expressed as exp −kxt − µlk k2 /(σkl )2 . From
t
the above equation, we can see that for Gaussian function the rst term ( h (x , xt )) and
1 PK 
last term (
K2 k,r=1 h µlk , µlr ) are constants. Since potential is a measure of novelty,

these constants may be discarded and the potential can be reduced to

K
2 X
h xt , µlk

ψ≈− (4.10)
K k=1

31
Chapter 4. EKF-McRBFN

Since we are addressing classication problems, the class-wise distribution plays a

vital role and it will inuence the performance of the classier signicantly [53]. Hence,

we use the measure of the spherical potential of the new training sample xt belonging

to class c with respect to the neurons associated to same class (i.e., l = c). Let Kc be

the number of neurons associated with the class c, then class-wise spherical potential or

class-wise signicance ( ψc ) is dened as

K c
1 X
h xt , µck

ψc = (4.11)
Kc k=1

We can observe that negative sign and constant 1/2 in Eq. 4.10 is removed in Eq. 4.11,

because the measure of spherical potential is not eected. Also note that the measure

of the spherical potential is dierent from the potential function method referred in [60].

The spherical potential explicitly indicates the knowledge contained in the sample, a

higher value of spherical potential (close to one) indicates that the sample is similar

to the existing knowledge in the cognitive component and a smaller value of spherical

potential (close to zero) indicates that the sample is novel.

4.3.2.1 Learning Strategies

Meta-cognitive component devices various learning strategies using the knowledge mea-

sures and self-regulated thresholds, which directly addresses the basic principles of self-

regulated human learning (i.e., what-to-learn, when-to-learn and how-to-learn ). The

meta-cognitive part controls the learning process in cognitive component by selecting

one of the following four learning strategies for the new training sample.

• Sample delete strategy : If the new training sample contains information similar

to the knowledge present in the cognitive component, then delete the new training

sample from the training data set without using it in the learning process.

• Neuron growth strategy : Use the new training sample to add a new hidden

neuron in the cognitive component. During neuron addition, sample overlapping

conditions are identied to allocate a new hidden neuron appropriately.

• Parameter update strategy : The new training sample is used to update the

parameters of the cognitive component. EKF is used to update the parameters.

32
Chapter 4. EKF-McRBFN

• Sample reserve strategy : The new training sample contains some information

but not signicant, they can be used at later stage of the learning process for ne

tuning the parameters of the cognitive component. These samples may be discarded

without learning or used for ne tuning the cognitive component parameters in a

later stage.

Most of the existing sequential learning algorithms address only neuron addition/pruning

and parameter update. In case of proposed EKF-McRBFN classier, these learning

strategies help to achieve the best human learning ability. Also, these strategies are

adapted such that it suits to the current training sample. Since, the meta-cognitive

component addresses what-to-learn, when-to-learn and how-to-learn, it improves the gen-


eralization ability of the cognitive component.

The principle behind these four learning strategies is described in detail below:

• Sample delete strategy : Prevents similar samples from being learnt, which

avoids over-training and reduces the computational eort. When the predicted

class label of the new training sample is same as the actual class label and the con-

dence level (estimated posterior probability) is greater than expected value then

the new training sample does not provide additional information to the classier

and can be deleted from training sequence without being used in learning process.

The sample delete criterion is given by

cbt == ct AND p̂(ct |xt ) ≥ βd (4.12)

The deletion threshold ( βd ) controls the number of samples participating in the

learning process. If one selects βd close to 1 then no sample will be deleted and

all the training samples participates in the learning process which results in over-

training with similar samples. Reducing βd beyond the desired accuracy results

in deletion of too many samples from the training sequence. But, the resultant

network may not satisfy the desired accuracy. Hence, it is xed at the expected

accuracy level. In our simulation studies, it is selected in the range of [0.9 - 0.95].

The sample deletion strategy prevents learning of samples with similar information,

and thereby, avoids over-training and reduces the computational eort.

33
Chapter 4. EKF-McRBFN

• Neuron growth strategy : When the new training sample contains signicant

information and the estimated class label is dierent from the actual class label

then one need to add new hidden neuron to capture the knowledge. The neuron

growth criterion is given by

cbt 6= ct AND ψc (xt ) ≤ βc AND E t ≥ βa (4.13)

where βc is the knowledge threshold and βa is the addition threshold. The thresh-

olds βc and βa allows samples with signicant knowledge for learning rst then uses
the other samples for ne tuning. If βc is chosen closer to zero and the initial value

of βa is chosen closer to the maximum value of hinge error (i.e., 2 because class

labels are coded to -1 or 1), then very few neurons will be added to the network.

Such a network will not approximate the function properly. If βc is chosen closer

to one and the initial value of βa is chosen closer to the minimum value of hinge

error, then the resultant network may contain many neurons with poor generaliza-

tion ability. Hence, the range for the knowledge threshold can be selected in the

interval [0.3 - 0.7] and the range for the initial value of addition threshold can be

selected in the interval [1.3 - 1.7].

The βa is adapted as follows

βat = δβat−1 + (1 − δ)E t (4.14)

where δ is the slope that controls rate of self-adaptation and is set close to 1.

βa allows samples with signicant knowledge for learning rst then uses the other

samples for ne tuning. The hypothesis behind Eq. (4.14) is as the learning process

progresses the network uses samples with higher hinge error than initial for neuron

growth.

If growth criterion given in Eq. (4.13) is satised, then a new hidden neuron K +1
is added and its parameters are initialized as explained below. Existing learning

algorithms in the literature do not consider overlapping and distinct cluster criterion

in assigning the new neuron parameters. However, the overlapping condition will

signicantly inuence the performance. The new training sample may have overlap

34
Chapter 4. EKF-McRBFN

with other classes or will be from a distinct cluster far away from the nearest neuron

in the same class. Hence, EKF-McRBFN measures inter/intra class nearest neuron

distances from the current sample in assigning the new neuron parameters. The

Let nrS be the nearest hidden neuron in the intra-class (i.e, l == c) with center

c c
(µnrS ) and width (σnrS ) parameters, and nrI be the nearest hidden neuron in

l l
the inter-class (i.e, l 6= c) with center ( µnrI ) and width (σnrI ) parameters. They

are dened as

nrS = arg min kxt − µlk k (4.15)


l==c;∀k

nrI = arg min kxt − µlk k (4.16)


l6=c;∀k

Let the Euclidian distances between the new training sample to nrS and nrI be

given as follows

dtS = ||xt − µcnrS || (4.17)

dtI = ||xt − µlnrI || (4.18)

Using the nearest neuron distances, we can determine the overlapping/no-overlapping

conditions as in four categories, Fig. 4.4 shows pictorially the distribution of intra-

class (same class), inter-class (dierent class) and four dierent samples for each

overlapping condition:

 Distinct sample : when a new training sample is far away from both intra/inter

class nearest neurons (dtS >> σnrS


c
AND dtI >> σnrI
l
) then the new

training sample does not overlap with any class cluster, and is forms a new

distinct cluster. In Fig. 4.4, square symbol represents this case of training

c c
sample. In this case, the new hidden neuron center ( µK+1 ), width (σK+1 )

and weight (wK+1 ) parameters are determined as

p
µcK+1 = xt ; σK+1
c
= max(0.00001, κ xt T xt ); wK+1 = et (4.19)

where κ is a positive constant which controls the overlap of the responses of

the hidden units in the input space, which lies in the range 0.5 ≤ κ ≤ 1.

35
Chapter 4. EKF-McRBFN

Distinct sample

Intra−class Inter−class

No−overlapping sample Significant overlapping sample

Minimum overlapping sample

Figure 4.4: Schematic representation of training samples corresponding to


overlapping/no-overlapping conditions

 No-overlapping : When a new training sample is close to the intra-class near-

est neuron then the sample does not overlap with the other classes, i.e., the

intra/inter class distance ratio is less than 1, then the sample does not overlap

with the other classes. In Fig. 4.4, plus symbol represents this case of training
c c
sample. In this case, the new hidden neuron center ( µK+1 ), width (σK+1 )

and weight (wK+1 ) parameters are determined as

µcK+1 = xt ; σK+1
c
= κkxt − µcnrS k; wK+1 = et (4.20)

 Minimum overlapping with the inter-class : when a new training sample is close

to the inter-class nearest neuron compared to the intra-class nearest neuron,

i.e., the intra/inter class distance ratio is in range 1 to 1.5, then the sample

has minimum overlapping with the other class. In Fig. 4.4, cross symbol

represents this case of training sample. In this case, the center of the new

hidden neuron is shifted away from the inter-class nearest neuron and shifted

towards the intra-class nearest neuron, and is initialized as

µcK+1 = xt + ζ(µcnrS − µlnrI ); σK+1


c
= κkµcK+1 − µcnrS k (4.21)

36
Chapter 4. EKF-McRBFN

where ζ is center shift factor which determines how much center has to be

shifted from the new training sample location. It lies in range [0.01-0.1].

In this case, the center new hidden neuron shifted from the position of new

training sample, the weight parameter of the new hidden neuron is calculated

as

wK+1 = et /htK+1 (4.22)

where
!
kxt − µcK+1 k2
htk+1 = exp − c
(4.23)
(σK+1 )2

 Signicant overlapping with the inter-class : When a new training sample is

very close to the inter-class nearest neuron compared to the intra-class nearest

neuron, i.e., the intra/inter class distance ratio is more than 1.5, then the

sample has signicant overlapping with the other class. In Fig. 4.4, triangle

symbol represents this case of training sample. In this case, the center of the

new hidden neuron is shifted away from the inter-class nearest neuron and is

initialized as

µcK+1 = xt − ζ(µlnrI − xt ); σK+1


c
= κkµcK+1 − µlnrI k; (4.24)

The weight parameter of new hidden neuron is calculated as given in Eq.

(4.22).

The above mentioned center and width determination conditions help in minimizing

the misclassication in EKF-McRBFN classier.

• Network parameters update strategy : Cognitive component parameters


l T
 l l l

(α = w1 , µ , σ1 , · · · , wK , µK , σK ) are updated if the following criterion is

satised.

cbt == ct AND E t ≥ βu (4.25)

where βu is the update threshold. If βu is chosen closer to 50% of maximum

hinge error (i.e., 1), then very few samples will be used for adapting the network

37
Chapter 4. EKF-McRBFN

parameters and most of the samples will be pushed to the end of the training

sequence. The resultant network will not accurately approximate the function. If

a lower value is chosen, then all samples will be used in updating the network

parameters without altering the training sequence. Hence, the range for the initial

value of update threshold can be selected in the interval [0.4 - 0.7]. The βu is

adapted based on the prediction error as:

βut = δβut−1 + (1 − δ)E t (4.26)

where δ is the slope that controls the rate of self-adaption of update threshold and

is set close to 1. The advantage of self-adaptive thresholds is that, they help in

selecting the samples for adding as a hidden neuron or to update parameters.

EKF-McRBFN uses extended Kalman lter to update the cognitive component

parameters

αt = αt−1 + Gt et (4.27)

where et is the error obtained from the hinge loss function for tth sample and

Gt ∈ <z×n is the Kalman gain matrix given by:

−1
Gt = Pt Bt R + (Bt )T Pt Bt

(4.28)

where z = K(m + n + 1), R = r0 In×n is the variance of the measurement

noise, Pt ∈ <z×z is the error covariance matrix, Bt is the partial derivatives for

the output with respect to the parameters ( α) given by

" 2w1 t 2w1


#T
ht1 In×n , ht1 (σ t l T t l 2
l 2 (x − µ1 ) , h1 (σ l )3 kx − µ1 k · · · ,
1)
Bt = 2wK
1
t 2wK (4.29)
htK In×n , htK (σ t l T t l
l )2 (x − µK ) , hK (σ l )3 kx − µK k
2
K K

The error covariance matrix is updated by

Pt+1 = Iz×z − Gt (Bt )T Pt + q0 Iz×z


 
(4.30)

The addition of articial process noise ( q0 ) helps in avoiding convergence to local

minima.

38
Chapter 4. EKF-McRBFN

When a new hidden neuron is added, the dimensionality of error covariance matrix

Pt is increased to

Pt−1
 
z×z 0z×(m+n+1)
(4.31)
0(m+n+1)×z p0 I(m+n+1)×(m+n+1)

where I is the identity matrix, 0 is the zero matrix and p0 is the initial estimated

uncertainty.

• Sample reserve strategy : If the new training sample does not satisfy either

the deletion or the neuron growth or the cognitive component parameters update

criterion, then the sample is pushed to the rear of the training sequence. Since,

EKF-McRBFN modies the strategies based on current sample knowledge, these

samples may be used in later stage.

Ideally, training process stops when no further sample is available in the data

stream. However, in real-time, stopping criterion is when samples in the reserve

remain same.

4.4 EKF-McRBFN Algorithm


To summarize, the EKF-McRBFN algorithm in a pseudo code form is given in Pseudo

code 1:

In EKF-McRBFN, sample delete strategy addresses the what-to-learn by deleting

insignicant samples from data stream, neuron growth strategy and parameters update

strategy addresses the how-to-learn by which the cognitive component learns from the

samples, and self-adaptive nature of meta-cognitive thresholds in addition to the sample

reserve strategy addresses the when-to-learn by presenting the samples in the learning

process according to the knowledge present in the sample.

4.4.1 Guidelines for EKF-McRBFN Thresholds Initialization


In this section, we explain the inuence of βc , βd , β a and βu thresholds on the perfor-

mance of the EKF-McRBFN and provide some guidelines for their initialization.

39
Chapter 4. EKF-McRBFN

Pseudocode 1 Pseudo code for the EKF-McRBFN classication algorithm.


Input : Present the training data one-by-one to the network
from data stream.
Output : Decision function that estimates the relationship
between feature space and class label.
START
Initialization : Assign the first sample as the first neuron ( K=1).
The parameters of the neuron are chosen as shown in Eq. (4.19).
For each training sample (xt , yt )
DO
Meta-cognitive component computes the signicance of the sample with
respect to the cognitive component :
Compute the cognitive component output b yt using Eq. (4.2).
Find the predicted class label cbt , maximum hinge error E t ,
confidence of classifier p(c b t | xt ) and class-wise significance ψc
using Eqs.(4.4),(4.6), (4.7) and (4.11).
Based on above calculated measures the meta-cognitive
component selects one of the following strategies :
Sample delete strategy :
IF cbt == ct AND p̂(ct |xt ) ≥ βd THEN
Delete the sample from the sequence without learning.
Neuron growth strategy :
ELSEIF cbt 6= ct AND ψc ( xt ) ≤ βc AND E t ≥ βa THEN
Add a neuron to the network ( K = K+1).
Choose the parameters of the new hidden neuron using
Eqs. (4.19) to (4.24). Update the addition threshold according
to Eq. (4.14) and increase the dimensionality of P.
Parameters update strategy :
ELSEIF cbt == ct AND E t ≥ βu THEN
Update the parameters of the cognitive component using EKF
according to Eq. (4.27). Update the update threshold according
to Eq. (4.26).
Sample reserve strategy :
ELSE
The current sample x t , yt is pushed to the rear
end of the sample stack to be used in future. They can be
later used to fine-tune the cognitive component parameters.
ENDIF
The cognitive component executes the above selected strategy.
ENDDO
END

40
Chapter 4. EKF-McRBFN

Hinge error (Et )

Neuron
Growing
Region
Misclassification
Region
1.3

1
Parameters
Updating
Region

0.4 Correct Classification


Region

0.2 Sample Deleting


Region

0 0.1 0.2 0.4 0.7 1.3 1.7 2


Thresholds
β βa
u
1 0.95 0.9 0

βd

Figure 4.5: Error regions of various thresholds in EKF-McRBFN

The knowledge threshold ( βc ) helps in identifying novelty of current sample and

it depends on spherical potential, the range of spherical potential is between 0-1. If

spherical potential is close to 1 then sample is similar to existing knowledge, lesser value

of spherical potential indicates that the sample is novel. If one selects the threshold βc
close to zero then the network does not allow addition of neurons. Similarly, if one selects

the threshold βc close to one then all samples will be identied as novel sample. Hence,

βc can be selected in the range [0.3-0.7].

The deletion threshold ( βd ) prevents over training by removing samples predicted

accurately with high condences. The self-regulated addition threshold ( βa ) and update

threshold (βu ) are used to select appropriate samples for ecient learning. βd , βa and βu
t t
thresholds depends on hinge error ( E ). Note that the hinge error E is between [0, 2].
The characteristics of thresholds and their inuence on EKF-McRBFN performance can

be explained by dividing the error region into three sub regions, namely, the sample

deleting region, the parameter updating region, and the neuron growing region, as shown

in Fig. 4.5.

If condence of classier (estimated posterior probability) is greater than βd and

41
Chapter 4. EKF-McRBFN

bt ) is same as actual class label then the current sample is similar


predicted class label ( c

to existing knowledge in the cognitive component. In that case the current sample is

deleted from the training sequence without being used in the learning process. Condence
t
of classier decreases from 1 to 0 as hinge error ( E ) increases from 0 to 2 as shown in Fig.

4.5. Suppose one selects the deletion threshold to be 0.5 then many samples satisfying the

condition will be deleted without being used in learning. Hence, the resultant classier

may not provide better generalization performance. If one selects close to 1, say 0.99 then

most of the similar samples will be used in learning, which will result in over training.

Hence, the βd can be selected in the range [0.9-0.95].

The addition threshold βa is combined with other conditions, which measure mis-

classication and knowledge. The minimum possible prediction error one can get when

there is a misclassication is 1. Hence, βa should be greater than 1. If one selects the

threshold βa close to 1 then all samples misclassied will be used for neuron addition. If

one selects the threshold βa close to 2 then very few neurons will be added and the re-

sultant network may not approximate the decision surface. Note that the meta-cognitive

component adapts addition threshold such that new hidden neuron added for a sample

with higher error. Hence, the initial value of addition threshold ( βa ) can be selected in

the range [1.3-1.7].

EKF-McRBFN updates the parameters when the predicted class is accurate and the
t
hinge error (E ) is greater than βu . When predicted class label is accurate then value

of Et is in between 0 to 1. If one selects the threshold βu close to 1 then no sample will

be used in updating and hence the resultant network does not approximate the decision

function. If one selects the threshold βu close to 0 then all samples will be used for

updating. EKF-McRBFN updates the parameters using samples, which produces higher

error. Hence, the initial value of update threshold ( βu ) can be selected in the range

[0.4-0.7].

4.5 Summary
In this chapter, we have presented the sequential learning algorithm for meta-cognitive

radial basis function network using EKF based on human meta-cognitive learning princi-

ples for classication problems. The meta-cognitive component in McRBFN is helpful in

42
Chapter 4. EKF-McRBFN

choosing suitable strategy for training the cognitive component in EKF-McRBFN. Also,

the learning strategies consider sample overlapping condition for proper initialization of

new hidden neurons, which helps in minimization of misclassication. The meta-cognitive

component appropriately adapts the learning strategies and hence it eciently decides

what-to-learn, when-to-learn and how-to-learn. In addition, the overlapping conditions

present in neuron growth strategy helps in proper initialization of new hidden neurons

parameters and also minimize the misclassication error. The main drawbacks in EKF-

McRBFN classier is knowledge gained from past samples is not used properly and also

uses computationally intensive extended Kalman lter for parameter update.

In the next chapter, a fast and ecient projection based sequential learning algorithm

for meta-cognitive radial basis function network classier is presented.

43
Chapter 5
Projection Based Learning Algorithm
for Meta-cognitive RBF Network
Classier

5.1 Introduction
In the previous chapter, an EKF based sequential learning algorithm for meta-cognitive

radial basis function network (EKF-McRBFN) classier based on the principles of hu-

man meta-cognition was presented. Therein, the meta-cognitive learning helps the radial

basis function network achieve better performance by controlling what-to-learn, when-to-


learn and how-to-learn. The sample overlapping conditions for allocation of new hidden

neuron parameters minimizes the misclassication. Also, the knowledge measures and

self-regulated thresholds help the network to approximate the underlying function e-

ciently, by employing a compact network structure.

However, EKF-McRBFN does not use the knowledge gained from past samples and

also uses a computationally intensive EKF for parameter update. To overcome these

drawbacks, in this chapter we introduce a fast and ecient Projection Based Learning

(PBL) algorithm for McRBFN. The McRBFN using the PBL to obtain the network

parameters is referred to as, `Projection Based Learning algorithm for a Meta-cognitive

Radial Basis Network' (PBL-McRBFN).

44
Chapter 5. PBL-McRBFN

5.2 PBL-McRBFN Classier


In PBL-McRBFN, when a neuron is added to the cognitive component the Gaussian

parameters (center and width) are determined based on the current sample and the

output weights are estimated using the projection based algorithm. When a new neuron

is added, existing neurons in the cognitive component will be used as pseudo-samples

in projection based learning. There-by, the proposed algorithm exploits the knowledge

stored in the network for proper initialization. The problem of nding optimal weights

is rst formulated as a linear programming problem using the principles of minimization

and real calculus. The Projection Based Learning (PBL) algorithm then converts the

linear programming problem into a system of linear equations and provides a solution

for the optimal weights, corresponding to the minima of the error function.

We present a detailed description of the cognitive and the meta-cognitive components

of PBL-McRBFN in the following sections:

5.2.1 Cognitive Component of PBL-McRBFN


In PBL-McRBFN, the cognitive component uses Projection Based Learning (PBL) algo-

rithm for learning process instead of computationally intensive extended Kalman lter.

The PBL algorithm is described as follows.

Projection based learning algorithm : The projection based learning algorithm works

on the principle of minimization of error function and nds the optimal network output

parameters for which the error function is minimum, i.e, the network achieves the mini-

mum error point of the error function.

The considered error function is the sum of squared hinge loss error at output neurons.

The error function for tth sample is dened as

n  
X 2
Jt = etj , t = 1, 2, · · · (5.1)
j=1
 h iT 
t t t t
Where e = e1 , · · · , ej , · · · , en ∈ <n is the hinge loss error dened as

yjt ybjt > 1



0 if
etj = j = 1, · · · , n (5.2)
yjt − ybjt otherwise

45
Chapter 5. PBL-McRBFN

From the denition of hinge loss error in Eq. (5.2), Jt becomes zero when yjt ybjt > 1 and
when yjt ybjt < 1, Jt becomes

n K
!2
X X
Jt = yjt − wkj htk (5.3)
j=1 k=1

For all t training samples, the overall error function becomes

t t n K
!2
1X 1 XX X
J (W) = Ji = yji − wkj hik (5.4)
2 i=1
2 i=1 j=1 k=1

where hik is the response of the kth hidden neuron for ith training sample.

The optimal output weights ( W ∈ <K×n ) are estimated such that the total error

reaches its minimum.

W∗ := arg min J (W) (5.5)


W∈<K×n

The optimal W∗ corresponding to the minima of the error function ( J (W



)) is obtained
by equating the rst order partial derivative of J (W) with respect to the output weight

to zero, i.e.,
∂J (W)
= 0, p = 1, · · · , K; j = 1, · · · , n (5.6)
∂wpj
Equating the rst partial derivative to zero and re-arranging we get

X t
K X t
X
hik hip wkj = hip yji (5.7)
k=1 i=1 i=1

Eq. (5.7) can be written as

K
X
akp wkj = bpj , p = 1, · · · , K; j = 1, · · · , n (5.8)
k=1

which can be represented in matrix form as

AW = B (5.9)

where the projection matrix A ∈ <K×K is given by

t
X
akp = hik hip , k = 1, · · · , K; p = 1, · · · , K (5.10)
i=1

46
Chapter 5. PBL-McRBFN

and the output matrix B ∈ <K×n is

t
X
bpj = hip yji , p = 1, · · · , K; j = 1, · · · , n (5.11)
i=1

Eq. (5.8) gives the set of K×n linear equations with K×n unknown output weights

W. The proof for A matrix invertible is given at the end of this chapter.

The solution for W obtained as a solution to the set of equations as given in Eq. (5.9)
∂2J
is minimum, if
∂wlp 2
> 0. The second derivative of the error function ( J ) with respect

to the output weights is given by,

t t
∂ 2 J (W) X X
= hip hip = |hip |2 > 0 (5.12)
∂wlp 2 i=1 i=1

As the second derivative of the error function J (W) is positive, the following observations
can be made from Eq. (5.12):

1. The function J is a convex function.

2. The output weight W∗ obtained as a solution to the set of linear equations (Eq.

(5.9)) is the weight corresponding to the minima of error function ( J ).

If the projection matrix A is positive denite symmetric matrix then it is invertible.

The solution for the system of equations in Eq. (5.9) can be determined as follows:

W∗ = A−1 B (5.13)

5.2.2 Meta-cognitive Component of PBL-McRBFN


In PBL-McRBFN, the functioning of meta-cognitive component is same as McRBFN,

however the principles underlying the learning strategies are modied for PBL algorithm.

5.2.2.1 Learning Strategies

The principle behind the learning strategies in PBL-McRBFN is described in detail below:

• Sample delete strategy : Prevents similar samples from being learnt, which

avoids over-training and reduces the computational eort. When the predicted

47
Chapter 5. PBL-McRBFN

class label of the new training sample is same as the actual class label and the con-

dence level (estimated posterior probability) is greater than expected value then

the new training sample does not provide additional information to the classier

and can be deleted from training sequence without being used in learning process.

The sample delete criterion is given by

cbt == ct AND p̂(ct |xt ) ≥ βd (5.14)

The deletion threshold ( βd ) controls the number of samples participating in the

learning process. The sample deletion strategy prevents learning of samples with

similar information, and thereby, avoids over-training and reduces the computa-

tional eort.

• Neuron growth strategy : When a new training sample contains signicant in-

formation and the predicted class label is dierent from the actual class label then

one need to add a new hidden neuron to represent the knowledge contained in the

sample. The neuron growth criterion is given by

cbt 6= ct OR E t ≥ βa AND ψc (xt ) ≤ βc



(5.15)

where βc is the knowledge threshold and βa is the addition threshold. The thresh-

olds βc and βa allows samples with signicant knowledge for learning rst then

uses the other samples for ne tuning.

The PBL-McRBFN growth strategy in Eq. (5.15) is slightly dierent from EKF-

McRBFN, here a new neuron is added when the class labels are dierent even if

the hinge error is low. This change is will reduce the misclassication error.

The βa is adapted as follows

βat = δβat−1 + (1 − δ)E t (5.16)

where δ is the slope that controls rate of self-adaptation and is set close to one. The

βa adaptation allows PBL-McRBFN to add neurons only when presented samples

to the cognitive network contains signicant information.

48
Chapter 5. PBL-McRBFN

The center and width parameters of the new neuron ( K + 1) are initialized based

on overlapping conditions as discussed in chapter 4: Growth strategy. For the

better continuity, the overlapping conditions are also described in this section. Ex-

isting learning algorithms in the literature do not consider overlapping and distinct

cluster criterion in assigning the new neuron parameters. However, the overlapping

condition will signicantly inuence the performance. The new training sample

may have overlap with other classes or will be from a distinct cluster far away from

the nearest neuron in the same class. Hence, PBL-McRBFN measures inter/intra

class nearest neuron distances from the current sample in assigning the new neuron

parameters.

Let nrS be the nearest hidden neuron in the intra-class (i.e, l == c) with center
c c
(µnrS ) and width (σnrS ) parameters, and nrI be the nearest hidden neuron in
l l
the inter-class (i.e, l 6= c) with center ( µnrI ) and width (σnrI ) parameters. They

are dened as

nrS = arg min kxt − µlk k (5.17)


l==c;∀k

nrI = arg min kxt − µlk k (5.18)


l6=c;∀k

Let the Euclidian distances between the new training sample to nrS and nrI are

given as follows

dtS = ||xt − µcnrS || (5.19)

dtI = ||xt − µlnrI || (5.20)

Using the nearest neuron distances, we can determine the overlapping/no-overlapping

conditions as follows:

 Distinct sample : when a new training sample is far away from both intra/inter

class nearest neurons (dtS >> σnrS


c
AND dtI >> σnrI
l
) then the new

training sample does not overlap with any class cluster, and is forms a new
c
distinct cluster. In this case, the new hidden neuron center ( µK+1 ) and width
c
(σK+1 ) parameters are determined as
p
µcK+1 = xt ; σK+1
c
= max(0.00001, κ xt T xt ) (5.21)

49
Chapter 5. PBL-McRBFN

where κ is a positive constant which controls the overlap of the responses of

the hidden units in the input space, which lies in the range 0.5 ≤ κ ≤ 1.

 No-overlapping : When a new training sample is close to the intra-class near-

est neuron then the sample does not overlap with the other classes, i.e., the

intra/inter class distance ratio is less than 1, then the sample does not overlap
c
with the other classes. In this case, the new hidden neuron center ( µK+1 ) and
c
width (σK+1 ) parameters are determined as

µcK+1 = xt ; σK+1
c
= κkxt − µcnrS k (5.22)

 Minimum overlapping with the inter-class : when a new training sample is close

to the inter-class nearest neuron compared to the intra-class nearest neuron,

i.e., the intra/inter class distance ratio is in range 1 to 1.5, then the sample

has minimum overlapping with the other class. In this case, the center of the

new hidden neuron is shifted away from the inter-class nearest neuron and

shifted towards the intra-class nearest neuron, and is initialized as

µcK+1 = xt + ζ(µcnrS − µlnrI ); σK+1


c
= κkµcK+1 − µcnrS k (5.23)

where ζ is center shift factor which determines how much center has to be

shifted from the new training sample location. It lies in range [0.01-0.1].

 Signicant overlapping with the inter-class : When a new training sample is

very close to the inter-class nearest neuron compared to the intra-class nearest

neuron, i.e., the intra/inter class distance ratio is more than 1.5, then the

sample has signicant overlapping with the other class. In this case, the

center of the new hidden neuron is shifted away from the inter-class nearest

neuron and is initialized as

µcK+1 = xt − ζ(µlnrI − xt ); σK+1


c
= κkµcK+1 − µlnrI k; (5.24)

The above mentioned center and width determination conditions help in minimizing

the misclassication in PBL-McRBFN classier.

50
Chapter 5. PBL-McRBFN

The existing sequential learning algorithms initialize output weight as error based

on the current sample. The inuence of past samples is not considered in weight

initialization. Hence, it will aect the performance of the classier signicantly.

The above mentioned issue is dealt in this chapter as existing knowledge of past

samples stored in the network as neuron center is used to initialize the weight of

new neuron. When a neuron is added to PBL-McRBFN, based on the existing

knowledge of past samples stored in the network the output weights are estimated

using the PBL as follows:

The size of matrix A is increased from K×K to (K + 1) × (K + 1)


At−1
T
aK+1
 t t T

At(K+1)×(K+1) = K×K + (h ) h (5.25)
aK+1 aK+1,K+1
 
where ht = ht1 , ht2 , · · · , htK is a vector of the existing K hidden neurons re-
th
sponse for new ( t ) training sample. In sequential learning samples are discarded

after learning, but the information present in the past samples is stored in the

network. The centers of neuron provide the distribution of past samples in feature

space. These centers can be used as pseudo-samples to capture the eect of past

samples. Hence, existing hidden neurons are used as pseudo-samples to calculate

aK+1 and aK+1,K+1 terms.

aK+1 ∈ <1×K is assigned as

K+1
!
X kµli − µlp k2
aK+1,p = hiK+1 hip , p = 1, · · · , K where hip = exp −
i=1
(σpl )2
(5.26)
+
and aK+1,K+1 ∈ < value is assigned as

K+1
X
aK+1,K+1 = hiK+1 hiK+1 (5.27)
i=1

The size of matrix B is increased from K×n (K + 1) × n


to

t−1 T T
 
BK×n + (ht ) (yt )
Bt(K+1)×n = (5.28)
bK+1
and bK+1 ∈ <1×n is a row vector assigned as

K+1
X
bK+1,j = hiK+1 ỹji , j = 1, · · · , n (5.29)
i=1

51
Chapter 5. PBL-McRBFN

where ỹ i is the pseudo-output for the ith l


pseudo sample or hidden neuron ( µi )

given as
1 if l = j

ỹji = j = 1, · · · , n (5.30)
−1 otherwise
Finally, the output weights are estimated as

t
  −1
WK 
t = At(K+1)×(K+1) Bt(K+1)×n (5.31)
wK+1

The above can be expanded as

−1 
t
At−1
T
aTK+1 Bt−1
T T
   t t t t

WK K×K + (h ) h K×n + (h ) (y )
=
t
wK+1 aK+1 aK+1,K+1 bK+1
(5.32)
t t
where WK is the output weight matrix for K hidden neurons, and wK+1 is the

vector of output weights for new hidden neuron after learning from tth sample.

The inverse of a matrix At(K+1)×(K+1) is calculated recursively using matrix iden-

tities as
" −1 # −1 −1 T
aT aT
 
 −1
AtK×K 0 1 AtK×K AtK×K
At(K+1)×(K+1) = +
0 0 ∆ 1 1
(5.33)
 −1
where ∆ = aK+1,K+1 − aK+1 AK×K + (h ) h aTK+1 , AtK×K = AK×K
t−1 t T t t−1
+
T −1
(ht ) ht , and AtK×K is calculated as

t−1
−1 T −1
−1 −1 AK×K (ht ) ht At−1
K×K
AtK×K = At−1
K×K − −1 (5.34)
t−1
1 + ht AK×K (ht )T

After calculating inverse of matrix in Eq. (5.32) using Eqs. (5.33) &(5.34), the

resultant equations are

−1
aTK+1 aK+1 h
" #
At−1
K×K t−1
−1 T T i
t
WK = IK×K + WK + At−1
K×K ht yt

−1
t−1
AK×K aTK+1 bK+1
− (5.35)

1 h  −1 T  i
aK+1 WK − bK+1
t t−1 t−1
T
wK+1 =− + AK×K ht yt (5.36)

52
Chapter 5. PBL-McRBFN

• Parameters update strategy : The current ( t


th
) training sample is used to up-

date the output weights of the cognitive component ( WK = [w1 , w2 , · · · , wK ]T )


if the following criterion is satised.

ct == cbt AND E t ≥ βu (5.37)

The βu is adapted based on the hinge error as:

βut = δβut−1 + (1 − δ)E t (5.38)

where δ is the slope that controls the rate of self-adaption of parameter update and

is set close to one.

When a sample is used to update the output weight parameters, the PBL algorithm

updates the output weight parameters as follows:

t t t
∂J1,t (WK ) ∂J1,(t−1) (WK ) ∂Jt (WK )
= + = 0, p = 1, · · · , K; j = 1, · · · , n
∂wpj ∂wpj ∂wpj
(5.39)

Equating the rst partial derivative to zero and re-arranging the Eq. (5.39), we get

   
t−1 t T t t t−1 t T t T
  
A + h h WK − B + h y =0 (5.40)

t−1 T
By substituting Bt−1 = At−1 WK & At−1 +(ht ) ht = At and adding/subtracting
T t−1
the term (ht ) ht WK on both sides the Eq. (5.40) reduced to

t
−1  t−1
T  T t−1

WK = At A t WK + ht yt − ht WK (5.41)

Finally the output weights are updated as

t t−1
−1 T T
WK = WK + At ht et (5.42)

where et is the hinge loss error for tth sample obtained from Eq. (5.2).

• Sample reserve strategy : PBL-McRBFN reserve strategy is same as discussed

in chapter 4. If the new training sample does not satisfy either the deletion or the

neuron growth or the cognitive component parameters update criterion, then the

current sample is pushed to the rear of the training sequence. Since PBL-McRBFN

53
Chapter 5. PBL-McRBFN

modies the strategies based on the current sample knowledge, these samples may

be used in later stage.

Ideally, training process stops when no further sample is available in the data

stream. However, in real-time, stopping criterion is when samples in the reserve

remain same. The guidelines to select parameters βc , βd , βa and βu in PBL-

McRBFN algorithm is same as in EKF-McRBFN algorithm as described in sub-

section 4.4.1.

5.3 PBL-McRBFN Algorithm


To summarize, the PBL-McRBFN algorithm in a pseudo code form is given in Pseudo

code 2:

In PBL-McRBFN, sample delete strategy address the what-to-learn by deleting in-

signicant samples from training data set, neuron growth strategy and parameters update

strategy address the how-to-learn eciently by which the cognitive component learns

from the samples, and self-adaptive nature of meta-cognitive thresholds in addition to

the sample reserve strategy address the when-to-learn by presenting the samples in the

learning process according to the knowledge present in the sample.

5.4 Salient Features of PBL-McRBFN Algorithm


In this section we list the similarities and dissimilarities of EKF-McRBFN and PBL-

McRBFN learning algorithms.

Similarities of EKF-McRBFN and PBL-McRBFN:

• The sample deletion strategy in both of the algorithms is same. The sample deletion

strategy helps the network to avoid over-training and save computational eort.

The sample deletion strategy address the what-to-learn, thus what-to-learn in both

of the learning algorithms (EKF-McRBFN and PBL-McRBFN) is same.

• The sample reserve strategy in both of the algorithms is same. The sample reserve

strategy address the when-to-learn in addition to the self-adaptive nature of meta-

cognitive thresholds. The meta-cognitive addition and update thresholds are also

54
Chapter 5. PBL-McRBFN

Pseudocode 2 Pseudo code for the PBL-McRBFN classication algorithm.


Input : Present the training data one-by-one to the network
from data stream.
Output : Decision function that estimates the relationship
between feature space and class label.
START
Initialization : Assign the first sample as the first neuron ( K=1).
The parameters of the neuron are chosen as shown in Eq. (5.21).
For each training sample (xt , yt )
DO
Meta-cognitive component computes the signicance of the sample with
respect to the cognitive component :
Compute the cognitive component output b yt using Eq. (4.2).
Find the predicted class label cbt , maximum hinge error E t ,
confidence of classifier p̂(ct | xt ) and class-wise significance ψc
using Eqs.(4.4),(4.6),(4.7) and (4.11).
Based on above calculated measures the meta-cognitive
component selects one of the following strategies :
Sample delete strategy :
IF cbt == ct AND p̂(ct |xt ) ≥ βd THEN
Delete the sample from the sequence without learning.
Neuron growth strategy :
ELSEIF (b
ct 6= ct OR E t ≥ βa ) AND ψc ( xt ) ≤ βc THEN
Add a neuron to the network ( K = K+1).
Choose the center and width parameters of the new hidden neuron
using Eqs. (5.21) to (5.24) and estimate the new hidden neuron output
weights using Eq. (5.36). Update the existing hidden neuron output
weights using Eq. (5.35). Update the self-adaptive meta-cognitive
addition threshold according to Eq. (5.16)
Parameters update strategy :
ELSEIF ct == cbt AND E t ≥ βu THEN
Update the parameters of the cognitive component using Eq. (5.42)
Update the self-adaptive meta-cognitive update threshold according
to Eq. (5.38)
Sample reserve strategy :
ELSE
The current sample x t , yt is pushed to the rear
end of the sample stack to be used in future. They can be
later used to fine-tune the cognitive component parameters.
ENDIF
The cognitive component executes the above selected strategy.
ENDDO
END
55
Chapter 5. PBL-McRBFN

adapted in same form in both of the algorithms. Thus when-to-learn is same in

both of the learning algorithms.

Dissimilarities of EKF-McRBFN and PBL-McRBFN:

• The neuron growth strategy criterion in PBL-McRBFN algorithm is dierent from

EKF-McRBFN. In EKF-McRBFN algorithm a new hidden neuron is added when

the predicted class label of sample dierent from actual class label and maximum

hinge error is greater than the knowledge threshold in addition to the novelty

condition. While in PBL-McRBFN algorithm, a new hidden neuron is added either

the predicted class label of sample is dieren from actual class label or maximum

hinge error is greater than the knowledge threshold in addition to the novelty

condition. Hence, the PBL-McRBFN algorithm may add slightly more hidden

neurons than EKF-McRBFN algorithm.

• In EKF-McRBFN algorithm, the new hidden neuron output weights are estimated

t
based on instantaneous prediction error ( e ). While in PBL-McRBFN algorithm,

the new hidden neuron output weights are estimated using existing knowledge of

past trained samples stored in the network. Thus knowledge gained from past

trained samples is used in further learning in PBL-McRBFN algorithm. Hence, exe-

cution of neuron growth strategy in PBL-McRBFN is dierent from EKF-McRBFN

algorithm.

• In EKF-McRBFN algorithm, network parameters are updated using extended Kalman

lter algorithm. In PBL-McRBFN, the existing neuron output weights are updated

using projection based learning algorithm. Hence, execution of parameter update

strategy in PBL-McRBFN is dierent from EKF-McRBFN algorithm.

• Neuron growth and parameter strategies in both algorithms are dierent, thus the

how-to-learn is dierent in both of the learning algorithms. PBL-McRBFN algo-

rithm uses past knowledge of trained samples in how-to-learn, thus its classication
performance will be better than the EKF-McRBFN algorithm.

56
Chapter 5. PBL-McRBFN

5.5 Summary
In this chapter, we have presented a Projection Based Learning (PBL) algorithm for

Meta-cognitive Radial Basis Function Network (McRBFN) classier. Projection based

learning accurately estimates the output weight by direct minimization of hinge loss

error. Knowledge gained from the past samples is used in new hidden neuron parameters

initialization and output weights estimations.

In the next chapter, the performance of proposed EKF-McRBFN and PBL-McRBFN

classiers have been evaluated using a number of benchmark multi-category, binary clas-

sication problems and compared with other standard classiers.

57
Chapter 6
Performance Evaluation of
EKF-McRBFN and PBL-McRBFN
Classiers

In the previous chapters 4 and 5, we proposed two sequential learning algorithms for

meta-cognitive radial basis function neural network which are EKF-McRBFN and PBL-

McRBFN. Compared to the EKF-McRBFN classier, the PBL-McRBFN classier is fast

and ecient.

In this chapter, we present the performance comparison of the proposed EKF-McRBFN

and PBL-McRBFN with the best performing sequential learning algorithm reported in

the literature (SRAN) [9], batch ELM [18] and standard Support Vector Machine (SVM)

[37] on real-world benchmark binary and multi-category classication data sets from the

UCI machine learning repository [61].

6.1 Data Sets Description


In order to extensively verify the performance of the proposed algorithms, we have chosen

data sets with small and large number of samples, low and high dimensional features,

and balanced and unbalanced data sets in both binary and multi classication problems.

The detailed specications of 5 binary and 10 multi classication data sets are given

in Table 6.1. Note that the data sets are taken from UCI machine learning repository,

except for satellite imaging [56], global cancer map using micro-array gene expression

58
Chapter 6. Performance Evaluation of EKF-McRBFN and PBL-McRBFN

Table 6.1: Specication of benchmark binary and multi class data sets
Data sets # Features # Classes # Samples I.F Random
Training Testing Training Testing Trial

IS 19 7 210 2100 0 0 Yes


IRIS 4 3 45 105 0 0 Yes
WINE 13 3 60 118 0 0.29 Yes
SI 6 9 9108 262144 0 0.87 No
LETTER 16 26 13333 6667 0.06 0.1 No
VC 18 4 424 422 0.1 0.12 Yes
AE 5 4 62 137 0.1 0.33 Yes
GCM 98 14 144 46 0.22 0.39 Yes
LAND 36 6 4435 2000 0.43 0.26 No
GI 9 6 336 105 0.68 0.77 Yes
HEART 13 2 70 200 0.14 0.1 Yes
LD 6 2 200 145 0.17 0.14 Yes
PIMA 8 2 400 368 0.22 0.39 Yes
BC 9 2 300 383 0.26 0.33 Yes
ION 34 2 100 251 0.28 0.28 Yes

[62] and acoustic emission [63] data sets. The sample imbalance in training and testing

is measured using imbalance factor (I.F) and is dened as

n
I.F = 1− ∗ min Nj (6.1)
N j=1···n
Pn
where Nj is the total number of training samples in class j and N = j=1 Nj .
For ecient comparison, we present them under the following categories as described

below:

• Binary class data sets : All considered binary class data sets have high sample

imbalance and are grouped into two categories

 Low dimensional : Liver Disorders (LD), Pima Indian diabetes (PIMA) and

Breast Cancer (BC) are having low dimensional features with relatively smaller

number of training samples.

 High dimensional : Heart disease (HEART) and Ionosphere (ION) data sets

are having smaller number of training samples with high dimensional features.

59
Chapter 6. Performance Evaluation of EKF-McRBFN and PBL-McRBFN

• Multi class data sets : Considered 9 multi class data sets are grouped into three

categories:

 Well balanced : Iris classication (IRIS), Image segmentation (IS) and Wine de-

termination (WINE) data sets have equal number of training samples per class.

These data sets are having varying number of features and training/testing

samples.

 Imbalanced : Acoustic Emission classication (AE), Vehicle Classication (VC)

and Glass Identication (GI) data sets have lower dimensional features and

highly imbalanced training samples. The Global Cancer Mapping using micro-

array gene expression (GCM) is having high dimensional features with high

sample imbalance.

 Large number of samples : Letter recognition (LETTER), Satellite Image clas-

sication (SI) and Landsat Satellite (LAND) data sets have relatively large

number of samples and classes.

6.2 Simulation Environment


For this performance comparison study, experiments are conducted for the PBL-McRBFN,

EKF-McRBFN, SRAN, ELM and SVM classiers on all the data sets in MATLAB 2011

on a desktop PC with Intel Core 2 Duo, 2.66GHz CPU and 3GB RAM. The tuneable

parameters of PBL-McRBFN, EKF-McRBFN, SRAN are chosen using cross-validation

on the training data sets. For ELM classier [18], the number of hidden neurons are

obtained using the constructive-destructive procedure presented in [64]. The simulations

for batch SVM with Gaussian kernels are carried out using the LIBSVM package in C

[65]. For SVM classier, the parameters ( c,γ ) are optimized using grid search technique.

For PBL-McRBFN and EKF-McRBFN classiers, the parameters ( βd , βa , βc , βu and

κ) are also optimized using grid search technique by cross-validating results on the train-
ing samples. Simulations on large data sets LETTER, SI and LAND are conducted in a

high-performance computer with Intel Xeon, 3.16 GHz CPU and 16 GB RAM.

60
Chapter 6. Performance Evaluation of EKF-McRBFN and PBL-McRBFN

6.3 Performance Measures


The class-wise performance measures like overall/average eciencies and a statistical

signicance test on performance of multiple classiers on multiple data sets are used

for performance comparison. The confusion matrix Q is used to obtain the class-level

performance and global performance of the various classiers. Class-level performance is

measured by the percentage classication (ηj ) which is dened as:

qjj
ηj = × 100% (6.2)
Nj
where qjj is the total number of correctly classied samples in the class j. The global

measures used in the evaluation are the average per-class classication accuracy (ηa ),
the over-all classication accuracy ( ηo ) and the geometric classication accuracy ( ηg )

dened as:
n
1X
ηa = ηj (6.3)
n j=1
Pn
j=1 qjj
ηo = × 100% (6.4)
√ N
ηg = n η1 η2 · · · ηn (6.5)

6.3.1 Statistical Signicance Test


The classication eciency itself is not a conclusive measure of a classier performance

[66]. Since the developed classier is compared with multiple classiers over multiple data

sets, the Friedman test followed by the Benferroni-Dunn test are used to establish the

statistical signicance of PBL-McRBFN classier. A brief description of the conducted

test is given below.

The Friedman test is used to compare multiple classiers ( U ) over multiple data sets

(V ). Let rij be the rank of the j th classier on the ith data set. The Friedman test
1 P j
compares the average ranks of classiers, Rj = V i ri . Under the null-hypothesis,

which states that all the classiers are equivalent and so their ranks Rj should be equal,

the Friedman statistic is given by

" #
12V X U (U + 1)2
χ2F = Rj2 − (6.6)
U (U + 1) j
4

61
Chapter 6. Performance Evaluation of EKF-McRBFN and PBL-McRBFN

which follows the χ2 (Chi-square distribution) distribution with U −1 degrees of

freedom. A χ2 distribution is the distribution of a sum of squares of U independent

standard normal variables.


2
Iman and Davenport showed that Friedman's statistic ( χF ) is more conservative and

derived a better statistic in [67]. It is given by

(V − 1)χ2F
FF = (6.7)
V (U − 1) − χ2F
which follows the F -distribution with U − 1 (df1) and (U − 1)(V − 1) (df2) degrees
of freedom is used in this paper. F-distribution is dened as the probability distribution

of the ratio of two independent χ2 distributions over their respective degrees of freedom.

The aim of the statistical test is to prove that the performance of PBL-McRBFN classier

is substantially dierent from the other classiers with a condence level of value 1 − α.
If calculated FF > Fα/2,(U −1),(U −1)(V −1) or FF < F1−α/2,(U −1),(U −1)(V −1) , then the

null-hypothesis is rejected. The statistical tables for critical values can be found in [68].

The Benferroni-Dunn test [69] is a post-hoc test that can be performed after rejection

of the null-hypothesis. It is used to compare PBL-McRBFN classier against all the other

classiers. This test assumes that the performances of two classiers are signicantly

dierent if the corresponding average ranks dier by at least the critical dierence (CD)

s
U (U + 1)
CD = qα (6.8)
6V

where critical values qα are based on the studentized range statistic divided by 2
as given in [66].

6.4 Performance Evaluation


The performance evaluation on binary and multi-category data sets presented separately

as follows:

6.4.1 Binary-class Data Sets


The performance measures such as overall ( ηo ), average (ηa ) testing eciencies, number

of neurons and samples used for PBL-McRBFN, EKF-McRBFN, SRAN, ELM and SVM

62
Chapter 6. Performance Evaluation of EKF-McRBFN and PBL-McRBFN

Table 6.2: Performance comparison of PBL-McRBFN, EKF-McRBFN, SRAN, ELM and


SVM on binary class data sets
Data Classier # Neurons # Samples Training Testing
set used (%) Time (sec) ηo ηa
SVM 42
a 100 0.03 75.5 75.10
HEART ELM 36 100 0.15 76.50 75.91
SRAN 28 80 0.53 78.50 77.52
EKF-McRBFN 26 65.7 0.55 80.50 79.65
PBL-McRBFN 20 98.5 0.53 81.50 81.47
a
SVM 141 100 0.30 71.03 70.21
LD ELM 100 100 0.15 72.41 71.41
SRAN 91 75.5 3.37 66.84 65.77
EKF-McRBFN 68 55 2.05 73.79 71.60
PBL-McRBFN 87 58 2.95 73.10 72.63
a
SVM 221 100 0.20 77.45 76.33
PIMA ELM 100 100 0.20 76.63 75.25
SRAN 97 57.5 12.24 78.53 74.90
EKF-McRBFN 76 48.2 6.45 80.16 77.31
PBL-McRBFN 100 41.2 5.95 79.62 79.13
a
SVM 24 100 0.04 96.61 97.06
BC ELM 66 100 0.12 96.35 96.48
SRAN 7 30.3 0.17 96.87 97.26
EKF-McRBFN 9 9 0.62 97.39 97.85
PBL-McRBFN 13 15 0.60 97.39 97.85
a
SVM 43 100 0.20 91.24 88.51
ION ELM 32 100 0.18 89.64 87.52
SRAN 21 86 3.71 90.84 91.88
EKF-McRBFN 20 39 1.02 95.62 95.60
PBL-McRBFN 18 58 0.52 96.41 96.47

a Number of support vectors

63
Chapter 6. Performance Evaluation of EKF-McRBFN and PBL-McRBFN

classiers on all the 5 binary class data sets are reported in Table 6.2. From the perfor-

mance comparison results in Table 6.2, one can see that in case of low dimensional LD

and PIMA data sets, the proposed PBL-McRBFN uses fewer samples for training and

achieves better generalization performance approximately 1-2 % improvement over EKF-

McRBFN, 4-7 % improvement over SRAN, 1-4 % improvement over ELM and SVM with

less number of neurons. In case of simple BC data set, PBL-McRBFN uses fewer samples

for training and achieves slightly better generalization performance approximately 1 %

improvement over SRAN, ELM and SVM and same performance as EKF-McRBFN. In

case of large dimensional HEART and ION data sets, PBL-McRBFN uses fewer samples

for training and achieves better generalization performance 1-2 % improvement over EKF-

McRBFN, 4-5 % improvement over SRAN, 6-9 % improvement over SVM and ELM. The

overlapping conditions and class specic criterion in learning strategies of PBL-McRBFN,

EKF-McRBFN helps in capturing the knowledge accurately in case of high sample im-

balance problems. We can notice that the PBL-McRBFN takes less computational time

than EKF-McRBFN on all data sets.

6.4.2 Multi-category Data Sets


The testing eciencies ηo and ηa , number of neurons and samples used for PBL-

McRBFN, EKF-McRBFN, SRAN, ELM and SVM classiers on all the 10 multi-category

data sets are reported in Table 6.3. From Table 6.3, one can see that in case of well

balanced IS, IRIS, and WINE data sets, PBL-McRBFN uses only 42-50 % training sam-

ples to achieve better generalization performance approximately 2-3 % improvement over

EKF-McRBFN, SRAN, SVM and ELM with the less number of neurons. Sample dele-

tion criterion in meta-cognitive component helps in removing redundant samples from

the training set and thereby improves the generalization performance. For highly un-

balanced data sets, the proposed PBL-McRBFN is able to achieve signicantly better

performance than the other classiers. In case of VC and GI data sets, PBL-McRBFN

uses fewer samples and achieves better generalization performance approximately 1-5 %

improvement over EKF-McRBFN and ELM, 3-12 % improvement over SRAN and SVM.

In case of low dimensional AE data set, PBL-McRBFN achieves slightly better general-

ization performance 1 % improvement over SVM, similar to EKF-McRBFN, SRAN and

64
Chapter 6. Performance Evaluation of EKF-McRBFN and PBL-McRBFN

Table 6.3: Performance comparison on multi-category data sets


Data Classier # Neurons # Samples Training Testing
set used (%) Time (sec) ηo ηa
SVM 127a 100 0.72 91.38 91.38
IS ELM 49 100 0.22 90.23 90.23
SRAN 47 53.8 22 92.29 92.29
EKF-McRBFN 49 44.2 10.67 93.38 93.38
PBL-McRBFN 50 42.3 1.69 94.19 94.19
SVM 13a 100 0.02 96.19 96.19
IRIS ELM 10 100 0.01 96.19 96.19
SRAN 8 64.4 0.08 96.19 96.19
EKF-McRBFN 5 48.8 0.47 97.14 97.14
PBL-McRBFN 6 44.4 0.38 98.10 98.10
SVM 36a 100 0.06 97.46 98.04
WINE ELM 10 100 0.03 97.46 98.04
SRAN 12 76.6 0.16 96.61 97.19
EKF-McRBFN 9 45 0.55 98.30 98.49
PBL-McRBFN 11 48.3 0.45 98.31 98.69
SVM 340a 100 1.3 70.62 68.51
VC ELM 150 100 0.36 77.01 77.59
SRAN 113 70.4 55 75.12 76.86
EKF-McRBFN 146 57.9 562.12 77.72 78.72
PBL-McRBFN 175 51.2 21.83 78.91 79.09
SVM 22a 100 0.11 98.54 97.95
AE ELM 10 100 0.16 99.27 98.91
SRAN 10 62.9 0.09 99.27 98.91
EKF-McRBFN 5 32.2 0.69 99.27 98.91
PBL-McRBFN 5 14.5 0.45 99.27 98.91
SVM 137a 100 0.03 76.08 74.76
GCM ELM 55 100 0.03 76.08 80.23
SRAN 92 77 345.68 78.26 71.42
EKF-McRBFN 71 76.3 325.11 76.08 73.57
PBL-McRBFN 72 75.6 0.81 93.47 91.67
SVM 183a 100 0.92 70.47 75.61
GI ELM 80 100 0.38 81.31 87.43
SRAN 59 47.3 28 86.21 80.95
EKF-McRBFN 73 34.8 13.61 85.71 87.03
PBL-McRBFN 71 34.2 2.90 84.76 92.72
SVM 1298a 100 - 92.21 90.14
SI ELM 1500 100 - 88.39 -
PBL-McRBFN 1243 32.1 2118.98 90.76 92.17
SVM 4429a 100 - 92.94 -
LETTER ELM - 100 - 93.51 -
PBL-McRBFN 1654 25 9875.15 95.42 95.44
SVM 981a 100 4.64 87.90 86.08
LAND ELM 380 100 2.27 87.45 84.20
PBL-McRBFN 245 20.6 26.75 89.35 88.56
a Number of support vectors 65
Chapter 6. Performance Evaluation of EKF-McRBFN and PBL-McRBFN

ELM. In case of large dimensional GCM data set, PBL-McRBFN achieves signicantly

better generalization performance approximately 11-20 % improvement over SVM, ELM,

SRAN and EKF-McRBFN. In case of large samples LETTER, SI and LAND data sets,

the generalization performance of PBL-McRBFN is better than ELM and SVM by ap-

proximately 2 %, 2-3% and 2-4% respectively using less number of training samples and

neurons. We can notice that the PBL-McRBFN takes less computational time than

EKF-McRBFN on all data sets. Due to computational complex EKF parameter update,

EKF-McRBFN and SRAN experiences memory problem for large problems like Letter,

SI and LAND. Hence, the results for EKF-McRBFN and SRAN on these problems are

not presented here.

From the Tables 6.2 and 6.3, we can say that the proposed PBL-McRBFN improves

generalization performance under wide range of sample imbalance data sets.

6.4.3 Statistical Performance Comparison


In order to compare the performance of the proposed PBL-McRBFN over EKF-McRBFN,

SRAN, ELM and SVM classiers on various benchmark data sets, we employ a non-

parametric Friedman test followed by the Benforroni-Dunn test as described in [66]. The

Friedman test compares whether the mean of individual experimental condition diers

signicantly from the aggregate mean across all conditions. If the measured F -statistic is
greater than the critical F -Statistic at 95% condence level then one rejects the equality
of mean hypothesis (the classiers used in our study perform similarly on dierent data

sets). If Friedman test rejects the equality hypothesis, then pair-wise post-hoc should be

conducted to test which mean is dierent from others. We have used 5 dierent classiers

ranks based on ηo and ηa testing eciencies on 12 dierent data sets from the Table 6.1.

• Comparison based on the overall testing eciency ( ηo ) : Ranks of all 5 classiers

based on the overall testing eciency for each data set are provided in Table 6.4.

2
The Friedman statistic ( χF as in Eq. (6.6)) is 25.49 and modied (Iman and

Davenport) statistic ( FF as in Eq. (6.7)) is 13.78. For 5 classiers and 12 data

sets, the modied statistic is distributed according to the F -distribution with 4 and
44 degrees-of-freedom. The critical value for rejecting the null hypothesis at 95 %

66
Chapter 6. Performance Evaluation of EKF-McRBFN and PBL-McRBFN

Table 6.4: Ranks based on the overall testing eciency ( ηo )

Data sets PBL-McRBFN EKF-McRBFN SRAN ELM SVM

HEART 1 2 3 4 5

LD 2 1 5 3 4

PIMA 2 1 3 5 4

BC 1.5 1.5 3 5 4

ION 1 2 4 5 3

IS 1 2 3 5 4

IRIS 1 2 4 4 4

WINE 1 2 5 3.5 3.5

VC 1 2 4 3 5

AE 2.5 2.5 2.5 2.5 5

GI 3 2 1 4 5

GCM 1 4 2 4 4

Average rank ( Rj ) 1.5 2.00 3.29 4.00 4.20

Table 6.5: Two-tailed critical values ( F -distribution) for the Friedman test at 95 % con-
dence level
df1=1 2 3 4 5 6 7 8 9 10

df2=44 5.385 4.016 3.429 3.093 2.871 2.711 2.591 2.496 2.419 2.355

condence level ( F 4,44,0.025 ) is 3.09 and the corresponding reference F -distribution


table is given in Table 6.5. Since, modied statistic is greater than the critical value

(13.78 >> 3.09), we can reject the null hypothesis at a condence of level 95 %.

Hence, we can say that the performance of all 5 classiers is dierent on these 12

data sets based on overall testing eciency.

Next, we conduct a pair-wise comparison using a Benforroni-Dunn test to highlight

the performance signicance of PBL-McRBFN classier with respect to other clas-

siers. Here, the proposed PBL-McRBFN classier is used as a control. From the

Table 6.6: Critical values for the Benferroni-Dunn test at 95 % condence level
# Classiers=2 3 4 5 6 7 8 9 10

q0.05 1.960 2.241 2.394 2.498 2.576 2.638 2.690 2.724 2.773

67
Chapter 6. Performance Evaluation of EKF-McRBFN and PBL-McRBFN

Eq. (6.8), the critical dierence (CD) is calculated is 1.61 at at 95 % condence

level and the reference qα values for dierent number of classiers are provided in

Table 6.6. From the Table 6.4, we can obtain the average ranks for all ve classiers

are PBL-McRBFN: 1.50, EKF-McRBFN: 2.00, SRAN: 3.29, ELM: 4.00, and SVM:

4.20. The dierence in average rank between the proposed PBL-McRBFN classi-

er and the other four classiers are PBL-McRBFN &EKF-McRBFN: 0.50, PBL-

McRBFN&SRAN: 1.79, PBL-McRBFN &ELM: 2.50 and PBL-McRBFN &SVM:

2.70. Note that dierence in average rank for PBL-McRBFN &SRAN, PBL-McRBFN

&ELM and PBL-McRBFN &SVM pairs are greater than critical dierence (CD)

at 95% condence level, i.e, 1.79 > 1.61, 2.50 > 1.61 and 2.70 > 1.61. The dif-

ference in average rank for PBL-McRBFN &EKF-McRBFN pair is less than CD

at 95% condence level, i.e, 0.50 < 1.61. Hence, we can say that PBL-McRBFN

performs slightly better than the EKF-McRBFN classier and signicantly bet-

ter than SRAN, ELM and SVM classiers with a condence of 95 % based on the

overall testing eciency.

• Comparison based on the average testing eciency ( ηa ) : Ranks of all 5 classiers

based on the average testing eciency for each data set are provided in Table 6.7.
2
The Friedman statistic ( χF as in Eq. (6.6)) is 27.69 and modied statistic ( FF as

in Eq. (6.7)) is 16.99. Since, modied statistic is greater than the critical value

(16.99 >> 3.09), we can reject the null hypothesis at a condence of level 95 %.

Hence, we can say that the performance of all 5 classiers is dierent on these 12

data sets based on average testing eciency.

From the Table 6.7, we can obtain the average ranks for all ve classiers and

are PBL-McRBFN: 1.16, EKF-McRBFN: 2.25, SRAN: 3.87, ELM: 3.58, and SVM:

4.12. The dierence in average rank between the proposed PBL-McRBFN classi-

er and the other three classiers are PBL-McRBFN &EKF-McRBFN: 1.09, PBL-

McRBFN&SRAN: 2.71, PBL-McRBFN &ELM: 2.42 and PBL-McRBFN &SVM:

2.96. Note that dierence in average rank for PBL-McRBFN &SRAN, PBL-McRBFN

&ELM and PBL-McRBFN &SVM pairs are greater than critical dierence (CD)

at 95% condence level, i.e, 2.71 > 1.61, 2.42 > 1.61 and 2.96 > 1.61. The dif-

ference in average rank for PBL-McRBFN &EKF-McRBFN pair is less than CD

68
Chapter 6. Performance Evaluation of EKF-McRBFN and PBL-McRBFN

Table 6.7: Ranks based on the average testing eciency ( ηa )

Data sets PBL-McRBFN EKF-McRBFN SRAN ELM SVM

HEART 1 2 3 4 5

LD 1 2 5 3 4

PIMA 1 2 5 4 3

BC 1.5 1.5 3 5 4

ION 1 2 3 5 4

IS 1 2 3 5 4

IRIS 1 2 4 4 4

WINE 1 2 5 3.5 3.5

VC 1 2 4 3 5

AE 2.5 2.5 2.5 2.5 5

GI 1 3 4 2 5

GCM 1 4 5 2 3

Average rank ( Rj ) 1.16 2.25 3.87 3.58 4.12

at 95% condence level, i.e, 1.09 < 1.61. Hence, we can say that PBL-McRBFN

performs slightly better than the EKF-McRBFN classier and signicantly bet-

ter than SRAN, ELM and SVM classiers with a condence of 95 % based on the

average testing eciency.

6.4.4 10 Random Trial Results


For this performance comparison study, 10 random trial experiments are conducted for

the PBL-McRBFN, EKF-McRBFN, ELM and SVM classiers on 12 dierent data sets

from the Table 6.1 except LETTER, SI and LANDSAT data sets. For this study, 10

random trial data sets are generated by maintaining the imbalance factor for 12 data

sets.

Binary class data sets : The performance measures such as overall ( ηo ), geometric (ηg )
testing eciencies and F -score, number of neurons and samples used for PBL-McRBFN,

EKF-McRBFN, ELM and SVM classiers on all the 5 binary class data sets are reported

in Table 6.8. From the performance comparison results in Table 6.8, one can see that

69
Chapter 6. Performance Evaluation of EKF-McRBFN and PBL-McRBFN

Table 6.8: PBL-McRBFN, EKF-McRBFN, SRAN, ELM and SVM classiers 10 random
trial results comparison on binary class data sets
Data # Neurons # Samples Testing
set Classier used(%) ηo ηg F − score
Mean Dev Mean Dev Mean Dev Mean Dev Mean Dev

SVM 44
a 7.39 100 0 77.3 2.69 77.35 2.56 0.76 0.02
HEART ELM 46.5 2.41 100 0 73.2 2.79 73.18 2.93 0.71 0.03
EKF-McRBFN 20.2 3.96 67.5 10.87 78.25 3.29 78.01 3.18 0.76 0.02
PBL-McRBFN 28.2 2.39 81.2 12.2 81.7 1.13 81.37 1.02 0.79 0.01
a
SVM 157.5 4.72 100 0 69.21 2.1 62.91 4.17 0.75 0.02
LD ELM 127 15.67 100 0 64.55 3.8 63.52 4.04 0.68 0.04
EKF-McRBFN 64.9 8.65 53.2 8.83 67.79 1.86 64.46 4.15 0.73 0.01
PBL-McRBFN 78.1 8.55 65.1 8.22 70.82 1.08 69.87 1.57 0.74 0.01
a
SVM 252.7 42.28 100 0 76.76 1.45 66.78 6.58 0.57 0.07
PIMA ELM 172 25.73 100 0 70.86 1.44 65.50 2.67 0.53 0.03
EKF-McRBFN 79.8 6.66 46.1 7.90 75.67 1.67 73.56 1.35 0.63 0.01
PBL-McRBFN 100.8 8.67 46.4 7.39 74.45 2.12 74.41 1.00 0.63 0.01
a
SVM 27.7 3.6 100 0 96.55 0.57 96.41 0.67 0.94 0.01
BC ELM 37.9 1.34 100 0 96.97 0.47 96.89 0.87 0.95 0.01
EKF-McRBFN 9.4 3.47 13.4 4.99 97.72 0.52 98.12 0.35 0.96 0.0
PBL-McRBFN 12.3 4.02 35.8 11.61 97.77 0.60 97.95 0.37 0.96 0.01
a
SVM 70.9 10.27 100 0 91.24 1.11 90.10 2.69 0.87 0.02
ION ELM 46 1.76 100 0 80.26 2.3 74.63 2.59 0.69 0.03
EKF-McRBFN 18.4 6.16 51.5 11.08 90.63 3.31 90.47 2.92 0.87 0.04
PBL-McRBFN 20.6 3.68 47.5 11.14 93.90 1.31 93.74 1.78 0.91 0.01

a Number of support vectors

in case of low dimensional LD and PIMA data sets, the proposed PBL-McRBFN uses

fewer samples for training and achieves signicantly better generalization performance

approximately 4-9 % improvement over ELM and SVM with less number of neurons,

and achieves better generalization performance approximately 1-5 % improvement over

EKF-McRBFN with slightly more number of neurons. In case of simple BC data set,

PBL-McRBFN and EKF-McRBFN performs similarly, uses fewer samples for training

and achieves slightly better generalization performance approximately 1 % improvement

over ELM and SVM with less number of neurons. In case of large dimensional HEART

and ION data sets, PBL-McRBFN uses fewer samples for training and achieves better

generalization performance 3-4 % improvement over SVM and 18-19 % improvement over

ELM, and achieves better generalization performance approximately 3 % improvement

over EKF-McRBFN.

Multi-category data sets : Overall (ηo ) and geometric ( ηg ) testing eciencies, num-

70
Chapter 6. Performance Evaluation of EKF-McRBFN and PBL-McRBFN

ber of neurons and samples used for PBL-McRBFN, EKF-McRBFN, ELM and SVM

classiers on all the 10 multi-category data sets are reported in Table 6.9. From Table

6.9, we can see that PBL-McRBFN performs similar to EKF-McRBFN and signicantly

better than the ELM and SVM on all the 10 multi-category data sets. In case of well

balanced IS, IRIS, and WINE data sets, PBL-McRBFN and EKF-McRBFN uses only

45-55% training samples to achieve 2-3 % more generalization performance over SVM and

ELM with the less number of neurons. Meta-cognitive sample deletion criteria helps in

removing redundant samples from the training set and thereby improving the generaliza-

tion performance. For highly unbalanced data sets, one can see that the proposed PBL-

McRBFN is able to achieve signicantly better performance than the other classiers.

In case of VC and GI data sets, PBL-McRBFN uses fewer samples and achieves better

generalization performance approximately 2-8 % improvement over EKF-McRBFN, 9 %

improvement over SVM and 3-6 % improvement over ELM. In case of low dimensional

AE data set, PBL-McRBFN and EKF-McRBFN achieves slightly better generalization

performance 1 % improvement over SVM and ELM. In case of large dimensional GCM

data set, with less number of neurons and using few samples PBL-McRBFN achieves

better generalization performance approximately 3 % improvement over EKF-McRBFN

and signicantly better generalization performance approximately 13 % improvement over

SVM and ELM.

From the Tables 6.8 and 6.9, we can see the better generalization performance of

PBL-McRBFN than other classiers based on 10 random trial results.

Statistical comparison based geometric testing eciency ( ηg ):


In order to compare the performance of the proposed PBL-McRBFN classier over

EKF-McRBFN, ELM and SVM classiers on various benchmark data sets statistically

based geometric testing eciency ( ηg ), we employ an one-way repeated analysis of vari-

ance (ANOVA) measure followed by a pair wise comparison using post-hoc Dunnett test

[70]. The ANOVA measure compares whether the mean of individual experimental con-

dition diers signicantly from the aggregate mean across all conditions. If the measure

F score is greater than the F -Statics at 95% condence level then one rejects the equality
of mean hypothesis (the classiers used in our study perform similarly on dierent data

sets).

71
Chapter 6. Performance Evaluation of EKF-McRBFN and PBL-McRBFN

Table 6.9: 10 random trial results comparison on multi-category data sets


Data # Neurons # Samples Testing
set Classier used(%) ηo ηg
Mean Dev Mean Dev Mean Dev Mean Dev

SVM 107.3
a 10.91 100 0 90.92 0.43 90.13 0.46
IS ELM 55.59 10.91 100 0 90.13 0.72 89.40 0.83
EKF-McRBFN 36.8 5.63 54.1 8.60 91.64 0.95 91.16 1.08
PBL-McRBFN 48.9 4.67 44.3 5.60 91.50 1.31 91.07 1.39
a
SVM 16.6 0.93 100 0 95.99 1.17 95.90 1.22
IRIS ELM 16 4.59 100 0 96.38 1.08 96.29 1.16
EKF-McRBFN 5.2 1.93 44.6 16.51 96.95 1.17 96.86 1.23
PBL-McRBFN 7.8 1.61 47.7 20.55 97.14 1.18 97.08 1.21
a
SVM 37 3.02 100 0 96.02 1.2 96.77 0.93
WINE ELM 15.2 2.09 100 0 94.83 1.35 95.58 1.08
EKF-McRBFN 5.5 2.36 50 14.68 98.64 0.71 98.86 0.66
PBL-McRBFN 10.6 1.77 53.6 14.43 98.31 1.16 98.62 1.17
a
SVM 246.8 58.2 100 0 73.2 2.01 69.4 3.61
VC ELM 150 25.3 100 0 77.09 3.44 75.16 3.83
EKF-McRBFN 140 10.98 78.7 7.03 78.96 1.32 76.34 2.85
PBL-McRBFN 210.8 7.28 68.39 11.65 79.62 1.28 78.33 1.55
a
SVM 21.7 4.48 100 0 97.66 1.02 97.62 1.43
AE ELM 14.5 3.68 100 0 97.88 0.87 97.34 1.11
EKF-McRBFN 5.7 1.70 34.5 11.35 98.90 0.51 98.52 0.72
PBL-McRBFN 7.2 2.69 30.4 9.56 98.31 1.31 98.13 1.25
a
SVM 120.6 5.78 100 0 76.53 2.80 63.91 5.59
GCM ELM 114.4 10.2 100 0 91.80 2.83 63.39 4.71
EKF-McRBFN 71 1.58 73.6 4.13 71.73 0.03 73.17 7.79
PBL-McRBFN 68.2 2.82 70.0 5.02 83.04 4.55 76.78 5.42
a
SVM 176.6 18.9 100 0 77.23 5.01 85.83 3.62
GI ELM 86 9.17 100 0 80.09 3.05 88.81 2.28
EKF-McRBFN 73.2 9.17 41.8 3.45 80.00 4.62 86.67 4.18
PBL-McRBFN 82.4 9.17 40.0 6.55 89.71 1.94 94.40 1.08

a Number of support vectors

72
Chapter 6. Performance Evaluation of EKF-McRBFN and PBL-McRBFN

Table 6.10: One-tailed critical values ( F -distribution) for the ANOVA test at 95 % con-
dence level
df1=1 2 3 4 5 6 7 8 9 10

df2=33 4.139 3.284 2.891 2.658 2.502 2.389 2.302 2.234 2.178 2.132

t
Table 6.11: Two-tailed critical values ( -distribution) for the Dunnett test at 95 % con-
dence level
df t0.1 t0.05 t0.025 t0.01 t0.005
33 1.692 2.034 2.348 2.378 3.008

If one-way repeated ANOVA test rejects the equality hypothesis, then pair-wise post-

hoc should be conducted to test which mean is dierent from others. In our study, we

have used 4 dierent classiers and 12 dierent data sets. For a given data set and

a classier, 10 random trials are conducted to measure the mean and variance of the

classiers performance. The geometric testing eciencies ( ηg ) of 4 classiers on 12 data

sets are organized as four groups and ANOVA monitors three kinds of variations in

the data, viz., within-group variations, between-group variations and the total variation.

The F -score obtained using repeated measures of one-way ANOVA test is 7.42, which is
greater than the F -statistic at 95 % condence level ( F 3,33,0.05 is 2.89), i.e, 7.42 > 2.89

and the corresponding reference F -distribution table is given in Table 6.10. Hence, one

can reject mean equality hypothesis at a condence of level 95 %, i.e., performances of 4

classiers are dierent on dierent data sets.

Next, we conduct a pair-wise comparison using a parametric Dunnett test to highlight

the performance signicance of PBL-McRBFN classier with respect to other classiers.

Here, the proposed PBL-McRBFN classier is used as a control. The t -observed obtained
for individual pairs are PBL-McRBFN & EKF-McRBFN: 1.93, PBL-McRBFN & ELM:

4.31 PBL-McRBFN & SVM: 3.63. Note that the t -observed PBL-McRBFN &ELM and

PBL-McRBFN &SVM pairs are greater than t -critical (t 33,0.025 is 2.34 at 95% condence
level; The corresponding reference t -distribution table is given in Table 6.11) and the

t -observed PBL-McRBFN &EKF-McRBFN pair is less than t -critical. Hence, we can

say that PBL-McRBFN performs slightly better than the EKF-McRBFN classier and

signicantly better than ELM and SVM classiers with a condence of 95 % based on 10

random trial performance evaluation study.

73
Chapter 6. Performance Evaluation of EKF-McRBFN and PBL-McRBFN

6.5 Work Flow of Meta-cognitive Strategies


In this section we exemplify the dynamic working nature of meta-cognitive strategies

(delete, growth, update and reserve) using PBL-McRBFN for Image segmentation (IS)

data set. IS is a seven classes multi-category data set contains 210 training samples and

2100 testing samples. The tuneable parameters are chosen as follows: deletion threshold

(βd ) = 0.9, knowledge threshold ( βc ) = 0.5117, addition threshold ( βa ) = 1.3, update

threshold (βu ) = 0.6769, self-adaption slope control parameter ( δ ) = 0.9927, hidden

neuron overlap constant ( κ) = 0.655 and hidden neuron center shifting factor ( ζ ) =

0.0111. We shall now consider how the dierent strategies of meta-cognitive learning in

PBL-McRBFN aid to its accurate classication.

• Sample delete strategy : When the predicted class label of the new training sam-

ple is same as the actual class label and the condence level (estimated posterior

probability) is greater than expected value then the new training sample does not

provide additional information to the classier and can be deleted from training

sequence without being used in learning process. We shall exemplify the working

of this strategy with a Fig. 6.1. Fig. 6.1 gives a snap-shot of prediction condence

of PBL-McRBFN classier for samples in range 100-150 along with the deletion

threshold (βd ). Those samples with condence level greater than βd are deleted

without participating in the PBL-McRBFN learning process. In Fig. 6.1, the con-

dence of PBL-McRBFN classier for sample at instant 130 is higher than deletion

threshold (βd = 0.9) and is thus deleted without participating in the learning pro-

cess. By deleting correctly classied samples with negligible novel information, the

sample deletion strategy helps the network to avoid over-training and save compu-

tational eort.

• Neuron growth strategy : When a new training sample contains novel information

and the predicted class label is dierent from the actual class label then a new hid-

den neuron is added. The novelty of sample is determined by class-wise signicance


t
(ψc ) and maximum hinge error ( E ). Let us study the eect of these measures in

the IS problem by considering a snap-shot of 50-100 training samples. The class-

wise signicance for these samples and knowledge threshold ( βc ) are given in Fig.

74
Chapter 6. Performance Evaluation of EKF-McRBFN and PBL-McRBFN

0.9

0.8

0.7
Confidence of classifier

0.6

0.5

0.4

0.3

0.2

Confidence of classifer
0.1
Delete threshold (βd)
Deleted samples
0
100 105 110 115 120 125 130 135 140 145 150
Training sample instants

Figure 6.1: Exemplication of sample deletion strategy in PBL-McRBFN for Image


segmentation data set

6.2(a), whereas Fig. 6.2(b) gives the hinge loss error, self-regulated addition and

deletion threshold for these 50-100 training samples. A sample contains the novel

information if the class-wise signicance is less than the knowledge threshold ( βc =

0.5117) and hinge error is greater than the self-regulatory addition threshold ( βa

= 1.3). It could be noticed from the Figs. 6.2(a)&(b) that even though the sample

is novel, a new hidden neuron is added to the network if the hinge error criterion

is not satised. Let us consider sample at instant 61, since both the class-wise

signicance measured is lower than the knowledge threshold and the instantaneous

hinge error is higher than the addition threshold, a new hidden neuron is added

to the network. The history of neuron and addition threshold in PBL-McRBFN

learning process for IS data set are given in Fig. 6.3(a)&(b). The neuron history

is plotted against the only samples used in training samples; PBL-McRBFN uses

only 89 samples out of 210 training samples. One could notice from Fig. 6.3(b)

that the self-regulatory addition threshold adapts its value based on the predicted

maximum hinge error.

75
Chapter 6. Performance Evaluation of EKF-McRBFN and PBL-McRBFN

0.7
Knowledge threshold (βc)
Class−wise significance (ψc)

0.6

0.5
Class−wise significance (ψc)

0.4

0.3

0.2

0.1

0
50 55 60 65 70 75 80 85 90 95 100
Training sample instants

(a) Class-wise signicance

Self−regulated addition threshold (βa)


1.8
Self−regulated update threshold (βu)

Instantaneous hinge error (Et)


1.6
Sample considered for
neuron addition

1.4
Instantaneous hinge error (Et)

1.2
Sample considered for
parameters update
1

0.8

0.6

0.4

Sample considered for reserve


0.2

0
50 55 60 65 70 75 80 85 90 95 100
Training sample instants

(b) Instantaneous hinge error

Figure 6.2: Class-wise signicance (a), and instantaneous hinge error with self-regulatory
thresholds (b) in PBL-McRBFN for Image segmentation data set

76
Chapter 6. Performance Evaluation of EKF-McRBFN and PBL-McRBFN

50 1.4

45

1.38
40

Self−regulated addition threshold


Number of hidden neurons

35
1.36

30

25 1.34

20

1.32
15

10
1.3

0 1.28
0 10 20 30 40 50 60 70 80 90 0 20 40 60 80 100 120 140 160 180 200
Training sample instants Training samples instants

(a) Neuron history (b) Self-regulated addition threshold history

0.73

0.72
Self−reguled update threshold

0.71

0.7

0.69

0.68

0.67
0 20 40 60 80 100 120 140 160 180 200
Training samples instants

(c) self-regulated update threshold history

Figure 6.3: History of number of hidden neurons (a), self-regulated addition (b), and
update thresholds (c) in PBL-McRBFN for Image segmentation data set

77
Chapter 6. Performance Evaluation of EKF-McRBFN and PBL-McRBFN

• Parameters update strategy : A correctly classied sample (i.e., Predicted class label

is same as the actual class label) is used to update the network parameters in PBL-

McRBFN for IS data set when the hinge error is greater than the self-regulatory

update threshold ( βa = 0.6769). In Fig. 6.2(b), the sample at instant 79 does not

contain signicant novel information and the hinge error is greater than the update

threshold, thus it is used to update the network parameters. The self-regulatory

parameter update threshold adapts its value based on the predicted maximum hinge

error as shown in Fig. 6.3(c).

• Sample reserve strategy : The samples which does not satisfy the either the deletion

or the neuron growth or the parameters update criterion are reserved by network to

be considered for learning later. These samples, by the virtue of the self-regulatory

nature of addition and parameter update thresholds, may be used in the learning

process at a later stage. In Fig. 6.2(b), the sample at instant 76 is reserved to

be later in the learning process. There are a few such reserve samples in PBL-

McRBFN training process for IS data set. It could be noticed that these samples

will be pushed to the rear of the data-stream to be learnt later. It should be noted

that these samples may be used in the learning process at a later stage to ne tune

the network.

6.6 Study on the Eect of Meta-cognition


In this section, we analyze the eect of meta-cognitive strategies in PBL-McRBFN algo-

rithm using Quantized Kernel Least Mean Square (QKLMS) [49] algorithm as a baseline.

QKLMS algorithm is a recently developed online kernel adaptive ltering algorithm which

is based on simple online vector quantization method. In QKLMS, quantization is ap-

plied to compress the input (or feature) space of the kernel adaptive lters so as to control

the growth of the RBF structure. For this purpose, we have conducted three dierent

experiments on the QKLMS algorithm using nine dierent data sets.

(1) Eect of how-to-learn : The QKLMS algorithm is trained using original samples

and its performance is compared with PBL-McRBFN algorithm. This experiment

emphasizes the advantage of neuron growth and update strategies in PBL-McRBFN.

78
Chapter 6. Performance Evaluation of EKF-McRBFN and PBL-McRBFN

(2) Eect of when-to-learn : The QKLMS algorithm is trained using samples selected by

PBL-McRBFN in training, i.e., the selected samples in PBL-McRBFN training are

sequentially presented in the same order in which they are used. We call this algo-


rithm QKLMS . This experiment emphasizes advantage of the self-adaptive nature

of meta-cognitive thresholds and the sample reserve strategy in PBL-McRBFN.

(3) Eect of what-to-learn : The QKLMS algorithm is trained using selected samples

followed by deleted samples by PBL-McRBFN in training. We call this algorithm as

∗∗
QKLMS . This experiment emphasizes the advantage of sample delete strategy in

PBL-McRBFN.

The overall and average eciencies of training and testing and the number of hidden

neurons are given in Table 6.12 for all the nine benchmark data sets. From the results

one can make the following observations.

(1) PBL-McRBFN classier training and testing performance is better than QKLMS on

all the nine data sets. In case of large dimensional binary class ION data set, PBL-

McRBFN uses fewer hidden neurons and achieves better testing performance 23 %

improvement over QKLMS. Also in case of unbalanced multi-category GI data set,

PBL-McRBFN uses fewer hidden neurons and achieves better testing performance

15% improvement over QKLMS. This clearly shows that the meta-cognitive learning

principles in PBL-McRBFN classier help to achieve better performance results.


(2) QKLMS uses less training samples selected by PBL-McRBFN and achieves 1-3 %

improvement in testing over QKLMS with less number of hidden neurons. On large


dimensional binary class ION data set, QKLMS testing performance is 15 % im-

proved over QKLMS. This show when-to-learn the training sample is important in

training a classier.

∗∗
(3) QKLMS uses all samples (selected samples followed by deleted samples by PBL-


McRBFN), its training performance is similar or better than the QKLMS algorithm.


However, the testing performance is slightly lower than the QKLMS . In case of large

∗∗
dimensional binary class ION data set, QKLMS uses more hidden neurons and

79
Chapter 6. Performance Evaluation of EKF-McRBFN and PBL-McRBFN

Table 6.12: Eect of Meta-cognitive learning principles in the QKLMS algorithm


Data Classier # Neurons Training Testing
set ηo ηa ηo ηa
QKLMS 50 91.43 90.42 78.00 77.07
HEART QKLMS∗ 50 78.57 76.67 80.00 78.68
QKLMS∗∗ 25 77.00 77.78 77.00 77.77
PBL-McRBFN 20 100 100 81.50 81.47

QKLMS 150 79.25 79.15 68.75 68.82


PIMA QKLMS∗ 140 80.25 78.07 73.09 70.23
QKLMS∗∗ 150 80.25 79.61 70.65 71.67
PBL-McRBFN 100 92.25 91.19 79.62 79.13

QKLMS 20 98.67 98.75 96.34 96.86


BC QKLMS∗ 4 97.33 96.76 97.38 97.45
QKLMS∗∗ 7 97.33 96.77 97.12 97.25
PBL-McRBFN 13 99.00 98.83 97.39 97.85

QKLMS 20 91.00 90.54 78.08 73.85


ION QKLMS∗ 14 87.00 84.98 82.47 78.98
QKLMS∗∗ 20 91.00 87.50 80.08 73.69
PBL-McRBFN 18 99.00 99.22 96.41 96.47

QKLMS 207 100 100 93.09 93.09


IS QKLMS∗ 89 100 100 93.67 93.67
QKLMS∗∗ 207 98.09 98.09 92.09 92.09
PBL-McRBFN 50 98.57 98.57 94.19 94.19

QKLMS 13 100 100 96.19 96.19


IRIS QKLMS∗ 9 100 100 98.10 98.10
QKLMS∗∗ 13 100 100 96.19 96.19
PBL-McRBFN 6 100 100 98.10 98.10

QKLMS 20 96.67 96.67 92.37 93.91


WINE QKLMS∗ 14 96.67 96.67 95.76 96.19
QKLMS∗∗ 19 95.00 95.00 93.22 94.03
PBL-McRBFN 11 100 100 98.31 98.69

QKLMS 300 88.68 88.64 70.85 70.95


VC QKLMS∗ 298 93.39 93.29 73.46 73.58
QKLMS∗∗ 620 92.92 92.71 69.19 69.19
PBL-McRBFN 175 96.46 96.47 78.91 79.09

QKLMS 80 89.91 95.02 73.33 77.21


GI QKLMS∗ 68 87.15 93.48 78.09 78.88
QKLMS∗∗ 80 94.49 95.80 72.38 74.45
PBL-McRBFN 71 94.49 97.29 84.76 92.72

80
Chapter 6. Performance Evaluation of EKF-McRBFN and PBL-McRBFN


achieves 4% improvement in training and 5 % decrement in testing over QKLMS .

∗∗
Also in case of unbalanced multi-category GI data set, QKLMS uses more hidden

neurons and achieves 2 % improvement in training and 4 % decrement in testing over


QKLMS . This shows what-to-learn in training is important in a learning algorithm.

The PBL-McRBFN algorithm uses the meta-cognitive principles by implementing dif-

ferent (delete, growth, update and reserve) strategies and addresses the what-to-learn,
when-to-learn and how-to-learn eciently. The aforementioned results also clearly high-

light that the use of meta-cognitive principles present in PBL-McRBFN improves the

performance of the QKLMS algorithm signicantly.

6.7 Summary
In this chapter, we have presented the performance evaluation study of proposed EKF-

McRBFN and PBL-McRBFN using a number of benchmark multi-category, binary classi-

cation problems with a wide range of imbalance factor. The qualitative and quantitative

performance analysis using multiple data sets clearly indicate the superior performance of

the proposed PBL-McRBFN and EKF-McRBFN classiers over other classiers consid-

ered in this study. Results also show that PBL-McRBFN classier performance is better

than EKF-McRBFN classier. Hence, in the next chapters, PBL-McRBFN is used in the

early diagnosis of neurodegenerative diseases such as Alzheimer's disease and Parkinson's

disease.

81
Chapter 7
Alzheimer's Disease Diagnosis using
PBL-McRBFN Classier

In this chapter we present an application of the proposed PBL-McRBFN classier in the

area of medical informatics particularly for early diagnosis of neurodegenerative diseases.

Neurodegenerative diseases are generally considered as a group of diseases that seriously

and progressively impair the functions of the nervous system through selective neuronal

vulnerability of specic brain regions. Depending on their type, neurodegenerative dis-

eases can be serious or life-threatening and most of them have no cure. The goal of

treatment for such diseases is usually to improve symptoms, relieve pain and increase

mobility. Alzheimer's disease (AD) is the most common neurodegenerative disease [71].

Parkinson's disease (PD) is the second most common neurodegenerative disease, after

AD. The prevalence of AD and PD is increasing in the elderly [72].

In this chapter, we use PBL-McRBFN classier for the early diagnosis of AD using

MRI scans. Since, the classier developed using PBL-McRBFN accurately approximates

the decision boundary, we also propose a Recursive Feature Elimination approach (called

PBL-McRBFN-RFE) to identify the most relevant and meaningful imaging biomarkers

with a predictive power for diagnosis of AD.

AD is a progressive neurodegenerative disease that causes memory loss, problems in

learning, confusion and poor judgment. AD is considered to be one of the most common

causes of dementia among elderly persons. Dementia is a clinical syndrome characterized

by signicant loss or decline in memory and other cognitive abilities. Around 60 −80%

of age related dementia is caused due to AD [73]. The only way to make a denitive

82
Chapter 7. Alzheimer's Disease Diagnosis using PBL-McRBFN

diagnosis of AD is from a brain autopsy revealing the characteristics of the neurobrillary

tangles and amyloid plaques that dene AD. Early detection of AD using non-invasive

neuroimaging techniques will help in providing assistance to them and thereby one can

slowdown the progression.

The literature review on AD detection is presented in the next section.

7.1 Literature Review on Alzheimer's Disease


Early detection of AD using non-invasive methods play a major role in providing treat-

ment that may slow down its progress. One such non-invasive method of early detection

of AD is by brain imaging. Commonly used brain imaging techniques for this purpose are:

Computed Tomography (CT) [74, 75], Single-Photon Emission Computed Tomography

(SPECT) [76, 77], Positron Emission Tomography (PET) [77] and Magnetic Resonance

Imaging (MRI) [78, 79].

Studies using CT scans for AD diagnosis have been described in [74, 75]. However, due

to a lower spatial resolution and the possibility of unreliable structural change detection

in the early stages of the disease, CT has been employed only in very few cases. SPECT

and PET are functional brain imaging techniques which use a radioactive substance to

detect the changes in the blood ow and metabolism in the brain. For AD diagnosis,

several studies using SPECT and PET images have been reported in [76, 77]. Both

PET and SPECT involve the use of ionizing radiation and they are harmful if these

are repeatedly used for the study. Hence, use of PET and SPECT in normal persons

is typically limited to a single scan which may not provide adequate information for a

proper diagnosis. Also, lack of spatial resolution in the SPECT images inuence the

accuracy in the detection of AD.

MRI is one of the most important brain imaging procedures that provides accurate

information about the shape and volume of the brain. Compared to CT, SPECT and

PET scans, MRI provides a high spatial resolution and can detect minute abnormalities

in the brain. Usage of MRI for the accurate detection of AD has recently become very

active [80, 79]. MRI helps to detect AD at an early stage - before irreversible damage has

been done [13]. Early detection of AD from MRI requires appropriate methods to detect,

83
Chapter 7. Alzheimer's Disease Diagnosis using PBL-McRBFN

locate and quantify tissue atrophy in the brain. Primarily, the visual assessment of the

degree of atrophy in the neuroanatomical structures is performed by an expert using

MRI. However, this may be adequate in a normal clinical setting, but is not enough to

get quantitative measures such as the ne incremental grades of atrophy and overall brain

volume [81].

The early detection of AD from MRI can be cast as a binary classication problem and

one can employ machine learning techniques to automatically detect the AD [78, 79, 82].

The main idea behind using machine learning techniques is to relate the brain volume

changes to the onset of AD. Two major ways of estimating the brain volume changes

from the MRI are: i) Region-of-Interest (ROI) approach; ii) Whole brain morphometric

approach.

7.1.1 Region-of-Interest Approach


ROI approach has been traditionally used to obtain a regional measurement of the brain

volume and to investigate the abnormal tissue structures with AD [83]. In the ROI ap-

proach, a volumetric analysis is performed by manually delineating specic brain regions.

In practice, a priori knowledge about abnormal regions is not always available. However,

in AD diagnosis, a lot of studies rely on manual tracing of hippocampus and entorhinal

cortex which is laborious and time consuming [84, 85]. In [86], the volumes of manually

segmented hippocampus and entorhinal cortex are measured to discriminate between the

AD patients and normal persons. Major shortcomings in the use of the manual ROI

approach are: it is dependent on the tracer's expertise, is time-consuming and error-

prone. Recently, an automatic method for the segmentation of the hippocampus using

probabilistic and anatomical priors have been proposed for the detection of AD patients

[82]. In [82], automatically segmented hippocampus volumes have been used to classify

AD patients and normal persons. Although, ROI techniques for AD analysis have been

widely used, they are dicult to accurately identify the brain volume changes in the

AD patients when the tissue loss is generally smaller. To overcome these shortcomings,

several approaches that enable the assessment of the whole brain have been reported in

the literature [87, 88, 89, 90].

84
Chapter 7. Alzheimer's Disease Diagnosis using PBL-McRBFN

7.1.2 Whole Brain Morphometric Approach


Voxel Based Morphometry (VBM) is one of the widely used, fully automated, whole

brain morphometric analysis [90]. VBM is based on the Statistical Parametric Mapping

(SPM) method, often employed for the investigation of tissue volume changes between

the brain MRI scans of the diseased group versus the normal persons. In the VBM analy-

sis, the brain MRI undergo various preprocessing steps before undergoing the voxel-wise

parametric tests [91]. The preprocessing steps involved in the VBM analysis are: normal-

ization, segmentation, modulation and smoothing. VBM analysis identies probability

of the gray matter, white matter and cerebrospinal uid tissue classes in a given voxel,

where a voxel is dened as a volume element representing the intensity of a point in a

three-dimensional space [90].

In the literature, AD classication studies have been conducted using morphometric

features and Support Vector Machine (SVM) classier [78, 79, 92, 78, 93, 94]. These

methods use dierent morphological features and dierent data sets for AD detection.

In [78], 90 samples ( 33 probable mild AD patients and 57 normal) from Rochester,

Minnesota is used. A statistical parametric map on gray matter tissues is obtained using

these 90 samples and this map is used to extract the features for a SVM classier. In

[79], a mass-preserving Regional Analysis of Volumes Examined in Normalized Space

(RAVENS) is used to extract the features from a smaller set of Baltimore longitudinal

study data. Here, 15 probable mild AD patients and 15 normal person's samples are used

for the AD detection. RAVENS based feature extraction and SVM classier provides

good performance, but the feature extraction process is computationally intensive. For

completely mild AD patients data, the computational eort in RAVENS increases further

and it inuences the accuracy of the features extracted which aects the SVM classier

performance.

SVM is based on the evaluation of discrimination power for classication, hence it has

limitation in dealing with noisy data which is the case for neuroimaging data. Also, high

dimensional VBM features make AD classication dicult and hence feature reduction

techniques have been increasingly used for dimensionality reduction in neuroimage clas-

sication studies [79, 95, 96, 97]. Principal Component Analysis (PCA) and Independent

Component Analysis (ICA) are the widely used feature construction techniques. PCA is

85
Chapter 7. Alzheimer's Disease Diagnosis using PBL-McRBFN

a subspace learning method and transfers the original features into a new linear subspace

[79, 97]. ICA, as one of the important techniques of blind signal separation, has been

shown to provide a powerful method for neuroimaging data [98, 99]. However, these

techniques involve a careful selection of parameters such as the number of components

to preserve the important subsets of the feature space. Also, the reduced features do

not provide any information on the original voxels and hence the regions in the brain

responsible for AD. Feature selection problem is also addressed using genetic algorithms,

Integer Coded Genetic Algorithm (ICGA) has been used along with a neural network

in [100]. However, the selection of size of population and other parameters aect the

convergence of the genetic algorithm. Recursive Feature Elimination (RFE) is a compu-

tationally less intensive wrapper based feature selection method - selection of the features

depends upon the classier model.

Machine learning algorithms for AD detection requires samples with signicant in-

formation as training samples. Ideal classier for AD detection must incorporate sample

selection in training for eective learning. Proposed PBL-McRBFN in chapter 5 selects

the samples for learning and found to be eective for learning. Hence, in this chapter

we propose a RFE approach with ecient classication method PBL-McRBFN (referred

as PBL-McRBFN-RFE) to AD classication and identify critical imaging biomarkers

relevant to AD at the same time.

In the next section, we present AD detection using PBL-McRBFN classier with

morphomertic features obtained from MRI scans.

7.2 Early Diagnosis of Alzheimer's Disease Based on


MRI features
The framework of our method is shown in Fig. 7.1. First, from all MRI scans the mor-

phometric features are extracted using the VBM. Next, high dimensional VBM features

are used for classication using PBL-McRBFN. The following sections present a descrip-

tion of the MRI data, VBM analysis for feature extraction and performance results of

PBL-McRBFN classier in AD detection.

86
Chapter 7. Alzheimer's Disease Diagnosis using PBL-McRBFN

Maximum Intensity Projections

MRI Scan

Unified Segmentation

Morphometric
Features PBL-McRBFN AD/
Smoothing Non AD
Classifier

Statistical Testing

Voxel Based Morphometry

Figure 7.1: Schematic diagram of the AD detection using PBL-McRBFN classier

87
Chapter 7. Alzheimer's Disease Diagnosis using PBL-McRBFN

7.2.1 Materials
7.2.1.1 OASIS data set

In our study, the publicly available Open Access Series of Imaging Studies (OASIS) data

set has been used [101]. OASIS data set has a cross-sectional collection of 416 persons

covering the adult life span, aged between 18 to 96 including individuals with AD in

an early-stage. The data includes 218 persons aged between 18 to 59 years and 198

persons aged between 60 to 96 years. Of the 198 older persons, 98 had no AD i.e., with a

Clinical Dementia Rating (CDR) of 0, 70 persons have been diagnosed with a very mild

AD (CDR=0.5), 28 persons are diagnosed with mild AD (CDR=1) and 2 persons with

moderate AD (CDR=2). The AD patients has scores between 14 −30 on the Mini-Mental

State Examination (MMSE) and normal persons have MMSE scores between 25 −30. In

our study, we have considered 198 elderly persons comprising of 98 normal persons and

100 AD patients. For each person, the whole-brain T1-weighted 3-dimensional MPRAGE

(Magnetization-Prepared Rapid-Acquisition Gradient Echo) images has been acquired on

a Siemens 1.5T scanner. The acquired volumes had 128 sagittal 1.25 mm slices without

gaps and pixel resolution of 256 ×256 (1×1 mm). The OASIS data set demographics and

the dementia details are summarized in Table 7.1.

Table 7.1: Demographic information of OASIS data used in our study


Group Normal Persons AD Patients

No. of persons 98 100


Percentage of Male 26.5% 41.0%
Age(mean±std) 75.92±8.99 74.76±7.12
MMSE(mean ±std) 28.96±1.21 24.32±4.17
CDR 0/0.5/1/2 98/0/0/0 0/70/28/2

7.2.1.2 ADNI data set

In our study, we also used the data obtained from the Alzheimer's Disease Neuroimaging

Initiative (ADNI) data set [102]. The ADNI was launched in 2003 by the National Insti-

tute on Aging (NIA), the National Institute of Biomedical Imaging and Bioengineering

(NIBIB), the Food and Drug Administration (FDA), private pharmaceutical companies

88
Chapter 7. Alzheimer's Disease Diagnosis using PBL-McRBFN

and non-prot organizations, as a USD 60 million, 5-year public private partnership.

The primary goal of ADNI has been to test whether serial MRI, positron emission to-

mography, other biological markers, and clinical and neuropsychological assessment can

be combined to measure the progression of Mild Cognitive Impairment (MCI) and early

AD. Determination of sensitive and specic markers of very early AD progression is in-

tended to aid the researchers and clinicians to develop new treatments and monitor their

eectiveness, as well as lessen the time and cost of clinical trials. The Principal Inves-

tigator of this initiative is Michael W. Weiner, MD, VA Medical Center and University

of California, San Francisco. ADNI is the result of eorts of many co-investigators from

a broad range of academic institutions and private corporations, and persons have been

recruited from over 50 sites across the U.S. and Canada. The initial goal of ADNI is

to recruit 800 adults, ages 55 to 90, to participate in the research, approximately 200

cognitively normal older individuals to be followed for 3 years, 400 people with MCI to

be followed for 3 years and 200 people with early AD to be followed for 2 years. For

up-to-date information, see (www.adni-info.org).

In our study, we have considered all the 432 elderly persons (232 normal persons and

200 AD patients) available in the ADNI data set as of February 2012. Standard 1.5T

screening/baseline T1-weighted images obtained using volumetric 3D MPRAGE protocol

with resolutions ranging from 0.9 mm × 0.9 mm × 1.20 mm to 1.3 mm × 1.3 mm ×


1.20 mm are included from the ADNI data set. The detailed information of the MRI

protocols and preprocessing steps are presented in [103]. The demographics for the 432

elderly persons used in our study is shown in Table 7.2.

Table 7.2: Demographic information of ADNI data used in our study


Group Normal Persons AD Patients

No. of persons 232 200


Percentage of Male 51.7% 51.5%
Age(mean±std) 76.01±5.00 75.65±7.70
MMSE(mean ±std) 29.11±1.00 23.29±2.05
CDR 0/0.5/1/2 232/0/0/0 0/98/102/0

89
Chapter 7. Alzheimer's Disease Diagnosis using PBL-McRBFN

Figure 7.2: Schematic diagram of the stages in feature extraction based on the VBM
analysis

7.2.2 Voxel Based Morphometry Based Feature Extraction


A feature extraction method based on the VBM is employed in this work [100]. The

ow diagram of the feature extraction process is shown in Fig. 7.2. VBM is a voxel-wise

comparison of local tissue volumes of gray matter within a group or across groups of

persons using MRI scans. In our study, VBM is used to detect signicant gray matter

dierences between the AD patients and normal persons. The VBM detected voxels

locations of signicant regions are further used as masks in order to extract the features

from all gray matter segmented MRI scans. VBM is performed on the OASIS and ADNI

data sets using the Statistical Parametric Map (SPM) software package [91].

In a VBM analysis, the brain MR images undergo various preprocessing steps before

the voxel-wise parametric tests are carried out on them. In our study, VBM analysis

based on a recently proposed unied segmentation model is performed [104]. The steps

involved in VBM analysis are: unied segmentation, smoothing and statistical testing,

in that order. The unied segmentation step is a generative modeling approach, in which

tissue segmentation, bias correction and image registration are combined in a single

model [104]. The unied segmentation framework combines deformable tissue probability

maps with a Gaussian mixture model. The MR brain images of both the AD patients

90
Chapter 7. Alzheimer's Disease Diagnosis using PBL-McRBFN

and normal persons are segmented into gray matter tissue class. The segmented and

normalized gray matter images are then smoothed by convolving them with an isotropic

Gaussian kernel. In our approach, a 10 mm full-width at half-maximum Gaussian kernel

is employed.

For better understanding of the VBM analysis, we have shown three planar views

(sagittal, coronal and axial) of the original images and images after every stage in the

VBM analysis in Fig. 7.3(a-c). Fig. 7.3(a) shows the dierent planar views of the MRI

scan. From the MRI scan, one has to perform a bias correction, tissue segmentation and

register the segmented image in a standard template for removing non-uniform artifacts.

The images after undergoing these steps are shown in Fig. 7.3(b). From Fig. 7.3(b), we

can see that the unied segmentation in the VBM analysis eciently identies the gray

matter from the MRI scans. These segmented images are now smoothed by convolving

them with an isotropic Gaussian kernel and the resultant images are shown in Fig. 7.3(c).

The smoothing process averages the concentration of the gray matter around the voxel

and this helps considerably in the subsequent voxel-by-voxel statistical analysis [104].

The smoothed brain volumes of AD patients and normal persons are used in the

statistical analysis to identify regions of gray matter concentration that are signicantly

related to the AD. These regions will be used to extract the features for accurate iden-

tication of AD. For the statistical analysis, a general linear model is used to detect

the volumetric changes in gray matter across the AD patients and normal persons. In

our statistical analysis, estimated total intracranial volume is used as a covariate in the

design matrix of the general linear model. Also a two-sample t -test is performed on the

smoothed images of normal persons and AD patients and a multiple comparison correc-

tion method, namely, a family wise error with P < 0.05 has been applied. Following

the application of the general linear model and statistical tests, the signicance of any

dierences in gray matter volume is ascertained using the theory of Gaussian random

elds [105]. These tests result in a maximum intensity projection map, which will be then

used to extract the features from individual segmented gray images for further analysis.

For better understanding, we show the maximum intensity projections of the signicant

voxels in sagittal, coronal and axial views in Fig. 7.4 and Fig. 7.5.

From Fig. 7.4 and Fig. 7.5, it can be noted that there are signicant areas with

decreased gray matter density in the AD patients relative to the normal persons indicating

91
Chapter 7. Alzheimer's Disease Diagnosis using PBL-McRBFN

Figure 7.3: Results of the unied segmentation and smoothing steps performed on MRI
of an AD patient (from right: sagittal view, coronal view and axial view)

92
Chapter 7. Alzheimer's Disease Diagnosis using PBL-McRBFN

Figure 7.4: Maximum intensity projections from OASIS data set - Normal persons vs.
AD patients (a) sagittal view (b) coronal view (c) axial view

Figure 7.5: Maximum intensity projections from ADNI data set - Normal persons vs.
AD patients (a) sagittal view (b) coronal view (c) axial view

93
Chapter 7. Alzheimer's Disease Diagnosis using PBL-McRBFN

Figure 7.6: Gray matter volume change from OASIS data set - Normal persons vs. AD
patients (a) sagittal view (b) coronal view (c) axial view

that for the AD patients gray matter in these locations are lower. A total of 19879/23797

features are extracted from OASIS/ADNI data sets using the above VBM analysis and

these features are then used for classication of AD patients. It is found in the literature

that the VBM produces dierent signicant areas with increased gray matter density

in the brain, when the voxel-wise statistics with dierent groups of persons (e.g., male

Vs female, only female etc.) and dierent covariates (e.g., gender, age etc.) are used

[78, 106]. This also implies that by employing the above VBM analysis one can obtain

dierent sets (with varying numbers) of feature vectors.

To locate the above regions with respect to the spatial locations in the brain, these

regions are overlaid on the sliced sections of the commonly used Montreal Neurological

Institute (MNI) brain template and the results of the same are shown in Fig. 7.6 and

Fig. 7.7. From Fig. 7.6 and Fig. 7.7, one can infer the regions of the brain which get

aected signicantly for the AD patients. In other words, during the MRI scans if we

can notice that the gray matter in these specic locations are lower, one can infer a good

likelihood of developing AD later in these patients.

94
Chapter 7. Alzheimer's Disease Diagnosis using PBL-McRBFN

Figure 7.7: Gray matter volume change from ADNI data set - Normal persons vs. AD
patients (a) sagittal view (b) coronal view (c) axial view

7.2.3 Experimental Results


In this section, we present the systematic studies that have been carried out on AD

detection using the PBL-McRBFN classier with MRI. Before, we present the details of

the study results, we highlight the sequence in which the studies have been carried-out.

First, we present the performance results of PBL-McRBFN classier for AD detection

using OASIS/ADNI data sets and also compare the performance with existing results

in the literature. Next, the generalization capability of the PBL-McRBFN classier is

shown by testing the ADNI samples on PBL-McRBFN classier developed using the

OASIS data set. Further, we present a method to identify the imaging biomarkers for

AD using proposed a RFE method for feature reduction and the PBL-McRBFN classier.

Finally, we present the detailed studies based on age/gender groups in OASIS data to

identify the imaging biomarkers for AD.

95
Chapter 7. Alzheimer's Disease Diagnosis using PBL-McRBFN

Table 7.3: Classication performance of PBL-McRBFN on the OASIS data set

Feature # Feat # Neur Training Testing


Type ures ons Accu- Sensit- Specif- Accu- Sensit- Specif-
racy(%) ivity(%) icity(%) racy(%) ivity(%) icity(%)
mean (std) mean (std) mean (std) mean (std) mean (std) mean (std)

VBM+ICA 50 49 93.47(1.71) 95.92(2.59) 91.02(3.09) 72.33(2.48) 72.00(2.00) 72.65(6.85)


VBM 19879 43 92.86(1.90) 91.83(6.92) 93.88(4.99) 75.80(1.01) 71.20(7.82) 80.41(7.98)

7.2.4 PBL-McRBFN Classier Performance on the OASIS Data


Set
The complete OASIS data set consists of 198 samples. For each sample 19879 morpho-

metric features were extracted using the VBM. However, as the feature space is large,

all features may not be responsible for AD. Hence, the obtained morphometric features

from VBM analysis are further reduced statistically by using the ICA. We employed the

FastICA (xed-point) algorithm [107] - the FastICA package for MATLAB [108] is used

on the VBM detected morphometric features and reduced to 50 features.

We conducted 10 random trials of experiments using PBL-McRBFN classier on the

complete 19879 feature set and reduced 50 features set with each trial containing 50%

total samples for training and the remaining for testing. The training/testing accuracy,

sensitivity and specicity obtained from the PBL-McRBFN classier is presented in Table

7.3. PBL-McRBFN produces the testing accuracy using 50 reduced features is 72.33 %.

PBL-McRBFN testing accuracy on complete feature set is 75.8 % which is 3% more than
the 50 reduced feature set, this is because the considered binary classication problem

consists of MRI scans of 100 `very mild to moderate AD' patients and 98 healthy elder

persons. Hence, PBL-McRBFN classier requires all morphometric features to classify

AD patients group consists of wide range of CDR from 0.5−2 and healthy elder persons.
Next, we compare the PBL-McRBFN classication results based on the OASIS data

set with other reported results in the literature.

Comparison with Related Works on the OASIS Data Set:


In [109], Yong Fan et. al proposed a method called Integrated feature extraction and

selection for neuroimage classication, which presents an integrated feature extraction

96
Chapter 7. Alzheimer's Disease Diagnosis using PBL-McRBFN

Table 7.4: Performance comparison with existing results on the OASIS data set

Feature Algorithm # Features Testing


Type Accuracy(%)

VBM+ICA PBL-McRBFN 50 74.79/72.33


50-50% train-test data
Single/10 random trial
VBM PBL-McRBFN 19879 76.90/75.80
50-50% train-test data
Single/10 random trial
VBM+PCA SVM - 66.9
[109] 50-50% train-test data
4 random trials
VBM+IPCA SVM - 69.7
[109] 50-50% train-test data
4 random trials
VBM+ICA SVM 200 62.8
[110] 50-50% train-test data
Single trial

and selection algorithm that contains two iterative steps, viz. : a constrained subspace

learning based feature extraction method and SVM based feature selection. Here, VBM

is used to extract features from MRI scans and an integrated feature extraction and

selection algorithm (IPCA) which is used to select features in conjunction with the SVM

based classication.

In [110], Wenlu Yang et. al proposed a method based on ICA called ICA based

feature extraction and automatic classication of AD related MRI data. Here, features

are extracted from VBM followed by ICA with the SVM based classication.

The above two methods used the same OASIS data set as described in Table 7.1.

First method reported 4 random trial results results and second method reported single

trial result. Hence, we compared the single and 10 random trial results obtained from

PBL-McRBFN with the results from methods in [109, 110] and are presented in Table

7.4.

From the Table 7.4, it is observed the PBL-McRBFN based classication of AD pa-

tients and normal persons (both single trial and 10 random trial) is 6-14 % than the

97
Chapter 7. Alzheimer's Disease Diagnosis using PBL-McRBFN

results from methods proposed in [109, 110]. PBL-McRBFN gives better results on com-

plete features set extracted VBM than reduced features after ICA. The PBL-McRBFN

classication eciency on complete features set is 10 % more than the PCA based SVM

classication eciency, 7 % more than classication eciency of the IPCA algorithm

based SVM classication eciency proposed in [109] and 14 % more than the ICA based

SVM classication eciency proposed in [110]. PBL-McRBFN performs better on 50

features set is 8 % more than the PCA based SVM classication eciency, 5 % more

than classication eciency of the IPCA algorithm based SVM classication eciency

proposed in [109] and 12 % more than the ICA based SVM classication eciency pro-

posed in [110]. Since PBL-McRBFN uses sample selection for proper learning of decision

function. The performance of PBL-McRBFN classier is better than the results reported

using well-known SVM classier.

7.2.5 PBL-McRBFN Classication Performance on the ADNI


Data Set
The performance of the PBL-McRBFN classier has also been evaluated using the ADNI

data set [103]. The complete ADNI data set consists of 232 normal persons and 200 AD

patients. After verication of the unied segmentation results, 6 normal persons and 4

AD patients were excluded (due to bad segmentation) from the VBM analysis. In our

study we considered 422 samples, for each sample 23797 morphometric features were

obtained from the VBM analysis. Here also, the obtained morphometric features are

further statistically reduced to 200 features by ICA [107]. In our classication study, for

each of the 10 random trial experiments, 50% samples are randomly chosen for training

and the remaining used for testing. PBL-McRBFN produces the testing accuracy using

200 reduced features is 82.38 %. PBL-McRBFN testing accuracy on complete feature

set is 85.28% which is 3% more than the 200 reduced feature set. The classication

performance of the PBL-McRBFN classier using both the complete and 200 reduced

feature set are given in Table 7.5.

Next, we compare the PBL-McRBFN classication results using the ADNI data set

with other reported results in the literature.

Comparison with Related Works on the ADNI Data Set:

98
Chapter 7. Alzheimer's Disease Diagnosis using PBL-McRBFN

Table 7.5: Classication performance of PBL-McRBFN on the ADNI Dataset

Feature # Feat # Neur Training Testing


Type ures ons Accu- Sensit- Specif- Accu- Sensit- Specif-
racy(%) ivity(%) icity(%) racy(%) ivity(%) icity(%)
mean (std) mean (std) mean (std) mean (std) mean (std) mean (std)

VBM+ICA 200 67 94.85(2.64) 93.06(4.10) 96.63(2.20) 82.38(0.53) 77.14(3.11) 87.61(2.42)


VBM 23797 64 96.35(0.94) 95.71(1.32) 96.99(1.00) 85.27(1.02) 82.03(2.34) 88.49(1.77)

Here, we compare the results of the PBL-McRBFN classier results with some recent

results reported in the literature that are also based on MRI ADNI data set for AD

classication. In particular, four recent methods have been compared in Table 7.6. In

[111], the automatic diagnostic capabilities of four structural MRI feature extraction

methods (manifold based learning, hippocampal volume, cortical thickness and tensor-

based morphometry) are compared using a SVM classier. The best obtained result using

the tensor-based morphometry is provided in Table 7.6. In [112], the Linear Program

(LP) boosting method with a novel additional regularization have been proposed to

incorporate the spatial smoothness of MRI feature space into the learning process. In

[14], ten methods, which include ve voxel-based methods, three cortical thickness based

methods, and two hippocampus based methods are compared using a SVM classier.

The best result obtained using the voxel-wise Gray Matter (GM) features is provided in

Table 7.6. In [94], 93 volumetric features extracted from the 93 ROI in GM density maps

of MRI data have been used for classication.

From Table 7.6, it can be seen that among the VBM based features method, PBL-

McRBFN's performance is 3 % more than that of the LP boosting method [112] and 2 %

lower than that of the SVM method [14]. This may be due to the fact that the SVM

method in [14] uses a lower number of subjects in the study. Comparing the performance

of the PBL-McRBFN classier using the VBM features with the method of SVM using the

93 ROI features [94] and also the method using the tensor based morphometry features

[111], one can see that PBL-McRBFN's performance is similar.

99
Chapter 7. Alzheimer's Disease Diagnosis using PBL-McRBFN

Table 7.6: Performance comparison with existing results on the ADNI data set

Feature Algorithm Subjects Testing


Type Accuracy(%)

VBM PBL-McRBFN 226 Normal 86.02/85.27


50-50% train-test data 196 AD
Single/ 10 random trial
VBM PBL-McRBFN 226 Normal 87.22
75-25% train-test data 196 AD
Single trial
VBM PBL-McRBFN 226 Normal 91.67
95-5% train-test data 196 AD
Single trial
VBM LP boosting 94 Normal 82.00
leave-N-out cross- 89 AD
[112] validation
VBM SVM 162 Normal 88.58
50-50% train-test data 137 AD
[14] Single trial
93 ROI SVM 52 Normal 86.20
[94] 10-fold cross- 51 AD
validation
Tensor-based SVM 231 Normal 87.00
morphometry 95-5% train-test data 198 AD
[111] leave-N-out cross-
validation, 100 times

100
Chapter 7. Alzheimer's Disease Diagnosis using PBL-McRBFN

Table 7.7: Generalization performance of PBL-McRBFN classier on unseen ADNI sam-


ples

% of ADNI # Neur Training Testing


Samples in Adaptation ons Accu- Sensit- Specif- Accu- Sensit- Specif-
racy(%) ivity(%) icity(%) racy(%) ivity(%) icity(%)

Without 48 94.90 93.88 95.92 62.39 95.92 31.42


With 25% 81 91.66 85.45 97.87 77.27 97.87 83.43

7.2.6 Generalization Capability of the PBL-McRBFN Classier


for the Detection of AD
The aim of this study is to evaluate the generalization capability of the PBL-McRBFN

classier trained with the OASIS data set being tested on the unseen samples from the

ADNI data set. The logical schematic diagram of the generalization capability study of

PBL-McRBFN classier is shown in Fig. 7.8. For this study, the Maximum Intensity

Projection (MIP) of the gray matter voxel locations obtained from the OASIS data set

(possible brain regions for AD) are used to extract the morphometric features from the

ADNI data set in VBM feature extraction procedure apart from the normal processes

of the unied segmentation and smoothing. VBM selected 19879 voxel locations from

OASIS data set and the same 19879 voxel locations are used to extract the morphometric

features from ADNI data set. These unseen ADNI samples with 19879 morphometric

features are tested with the best classier developed using the OASIS training samples.

From the description of OASIS and ADNI data sets, we can see that these data sets are

collected from dierent demographic people with dierent geographic locations. Hence,

the data sets represent a wide variation in the data distribution. Hence, 25 % of ADNI

samples are used further for adaptation of OASIS trained PBL-McRBFN classier and

the same is tested using the remaining ADNI samples. Such a generalization capability

of the classier will prevent computationally intensive VBM feature extraction process

to be done again and unify and simplify the diagnosis mechanism. The generalization

performance of OASIS trained PBL-McRBFN classier on the unseen ADNI data set is

given in Table 7.7.

101
Chapter 7. Alzheimer's Disease Diagnosis using PBL-McRBFN

OASIS MIP

Unified VBM Feature PBL-McRBFN AD/


Smoothing Non AD
Segmentation Extraction Classifier

Meta-
New ADNI MRI cognitive
Learning

Figure 7.8: Schematic representation of the generalization capability study of OASIS


trained PBL-McRBFN classier on ADNI data set.

To evaluate the PBL-McRBFN generalization capability, rst PBL-McRBFN classi-

er is trained on OASIS training data set (50% OASIS samples) and is tested on the 422

samples from the ADNI data set. This experiment is called `Without' because all ADNI

samples are tested using PBL-McRBFN classier trained on OASIS samples and without

ADNI samples for adaptation. The classication accuracy on unseen ADNI samples from

the above experiment is 62.39%. Hence, we can say that PBL-McRBFN classier for AD

detection trained with VBM features using one data set (OASIS) can classify unseen

samples from the other data set (ADNI). Further, 25% of samples from the ADNI data

set were used to adapt the PBL-McRBFN classier using meta-cognitive principles and

the same is tested on the remaining 75% of samples and the testing accuracy is 77.27%.

This experiment is called `With 25 %' because 25% of ADNI samples are used for adap-

tation in PBL-McRBFN classier trained on OASIS samples. Hence, we can say that,

with minor adaptation, PBL-McRBFN can classify unseen samples from other data sets

accurately.

Results shows that the MIP of the gray matter voxels locations generated using VBM

from one data set samples (OASIS data set) are able to discriminate samples from other

102
Chapter 7. Alzheimer's Disease Diagnosis using PBL-McRBFN

data set (ADNI data set). For growing data sets, the sequential learning PBL-McRBFN

classier is able to capture the functional relationship between the VBM features and the

class labels (Disease status). Results shows that PBL-McRBFN classier trained on one

data set (OASIS data set) with minor adaptation using fewer samples from other data

set (ADNI data set) achieves signicant testing accuracy on a larger unseen samples.

7.3 Identication of Imaging Biomarkers for AD


In this section, we present the identication of imaging biomarkers for AD using the

OASIS data set. VBM extracted the gray matter voxels which are statistically variant

between the normal persons and AD patients. All the voxels generated from VBM may

not be responsible to detect AD. Therefore, we propose PBL-McRBFN-RFE to nd the

minimal set of features among the VBM generated voxels that maximizes the detection

of AD. The selected minimal set of features can be termed as the imaging biomarkers for

AD. In literature, many feature selection techniques have been proposed, in general the

goal of feature selection are to reduce the dimensionality. Filter and wrapper methods are

two kinds of well-known feature selection techniques for high dimensional data set [113].

In the lter method, features are selected on the basis of feature separability of training

samples, which is independent of the learning algorithm. The separability only takes

into account the correlations between the features, so the selected features may not be

optimal. Wrapper methods search for critical features based on the learning algorithm,

and often result in better results than lter methods.

Recursive Feature Elimination (RFE) is a computationally less intensive wrapper

based feature selection method. The basic principle of RFE is to include initially all

features of a large region, and to gradually exclude features, that do not contribute in

discriminating patterns from dierent classes. Whether a feature in the current feature

set contributes enough to be kept is determined by the discriminative power of a feature

resulting from training a classier with the current set of features. In order to increase the

likelihood that the best feature are selected, feature elimination progresses gradually

and includes cross-validation steps. In each feature elimination step, a single feature is

discarded until a core set of features remains with the highest discriminative power.

103
Chapter 7. Alzheimer's Disease Diagnosis using PBL-McRBFN

In this study, informative features are selected by the method of RFE utilizing a

PBL-McRBFN classier. RFE utilizing PBL-McRBFN is referred as PBL-McRBFN-

RFE. PBL-McRBFN-RFE conducts feature selection in a sequential elimination manner,

which starts with all the features and discards one feature at a time from the top. The

discard feature added to the feature set again if the training/testing eciency decreases.

The PBL-McRBFN-RFE feature selection algorithm runs till the number of selected

features in current iteration ( s) is more than predened minimum limit for number of

features to be selected ( r ) and is not equal to the number of selected features in previous

iteration (p). To summarize, the PBL-McRBFN-RFE feature selection algorithm in a

pseudo code form is given in Pseudo code 3.

Pseudocode 3 Pseudo code for the PBL-McRBFN-RFE feature selection algorithm.


Input : N data samples {(x , y )},
t t

r predefined minimum limit for number of features to be selected.


Output : S selected features set.
START
Initialization : Initialize the set of selected features ( S) to the full
feature set. Assign the number of selected features in
current iteration ( s) to the number of features in set S
and the number of selected features in previous iteration p
to zero.
WHILE s > r AND s 6= p DO
Assign p to the number of features from set S.
FOR each feature value in set s DO
Remove first feature value in set S.
Train PBL-McRBFN classifier with remaining features in set S and
calculate training and testing efficiencies.
IF T raining OR T esting ef f icieny decreases THEN
Insert the removed first feature value at the rear end of
the set S.
ENDIF
ENFFOR
Assign s to the number of features in set S.
ENDWHILE
RETURN selected features set S.
END

First we present the identication of imaging biomarkers for AD from the complete

104
Chapter 7. Alzheimer's Disease Diagnosis using PBL-McRBFN

Table 7.8: VBM detected and PBL-McRBFN-RFE selected regions from complete OASIS
data set

Feature # Features Identied Regions


Type

Parahippocampal gyrus, Amygdala, Hippocampus,


VBM 19879 Superior temporal gyrus, Insula, Sub-gyrus,
Precentral gyrus, Extra-nuclear
Parahippocampal gyrus, Hippocampus,
VBM + 906 Superior temporal gyrus, Insula,
PBL-McRBFN-RFE Precentral gyrus, Extra-nuclear

Table 7.9: PBL-McRBFN classier performance comparison with VBM detected and
PBL-McRBFN-RFE selected features on complete OASIS data set

# Features Training Testing


Accu- Sensit- Specif- Accu- Sensit- Specif-
racy(%) ivity(%) icity(%) racy(%) ivity(%) icity(%)

19879 (VBM) 94.90 93.87 95.91 76.90 64 89.79


906 (VBM+
PBL-McRBFN-RFE) 91.84 91.83 91.83 84.96 74 95.91

OASIS data set using PBL-McRBFN-RFE. In medical literature it is reported that there

is dierence between the regions eected by AD among male and female persons [15,

16]. Hence, we also conducted an age-wise and gender-wise analysis to identify imaging

biomarkers for AD using PBL-McRBFN-RFE.

7.3.1 Imaging Biomarkers for AD in Complete OASIS Data Set


In imaging biomarkers identication analysis from the complete OASIS data set, the

minimal set of voxels which are more relevant to AD are found using PBL-McRBFN-

RFE. Brain regions corresponding to the VBM detected voxels (19879 voxels with 198

OASIS samples) are reported in Table 7.8. PBL-McRBFN-RFE selected 906 voxels

among 19879 voxels, corresponding brain regions to 906 voxels are reported in Table 7.8.

MNI templates of the complete 19879 and 906 voxel regions are shown in Fig. 7.9.

The testing performance of PBL-McRBFN on this selected 906 features set is 84.95 %

105
Chapter 7. Alzheimer's Disease Diagnosis using PBL-McRBFN

(a) VBM Detected 19879 voxels (b) PBL-McRBFN-RFE Selected 906 vox-
els

Figure 7.9: Comparison of gray matter volume change - Normal persons vs. AD patients
from complete OASIS data set

Table 7.10: Generalization performance of PBL-McRBFN classier on unseen ADNI


samples with selected 906 features

% of ADNI # Neur Training Testing


Samples in Adaptation ons Accu- Sensit- Specif- Accu- Sensit- Specif-
racy(%) ivity(%) icity(%) racy(%) ivity(%) icity(%)

Without 38 91.83 91.83 91.83 52.59 92.34 12.83


With 25% 50 80.75 63.63 97.87 75.86 62.22 89.50

as shown in Table 7.9. To check the discriminating capability of the selected 906 voxels,

we have conducted a generalization capability study as in section 7.2.6. The generaliza-

tion performance of PBL-McRBFN classier on unseen ADNI samples with selected 906

features is given in Table 7.10.

We found these selected voxels by PBL-McRBFN-RFE are located at the brain regions

such as: the superior temporal gyrus, the insula, the precentral gyrus and the extra-

nuclear. These regions are consistent with those reported in the medical literature as the

biomarkers for AD [114, 115, 116]. Hence, the gray matter atrophy in the brain regions

detected by PBL-McRBFN-RFE among VBM features are may be more relevant to the

106
Chapter 7. Alzheimer's Disease Diagnosis using PBL-McRBFN

detection of AD.

7.3.2 Imaging Biomarkers for AD Based on Age in OASIS Data


Set
The brain regions aected in AD patients may dier based on their ages. To verify this,

an analysis is conducted in the OASIS data set based on the ages of the patients. Among

the 198 persons in the OASIS data set, 40 persons are in the age group of 60-69, 83

persons are in the age group of 70-79 and 75 persons are in the ages of 80 and above. We

have separately conducted this analysis based on dierent age groups.

Study of the 60-69 age group : VBM extracted 292 features with 40 persons on this

age group, with a 50-50 % training and testing split PBL-McRBFN obtained the testing

performance on 292 features set of 100 %. After performing PBL-McRBFN-RFE on 292

features set, 25 features are selected and the testing performance of PBL-McRBFN on

this selected features set is 100 %.

Study of the 70-79 age group : VBM extracted 3298 features with 83 persons on this

age group, with a 50-50 % training and testing split PBL-McRBFN obtained the best

testing performance on 3298 features set of 91.67 %. After performing PBL-McRBFN-

RFE on 3298 features set, 90 features are selected and the testing performance of PBL-

McRBFN on this selected features set is 95.83 %.

Study of the Above 80 age group : VBM extracted 1047 features with 75 persons on

this age group, with a 50-50 % training and testing split PBL-McRBFN obtained the best

testing performance on 1047 features set of 89.47 %. After performing PBL-McRBFN-

RFE on 1047 features set, 154 features are selected and the testing performance of PBL-

McRBFN on this selected features set is 94.59 %.

The VBM detected and PBL-McRBFN-RFE selected brain regions of voxels for dier-

ent age groups are listed in Table 7.11 and the corresponding PBL-McRBFN performance

107
Chapter 7. Alzheimer's Disease Diagnosis using PBL-McRBFN

Table 7.11: VBM detected and PBL-McRBFN-RFE selected regions from age-wise OA-
SIS data sets

Age Group Feature # Features Identied Regions


Type

60-69 VBM 292 Superior temporal gyrus, Postcentral gyrus


VBM + 25 Superior temporal gyrus
PBL-McRBFN-RFE
VBM 3298 Parahippocampal gyrus, Amygdala,
70-79 Extra-nuclear, Uncus, Third Ventricle
VBM+ 90 Parahippocampal gyrus , Extra-nuclear
PBL-McRBFN-RFE
VBM 1047 Hippocampus, Parahippocampal gyrus,
80-Above Lateral Ventrical, Extra-nuclear
VBM + 154 Hippocampus, Parahippocampal gyrus,
PBL-McRBFN-RFE Lateral Ventrical

Table 7.12: PBL-McRBFN classier performance comparison with VBM detected and
PBL-McRBFN-RFE selected features on age-wise OASIS data sets

Age # Features Training Testing


Group Accu- Sensit- Specif- Accu- Sensit- Specif-
racy(%) ivity(%) icity(%) racy(%) ivity(%) icity(%)

60-69 292 (VBM) 93.75 87.5 100 100 100 100


25 (VBM+ 93.75 87.5 100 100 100 100
PBL-McRBFN-RFE)
70-79 3298(VBM) 100 100 100 91.67 83.33 100
90 (VBM+) 95.14 95.83 94.44 95.83 91.67 100
PBL-McRBFN-RFE)
80-Above 1047 (VBM) 92.10 100 84.21 89.47 100 78.94
154 (VBM+) 91.81 88.88 94.73 94.59 94.44 94.73
PBL-McRBFN-RFE)

108
Chapter 7. Alzheimer's Disease Diagnosis using PBL-McRBFN

(a) VBM Detected 292 voxels (b) PBL-McRBFN-RFE Selected 25 voxels

(c) VBM Detected 3298 voxels (d) PBL-McRBFN-RFE Selected 90 voxels

(e) VBM Detected 1047 voxels (f) PBL-McRBFN-RFE Selected 154 voxels

Figure 7.10: Comparison of gray matter volume change - Normal persons vs. AD patients
from 60-69 (a&b), 70-79 (c&d), 80-Above (e&f ) age groups in OASIS data set

109
Chapter 7. Alzheimer's Disease Diagnosis using PBL-McRBFN

results are shown in Table 7.12. MNI templates of complete VBM and PBL-McRBFN-

RFE selected voxel regions are shown in Fig. 7.10. In each age group analysis, the

selected voxels from PBL-McRBFN-RFE gives better classication accuracy than the

VBM detected regions.

From the age-wise analysis, we can see that PBL-McRBFN classier detects AD

accurately in the 60-69 age group and 25 voxels selected from the PBL-McRBFN-RFE

are still able to classify AD accurately. The PBL-McRBFN-RFE detected brain region

responsible for AD in the 60-69 age group is the superior temporal gyrus which contains

the primary auditory cortex and is responsible for processing sounds. Hence, we can

conclude that, AD patients in the 60-69 age group may have auditory related problems

as indicators of AD. In the 70-79 age group, the detected brain regions responsible for AD

are the parahippocampal gyrus and the extra-nuclear which are responsible for memory

encoding and retrieval. Hence, we can conclude that, AD patients in the 70-79 age group

may have memory related problems. In the 80-89 age group, the detected brain regions

responsible for AD are the hippocampus, the parahippocampal gyrus and the lateral

ventrical which are responsible for short-term memory to long-term memory, spatial

navigation, memory encoding and retrieval. Hence, we can conclude that, AD patients

in the 80-89 age group may have major diculties in memory.

7.3.3 Imaging Biomarkers for AD Based on gender in OASIS


Data Set
In the medical literature [15, 16], it is reported that gender may be an important modi-

fying factor in AD's development and expression. To verify this, an analysis is conducted

gender-wise in the OASIS data sets. Among the 198 persons in the OASIS data set, 67

persons are male and 131 persons are female. We have conducted the analysis using the

PBL-McRBFN-RFE on both the male and female persons separately.

Male persons study : Here, AD imaging biomarkers identication analysis is conducted

considering 67 male persons alone. VBM extracted 1239 voxels with 67 male persons.

The corresponding brain regions are shown in Table 7.13. PBL-McRBFN obtained best

testing performance on the complete 1239 features is 79.8 % with 50-50% training and

testing data set split as shown in Table 7.14. After performing PBL-McRBFN-RFE on

110
Chapter 7. Alzheimer's Disease Diagnosis using PBL-McRBFN

Table 7.13: VBM detected and PBL-McRBFN-RFE selected regions from male-OASIS
data set

Feature # Features Identied Regions


Type

Parahippocampal gyrus, Traverse temporal gyrus,


VBM 1239 Insula, Superior temporal gyrus, Sub-gyrus,
Extra-nuclear, Inferior parietal lobule
VBM +
PBL-McRBFN-RFE 31 Insula

Table 7.14: PBL-McRBFN classier performance comparison with VBM detected and
PBL-McRBFN-RFE selected features on male-OASIS data set

# Features Training Testing


Accu- Sensit- Specif- Accu- Sensit- Specif-
racy(%) ivity(%) icity(%) racy(%) ivity(%) icity(%)

1239 (VBM) 96.15 100 92.30 79.81 75 84.61


31 (VBM+
PBL-McRBFN-RFE) 100 100 100 89.81 95 84.61

OASIS male complete data set, 31 voxels were selected and the testing performance of

PBL-McRBFN on this reduced features set is 89.8 % as shown in Table 7.14.

The corresponding brain regions of the 31 voxels are listed in Table 7.13. MNI tem-

plates of the complete 1239 and 31 voxel regions are shown in Fig. 7.11. All the 31 voxels

are from the insular cortex region which is responsible for emotion and consciousness.

The insula region is also reported in AD research studies [117, 118] and it is associated

with hypometabolism. Hence, we can conclude that the male AD patients may have

emotion related problems.

Female persons study : Here, AD imaging biomarkers identication analysis is con-

ducted considering 131 female persons alone. VBM extracted 15203 voxels with 131

female persons. The corresponding brain regions of 15203 voxels are shown in Table

7.15. PBL-McRBFN obtained best testing performance on complete 15203 features is

79.93 % with 50-50% training and testing data set split as shown in Table 7.16.

111
Chapter 7. Alzheimer's Disease Diagnosis using PBL-McRBFN

(a) VBM Detected 1239 voxels (b) PBL-McRBFN-RFE Selected 31 voxels

Figure 7.11: Comparison of gray matter volume change - Normal persons vs. AD patients
from Male-OASIS data set

Table 7.15: VBM detected and PBL-McRBFN-RFE selected regions from female-OASIS
data set

Feature # Features Identied Regions


Type

Parahippocampal gyrus, Amygdala,


VBM 15203 Superior temporal gyrus, Inferior temporal gyrus,
Middle temporal gyrus, Insula,
Sub-gyrus, Extra-nuclear,
VBM +
PBL-McRBFN-RFE 294 Parahippocampal gyrus , Extra-nuclear

112
Chapter 7. Alzheimer's Disease Diagnosis using PBL-McRBFN

Table 7.16: PBL-McRBFN classier performance comparison with VBM detected and
PBL-McRBFN-RFE selected features on female-OASIS data set

# Features Training Testing


Accu- Sensit- Specif- Accu- Sensit- Specif-
racy(%) ivity(%) icity(%) racy(%) ivity(%) icity(%)

15203 (VBM) 90.99 93.10 88.88 79.93 79.31 80.55


294 (VBM+
PBL-McRBFN-RFE) 90.33 86.20 94.44 85.44 93.10 77.77

(a) VBM Detected 15203 voxels (b) PBL-McRBFN-RFE Selected 294 vox-
els

Figure 7.12: Comparison of gray matter volume change - Normal persons vs. AD patients
from Female-OASIS data set

113
Chapter 7. Alzheimer's Disease Diagnosis using PBL-McRBFN

After performing PBL-McRBFN-RFE on the OASIS female data set, 294 voxels were

selected and the testing performance of PBL-McRBFN on this selected features set is

85.44 % as shown in Table 7.16. The corresponding brain regions of 294 voxels are listed

in Table 7.15. MNI templates of the complete 15203 and 294 voxel regions are shown in

Fig. 7.12. We found that these selected 294 voxels are located at the brain regions such

as the parahippocampal gyrus and the extra-nuclear regions, which are responsible for

memory encoding and retrieval. Hence, we can conclude that female AD patients may

have memory related problems.

The above detailed study results indicate the superior performance of the proposed

PBL-McRBFN classier. Also, the PBL-McRBFN-RFE approach identies the imaging

biomarkers for the onset of AD.

7.4 Discussion
The AD detection performance of PBL-McRBFN is better than the existing results in

the literature for OASIS and ADNI data sets. The generalization capability of the PBL-

McRBFN classier has been demonstrated by testing the unseen samples from ADNI data

set using the trained PBL-McRBFN classier with samples from OASIS data set. Using

the proposed PBL-McRBFN-RFE, we have identied the imaging biomarkers (critical

regions in the brain) responsible for AD using the OASIS data set. Based on our study, the

gray matter atrophy identied in AD patients in the superior temporal gyrus, the insula,

the precentral gyrus and the extra nuclear regions, which have also been highlighted in

the medical literature [114, 115, 116]. Further, we have carried out a detailed analysis

based on the age and gender. Based on our analysis, the indicators that emerge for the

onset of AD are:

• In the age group 60-69 : Degradation in sound processing capability (primary au-

ditory cortex)

• In the age group 70-79 : Memory related problems (parahippocampal gyrus and

extra nuclear)

• In the age group 80-89 : Problem in short-term/long-term memory, encoding/retrieval

and spatial navigation (parahippocampal gyrus and lateral ventrical)

114
Chapter 7. Alzheimer's Disease Diagnosis using PBL-McRBFN

• In male persons : Emotion related problems usually associated as a hypometabolisim

(insula)

• In female persons : Mainly memory related problems (parahippocampal gyrus and

extra nuclear)

Thus, the proposed approach provides imaging biomarkers identication mechanism

for AD and this approach can be used for other similar problems.

7.5 Summary
In this chapter, AD diagnosis problem is solved by employing PBL-McRBFN classi-

er. Morphometric features were extracted from MRI scans using VBM. For simulation

studies, we have used well-known OASIS and ADNI data sets. The performance of the

PBL-McRBFN classier has been evaluated on complete morphometric features set ob-

tained from the VBM analysis and also on reduced features sets from ICA. Since, the

data sets contain very mild AD patients and fewer samples, AD detection using complete

VBM features provide better performance than ICA reduced features. The performance

of the proposed method is compared against state-of-the-art methods reported in litera-

ture. Next, the performance evaluation on ADNI data set with PBL-McRBFN classier

trained on OASIS data set shows that the proposed PBL-McRBFN can also achieve

signicant results on the unseen data set. Finally, imaging biomarkers responsible for

AD are detected with PBL-McRBFN-RFE approach using OASIS data set. Imaging

biomarkers responsible for AD were found for dierent age groups, and for both genders.

Parkinson's disease is the second widely reported neurodegenerative disease, next only

to AD. Hence, in the next chapter, diagnosis of Parkinson disease using PBL-McRBFN

classier is presented.

115
Chapter 8
Parkinson's Disease Diagnosis using
PBL-McRBFN Classier

In this chapter we use PBL-McRBFN classier for the diagnosis of Parkinson's disease

based on microarray gene expression, MRI scans, vocal and gait features.

Parkinson's Disease (PD) is characterized by progressive degeneration of dopaminer-

gic neurons in the pars compacta of the substantia nigra [119]. Most important symptoms

of PD include muscle rigidity, tremors, and changes in speech and gait [119, 120]. PD

is more common in elderly people over the age of 50, which has inuenced millions of

people worldwide. According to the global declaration for PD, 6.3 million people are

aected by this disease worldwide, and aect all races and cultures in 2013. Albeit sig-

nicant research advances have been made, including the recent identication of possible

genetic and environmental risk factors for PD, further research is required to illustrate

the underlying causes of PD and to discover ameliorated treatments. At present there

is no cure for PD and the diagnosis of PD is based on medical history and neurological

examination conducted by interviewing and observing the patient in person using the

disease rating scales. Unied Parkinson's Disease Rating Scale (UPDRS), Hoehn and

Yahr scale, Schwab and England Activities of Daily Living (ADL) scale, PDQ-39, PD

Non-motor symptoms (NMS) questionnaire, NMS survey are most commonly used PD

rating scales. The reliable diagnosis of PD using these scales is dicult, especially in its

early stages [121]. As the symptoms of PD are comorbid with other neurological diseases,

only 75% of clinical diagnoses of PD are conrmed to be idiopathic PD at autopsy. Thus,

automatic approaches based on machine learning techniques are needed to increase the

diagnosis accuracy and to assist physicians to make better decisions.

116
Chapter 8. Parkinson's Disease Diagnosis using PBL-McRBFN

The literature review on machine learning approaches for diagnosis of PD is presented

in the next section.

8.1 Literature Review on Parkinson's Disease


In literature, machine learning approaches for PD classication were undertaken by de-

tecting dysphonia and tremor symptoms. Nearly one third of PD patients exhibit a

group of vocal impairment symptoms, which is known as dysphonia [122]. Machine

learning approaches for PD classication by detecting dysphonia using acoustic measure-

ments are experimented with a data set created in [122], consisting of sustained vowel

phonation's from 31 people of whom 23 with PD. In [123], kernel-support vector ma-

chine is used for PD classication. Here, exhaustive search process is implemented to

select the best kernel width and penalty value. In [124], Support Vector Machine (SVM)

classier is used with features selection using maximum-relevance-minimum-redundancy

(mRMR) criterion. For mRMR, all the available samples are used in mutual information

computations. Also, the above two approaches uses all the available data samples to

optimize the SVM parameters which is inevitable when working with such a small data

set. In [125], four independent classication approaches (neural networks, data mining

neural, logistic regression and decision trees) are compared for diagnosis of PD. Among

the four approaches, neural networks classier (Multi-layer feed-forward neural network

with Levenberg-Marquardt algorithm) yields the best performance. The drawback in

the above neural network approach is random initialization of weights and heuristic de-

termination of number of hidden neurons, which aects the classication performance

signicantly. Also, it requires retraining when the training samples changes with time.

In [126], parallel neural networks approach is used for prediction of PD. The training time

and complexity of the parallel network approach do increase as the number of parallel

networks increases. In [127], adaptive neuro-fuzzy classier with linguistic hedges is used

for feature selection and classication. Linguistic hedges feature selection will require

the optimization search from huge set of theoretically possible encoding combinations for

each feature and hence computationally intensive. In [128], a fuzzy c-means clustering-

based feature weighting with k -NN classication approach is presented. The choice of

117
Chapter 8. Parkinson's Disease Diagnosis using PBL-McRBFN

number of clusters or selection of k-neighbors signicantly aects the performance. In

addition, increasing the number of training samples may further complicate the choice

of number of clusters.

PD patients exhibit large gait variability compared to the normal persons [129]. Gait

analysis is routinely used in clinical settings to assess these gait disorders. Gait analysis is

a locomotion study and typically consists measurements of spatial-temporal parameters

of the gait cycle, motion of joints and segments, force/moments, and electromyography

patterns of muscle activation. Machine learning approaches using the dierent gait fea-

tures have been reported in the literature for PD classication [130, 131, 121]. In [130],

image data is obtained from plantar pressure measurements of right foot during heel

to motion from 17 control and 21 PD patients, and SVM is applied to distinguish gait

pattern. Here, other important basic, kinetic and kinematic features are not used in gait

analysis. In [131], a data set from 20 control and 12 PD patients consists of basic spa-

tiotemporal, kinematic and kinetic gait features is used, and the ability of ANN and SVM

classiers is discussed. The above two studies on gait features use their own proprietary

data, with less number of subjects. In [121], data collected from sensors located under

the feet of 73 controls and 93 PD patients is used. In their approach, wavelet transform

has been employed to extract the relevant features and neural networks with weighted

fuzzy membership have been used to approximate the functional relationship between

the extracted features and class label.

Recent studies on gene expression analysis found that there is a profound change

in gene expression for individuals aected by PD [132]. These studies discovered that,

diagnosis of early stage PD using vocal and gait features is impossible because tremor

and slow movements develop in PD patients only after approximately 70 % of vulner-

able dopaminergic neurons in the substantia nigra have already died [132]. However,

machine learning approaches for PD classication based on gene analysis have been lim-

ited. Therefore, there is a need for devising a new machine learning approach for PD

classication based on gene analysis.

Over the past two decades, neuroimaging techniques such as Positron Emission To-

mography (PET) [133], Single-Photon Emission Computed Tomography (SPECT) [134],

Magnetic Resonance Imaging (MRI) [135] and Transcranial Brain Sonography (TCS)

118
Chapter 8. Parkinson's Disease Diagnosis using PBL-McRBFN

[136] have increasingly been employed to predict PD, to strengthen the neuropathological

mechanisms and compensatory responses underlying symptoms and treatment associated

complications, and to monitor disease progression [137]. MRI is far more widely available

than PET and SPECT and is most commonly used in clinical practice to dierentiate PD

from normal persons [135]. However, machine learning approaches for PD classication

based on MRI scans have been limited. Therefore, there is a need for devising a new

machine learning approach for PD classication based on MRI scans.

8.2 Materials
We have considered four possible ways to diagnosis PD using PBL-McRBFN classier

(a) Prediction of PD from microarray gene expression features.

(b) Prediction of PD from MRI scans.

(c) Detection of PD from vocal features.

(d) Detection of PD from gait features.

8.2.1 Microarray Gene Expression Data Set


In this study, the normalized microarray gene expression data is obtained from ParkDB

database [138] under the accession number E-GEOD-6613. ParkDB database is the rst

queryable database dedicated to gene expression in PD. ParkDB database contains a

complete set of re-analyzed, curated and annotated microarray data sets. The considered

data set is obtained by transcriptional proling from the RNA extracted from whole

blood of 50 PD patients at early stage and 22 controls [132]. The extracted 22283

oligonucleotide probe sets (short section of gene) on microarrays is used to analyze the

dierence in expression of the genes between PD patients and control. Robust Multi-array

Analysis (RMA) method in Limma package [139] is used to normalize and summarize

the probe intensity measurements. Thus, the complete gene expression data set contains

72 subjects with 22283 genes expression information.

119
Chapter 8. Parkinson's Disease Diagnosis using PBL-McRBFN

8.2.2 MRI Data Set


MRI data is obtained from the Parkinson's Progression Markers Initiative (PPMI) data

set (www.ppmi-info.org/data). Standard 1.5T baseline 3D volumetric T1-weighted brain

MR images were selected from the PPMI data set. We have considered 239 persons (112

normal persons and 127 PD patients) available in the data set as of April 2012. Among

239 persons, 31 normal persons and 34 PD patient's MR images were excluded due to

failure of the segmentation method.

Whole brain T1-weighted, 3D MPRAGE MR images were acquired using at least 1.5

Tesla scanner with repetition time between 5-11 ms and echo delay time between 2-6 ms.

The acquired volumes have slice thickness ranging from 1-1.5 mm and voxel dimensions

of 1.0 mm × 1.0 mm × 1.20 mm. The detailed information of the MRI protocols and

preprocessing steps are presented in [140]. The demographics for the data used in our

study is shown in Table 8.1.

Table 8.1: Demographic information of PPMI MRI data used in our study
Group Normal Persons PD Patients

No. of persons 112 127


Sex(M/F) 64/48 86/41
Age(mean±std) 58.35±11.31 61.83±9.67

8.2.3 Vocal Data Set


Vocal data set is obtained from voice recording originally done at University of Oxford by

Max Little [122] has been used to PD classication by detecting dysphonia. The recording

consists of 195 entries collected from 31 people of whom 23 are suering from PD. From

the 195 samples, 147 are of PD patients and 48 controls. Averages of six phonations

were recorded from each subject, ranging from 1 to 36 sec in length. The 22 attributes

used in this prediction task can be broadly classied into jitter (variation in fundamental

frequency), shimmer (variation in amplitude), harmonic/noise ratio (amplitude of noise

relative to tonal components in the speech), fundamental frequency, descriptive statistics

and correlation factors (non-linear measures) [122].

120
Chapter 8. Parkinson's Disease Diagnosis using PBL-McRBFN

8.2.4 Gait Data Set


Gait data set provided by PhysioBank [141] has been used to discriminate Parkinson gait

and normal gait. This data set consists 166 samples, contains measures of gait from 93

PD patients and 73 controls. The data set collected by 8 sensors underneath each foot,

includes the vertical ground reaction force records of subjects as they walked at their

usual, self-selected pace for approximately 2 minutes on level ground. The 10 attributes

used in this prediction task are left swing interval (sec), right swing interval (sec), left

swing interval ( % of stride), right swing interval ( % of stride), double support interval

(sec), double support interval ( % of stride), left stride variability, right stride variability,

cadence and speed. The 4 swing interval measures and 2 double support interval measures

are ranked the top attributes with maximum relevance and less redundancy for gait

analysis [142].

8.3 Early Diagnosis of Parkinson's Disease Based on


Gene Expression Features
In this section, we present the performance evaluation of PBL-McRBFN on PD classi-

cation using microarray gene expression data set. The PBL-McRBFN classier perfor-

mance is evaluated using gene expression features in two scenarios as shown in Fig. 8.1.

In rst scenario, its performance is evaluated on ICA reduced features from complete

22283 genes as shown in Fig. 8.1(a). Next, its performance is evaluated on ICA reduced

features from selected 1594/412 genes with signicance levels p < 0.05/0.01 as shown in

Fig. 8.1(b). We have conducted 10 random trials of experiments for every ICA reduced

feature set. In each trial, randomly 75 % of total samples are selected for training and

25% for testing. The classication performance of PBL-McRBFN is compared with the

standard SVM classier.

8.3.1 p-value Based Gene Selection


The complete gene expression data set consists of large number of redundant genes, which

aects the classier performance on PD prediction. Hence, we select the most informative

genes based on p -value selection from ParkDB database. When less constraints are

121
Chapter 8. Parkinson's Disease Diagnosis using PBL-McRBFN

Pre−processing
Complete gene 22283 genes 10/25/50 features PD/
Expression ICA PBL−McRBFN
Non PD
Data set Classifier

(a)

Pre−processing
Selected genes
Complete gene p < 0.05/0.01 10/25/50 features PBL−McRBFN PD/
Expression ICA Non PD
Classifier
Data set 1594/412 genes
(b)

Figure 8.1: PBL-McRBFN classier on ICA reduced features from: (a) Complete genes,
(b) Selected genes

incorporated, with gene fold change greater than 1.5 (on a binary logarithmic scale)

and p-value less than 0.05, 1594 genes are extracted. When more stringent constraints

are incorporated, with the same fold change (1.5) and increased level signicance level

(p-value less than 0.01), 412 genes are extracted. The above two sets of selected genes

expression features of the same 72 subjects are considered as selected gene expression

data sets.

However, as the feature space in complete gene expression data set and selected gene

expression data sets is high compared to the number of samples, it will be dicult

to predict PD accurately. Hence, the obtained complete and selected genes expression

features are further reduced statistically by ICA [107].

8.3.2 ICA Based Feature Reduction


The basic goal of independent component analysis is to nd a transformation in which

the components of transformed data are statistically as independent from each other

as possible. ICA can be applied for blind source separation, exploratory data analysis

and feature extraction. Feature extraction using the ICA is a promising application.

The extracted feature vectors from the ICA analysis are as independent from each other

as possible, i.e. the extracted features do not contain mutual information about other

features.

In our study, we employ FastICA (xed-point) algorithm [107], the FastICA package

for MATLAB [108] is used to reduce complete and selected genes expression features to

combinations of 10, 25, 50 features.

122
Chapter 8. Parkinson's Disease Diagnosis using PBL-McRBFN

The mean classication performance measures (average/overall) testing eciencies

and F -score obtained from 10 random trials of experiments with PBL-McRBFN and

SVM using ICA reduced feature data sets from complete and selected gene expression

data sets are presented in Tables 8.2-8.4. From the Tables 8.2-8.4, it is evident that PBL-

McRBFN generalization performance is better than SVM classier on both complete and

selected gene expression data sets.

8.3.3 Performance of PBL-McRBFN on ICA Reduced Features


from Complete Genes

Table 8.2: Performance comparison on complete gene expression data set from an average
of 10 trials
Features # Features Algorithm Testing Accuracy
Type Overall (ηo %) Average (ηa %) F -score
mean (std) mean (std) mean (std)

ICA 10 SVM 72.78 (7.15) 66.87 (7.84) 0.8053 (0.0656)


Reduced PBL-McRBFN 85.55 (4.68) 84.94 (3.66) 0.8896 (0.0424)
ICA 25 SVM 72.78 (7.61) 67.99 (9.39) 0.8037 (0.0620)
Reduced PBL-McRBFN 83.89 (4.86) 83.75 (5.55) 0.8763 (0.0438)
ICA 50 SVM 71.67 (8.05) 60.94 (6.50) 0.8166 (0.0498)
Reduced PBL-McRBFN 84.44 (5.73) 83.58 (3.41) 0.8802 (0.0536)
Complete 22283 SVM 86.67 (7.03) 72.03 (9.99) 0.9201 (0.0439)
PBL-McRBFN 88.89 (5.24) 88.53 (6.21) 0.9226 (0.0393)

On complete gene expression data set, PBL-McRBFN classier achieves better gener-

alization performance with 10 ICA reduced features than 25, 50 features as shown in the

Table 8.2. From the Table 8.2, we can see that on 10 ICA reduced features ηa of PBL-

McRBFN is 8 % more than SVM with better F -score value. Similarly on 25, 50 features,

ηa of PBL-McRBFN is more than SVM classier. The ηa of PBL-McRBFN is reduced

by 1% on 25 and 50 features compared to 10 features, this is due to the redundancy

of ICA features. On 25, 50 features, SVM classier performance is reduced signicantly

than PBL-McRBFN classier. We can also see that the PBL-McRBFN performance on

original 22283 features is more than ICA reduced features and 2 % more when compared

to SVM performance.

123
Chapter 8. Parkinson's Disease Diagnosis using PBL-McRBFN

8.3.4 Performance of PBL-McRBFN on ICA Reduced Features


from Statistically Selected Genes
On selected gene expression data set with p-value < 0.05, PBL-McRBFN classier

achieves better generalization performance with 10 ICA reduced features than 25, 50

features as shown in Table 8.3. From the Table 8.3, we can see that on 10 ICA reduced

features, ηa of PBL-McRBFN is 7 % more than SVM with better F -score value. The bet-

ter performance of both classiers on the selected gene expression data set with p-value
< 0.05 can be observed when compared to the performance on complete gene expression

data set. The ηa of PBL-McRBFN on the selected gene expression data set with p-value
< 0.05 is 13% more than complete gene expression data set, this is due to presence of

more redundant genes information relative to PD in the complete gene expression data

set.

Table 8.3: Performance comparison on selected gene expression data set with p-value <
0.05 from an average of 10 trials
Features # Features Algorithm Testing Accuracy
Type Overall (ηo ) Average (ηa ) F -score
mean (std) mean (std) mean (std)

ICA 10 SVM 86.67 (5.37) 84.17 (7.53) 0.9047 (0.0388)


Reduced PBL-McRBFN 96.67 (4.68) 97.17 (4.32) 0.9769 (0.0318)
ICA 25 SVM 83.39 (5.11) 80.42 (7.09) 0.8862 (0.0494)
Reduced PBL-McRBFN 88.33 (7.14) 89.25 (7.26) 0.9097 (0.0596)
ICA 50 SVM 78.33 (8.47) 71.44 (10.26) 0.8499 (0.0619)
Reduced PBL-McRBFN 84.99 (6.95) 85.51 (5.00) 0.8820 (0.0665)
Complete 1594 SVM 95.55 (3.51) 96.27 (3.17) 0.9687 (0.0258)
PBL-McRBFN 100 (0) 100 (0) 1 (0)

On selected gene expression data set with p-value < 0.01, PBL-McRBFN classier

achieves better generalization performance with 10 ICA reduced features than 25, 50

features as shown in Table 8.4. From Table 8.4, we can see that on 10 ICA reduced

features data set ηa of PBL-McRBFN is 30 % more than SVM with better F -score value.
The minor reduction in performance of both classiers on the selected gene expression

data set with p-value < 0.01 can be observed when compared to the performance on

selected gene expression data set with p-value < 0.05 and better when compare to the

performance on the complete gene expression data set. The ηa of PBL-McRBFN on the

124
Chapter 8. Parkinson's Disease Diagnosis using PBL-McRBFN

selected gene expression data set with p-value < 0.01 is 1 % less than the selected gene

expression data set with p-value < 0.05, this is due to absence of few informative genes

relative to PD in the selected gene expression data set with p-value < 0.01.

Table 8.4: Performance comparison on selected gene expression data set with p-value <
0.01 from an average of 10 trials
Features # Features Algorithm Testing Accuracy
Type Overall (ηo ) Average (ηa ) F -score
mean (std) mean (std) mean (std)

ICA 10 SVM 72.78 (7.15) 66.87 (7.84) 0.8974 (0.0660)


Reduced PBL-McRBFN 95.55 (3.51) 96.02 (3.32) 0.9676 (0.0260)
ICA 25 SVM 90.00 (6.83) 86.43 (10.27) 0.9312 (0.0454)
Reduced PBL-McRBFN 94.44 (5.23) 94.64 (6.24) 0.9582 (0.0391)
ICA 50 SVM 73.33 (5.74) 68.03 (10.17) 0.8120 (0.0381)
Reduced PBL-McRBFN 83.89 (4.09) 85.28 (6.33) 0.8762 (0.0375)
Complete 412 SVM 96.67 (3.88) 97.19 (3.25) 0.9753 (0.0292)
PBL-McRBFN 100 (0) 100 (0) 1 (0)

From the Tables (8.2 to 8.4), we can see that the PBL-McRBFN classier achieves best

performance on the 10 ICA features data set obtained from selected gene expression data

set with p-value < 0.05. The ηa of PBL-McRBFN classier on the 10 ICA features data

set obtained from selected gene expression data set with p-value < 0.05 is 97.17 % which

is 1% more than the ηa of PBL-McRBFN classier on the selected gene expression data

set with p-value < 0.05 without ICA feature reduction (96.87 %). The PBL-McRBFN

classier performance on all the three gene expression data sets without ICA feature

reduction is same. Thus, we can observe that the changes in the performance of PBL-

McRBFN classier on the three gene expression data sets with ICA reduced features is

due to bad performance of ICA. We can also see that the PBL-McRBFN performance

on original 1594 and 412 features is more than ICA reduced features and better when

compared to SVM performance. PBL-McRBFN accurately classies PD with original

1594 and 412 statistically selected gene expression features.

125
Chapter 8. Parkinson's Disease Diagnosis using PBL-McRBFN

8.4 Early Diagnosis of Parkinson's Disease Based on


MRI features
In this section, we present the performance evaluation of PBL-McRBFN on PD classi-

cation using MRI data set. First time in the literature, we conducted PD prediction

study using MRI scans. Hence, no study available in the literature on PD prediction

based on MRI scans. We also propose PBL-McRBFN-RFE approach to identify imaging

biomarkers (critical brain regions) responsible for PD. In PBL-McRBFN-RFE approach,

PBL-McRBFN is used for PD classication and RFE method is used for feature selection.

RFE uses the training algorithm (PBL-McRBFN) recursively to eliminate irrelevant fea-

tures one at a time. RFE seeks to improve generalization performance by eliminating

the least important feature whose elimination will have the no eect on classication

performance.

8.4.1 VBM Based Feature Extraction


The VBM analysis is used in this study to identify the regional dierences in gray matter

between PD patients and normal persons, and to extract morphometric features from

MRI scans. VBM analysis used in this study is as described in section 7.2.2. The ow

diagram of the feature extraction process is shown in Fig. 8.2. For better understanding,

we show the maximum intensity projections of the signicant voxels in sagittal, coronal

and axial views in Fig. 8.3.

To locate the above regions with respect to the spatial locations in the brain, these

regions were overlaid on the sliced sections of the commonly used MNI brain template

and the results of the same are shown in Fig. 8.4. From Fig. 8.3 and Fig. 8.4, it is

inferred that there is a signicant gray matter volume dierences in the superior temporal

gyrus, middle temporal gyrus, parahippocampal gyrus, sub-gyral and insula regions of

the brain, which have also been highlighted in the medical literature [144].

The voxels locations of the VBM detected signicant regions are used as mask in

order to extract the features from all the segmented gray matter images. The feature

extraction process computes a vector with all the gray matter segmentation values for

the voxels locations included in each VBM identied region.

126
Chapter 8. Parkinson's Disease Diagnosis using PBL-McRBFN

Figure 8.2: Schematic diagram of the stages in feature extraction based on the VBM
analysis.

A total of 2981 features (gray matter tissue probability values) are extracted from

the VBM identied region and are then used as an input to the PBL-McRBFN classier.

However, as the feature space in VBM detected features set is high compared to the

number of samples, it will be dicult to predict PD accurately. Hence, the extracted

VBM features are further reduced statistically by ICA [107]. In our study, we employ

FastICA (xed-point) algorithm [107]. The FastICA package for MATLAB [108] is used

to reduce complete 2981 VBM detected features to combinations of 10 and 50 features.

The PBL-McRBFN classier performance is evaluated using VBM detected features

and ICA reduced features. We have conducted 10 random trials of experiments for VBM

feature set and for every ICA reduced feature set. In each trial, 75 % and 25% samples

are randomly chosen for training and testing respectively. The classication performance

of PBL-McRBFN is compared with the SVM classier [37].

8.4.2 Performance of PBL-McRBFN on VBM Features


The complete MRI data set consists of 239 samples with 2981 morphometric features.

The mean, standard deviation of training/testing accuracy, sensitivity and specicity

obtained during the 10 random trials for the PBL-McRBFN and the SVM classiers on

127
Chapter 8. Parkinson's Disease Diagnosis using PBL-McRBFN

Figure 8.3: Maximum intensity projections from PPMI MRI data set - Normal persons
vs. PD patients (a) sagittal view (b) coronal view (c) axial view

2981 VBM features set are presented in Table 8.5. In each trial, randomly 75 % of total

samples are selected for training and 25 % for testing. From Table 8.5, we can see that

testing accuracy of PBL-McRBFN is 3 % more than SVM with better sensitivity and

specicity values. Thus, PBL-McRBFN classier performs an ecient classication of

the VBM morphometric features from MRI scans for prediction of PD.

Table 8.5: Performance comparison on 2981 VBM features data set from an average of
10 trials

Algorithm Training Testing


Accu- Sensit- Specif- Accu- Sensit- Specif-
racy(%) ivity(%) icity(%) racy(%) ivity(%) icity(%)
mean (std) mean (std) mean (std) mean (std) mean (std) mean (std)

SVM 96.40 (2.18) 98.52 (1.55) 94.00 (3.61) 79.06 (3.63) 83.04 (6.30) 74.5 (9.55)
PBL-McRBFN 93.04 (2.45) 95.88 (2.92) 89.83 (6.20) 82.32 (2.50) 83.47 (6.41) 81.00 (4.59)

128
Chapter 8. Parkinson's Disease Diagnosis using PBL-McRBFN

Figure 8.4: Gray matter volume change from PPMI MRI data set - Normal persons vs.
PD patients (a) sagittal view (b) coronal view (c) axial view

8.4.3 Performance of PBL-McRBFN on Reduced features


The feature reduction using ICA is performed on the morphometric features obtained

from VBM analysis. A total 2981 morphometric features extracted from the VBM were

reduced to dierent combinations of 10 and 50 features using FastICA [108]. The mean,

standard deviation of training/testing accuracy, sensitivity and specicity obtained dur-

ing the 10 random trials for the PBL-McRBFN and the SVM classiers on dierent ICA

reduced features sets are presented in Table 8.6. In each trial, randomly 75 % of total

samples are selected for training and 25 % for testing. From Table 8.6, we can see that

the PBL-McRBFN classier achieves better generalization performance with 10 ICA re-

duced features than 50 ICA reduced features as shown in Table 8.6. On 10 ICA reduced

features, testing accuracy of PBL-McRBFN is 4 % more than SVM with better sensitivity

and specicity values. Similarly on 50 ICA reduced features, testing accuracy of PBL-

McRBFN is 5 % more than SVM classier with better sensitivity and specicity values.

129
Chapter 8. Parkinson's Disease Diagnosis using PBL-McRBFN

The testing accuracy of PBL-McRBFN is reduced by 3 % on 50 features compared to 10

features, this is due to the redundancy of ICA features.

Table 8.6: Performance comparison on ICA reduced features data sets from an average
of 10 trials

# ICA Algorithm Training Performance Testing Performance


Features Accu- Sensit- Specif- Accu- Sensit- Specif-
racy(%) ivity(%) icity(%) racy(%) ivity(%) icity(%)
mean (std) mean (std) mean (std) mean (std) mean (std) mean (std)

10 SVM 94.76 (2.52) 97.94 (1.02) 91.16 (4.97) 67.44 (3.46) 75.65 (8.50) 58.00 (10.85)
PBL-McRBFN 93.28 (2.18) 94.55 (3.18) 91.83 (4.04) 71.39 (4.25) 63.91 (12.64) 80.00 (9.71)
50 SVM 98.35 (2.88) 98.52 (2.77) 98.16 (3.08) 63.25 (3.06) 67.82 (8.24) 58.00 (8.88)
PBL-McRBFN 92.50 (9.69) 92.79 (9.59) 92.16 (1.27) 68.83 (2.94) 68.26 (7.68) 69.50 (10.12)

From Tables 8.5 and 8.6, it is observed that PBL-McRBFN classier gives better

results with 2981 morphometric features extracted from VBM analysis than reduced

features from VBM and ICA analysis. PBL-McRBFN classication accuracy on VBM

features set is 82.32 % which is 11% more than the classication accuracy on ICA reduced

10 features set, this is because the considered binary classication problem consists of

small number MRI volumes with high dimensional VBM features of PD patients and

normal persons from both genders with dierent ages.

8.4.4 Identication of Imaging Biomarkers for PD


In this section, we present the identication of imaging biomarkers responsible for PD

using PPMI MRI data set. In the previous section, PBL-McRBFN classier performance

has been evaluated on features obtained from VBM analysis and further reduced ICA

features. VBM involves voxel-wise statistical analysis of MRI volumes and infers regions

in which brain volume diers between PD and normal persons. All the inferred regions

from VBM may not be useful to predict PD and further reduced ICA features do not

provide any information relevant to the critical brain regions relevant to PD. In this

study, we have conducted an analysis to identify most signicant brain regions (imaging

biomarkers) for PD. Identication of imaging biomarkers for PD can be considered a

general feature selection problem. Feature selection techniques attempt to remove as

130
Chapter 8. Parkinson's Disease Diagnosis using PBL-McRBFN

many irrelevant and redundant features as possible and to nd a feature subset such

that, with reduced low dimensional data, a machine learning classier can achieve better

performance. Filter and wrapper methods are two kinds of well-known feature selection

techniques for high dimensional data [113]. In the lter method, features are selected

on the basis of feature separability of training samples, which is independent of the

learning algorithm. The separability only takes into account the correlations between

the features, so the selected features may not be optimal. Wrapper methods search for

critical features based on the learning algorithm, and often result in better results than

lter methods. RFE is a computationally less intensive wrapper based feature selection

method. In this study, we used RFE feature selection utilizing a PBL-McRBFN classier.

RFE utilizing PBL-McRBFN is referred as PBL-McRBFN-RFE. PBL-McRBFN-RFE

conducts feature selection in a sequential elimination manner, which starts with all the

features and discards one feature at a time.

Table 8.7: VBM detected and PBL-McRBFN-RFE selected regions responsible for PD

Feature # Features Identied Regions


Type

VBM 2981 Superior temporal gyrus, Middle temporal gyrus,


Parahippocampal gyrus, Sub-gyral, Insula
VBM + 19 Superior temporal gyrus
PBL-McRBFN-RFE

VBM analysis detected a total of 2981 features. The corresponding brain regions to

the 2981 VBM detected features are reported in Table 8.7. The mean testing performance

of PBL-McRBFN on 2981 VBM features from 10 random trials is 82.32 % as shown in

Table 8.8. To identify most signicant brain regions responsible for PD, the minimal set

of features are found using PBL-McRBFN-RFE. After performing feature selection using

PBL-McRBFN-RFE, 19 features among 2981 were selected. The corresponding brain

regions to the 19 PBL-McRBFN-RFE selected features are reported in Table 8.7. MNI

templates of the complete 2981 and 19 features regions are shown in Fig. 8.5. The mean

testing performance of PBL-McRBFN on this selected 19 features from 10 random trials

131
Chapter 8. Parkinson's Disease Diagnosis using PBL-McRBFN

is 87.21% as shown in Table 8.8. From Table 8.8, we can see that the selected 19 features

from PBL-McRBFN-RFE approach give better prediction rate than 2981 features.

Table 8.8: PBL-McRBFN classier performance on VBM detected and PBL-McRBFN-


RFE selected features from an average of 10 trials

# Features Training Testing


Accu- Sensit- Specif- Accu- Sensit- Specif-
racy(%) ivity(%) icity(%) racy(%) ivity(%) icity(%)
mean (std) mean (std) mean (std) mean (std) mean (std) mean (std)

2981 (VBM) 93.04 (2.45) 95.88 (2.92) 89.83 (6.20) 82.32 (2.50) 83.47 (6.41) 81.00 (4.59)
19 (VBM+ 92.50 (9.69) 92.79 (9.59) 92.17 (12.77) 87.21 (3.67) 87.39 (4.32) 87.00 (10.32)
PBL-McRBFN-RFE)

(a) VBM Detected 2981 voxels (b) PBL-McRBFN-RFE Selected 19 voxels

Figure 8.5: Comparison of gray matter volume change - Normal persons vs. PD patients
in Superior temporal gyrus region

We found PBL-McRBFN-RFE selected 19 voxels are located at the superior temporal

gyrus brain region. The superior temporal gyrus is one of three (sometimes two) gyri

in the temporal lobe of the human brain. The superior temporal gyrus is involved in

auditory processing, including language and social cognition. Superior temporal gyrus is

consistently reported in medical research studies as the biomarker of PD [137, 143, 144].

Hence, the brain region detected by PBL-McRBFN-RFE among VBM features from MRI

volumes may play more signicant role than others in PD.

132
Chapter 8. Parkinson's Disease Diagnosis using PBL-McRBFN

Table 8.9: Performance comparison on vocal data set from an average of 10 trials
Algorithm Testing Accuracy
Overall (ηo ) Average (ηa ) F -score
SVM 96.94 (2.40) 96.67 (2.16) 0.9221 (0.0286)
PBL-McRBFN 98.97 (1.07) 99.35 (0.68) 0.9934 (0.0069)

8.5 PD Diagnosis Based on Vocal Features


Research on PD has shown that approximately 90 % of patients exhibit some form of vocal

impairment [123]. PD patients display a constellation of vocal symptom that includes

impairment in the normal production of vocal sounds, which is dysphonia. The voice

of people with dysphonia will sound hoarse, strained or eortful. Telemonitoring of the

PD disease using measurements of dysphonia has a vital role in its early diagnosis as

the symptoms of PD occur gradually and mostly targeting the elderly people for whom

physical visits to the clinic are costly and inconvenient.

The mean, standard deviation of testing eciencies and F -score obtained during the
10 random trials with PBL-McRBFN and SVM classiers are presented in Table 8.9. In

each trial, randomly 75 % of total samples are selected for training and 25 % for testing.

From the Table 8.9, we can see that on vocal data set, ηa of PBL-McRBFN is 3 % more

than SVM classier with better F -score value. The reported results in the literature

on the same vocal data is given in Table 8.10. From the Table 8.10, it is evident that

PBL-McRBFN classier generalization performance is also better than reported results

in the literature on the same vocal data set. On 50-50 % train-test combination, the best

PBL-McRBFN generalization performance is 98.63 % which is approximately 1 % more

than k -NN approach with fuzzy c-means clustering (97.93 %) [128]. Thus, PBL-McRBFN

classier performs an ecient classication of the vocal features for prediction of PD.

8.6 PD Diagnosis Based on Gait Features


PD greatly inuences the patients gait by reducing speed, stride length and total range of

movement during walking. Gait analysis is a systematic study of human motion from the

measurements of spatial-temporal parameters of the gait cycle, motion of joints and seg-

133
Chapter 8. Parkinson's Disease Diagnosis using PBL-McRBFN

Table 8.10: PBL-McRBFN classier performance comparison with studies in the litera-
ture on vocal data set
Study Algorithm Testing
Accuracy

M. A. Little et. al[123] SVM, 91.4 (4.4)


50 trials with bootstrap
Shahbaba et. al[145] Dirichlet process multinomial 87.7 (3.3)
logit, 5-fold cross validation
Psorakis et. al[146] non-sparse Expectation- 89.5 (6.6)
Maximization, 10-fold
cross validation
Sakar et. al[124] SVM, 92.8 (1.2)
50 trials with bootstrap
Pei-Fang Guo et. al[147] Genetic Programming 93.1 (2.9)
and Expectation Maximization,
10-fold cross validation
Resul Das et. al[125] Neural network, 92.9
65-35% train-test data,
Mehmet et. al[127] Adaptive Neuro-Fuzzy 94.72
classier with linguistic hedges,
10 random trials,
50-50% train-test data
F. Strom et. al[126] 9 Parallel neural networks, 91.2 (1.6)
30 random trials,
60-40% train-test data
PBL-McRBFN PBL-McRBFN, 96.83 (0.97)
Approach 10 random trials,
50-50% train-test data
PBL-McRBFN PBL-McRBFN, 97.67 (1.31)
Approach 10 random trials,
60-40% train-test data
PBL-McRBFN PBL-McRBFN, 99.35 (0.68)
Approach 10 random trials,
75-25% train-test data

134
Chapter 8. Parkinson's Disease Diagnosis using PBL-McRBFN

Table 8.11: Performance comparison on gait data set from an average of 10 trials
Algorithm Testing Accuracy
Overall (ηo ) Average (ηa ) F -score
SVM 77.56 (3.78) 77.37 (3.68) 0.7803 (0.0508)
PBL-McRBFN 83.90 (2.86) 84.36 (2.42) 0.8519 (0.0340)

Table 8.12: PBL-McRBFN classier performance comparison with studies in the litera-
ture using gait patterns
Study Algorithm Testing
Accuracy

Sang-Hong Lee et. al[121] Neural network with weighted 77.33


93 PD patients, fuzzy membership functions,
73 control Single trial,
50-50% train-test data
PBL-McRBFN approach PBL-McRBFN, 82.52
93 PD patients, Single trial,
73 control 50-50% train-test data
PBL-McRBFN approach PBL-McRBFN, 84.36 (2.42)
93 PD patients, 10 random trials
73 control 75-25% train-test data

ments, force/moments, and electromyography patterns of muscle activation. Application

of gait analysis in the assessment of PD is an important application.

The mean, standard deviation of testing eciencies and F -score obtained during the
10 random trials for PBL-McRBFN and SVM classiers are presented in Table 8.11. In

each trial, randomly 75 % of total samples are selected for training and 25 % for testing.

From the Table 8.11, we can see that ηa of PBL-McRBFN is 7 % more than SVM with

better F -score value. The reported result in the literature on the same gait features

data is given in Table 8.12. From Table 8.12, we can see that on the same gait data

set with 50-50 % train-test combination, PBL-McRBFN generalization performance is

approximately 5 % more than neural network approach with weighted fuzzy member-

ship functions on wavelet based feature extraction [121]. Thus, PBL-McRBFN classier

performs an ecient classication of the gait features for prediction of PD.

135
Chapter 8. Parkinson's Disease Diagnosis using PBL-McRBFN

8.7 Summary
In this chapter, PD diagnosis problem was solved by employing PBL-McRBFN classier.

The early diagnosis of PD based on microarray gene expression and MRI features, and

detection of PD based on vocal and gait features are presented. The quantitative com-

parison with SVM classier and existing literature results clearly indicates the superior

performance of the proposed PBL-McRBFN classier for prediction of individuals with

or without PD. Imaging biomarkers responsible for PD are detected with PBL-McRBFN-

RFE approach using PPMI MRI data set. Identication of genes responsible for PD using

feature selection techniques will help doctor's to track the development of the disease.

This will be undertaken as a future study.

In the next chapter, we shall summarize the work done in this thesis and conclude it

by giving plans for future directions.

136
Chapter 9
Conclusions and Future Works

9.1 Conclusions
This thesis focuses on the development and application of meta-cognitive sequential learn-

ing algorithms in radial basis function network for classication problems with fewer

samples, high dimensional feature set, and high sample imbalance. First time in the lit-

erature, human meta-cognition principles are integrated in radial basis function network.

Human like self-regulated learning helps to radial basis function network in achieving

better generalization by proper selection of samples and strategies in learning. In the-

sis, two such meta-cognitive algorithms EKF-McRBFN and PBL-McRBFN developed to

handle classication problems. One of the developed meta-cognitive learning algorithms

PBL-McRBFN is applied to the early diagnosis of neurological diseases - Alzheimer's

disease and Parkinson's disease. To summarize, the major contributions of this thesis

are:

(a) Development of an Extended Kalman Filter based Meta-cognitive Radial Basis Func-

tion Network (EKF-McRBFN) classier.

(b) Development of a Projection Based Learning algorithm for a Meta-cognitive Radial

Basis Function Network (PBL-McRBFN) classier.

(c) Application of PBL-McRBFN classier to the early diagnosis of Alzheimer's disease

based on MRI scans and development of PBL-McRBFN-RFE feature selection ap-

proach for imaging biomarkers detection of Alzheimer's disease based on MRI scans.

137
Chapter 9. Conclusion

(d) Application of PBL-McRBFN to the early diagnosis of Parkinson's disease based

on micro-array gene expression, MRI scans, gait and vocal features and application

of PBL-McRBFN-RFE approach for imaging biomarkers detection of Parkinson's

disease based on MRI scans.

The major conclusions from the above studies are:

(1) EKF-McRBFN Classier :


McRBFN classier has been developed based on principles of human meta-cognition

and its sequential learning algorithm has been derived using Extended Kalman Filter

(EKF). The McRBFN using EKF is referred to as, `EKF-McRBFN'. The McRBFN

has a cognitive and a meta-cognitive component that monitors and controls the

learning ability of the cognitive component. A radial basis function with the Gaus-

sian activation function is the cognitive component, and a self-regulatory learning

mechanism is its meta-cognitive component. The cognitive component begins with

zero hidden neurons and adds a neuron or updates an existing neuron based on the

learning strategy chosen by the meta-cognitive component for every sample in the

training data set. Thus meta-cognitive component controls the cognitive component

by deciding what-to-learn, when-to-learn and how-to-learn. It realizes what-to-learn


by deleting samples that contain knowledge similar to that already learnt by the

network. It decides how-to-learn by choosing the sample to either add a neuron

or by updating an existing neuron. While a sample is used to add a new neuron,

the parameters of the new neuron are initialized based on the sample overlapping

conditions. When a sample is used to update an existing neuron, its parameters

are updated using the EKF. The meta-cognitive component decides when-to-learn
by reserving the sample for future use. Performance study of EKF-McRBFN on

benchmark classication problems shows superior performance of EKF-McRBFN, in

comparison to existing learning algorithms.

(2) PBL-McRBFN Classier :


Here, a PBL algorithm has been developed for a McRBFN. The McRBFN using

138
Chapter 9. Conclusion

the PBL algorithm is referred to as, `PBL-McRBFN'. Similar to EKF-McRBFN, the

PBL-McRBFN also has a cognitive and meta-cognitive component. An RBF network

with a Gaussian activation function is the cognitive component of McRBFN and a

self-regulatory learning mechanism is its meta-cognitive component. However, when

a new neuron is added to the network, the projection based learning algorithm of

PBL-McRBFN initializes the input parameters based on the distance criterion, and

computes the optimum output weights by minimizing a hinge-loss error function. The

problem of minimizing the error function is solved as a linear programming problem

and the output weights are obtained by solving a set of simultaneous linear equations.

Thus, while adding a new neuron, the existing neurons are used as pseudo-samples

to represent knowledge of the past samples. Thus, the PBL-McRBFN explicitly uses

knowledge in the past samples and a computationally ecient algorithm to map

the input-output relationship dened by the training data set. Performance study

of PBL-McRBFN on benchmark classication problems shows superior performance

of PBL-McRBFN, in comparison to EKF-McRBFN and existing learning algorithms.

(3) Early Diagnosis of Alzheimer's Disease using PBL-McRBFN based on


MRI scans :
The developed PBL-McRBFN classier is used for automatic, early diagnosis of AD

from MRI scans using the OASIS [101] and ADNI [102] data sets. Study results show

that PBL-McRBFN classier accurately distinguishes AD patients from normal sub-

jects in both the OASIS and ADNI data sets. It is observed from the results that the

AD diagnosis using complete voxel-based morphometric features provide better per-

formance than ICA reduced features, as the data sets contain very mild AD patients

and fewer samples. Thus, from the studies conducted on OASIS and ADNI data

sets, we can infer that human meta-cognitive principles in machine learning algo-

rithm improve the classication performance signicantly. Next, the generalization

ability of PBL-McRBFN is studied by training the PBL-McRBFN on OASIS data

set, and testing its performance on the ADNI data set. Performance results on this

generalization ability study shows that the PBL-McRBFN is capable of generaliza-

tion on unseen data set. Finally, PBL-McRBFN-RFE feature selection approach is

139
Chapter 9. Conclusion

proposed to identify imaging biomarkers for AD. Imaging biomarkers responsible for

AD are detected with the proposed PBL-McRBFN-RFE approach using OASIS data

set. The imaging biomarkers identied using PBL-McRBF-RFE approach are in the

parahippocampal gyrus, the hippocampus, the superior temporal gyrus, the insula,

the precentral gyrus and the extra-nuclear regions. These regions are also indicated

as biomarkers for AD in the medical literature [114, 115, 116].

Next, the PBL-McRBFN-RFE approach has also been used to identify imaging

biomarkers for AD from the OASIS gender-wise and age-wise analysis. The results

indicate the following:

In the 60-69 age group AD patients, gray matter atrophy is observed in the superior

temporal gyrus region which is responsible for processing sounds. In the 70-79 age

group AD patients, gray matter atrophy is observed in the parahippocampal gyrus

and the extra-nuclear regions which are responsible for memory encoding and re-

trieval. In the 80-89 age group AD patients, gray matter atrophy is observed in the

hippocampus, the parahippocampal gyrus and the lateral ventrical regions which are

responsible for short-term memory to long-term memory, spatial navigation, mem-

ory encoding and retrieval. In male AD patients, gray matter atrophy is observed

in the insula region which is responsible for emotion and consciousness. The in-

sula region is also reported in AD research studies [117, 118] and it is associated

with hypometabolism. In female AD patients, gray matter atrophy is observed in

the parahippocampal gyrus and the extra-nuclear regions which are responsible for

memory encoding and retrieval.

(4) Early Diagnosis of Parkinson's Disease using PBL-McRBFN :


PBL-McRBFN classier is used in early diagnosis of PD using micro-array gene

expression, MRI scans, gait and vocal features. Here, PBL-McRBFN classier is

used in diagnosis of PD in the following ways:

• Early detection of PD using PBL-McRBFN based on micro-array gene expres-

sion features data obtained from ParkDB database [138].

140
Chapter 9. Conclusion

• Early detection of PD using PBL-McRBFN based on MRI scans obtained from

the PPMI data set [140].

• Detection of PD using PBL-McRBFN based on vocal features data [122].

• Detection of PD using PBL-McRBFN based on gait features data [141].

In comparison with existing results in the literature the performance results from

the above studies on these data sets show that the PBL-McRBFN performs better

than existing results. Finally, imaging biomarkers responsible for PD are detected

with the proposed PBL-McRBFN-RFE feature selection approach using PPMI MRI

data set. PBL-McRBFN-RFE approach results shows that the superior temporal

gyrus brain region plays a more signicant role than others in detecting PD. The

superior temporal gyrus is involved in auditory processing, including language and

social cognition. Superior temporal gyrus is also consistently reported in medical

research studies as the biomarker of PD [137, 143, 144].

9.2 Future Works


9.2.1 Plan of Work for McRBFN
Monitoring Signals for McRBFN :
In this thesis, the meta-cognitive learning considers only the error, nearest neuron dis-

tance and the predicted class labels as the monitoring signals. However, in the literature

of human meta-cognition, the feel-of-knowing has been used as the monitoring signal.

The term feel-of-knowing (FOK) stated as, an individual may fail to recall an item from

memory but still feel that it would be recognized on a later test, a retrieval state, the

original FOK denition proposed by Hart [148] as a composite of two criteria:

• A feeling that the sought-after information is known.

• A feeling that the sought-after information can be correctly identied on a later

criterion test.

Hence, the addition of feel-of-knowing (FOK) measure using a combination of class-

wise signicance measure and reliability of the meta-cognitive component, as a monitoring

141
Chapter 9. Conclusion

signal for PBL-McRBFN will be undertaken as a future study.

Selection of Radial Basis Function in McRBFN :


In this thesis, Gaussian radial basis function is considered in McRBFN. However, the

Cauchy radial basis function is preferred in applications like image retrieval [149] and

computerized tomography [150], while inverse multi-quadratic radial basis function is

preferred in real-time signal processing application [151]. A q -Gaussian function pa-

rameterizes standard Gaussian distribution by replacing exponential expressions with

q -exponential expressions [152]. Thus, the modication of the q -parameter allows the

representation of dierent basis functions (Gaussian, Cauchy and inverse multi-quadratic

etc) and helps the q -Gaussian function to match the shape of the kernel and the distri-

bution of the distances better [152]. The q -parameter helps the q-Gaussian RBF to

reproduce dierent RBF's [152]. For example, when q → 1, the q -Gaussian converges

to a Gaussian RBF, while q→2 and q→3 converges to a Cauchy RBF, and inverted

multi-quadratic RBF, respectively. Thus, the q -Gaussian helps to realize dierent radial
basis functions for dierent values of the parameter q. Therefore, it is desirable to em-

ploy an activation function like the q-Gaussian function that helps to employ radial basis

functions with kernels, whose shape can be relaxed or contracted.

Feature Selection :
One important issue in the diagnosis of neurodegenerative diseases is the curse of di-

mensionality, where the data sets have fewer samples with huge dimensional feature set.

Hence, there is a need to select appropriate feature set for better performance, and to

select the best feature subset that are non-redundant and more relevant to the class

distributions. We plan to work along this direction for better performance results.

9.2.2 Applications
Alzheimer's Disease Diagnosis :
In the present work, PBL-McRBFN classier has been used to predict AD patients from

normal persons based on MRI scans with VBM features extracted from MRI scans and

PBL-McRBFN-RFE approach is used to AD imaging biomarkers detection based on

142
Chapter 9. Conclusion

MRI scans. Similar approach can be used to predict Mild Cognitive Impairment (MCI)

patients from normal persons. MCI is early stage of AD and it increases the risk of

developing AD. If one were able to successfully treat MCI such that the progression of

these individuals to AD could be delayed by one year, there would be a signicant saving.

Parkinson's Disease Diagnosis :


In the present work, PBL-McRBFN classier has been used to predict PD patients from

normal persons based on micro-array gene expression and MRI scans, and PD detection

based on gait and vocal features. Also, PBL-McRBFN-RFE approach is used to detection

imaging biomarkers for PD based on MRI scans. Similar PBL-McRBFN-RFE feature

selection approach can be used to detect biomarkers based on gene expression features.

143
Publications List

Journals
1. G. Sateesh Babu, S. Suresh and B. S. Mahanand, A novel PBL-McRBFN-RFE

approach for identication of critical brain regions responsible for Parkinson's dis-

ease, Expert Systems with Applications , vol. 41(2), pp:478-488, 2014.

2. G. Sateesh Babu and S. Suresh, Sequential Projection-Based Metacognitive Learn-

ing in a Radial Basis Function Network for Classication Problems, IEEE Trans-
actions on Neural Networks and Learning Systems , vol. 24(2), pp: 194-206, 2013.

3. G. Sateesh Babu and S. Suresh, Parkinson's Disease Prediction Using Gene Ex-

pression - A Projection Based Learning Meta-cognitive Neural Classier Approach,

Expert Systems with Applications , vol. 40(5), pp: 1519-1529, 2013.

4. G. Sateesh Babu and S. Suresh, Meta-cognitive RBF Network and Its Projection

Based Learning algorithm for classication problems, Applied Soft Computing , vol.
13(1), pp: 654-666, 2013.

5. G. Sateesh Babu and S. Suresh, Meta-cognitive Neural Network for classication

problems in a sequential learning framework, Neurocomputing, vol. 81, pp: 86-96,

2012.

Conference Proceedings
1. G. Sateesh Babu, S. Suresh and B. S. Mahanand, Meta-cognitive q-Gaussian

RBF Network for Binary Classication: Application to Mild Cognitive Impair-

ment (MCI), Intl. Joint Conf. Neural Networks (IJCNN) , Dallas (Texas, USA),

pp: 1-8, 2013.

144
Chapter 9. Conclusion

2. G. Sateesh Babu, S. Suresh and B. S. Mahanand, A Sequential Projection Based

Learning Meta-cognitive RBF Network Classier: Application to Alzheimer's dis-

ease detection, International Conference on Machine Learning (ICML): Workshop


on Machine Learning for Clinical Data Analysis , Edinburgh (Scotland) 2012.

3. G. Sateesh Babu, R. Savitha and S. Suresh, A Projection Based Learning in Meta-

cognitive Radial Basis Function Network for classication problems, Intl. Joint
Conf. Neural Networks (IJCNN) , Brisbane (Australia), pp: 2907-2914, 2012.

4. G. Sateesh Babu, S. Suresh and B. S. Mahanand, Alzheimer's disease detection

using a Projection Based Learning Meta-cognitive RBF Network, Intl. Joint Conf.
Neural Networks (IJCNN) , Brisbane (Australia), pp: 408-415, 2012.

5. G. Sateesh Babu, S. Suresh, K. Uma Sangumathi and H. J. Kim, A Projec-

tion Based Learning Meta-cognitive RBF Network Classier for eective diagno-

sis of Parkinson's disease, International Symposium on Neural Networks (ISNN) ,


Shenyang (China) Part II, LNCS 7368, pp: 611-620, Springer-Verlag Berlin Heidel-

berg, 2012.

145
Bibliography

[1] T. G. Dietterich, Machine learning for sequential data: A review, in Structural,


Syntactic, and Statistical Pattern Recognition , pp. 1530, New York: Springer Ver-

lag, 2002.

[2] A. Wenden, Learner strategies for learner autonomy . Great Britain: Prentice Hall,

1998.

[3] W. Rivers, Autonomy at all costs: An ethnography of meta-cognitive self assess-

ment and self management among experience language learners, Modern Language
Journal, vol. 85, no. 2, pp. 279  290, 2001.

[4] R. Isaacson and F. Fujita, Metacognitive knowledge monitoring and self-regulated

learning: Academic success and reections on learning, Journal of the Scholarship


of Teaching and Learning , vol. 6, no. 1, pp. 3955, 2006.

[5] J. H. Flavell, Meta-cognition and cognitive monitoring: A new area of cognitive-

developmental inquiry, American Psychologist , vol. 34, no. 10, pp. 906  911, 1979.

[6] D. P. Josyula, F. C. Hughes, H. Vadali, B. J. Donahue, F. Molla, M. Snowden,

J. Miles, A. Kamara, and C. Maduka, Meta-cognition for self-regulated learning

in a dynamic environment, in 2010 Fourth IEEE International Conference on Self-


Adaptive and Self-Organizing Systems Workshop (SASOW) , pp. 261  268, 2010.

[7] M. T. Cox, Meta-cognition in computation: A selected research review, Articial


Intelligence, vol. 169, no. 2, pp. 104  141, 2005.

[8] T. O. Nelson and L. Narens, Metamemory: A theoritical framework and new

ndings, Psychology of Learning and Motivation , vol. 26, pp. 125  173, 1990.

146
BIBLIOGRAPHY

[9] S. Suresh, K. Dong, and H. Kim, A sequential learning algorithm for self adap-

tive resource allocation network classier, Neurocomputing, vol. 73, no. 16  18,

pp. 3012  3019, 2010.

[10] S. Suresh, R. Savitha, and N. Sundararajan, A sequential learning algorithm for

complex valued self-regulatory resource allocation network, IEEE Transactions on


Neural Networks, vol. 22, no. 7, pp. 1061  1072, 2011.

[11] Alzheimer's disease facts and gures, 2013. http://www.alz.org/alzheimers_


disease_facts_and_figures.asp .

[12] Statistics on Parkinson's disease, 2013. http://www.pdf.org/en/parkinson_


statistics.

[13] G. C. Chiang, P. S. Insel, D. Tosun, N. Schu, D. Truran-Sacrey, S. Raptentsetsang,

C. R. Jack, M. W. Weiner, and F. the Alzheimer's Disease Neuroimaging Initia-

tive, Identifying Cognitively Healthy Elderly Individuals with Subsequent Memory

Decline by Using Automated MR Temporoparietal Volumes, Radiology, vol. 259,

no. 3, pp. 844851, 2011.

[14] R. Cuingnet, E. Gerardin, J. Tessieras, G. Auzias, S. Lehéricy, M.-O. Habert,

M. Chupin, H. Benali, and O. Colliot, Automatic classication of patients with

Alzheimer's disease from structural MRI: A comparison of ten methods using the

ADNI database, NeuroImage, vol. 56, no. 2, pp. 766  781, 2011.

[15] D. N. Ripich, S. A. Petrill, P. J. Whitehouse, and E. W. Ziol, Gender dierences in

language of AD patients: A longitudinal study, Neurology, vol. 45, no. 2, pp. 299
302, 1995.

[16] B. R. Ott, K. L. Lapane, and G. Gambassi, Gender dierences in the treatment of

behavior problems in Alzheimer's disease, Neurology, vol. 54, no. 2, pp. 427432,

2000.

[17] G. Sateesh Babu and S. Suresh, Meta-cognitive Neural Network for classication

problems in a sequential learning framework, Neurocomputing, vol. 81, pp. 86 

96, 2012.

147
BIBLIOGRAPHY

[18] G.-B. Huang, H. Zhou, X. Ding, and R. Zhang,  Extreme Learning Machine for

Regression and Multiclass Classication, IEEE Transactions on Systems, Man,


and Cybernetics, Part B: Cybernetics , vol. 42, no. 2, pp. 513 529, 2012.

[19] J. C. Platt, A resource allocation network for function interpolation, Neural Com-
putation, vol. 3, no. 2, pp. 213225, 1991.

[20] V. Kadirkamanathan and M. Niranjan, A Function Estimation Approach to Se-

quential Learning with Neural Networks, Neural Computation , vol. 5, no. 6,

pp. 954975, 1993.

[21] L. Yingwei, N. Sundararajan, and P. Saratchandran, A sequential learning scheme

for function approximation using minimal radial basis function neural networks,

Neural Computation , vol. 9, no. 2, pp. 461478, 1997.

[22] Y. Li, N. Sundararajan, and P. Saratchandran, Analysis of minimal radial basis

function network algorithm for real-time identication of nonlinear dynamic sys-

tems, IEE Proceedings: Control Theory and Applications , vol. 147, no. 4, pp. 476
484, 2000.

[23] G.-B. Huang, P. Saratchandran, and N. Sundararajan, An ecient sequential

learning algorithm for growing and pruning RBF (GAP-RBF) networks, IEEE
transactions on Systems, Man, and Cybernetics. Part B, Cybernetics , vol. 34, no. 6,
pp. 22842292, 2004.

[24] G.-B. Huang, P. Saratchandran, and N. Sundararajan, A generalized growing and

pruning RBF (GGAP-RBF) neural network for function approximation, IEEE


Transactions on Neural Networks , vol. 16, no. 1, pp. 5767, 2005.

[25] R. Zhang, G.-B. Huang, N. Sundararajan, and P. Saratchandran, Improved GAP-

RBF network for classication problems, Neurocomputing, vol. 70, no. 16-18,

pp. 3011  3018, 2007.

[26] G.-B. Huang, Q. Y. Zhu, and C. K. Siew, Extreme learning machine: A new learn-

ing scheme of feedforward neural networks, in IEEE International Joint Confer-


ence on Neural Networks. Proceedings , vol. 2, pp. 985990, 2004.

148
BIBLIOGRAPHY

[27] G.-B. Huang, D. Wang, and Y. Lana, Extreme Learning Machines: A Survey,

International Journal of Machine Leaning and Cybernetics , vol. 2, no. 2, pp. 107

122, 2011.

[28] N.-Y. Liang, G.-B. Huang, P. Saratchandran, and N. Sundararajan, A fast and

accurate online sequential learning algorithm for feedforward networks., IEEE


Transactions on Neural Networks , vol. 17, no. 6, pp. 14111423, 2006.

[29] S. Suresh, R. V. Babu, and H. J. Kim, No-reference Image Quality Assessment

using Modied Extreme Learning Machine Classier, Applied Soft Computing ,


vol. 9, no. 2, pp. 541552, 2009.

[30] G.-B. Huang, L. Chen, and C.-K. Siew, Universal approximation using incremental

constructive feedforward networks with random hidden nodes, IEEE Transactions


on Neural Networks,, vol. 17, no. 4, pp. 879892, 2006.

[31] G.-B. Huang and L. Chen, Convex incremental extreme learning machine, Neu-
rocomputing, vol. 70, no. 1618, pp. 30563062, 2007.

[32] G.-B. Huang and L. Chen, Enhanced random search based incremental extreme

learning machine, Neurocomputing, vol. 71, no. 1618, pp. 34603468, 2008.

[33] S. Wysoski, L. Benuskova, and N. Kasabov, On-Line Learning with Structural

Adaptation in a Network of Spiking Neurons for Visual Pattern Recognition, in

Articial Neural Networks - ICANN 2006 (S. Kollias, A. Stafylopatis, W. Duch,


and E. Oja, eds.), vol. 4131 of Lecture Notes in Computer Science , pp. 6170,

Springer Berlin Heidelberg, 2006.

[34] S. Soltic, S. Wysoski, and N. Kasabov, Evolving spiking neural networks for taste

recognition, in IEEE International Joint Conference on Neural Networks, 2008 ,


pp. 20912097, 2008.

[35] F. Alnajjar, I. Bin Mohd Zin, and K. Murase, A Spiking Neural Network with

dynamic memory for a real autonomous mobile robot in dynamic environment,

in IEEE International Joint Conference on Neural Networks, 2008 , pp. 22072213,


2008.

149
BIBLIOGRAPHY

[36] J. Wang, A. Belatreche, L. Maguire, and M. McGinnity, Online versus oine

learning for spiking neural networks: A review and new strategies, in 2010 IEEE
9th International Conference on Cybernetic Intelligent Systems (CIS) , pp. 16,

2010.

[37] C. Cortes and V. Vapnik, Support vector networks, Machine Learning, vol. 20,

no. 3, pp. 273  297, 1995.

[38] P. Laskov, C. Gehl, S. Krüger, and K.-R. Müller, Incremental Support Vector

Learning: Analysis, Implementation and Applications, Journal of Machine Learn-


ing Research, vol. 7, pp. 19091936, 2006.

[39] G. Cauwenberghs and T. Poggio, Incremental and decremental support vector

machine learning, in Advances in Neural Information Processing Systems (NIPS


2000), vol. 13, pp. 409415, MIT Press, Cambridge, MA, 2001.

[40] J. Ma, J. Theiler, and S. Perkins, Accurate Online Support Vector Regression,

Neural Computation , vol. 15, pp. 26832703, 2003.

[41] M. Karasuyama and I. Takeuchi, Multiple incremental decremental learning of

support vector machines, IEEE Transactions on Neural Networks , vol. 21, no. 7,

pp. 10481059, 2010.

[42] E. Pistikopoulos, M. Georgiadis, and V. Dua, Process Systems Engineering: Vol-


ume 1: Multi-Parametric Programming . No. 2 in Process Systems Engineering,

John Wiley & Sons, 2007.

[43] W. Liu, P. P. Pokharel, and J. C. Príncipe, The Kernel Least-Mean-Square Al-

gorithm, IEEE Transactions on Signal Processing , vol. 56, no. 2, pp. 543554,

2008.

[44] B. Chen, S. Zhao, P. Zhu, and J. C. Príncipe, Mean square convergence analysis

for kernel least mean square algorithm, Signal Processing, vol. 92, no. 11, pp. 2624
 2632, 2012.

150
BIBLIOGRAPHY

[45] P. P. Pokharel, W. Liu, and J. C. Príncipe, Kernel least mean square algorithm

with constrained growth, Signal Processing, vol. 89, no. 3, pp. 257  265, 2009.

[46] W. Liu, I. M. Park, Y. Wang, and J. C. Príncipe, Extended Kernel Recursive

Least Squares Algorithm, IEEE Transactions on Signal Processing , vol. 57, no. 10,
pp. 38013814, 2009.

[47] A. R. C. Paiva, I. Park, and J. C. Príncipe, A reproducing kernel Hilbert space

framework for spike train signal processing, Neural Computation , vol. 21, no. 2,

pp. 424449, 2009.

[48] S. Van Vaerenbergh, I. Santamaria, W. Liu, and J. C. Príncipe, Fixed-budget

kernel recursive least-squares, in 2010 IEEE International Conference on Acoustics


Speech and Signal Processing (ICASSP) , pp. 18821885, 2010.

[49] B. Chen, S. Zhao, P. Zhu, and J. C. Príncipe, Quantized Kernel Least Mean Square

Algorithm, IEEE Transactions on Neural Networks and Learning Systems , vol. 23,
no. 1, pp. 2232, 2012.

[50] P. Zhu, B. Chen, and J. C. Príncipe, A novel extended kernel recursive least

squares algorithm, Neural Networks, vol. 32, pp. 349  357, 2012.

[51] S. Zhao, and J. C. Príncipe, A nonparametric information theoretic approach for

change detection in time series, in The International Joint Conference on Neural


Networks (IJCNN), pp. 12811284, 2011.

[52] S. Zhao, B. Chen, P. Zhu, and J. C. Príncipe, Fixed budget quantized kernel

least-mean-square algorithm, Signal Processing, vol. 93, no. 9, pp. 2759  2770,

2013.

[53] S. Suresh, N. Sundararajan, and P. Saratchandran, Sequential Multi-Category

Classier using Radial Basis Function Networks, Neurocomputing, vol. 71, no. 7-

9, pp. 13451358, 2008.

[54] T. Harris and R. Hodges, eds., The literacy dictionary: The vocabulary of reading
and writing. Newark, DE: International Reading Association, 1995.

151
BIBLIOGRAPHY

[55] L. R. Squire, Mechanisms of memory, Science, vol. 232, no. 4758, pp. 16121619,
1986.

[56] S. Suresh, N. Sundararajan, and P. Saratchandran, Risk Sensitive Loss Functions

for Sparse Multi-category Classication Problems, Information Sciences , vol. 179,


no. 21, pp. 26212638, 2008.

[57] T. Zhang, Statistical behavior and consistency of classication methods based on

convex risk minimization, Annals of Statistics, vol. 32, no. 1, pp. 5685, 2003.

[58] B. Scholkopf and A.-J. Smola, Learning with Kernels . MIT Press, Cambridge, MA,

2002.

[59] H. Homann, Kernel PCA for novelty detection, Pattern Recognition , vol. 40,

no. 3, pp. 863874, 2007.

[60] M. A. Aizerman, E. A. Braverman, L. Rozonoer, Theoretical foundations of the

potential function method in pattern recognition learning, Automation and Re-


mote Control, vol. 25, pp. 821837, 1964.

[61] C. Blake and C. Merz,  UCI repository of machine learning databases, University

of California, Irvine, Department of Information and Computer Sciences, 1998.

[62] S. Ramaswamy, P. Tamayo, R. Rifkin, S. Mukherjee, C.-H. Yeang, M. Angelo,

C. Ladd, M. Reich, E. Latulippe, J. P. Mesirov, T. Poggio, W. Gerald, M. Loda,

E. S. Lander, and T. R. Golub, Multiclass cancer diagnosis using tumor gene

expression signatures, Proceedings of the National Academy of Sciences of the


United States of America , vol. 98, no. 26, pp. 15149  15154, 2001.

[63] S. N. Omkar, S. Suresh, T. R. Raghavendra, and V. Mani, Acoustic emission signal

classication using fuzzy c-means clustering, in Intl. Conf. Neural Information


Processing, vol. 4, pp. 18721831, 2002.

[64] S. Suresh, S. N. Omkar, V. Mani, and T. N. Guru Prakash, Lift coecient pre-

diction at high angle of attack using recurrent neural network, Aerospace Science
and Technology, vol. 7, no. 8, pp. 595  602, 2003.

152
BIBLIOGRAPHY

[65] C.-C. Chang and C.-J. Lin,  LIBSVM: A library for support vector machines,

ACM Transactions on Intelligent Systems and Technology , vol. 2, pp. 27:127:27,

2011. Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm .

[66] J. Demsar, Statistical comparisons of classiers over multiple data sets, Journal
of Machine Learning Research , vol. 7, no. 1, pp. 130, 2006.

[67] R. L. Iman and J. M. Davenport, Approximations of the critical region of the

Friedman static, Communication in Statistics , vol. 9, no. 6, pp. 571  595, 1980.

[68] J. H. Zar, Biostatistical analysis (4th Ed.) . Englewood Clifs, New Jersey: Prentice-

Hall, 1999.

[69] O. J. Dunn, Multiple comparisons among means, Journal of the American Sta-
tistical Association, vol. 56, no. 293, pp. 52  64, 1961.

[70] N. Japkowicz and M. Shah, Evaluating Learning Algorithms: A Classication Per-


spective. Cambridge University Press, 2011.

[71] M. K. Aronson, W. L. Ooi, D. L. Geva, D. Masur, A. Blau, and W. Frishman,

Dementia: Age-dependent incidence, prevalence, and mortality in the old old,

Archives of Internal Medicine , vol. 151, no. 5, pp. 989992, 1991.

[72] M. M. Hoehn and M. Yahr, Parkinsonism: onset, progression and mortality.,

Neurology, vol. 17, no. 5, pp. 427442, 1967.

[73] W. W. Barker, C. A. Luis, A. Kashuba, M. Luis, D. G. Harwood, D. Loewenstein,

C. Waters, P. Jimison, E. Shepherd, S. Sevush, N. Gra-Radford, D. Newland,

M. Todd, B. Miller, M. Gold, K. Heilman, L. Doty, I. Goodman, B. Robinson,

G. Pearl, D. Dickson, and R. Duara, Relative frequencies of Alzheimer disease,

lewy body, vascular and frontotemporal dementia, and hippocampal sclerosis in the

state of Florida brain bank, Alzheimer Disease and Associated Disorder , vol. 16,

no. 4, pp. 203212, 2002.

153
BIBLIOGRAPHY

[74] O. L. Lopez, J. T. Becker, C. A. Jungreis, D. Rezek, C. Estol, F. Boller, and S. T.

DeKosky, Computed tomographybut not magnetic resonance imagingidentied

periventricular white-matter lesions predict symptomatic cerebrovascular disease in

probable Alzheimer's disease, Archives of Neurology , vol. 52, no. 7, pp. 659664,

1995.

[75] K. A. Jobst, L. P. Barnetson, and B. J. Shepstone, Accurate prediction of histo-

logically conrmed Alzheimer's disease and the dierential diagnosis of dementia:

the use of NINCDS-ADRDA and DSM-III-R criteria, SPECT, X-ray CT, and Apo

E4 in medial temporal lobe dementias. Oxford Project to Investigate Memory and

Aging, International Psychogeriatrics , vol. 10, no. 3, pp. 271302, 1998.

[76] J. Ramírez, J. M. Górriz, M. López, D. Salas-Gonzalez, I. Álvarez, F. Segovia,

and C. G. Puntonet, Early detection of the Alzheimer's disease combining fea-

ture selection and kernel machines, in Advances in Neuro-Information Processing


(M. Köppen, N. Kasabov, and G. Coghill, eds.), vol. 5507, pp. 410417, Springer

Berlin / Heidelberg, 2009.

[77] M. López, J. Ramírez, J. M. Górriz, I. Álvarez, D. Salas-Gonzalez, F. Segovia,

R. Chaves, P. Padilla, M. Gómez-Río, and the Alzheimer's Disease Nueroimag-

ing Initiative, Principal component analysis-based techniques and supervised clas-

sication schemes for the early detection of Alzheimer's disease, Neurocomputing,


vol. 74, no. 8, pp. 121601271, 2011.

[78] S. Klöppel, C. M. Stonnington, C. Chu, B. Draganski, R. I. Scahill, J. D. Rohrer,

N. C. Fox, C. R. Jack Jr, J. Ashburner, and R. S. J. Frackowiak, Automatic

classication of MR scans in Alzheimer's disease, Brain, vol. 131, no. 3, pp. 681

689, 2008.

[79] C. Davatzikos, Y. Fan, X. Wu, D. Shen, and S. M. Resnick, Detection of prodro-

mal Alzheimer's disease via pattern classication of MRI, Neurobiology of Aging ,


vol. 29, pp. 514523, 2008.

154
BIBLIOGRAPHY

[80] P. M. Thompson, K. M. Hayashi, R. A. Dutton, M. C. Chiang, A. D. Leow, E. R.

Sowell, G. D. Zubicaray, J. T. Becker, O. L. Lopez, H. J. Aizenstein, and A. W.

Toga, Tracking Alzheimer's disease, Annals Of The New York Academy Of Sci-
ences, vol. 1097, pp. 183214, 2007.

[81] P. Vemuri and C. Jack Jr., Role of structural MRI in Alzheimer's disease,

Alzheimer's Research and Therapy , vol. 2, no. 4, pp. 110, 2010.

[82] M. Chupin, E. Gérardin, R. Cuingnet, C. Boutet, L. Lemieux, S. Lehéricy, H. Be-

nali, L. Garnero, O. Colliot, and the Alzheimer's Disease Neuroimaging Initiative,

Fully automatic hippocampus segmentation and classication in Alzheimer's dis-

ease and mild cognitive impairment applied on data from ADNI, Hippocampus,
vol. 19, no. 6, pp. 579587, 2009.

[83] N. R. Giuliani, V. D. Calhoun, G. D. Pearlson, A. Francis, and R. W. Buchanan,

Voxel-based morphometry versus region of interest: a comparison of two methods

for analyzing gray matter disturbances in schizophrenia, Schizophrenia Research ,


vol. 74, no. 23, pp. 135147, 2005.

[84] C. R. Jack Jr, R. C. Petersen, Y. C. Xu, P. C. ÓBrien, G. E. Smith, R. J. Ivnik,

B. F. Boeve, S. C. Waring, E. G. Tangalos, and E. Kokmen, Prediction of AD with

MRI-based hippocampal volume in mild cognitive impairment, Neurology, vol. 52,


no. 7, pp. 13971403, 1999.

[85] R. J. Killiany, B. T. Hyman, T. Gomez-Isla, M. B. Moss, R. Kikinis, F. Jolesz,

R. Tanzi, K. Jones, and M. S. Albert,  MRI measures of entorhinal cortex vs

hippocampus in preclinical AD, Neurology, vol. 58, no. 8, pp. 11881196, 2002.

[86] G. B. Frisoni, M. P. Laakso, A. Beltramello, C. Geroldi, A. Bianchetti, H. Soininen,

and M. Trabucchi, Hippocampal and entorhinal cortex atrophy in frontotemporal

dementia and Alzheimer's disease, Neurology, vol. 52, no. 1, pp. 91100, 1999.

[87] A. Fornito, M. Yücel, S. J. Wood, C. Adamson, D. Velakoulis, M. M. Saling,

P. D. McGorry, and C. Pantelis, Surface-based morphometry of the anterior cin-

gulate cortex in rst episode schizophrenia, Human Brain Mapping , vol. 29, no. 4,
pp. 478489, 2008.

155
BIBLIOGRAPHY

[88] J. Ashburner, C. Hutton, R. S. J. Frackowiak, I. Johnsrude, C. Price, and K. J. Fris-

ton, Identifying global anatomical dierences: deformation-based morphometry,

NeuroImage, vol. 6, no. 56, pp. 348357, 1998.

[89] N. Lepore, C. A. Brun, M. C. Chiang, Y. Y. Chou, R. A. Dutton, K. M. Hayashi,

E. Luders, O. L. Lopez, H. J. Aizenstein, A. W. Toga, J. T. Becker, and P. M.

Thompson, Generalized tensor-based morphometry of HIV/AIDS using multivari-

ate statistics on deformation tensors, IEEE Transactions on Medical Imaging ,


vol. 27, no. 1, pp. 129141, 2008.

[90] J. Ashburner and K. J. Friston, Voxel-Based Morphometry-The Methods, Neu-


roImage, vol. 11, no. 6, pp. 805821, 2000.

[91] SPM8, Wellcome trust center for neuroimaging, Institute of neurology, UCL, Lon-

don, UK, 2011. http://www.fil.ion.ucl.ac.uk/spm/ .

[92] Y. Fan, D. Shen, R. C. Gur, R. E. Gur, and C. Davatzikos,  COMPARE: Clas-

sication of Morphological Patterns Using Adaptive Regional Elements, IEEE


Transactions on Medical Imaging , vol. 26, no. 1, pp. 93 105, 2007.

[93] B. Magnin, L. Mesrob, S. Kinkingnéhun, M. Pélégrini-Issac, O. Colliot, M. Sarazin,

B. Dubois, S. Lehéricy, and H. Benali, Support vector machine-based classication

of Alzheimer's disease from whole-brain anatomical MRI, Neuroradiology, vol. 51,


no. 2, pp. 7383, 2009.

[94] D. Zhang, Y. Wang, L. Zhou, H. Yuan, and D. Shen, Multimodal classication of

Alzheimer's disease and mild cognitive impairment, NeuroImage, vol. 55, no. 3,

pp. 856  867, 2011.

[95] Y. Fan, D. Shen, and C. Davatzikos, Classication of structural images via high-

dimensional image warping, robust feature extraction, and SVM, in Proceedings


of the 8th international conference on Medical Image Computing and Computer-
Assisted Intervention - Volume Part I , MICCAI'05, (Berlin, Heidelberg), pp. 18,

Springer-Verlag, 2005.

156
BIBLIOGRAPHY

[96] P. Vemuri, J. L. Gunter, M. L. Senjem, J. L. Whitwell, K. Kantarci, D. S. Knopman,

B. F. Boeve, R. C. Petersen, and C. R. J. Jr., Alzheimer's disease diagnosis in

individual subjects using structural MR images: Validation studies, NeuroImage,


vol. 39, no. 3, pp. 1186  1197, 2008.

[97] U. Yoon, J.-M. Lee, K. Im, Y.-W. Shin, B. H. Cho, I. Y. Kim, J. S. Kwon, and S. I.

Kim, Pattern classication using principal components of cortical thickness and

its discriminative pattern in schizophrenia, NeuroImage, vol. 34, no. 4, pp. 1405 
1415, 2007.

[98] M. J. McKeown and T. J. Sejnowski, Independent component analysis of fMRI

data: Examining the assumptions, Human Brain Mapping , vol. 6, no. 5-6, pp. 368
372, 1998.

[99] L. Xu, G. Pearlson, and V. D. Calhoun, Joint source based morphometry identies

linked gray and white matter group dierences, NeuroImage, vol. 44, no. 3, pp. 777
 789, 2009.

[100] B. S. Mahanand, S. Suresh, N. Sundararajan, and M. A. Kumar, Alzheimer's

disease detection using a Self-adaptive Resource Allocation Network classier, in

The International Joint Conference on Neural Networks (IJCNN) , pp. 19301934,


2011.

[101] D. S. Marcus, T. H. Wang, H. Parker, J. . G. Csernansky, J. C. Morris, and

R. L. Buckner, Open access series of imaging studies (OASIS): cross-sectional MRI

data in young, middle aged, nondemented, and demented older adults, Journal of
Cognitive Neuroscience , vol. 19, pp. 14981507, 2007.

[102] S. G. Mueller, M. W. Weiner, L. J. Thal, R. C. Petersen, C. Jack, W. Jagust, J. Q.

Trojanowski, A. W. Toga, and L. Beckett, The Alzheimer's Disease Neuroimaging

Initiative, Neuroimaging Clinics of North America , vol. 15, no. 4, pp. 869  877,

2005.

157
BIBLIOGRAPHY

[103] C. R. Jack, M. A. Bernstein, N. C. Fox, P. Thompson, G. Alexander, D. Harvey,

B. Borowski, P. J. Britson, L. W. Jennifer, C. Ward, A. M. Dale, J. P. Felmlee, J. L.

Gunter, D. L. Hill, R. Killiany, N. Schu, S. Fox-Bosetti, C. Lin, C. Studholme,

C. S. DeCarli, K. Gunnar, H. A. Ward, G. J. Metzger, K. T. Scott, R. Mallozzi,

D. Blezek, J. Levy, J. P. Debbins, A. S. Fleisher, M. Albert, R. Green, G. Bartzokis,

G. Glover, J. Mugler, and M. W. Weiner, The Alzheimer's disease neuroimaging

initiative (ADNI): MRI methods, Journal of Magnetic Resonance Imaging , vol. 27,
no. 4, pp. 685691, 2008.

[104] J. Ashburner and K. J. Friston, Unied segmentation, NeuroImage, vol. 26,

pp. 839851, 2005.

[105] K. J. Friston, A. P. Holmes, K. J. Worsley, J. B. Poline, C. D. Frith, and R. S. J.

Frackowiak, Statistical parametric maps in functional imaging: A general linear

approach, Human Brain Mapping , vol. 2, pp. 189210, 1994.

[106] M. García-Sebastián, A. Savio, M. Graña, and J. Villanúa, On the use of mor-

phometry based features for Alzheimer's disease detection on MRI, in Proceedings


of the 10th International Work-Conference on Articial Neural Networks: Part
I: Bio-Inspired Systems: Computational and Ambient Intelligence (IWANN2009) ,
(Salamanca, Spain), pp. 957964, 2009.

[107] A. Hyvarinen, Fast and robust xed-point algorithms for independent component

analysis, IEEE Transactions on Neural Networks , vol. 10, no. 3, pp. 626634, 1999.

[108] H. Gavert, J. Hurri, J. Sarela, and A. Hyvarinen, The FastICA package for MAT-

LAB, 2005. http://www.cis.hut.fi/projects/ica/fastica/ .

[109] Y. Fan and D. Shen, Integrated feature extraction and selection for neuroimage

classication, in Medical Imaging 2009: Image Processing , (Lake Buena Vista, FL,
USA), p. 72591U, 2009.

[110] W. Yang, H. Xia, B. Xia, L. M. Lui, and X. Huang,  ICA-based feature extrac-

tion and automatic classication of AD-related MRI data, in Sixth International


Conference on Natural Computation (ICNC) , vol. 3, pp. 12611265, Aug. 2010.

158
BIBLIOGRAPHY

[111] R. Wolz, V. Julkunen, J. Koikkalainen, E. Niskanen, D. P. Zhang, D. Rueckert,

H. Soininen, J. Lötjönen, and the Alzheimer's Disease Neuroimaging Initiative,

Multi-Method Analysis of MRI Images in Early Diagnostics of Alzheimer's Dis-

ease, PLoS ONE, vol. 6, no. 10, p. e25446, 2011.

[112] C. Hinrichs, V. Singh, L. Mukherjee, G. Xu, M. K. Chung, and S. C. Johnson,

Spatially augmented LPboosting for AD classication with evaluations on the

ADNI dataset, NeuroImage, vol. 48, no. 1, pp. 138  149, 2009.

[113] I. Guyon and A. Elissee, An introduction to variable and feature selection, Jour-
nal of Machine Learning Research , vol. 3, pp. 11571182, 2003.

[114] H. Hampel, K. Burger, S. J. Teipel, A. L. Bokde, H. Zetterberg, and K. Blennow,

Core candidate neurochemical and imaging biomarkers of Alzheimer's disease,

Alzheimer's and Dementia , vol. 4, no. 1, pp. 38  48, 2008.

[115] J. A. Harasty, G. M. Halliday, J. J. Kril, and C. Code, Specic temporoparietal

gyral atrophy reects the pattern of language dissolution in Alzheimer's disease,

Brain, vol. 122, no. 4, pp. 675686, 1999.

[116] G. W. Van Hoesen, J. C. Augustinack, J. Dierking, S. J. Redman, and

R. Thangavel, The Parahippocampal Gyrus in Alzheimer's Disease: Clinical and

Preclinical Neuroanatomical Correlates, Annals of the New York Academy of Sci-


ences, vol. 911, no. 1, pp. 254274, 2000.

[117] A. L. Foundas, C. M. Leonard, S. M. Mahoney, O. F. Agee, and K. M. Heilman,

Atrophy of the hippocampus, parietal cortex, and insula in Alzheimer's disease: A

volumetric magnetic resonance imaging study, Neuropsychiatry, Neuropsychology


and Behavioral Neurology , vol. 10, no. 2, pp. 8189, 1997.

[118] D. Bonthius, A. Solodkin, and G. Van Hoesen, Pathology of the insular cortex

in Alzheimer disease depends on cortical architecture, Journal of Neuropathology


and Experimental Neurology , vol. 64, no. 10, pp. 910922, 2005.

159
BIBLIOGRAPHY

[119] M. Arianna, C. Flavia, V. Augusto, C. Gemma, L. Giovanni, and A. Enrico, The

application of Russell and Burch 3R principle in rodent models of neurodegenerative

disease: The case of Parkinson's disease, Neuroscience and Biobehavioral Reviews ,


vol. 33, no. 1, pp. 1832, 2009.

[120] C.-W. Cho, W.-H. Chao, S.-H. Lin, and Y.-Y. Chen, A vision-based analysis

system for gait recognition in patients with Parkinson's disease, Expert Systems
with Applications, vol. 36, no. 3, Part 2, pp. 70337039, 2009.

[121] S.-H. Lee and J. S. Lim, Parkinson's disease classication using gait characteristics

and wavelet-based feature extraction, Expert Systems with Applications , vol. 39,

no. 8, pp. 73387344, 2012.

[122] M. A. Little, P. E. McSharry, S. J. Roberts, D. A. E. Costello, and I. M. Moroz,

Exploiting Nonlinear Recurrence and Fractal Scaling Properties for Voice Disorder

Detection, BioMedical Engineering OnLine , vol. 6, no. 23, pp. 119, 2007.

[123] M. A. Little, P. E. McSharry, E. J. Hunter, J. Spielman, and L. O. Ramig, Suitabil-

ity of Dysphonia Measurements for Telemonitoring of Parkinson's Disease, IEEE


Transactions on Biomedical Engineering , vol. 56, no. 4, pp. 10151022, 2009.

[124] C. Sakar and O. Kursun, Telediagnosis of Parkinson's Disease Using Measurements

of Dysphonia, Journal of Medical Systems , vol. 34, no. 4, pp. 591599, 2010.

[125] R. Das, A comparison of multiple classication methods for diagnosis of Parkinson

disease, Expert Systems with Applications , vol. 37, no. 2, pp. 15681572, 2010.

[126] F. Strom and R. Koker, A parallel neural network approach to prediction of Parkin-

son's Disease, Expert Systems with Applications , vol. 38, no. 10, pp. 1247012474,
2011.

[127] M. F. Caglar, B. Cetisli, and I. B. Toprak, Automatic Recognition of Parkinson's

Disease from Sustained Phonation Tests Using ANN and Adaptive Neuro-Fuzzy

Classier, Journal of Engineering Science and Design , vol. 1, no. 2, pp. 5964,

2010.

160
BIBLIOGRAPHY

[128] K. Polat, Classication of Parkinson's disease using feature weighting method on

the basis of fuzzy C-means clustering, International Journal of Systems Science ,


vol. 43, no. 4, pp. 597609, 2012.

[129] G. T. Stebbins and C. G. Goetz, Factor structure of the Unied Parkinson's Dis-

ease Rating Scale: Motor Examination section, Movement Disorders , vol. 13, no. 4,
pp. 633636, 1998.

[130] H.-S. Jeon, J. Han, W.-J. Yi, B. Jeon, and K. S. Park, Classication of Parkinson

gait and normal gait using Spatial-Temporal Image of Plantar pressure, in Engi-
neering in Medicine and Biology Society. 30th Annual International Conference of
the IEEE, pp. 46724675, 2008.

[131] N. M. Tahir and H. H. Manap, Parkinson Disease Gait Classication based on

Machine Learning Approach, Journal of Applied Sciences , vol. 12, no. 2, pp. 180
185, 2012.

[132] C. R. Scherzer, A. C. Eklund, L. J. Morse, Z. Liao, J. J. Locascio, D. Fefer, M. A.

Schwarzschild, M. G. Schlossmacher, M. A. Hauser, J. M. Vance, L. R. Sudarsky,

D. G. Standaert, J. H. Growdon, R. V. Jensen, and S. R. Gullans, Molecular

markers of early Parkinson's disease based on gene expression in blood, Proceedings


of the National Academy of Sciences , vol. 104, no. 3, pp. 955960, 2007.

[133] H. Shinotoh and D. B. Calne, The Use of PET in Parkinsons-Disease, Brain and
Cognition, vol. 28, no. 3, pp. 297310, 1995.

[134] A. Winogrodzka, P. Bergmans, J. Booij, E. A. Van Royen, J. C. Stoof, and E. C.

Wolters,  [123I] β -CIT SPECT is a useful method for monitoring dopaminergic

degeneration in early stage Parkinson's disease, Journal of Neurology Neurosurgery


and Psychiatry, vol. 74, no. 3, pp. 294298, 2003.

[135] A. Schrag, C. Good, K. Miszkiel, H. Morris, C. Mathias, A. Lees, and N. Quinn,

Dierentiation of atypical parkinsonian syndromes with routine MRI, Neurology,


vol. 54, no. 3, pp. 697702, 2000.

161
BIBLIOGRAPHY

[136] G. Becker, J. Seufert, U. Bogdahn, H. Reichmann, and K. Reiners, Degeneration

of substantia nigra in chronic Parkinson's disease visualized by transcranial color-

coded real-time sonography, Neurology, vol. 45, no. 1, pp. 182184, 1995.

[137] P. Nicola and J. B. David, Imaging neurodegeneration in Parkinson's disease,

Biochimica et Biophysica Acta (BBA) - Molecular Basis of Disease , vol. 1792,

no. 7, pp. 722729, 2009.

[138] C. Taccioli, V. Maselli, J. Tegner, D. Gomez-Cabrero, G. Altobelli, W. Emmett,

F. Lescai, S. Gustincich, and E. Stupka, ParkDB: a Parkinson's disease gene

expression database., 2011. http://database.oxfordjournals.org/content/


2011/bar007.abstract .

[139] G. K. Smyth, Linear models and empirical bayes methods for assessing dierential

expression in microarray experiments, Statistical Applications in Genetics and


Molecular Biology, vol. 3, no. 1, 2004. Article 3.

[140] M. Kenneth, J. Danna, L. Shirley, S. Andrew, T. Caroline, S. Tanya, C. Chris,

K. Karl, F. Emily, C. Sohini, P. Werner, M. Brit, K. Paracelsus-Elena, S. Todd,

F. Mark, M. Claire, R. Alice, C. Cindy, S. John, M. Susan, S. Norbert, Z. Ying,

T. Arthur, C. Karen, A. Alison, P. D. Blasio, P. Michele, T. John, S. Les, S. An-

drew, H. Keith, E. Jamie, B. Deborah, R. David, L. Laura, F. Stewart, S. Barbara,

H. Penelope, P. Emily, W. Karen, S. David, G. Stephanie, H. Robert, D. Holly,

J. Joseph, H. Christine, S. Matthew, T. Baochan, L. Jim, B. Marne, F. Sam,

T. Cathi-Ann, R. Irene, D. Cheryl, R. Linda, S. Fabienne, L. Elisabeth, S. Holly,

O. Sanja, F. Hubert, W. Adrienna, B. Daniela, G. Katharina, G. Douglas, F. Deb-

orah, M. Zoltan, G. Melissa, B. David, M. Sophie, B. Paolo, L. Katia, C. Tom,

R. Bernard, G. Igor, G. Kim, C. Michelle, L. W. Katherine, O. Suzanne, F. Paulo,

H. Tony, L. Johan, B. Marcel van der, D. R. Alastair, and T. Peggy, The Parkin-

son Progression Marker Initiative (PPMI), Progress in Neurobiology , vol. 95, no. 4,
pp. 629635, 2011.

[141] J. M. Hausdor, J. Lowenthal, T. Herman, L. Gruendlinger, C. Peretz, and N. Gi-

ladi, Rhythmic auditory stimulation modulates gait variability in Parkinson's dis-

ease, European Journal of Neuroscience , vol. 26, no. 8, pp. 23692375, 2007.

162
BIBLIOGRAPHY

[142] H. Zheng, M. Yang, H. Wang, and S. McClean, Machine Learning and Statistical

Approaches to Support the Discrimination of Neuro-degenerative Diseases Based on

Gait Analysis, in Intelligent Patient Management (S. McClean, P. Millard, E. El-

Darzi, and C. Nugent, eds.), vol. 189 of Studies in Computational Intelligence ,


pp. 5770, Springer Berlin / Heidelberg, 2009.

[143] M. K. Beyer, C. C. Janvin, J. P. Larsen, and D. Aarsland, A magnetic resonance

imaging study of patients with Parkinson's disease with mild cognitive impairment

and dementia using voxel-based morphometry, Journal of Neurology, Neurosurgery


and Psychiatry, vol. 78, no. 3, pp. 254259, 2007.

[144] C. Summereld, C. Junqué, E. Tolosa, and et al, Structural brain changes in

parkinson disease with dementia: A voxel-based morphometry study, Archives of


Neurology, vol. 62, no. 2, pp. 281285, 2005.

[145] B. Shahbaba and R. Neal, Nonlinear Models Using Dirichlet Process Mixtures,

Journal of Machine Learning Research , vol. 10, pp. 18291850, 2009.

[146] I. Psorakis, T. Damoulas, and M. A. Girolami, Multiclass Relevance Vector Ma-

chines: Sparsity and Accuracy, IEEE Transactions on Neural Networks , vol. 21,

no. 10, pp. 15881598, 2010.

[147] P.-F. Guo, P. Bhattacharya, and N. Kharma, Advances in Detecting Parkinson's

Disease, in Medical Biometrics (D. Zhang and M. Sonka, eds.), vol. 6165 of Lecture
Notes in Computer Science , pp. 306314, Springer Berlin / Heidelberg, 2010.

[148] J. T. Hart, Memory and the feeling-of-knowing experience., Journal of Educa-


tional Psychology, vol. 56, no. 4, pp. 208  216, 1965.

[149] K. Shkurko and X. Qi, A Radial Basis Function and Semantic Learning Space

Based Composite Learning Approach to Image Retrieval, in IEEE International


Conference on Acoustics, Speech and Signal Processing , vol. 1, pp. 945 948, 2007.

[150] J. Zhang and H. Li, A Reconstruction Approach to CT with Cauchy RBFs Net-

work, in Advances in Neural Networks - ISNN (F.-L. Yin, J. Wang, and C. Guo,

163
BIBLIOGRAPHY

eds.), vol. 3174 of Lecture Notes in Computer Science , pp. 531536, Springer Berlin
Heidelberg, 2004.

[151] A. Saranli and B. Baykal, Complexity reduction in radial basis function (RBF)

networks by using radial B-spline functions, Neurocomputing, vol. 18, no. 1-3,

pp. 183  194, 1998.

[152] F. Fernández-Navarro, C. Hervás-Martínez, P. A. Gutiérrez, J. M. Pe«a-Barragán,

and F. López-Granados, Parameter estimation of q-Gaussian Radial Basis Func-

tions Neural Networks with a Hybrid Algorithm for binary classication, Neuro-
computing, vol. 75, no. 1, pp. 123  134, 2012.

164

You might also like