Kernel Learning Algorithms for Face Recognition
By Jun-Bao Li, Shu-Chuan Chu and Jeng-Shyang Pan
()
About this ebook
Related to Kernel Learning Algorithms for Face Recognition
Related ebooks
Kernel Methods: Fundamentals and Applications Rating: 0 out of 5 stars0 ratingsMachine Learning - Advanced Concepts Rating: 0 out of 5 stars0 ratingsCMOS Test and Evaluation: A Physical Perspective Rating: 0 out of 5 stars0 ratingsDATA MINING and MACHINE LEARNING: CLUSTER ANALYSIS and kNN CLASSIFIERS. Examples with MATLAB Rating: 0 out of 5 stars0 ratingsNeural Networks for Beginners. Part 2 Rating: 0 out of 5 stars0 ratingsDEEP LEARNING TECHNIQUES: CLUSTER ANALYSIS and PATTERN RECOGNITION with NEURAL NETWORKS. Examples with MATLAB Rating: 0 out of 5 stars0 ratingsLarge Scale Machine Learning with Spark Rating: 0 out of 5 stars0 ratingsRadial Basis Networks: Fundamentals and Applications for The Activation Functions of Artificial Neural Networks Rating: 0 out of 5 stars0 ratingsAdvanced Backend Code Optimization Rating: 0 out of 5 stars0 ratingsMachine Learning Pipelines Rating: 0 out of 5 stars0 ratingsFundamentals of Machine Learning: An Introduction to Neural Networks Rating: 0 out of 5 stars0 ratingsMixture Models and Applications Rating: 0 out of 5 stars0 ratingsPathways to Machine Learning and Soft Computing: 邁向機器學習與軟計算之路(國際英文版) Rating: 0 out of 5 stars0 ratingsMastering Machine Learning Basics: A Beginner's Companion Rating: 0 out of 5 stars0 ratingsPython for Machine Learning: From Fundamentals to Real-World Applications Rating: 0 out of 5 stars0 ratingsImage Classification: Step-by-step Classifying Images with Python and Techniques of Computer Vision and Machine Learning Rating: 0 out of 5 stars0 ratingsLinear Programming: Foundations and Extensions Rating: 0 out of 5 stars0 ratingsMATLAB for Machine Learning: Unlock the power of deep learning for swift and enhanced results Rating: 0 out of 5 stars0 ratingsIntroduction to Neural Architecture Search: Optimizing AI Models Rating: 0 out of 5 stars0 ratingsPython: Deeper Insights into Machine Learning Rating: 0 out of 5 stars0 ratingsRadar Signal Processing for Autonomous Driving Rating: 0 out of 5 stars0 ratingsContextual Image Classification: Understanding Visual Data for Effective Classification Rating: 0 out of 5 stars0 ratingsGeometric Feature Learning: Unlocking Visual Insights through Geometric Feature Learning Rating: 0 out of 5 stars0 ratings
Telecommunications For You
Math Word Problems Demystified 2/E Rating: 3 out of 5 stars3/5Make Your Smartphone 007 Smart Rating: 4 out of 5 stars4/5Medical Charting Demystified Rating: 2 out of 5 stars2/5Tor and the Dark Art of Anonymity Rating: 5 out of 5 stars5/5101 Spy Gadgets for the Evil Genius 2/E Rating: 4 out of 5 stars4/5Codes and Ciphers - A History of Cryptography Rating: 4 out of 5 stars4/5Codes and Ciphers Rating: 5 out of 5 stars5/5The Deal of the Century: The Breakup of AT&T Rating: 4 out of 5 stars4/5Stop Scrolling: 30 Days to Healthy Screen Time Habits (Without Throwing Your Phone Away): 30 Day Expert Series Rating: 0 out of 5 stars0 ratingsA Beginner's Guide to Ham Radio Rating: 0 out of 5 stars0 ratingsWireless and Mobile Hacking and Sniffing Techniques Rating: 0 out of 5 stars0 ratingsiPhone Unlocked Rating: 0 out of 5 stars0 ratings22 Radio and Receiver Projects for the Evil Genius Rating: 0 out of 5 stars0 ratingsBuilding Confidence Using Your iPhone: Book I — THE ULTIMATE GUIDE Rating: 0 out of 5 stars0 ratingsThe TAB Guide to DIY Welding: Hands-on Projects for Hobbyists, Handymen, and Artists Rating: 0 out of 5 stars0 ratings15 Dangerously Mad Projects for the Evil Genius Rating: 4 out of 5 stars4/5Digital Filmmaking for Beginners A Practical Guide to Video Production Rating: 0 out of 5 stars0 ratingsSteampunk Gear, Gadgets, and Gizmos: A Maker's Guide to Creating Modern Artifacts Rating: 4 out of 5 stars4/5Get on the Air…Now! A practical, understandable guide to getting the most from Amateur Radio Rating: 3 out of 5 stars3/5Teardowns: Learn How Electronics Work by Taking Them Apart Rating: 0 out of 5 stars0 ratingsDiagrams of Targeted Justice Rating: 0 out of 5 stars0 ratingsEvaluation of Some Android Emulators and Installation of Android OS on Virtualbox and VMware Rating: 0 out of 5 stars0 ratingsMORE Electronic Gadgets for the Evil Genius: 40 NEW Build-it-Yourself Projects Rating: 4 out of 5 stars4/5Probability Demystified 2/E Rating: 4 out of 5 stars4/5The Hello Girls: America’s First Women Soldiers Rating: 4 out of 5 stars4/5Linear Algebra Demystified Rating: 0 out of 5 stars0 ratingsiPhone 16 Pro Max User Manual: The Complete Step-By-Step Guide to Maximize your New iPhone 16 Pro Max and iOS 18 Rating: 0 out of 5 stars0 ratingsMaking Everyday Electronics Work: A Do-It-Yourself Guide: A Do-It-Yourself Guide Rating: 4 out of 5 stars4/5Pre-Algebra DeMYSTiFieD, Second Edition Rating: 0 out of 5 stars0 ratingsDiscrete Mathematics DeMYSTiFied Rating: 0 out of 5 stars0 ratings
Reviews for Kernel Learning Algorithms for Face Recognition
0 ratings0 reviews
Book preview
Kernel Learning Algorithms for Face Recognition - Jun-Bao Li
Jun-Bao Li, Shu-Chuan Chu and Jeng-Shyang PanKernel Learning Algorithms for Face Recognition201410.1007/978-1-4614-0161-2_1© Springer Science+Business Media New York 2014
1. Introduction
Jun-Bao Li¹ , Shu-Chuan Chu² and Jeng-Shyang Pan³
(1)
Department of Automatic Test and Control, Harbin Institute of Technology, Yi-Kuang Street 2, Harbin, People’s Republic of China
(2)
School of Information and Engineering, Flinders University of South Australia, Sturt Road, Bedford Park, SA, 5042, Australia
(3)
HIT Shenzhen Graduate School, Harbin Institute of Technology, Room 424, Building D, Shenzhen Graduate School of Harbin Institute of Technology, Xili University Town, Nanshan,, Shenzhen City, Guangdong Province, People’s Republic of China
Jun-Bao Li (Corresponding author)
Email: junboalihit@gmail.com
Shu-Chuan Chu
Email: scchu@bit.kuas.edu.tw
Jeng-Shyang Pan
Email: jspan@cc.kuas.edu.tw
Abstract
Face recognition (FR) has become a popular research topic in the computer vision, image processing, and pattern recognition areas. Recognition performance of the practical FR system is largely influenced by the variations in illumination conditions, viewing directions or poses, facial expression, aging, and disguises. FR provides the wide applications in commercial, law enforcement, military, and so on, such as airport security and access control, building surveillance and monitoring, human–computer intelligent interaction and perceptual interfaces, smart environments at home, office, and cars.
1.1 Basic Concept
Face recognition (FR) has become a popular research topic in the computer vision, image processing, and pattern recognition areas. Recognition performance of the practical FR system is largely influenced by the variations in illumination conditions, viewing directions or poses, facial expression, aging, and disguises. FR provides the wide applications in commercial, law enforcement, and military, and so on, such as airport security and access control, building surveillance and monitoring, human–computer intelligent interaction and perceptual interfaces, smart environments at home, office, and cars. An excellent FR method should consider what features are used to represent a face image and how to classify a new face image based on this representation. Current feature extraction methods can be classified into signal processing and statistical learning methods. On signal-processing-based methods, feature-extraction-based Gabor wavelets are widely used to represent the face image, because the kernels of Gabor wavelets are similar to two-dimensional receptive field profiles of the mammalian cortical simple cells, which capture the properties of spatial localization, orientation selectivity, and spatial frequency selectivity to cope with the variations in illumination and facial expressions. On the statistical-learning-based methods, the dimension reduction methods are widely used. In this book, we have more attentions on learning-based FR method. In the past research, the current methods include supervised learning, unsupervised learning, and semi-supervised learning.
1.1.1 Supervised Learning
Supervised learning is a popular learning method through mapping the input data into the feature space, and it includes classification and regression. During learning the mapping function, the sample with the class labels is used to training. Many works discuss supervised learning extensively including pattern recognition, machine learning.
Supervised learning methods are consisted of two kinds of generative or discriminative methods. Generative models assume that the data that are independently and identically distributed are subject to one probability density function, for example, posteriori estimation (MAP) [1], empirical Bayes, and variational Bayes [2]. Different from data generation process, discriminative methods directly make the decision boundary of the classes. The decision boundary is represented with a parametric function of data through minimizing the classification error on the training set [1]. Empirical risk minimization (ERM) is a widely adopted principle in discriminative supervised learning, for example, neural networks [3] and logistic regression [2]. As opposed to probabilistic methods, the decision boundary is modeled directly, which overcomes structural risk minimization (SRM) principle by Vapnik’s [4], and this method adds a regularity criterion to the empirical risk. So that, the classifier has a good generalization ability. Most of the above classifiers implicitly or explicitly require the data to be represented as a vector in a suitable vector space [5].
Ensemble classifiers are used to combine multiple component classifiers to obtain a meta-classifier, for example, bagging [6] and boosting [7, 8]. Bagging is a short form for bootstrap aggregation, which trains multiple instances of a classifier on different subsamples. Boosting samples trains data more intelligently, and it is difficult for the existing ensemble to classify with a higher preference.
1.1.2 Unsupervised Learning
Unsupervised learning is a significantly more difficult problem than classification. Many clustering algorithms have already been proposed [9], and we broadly divide the clustering algorithms into groups. As an example of a sum of squared error (SSE) minimization algorithm, K-means is the most popular and widely used clustering algorithm. K-means is initialized with a set of random cluster centers, for example, ISODATA [10], linear vector quantization [11].
Parametric mixture models are widely used in machine learning areas [12], for example, GMM [13, 14] has been extensively used for clustering. Since it assumes that each component is homogeneous, unimodal, and generated using a Gaussian density, its performance is limited. For that, an improved method called latent Dirichlet allocation [15] was proposed, as a multinomial mixture model. Several mixture models have been extended to their nonparametric form by taking the number of components to infinity [16–18]. Spectral clustering algorithms [19–21] are popular nonparametric models, and it minimizes an objective function. Kernel K-means is a related kernel-based algorithm, which generalizes the Euclidean-distance-based K-means to arbitrary metrics in the feature space. Using the kernel trick, the data are first mapped into a higher-dimensional space using a possibly nonlinear map, and a K-means clustering is performed in the higher-dimensional space.
1.1.3 Semi-Supervised Algorithms
Semi-supervised learning methods attempt to improve the performance of a supervised or an unsupervised learning in the presence of side information. This side information can be in the form of unlabeled samples in the supervised case or pair-wise constraints in the unsupervised case. An earlier work by Robbins and Monro [22] on sequential learning can also be viewed as related to semi-supervised learning, for example, Vapnik’s overall risk minimization (ORM) principle [23]. Usually, the underlying geometry of the data is captured by representing the data as a graph, with samples as the vertices, and the pair-wise similarities between the samples as edge weights. Several graph-based algorithms such as label propagation [24], Markov random walks [25], graph cut algorithms [26], spectral graph transducer [27], and low-density separation [28]. The second assumption is cluster assumption [29]. Many successful semi-supervised algorithms are TSVM [30] and semi-supervised SVM [31]. These algorithms assume a model for the decision boundary, resulting in a classifier.
1.2 Kernel Learning
Kernel method was firstly proposed in Computational Learning Theory Conference in 1992. In the conference, support vector machine (SVM) theory was introduced and caused the large innovation of machine learning. The key technology of SVM is that the inner product of the nonlinear vector is defined with kernel function. Based on the kernel function, the data are mapped into high-dimensional feature space with kernel mapping. Many kernel learning methods were proposed through kernelizing the linear learning methods.
Kernel learning theory is widely paid attentions by researchers, and kernel learning method is successfully applied in pattern recognition, regression estimation, and so on [32–38].
1.2.1 Kernel Definition
Kernel function defines the nonlinear inner product <(x), (y)> of the vector x and y, then
$$ k(x,y) = \left\langle {\Upphi (x),\Upphi (y)} \right\rangle $$(1.1)
The definition is proposed based on Gram matrix and positive matrix.
1.
Kernel construction
Kernel function is a crucial factor for influencing the kernel learning, and different kernel functions cause the different generalization of kernel learning, such as SVM. Researchers construct different kernel in the different application. In current kernel learning algorithms, polynomial kernel, Gaussian kernel, and sigmoid kernel and RBF kernel are popular kernel, as follows.
Polynomial kernel
$$ k(x,y) = (x \cdot y)^{d} \quad (d \in N) $$(1.2)
Gaussian kernel
$$ k(x,y) = \exp \left( { - \frac{{\left\| {x - y} \right\|^{2} }}{{2\sigma^{2} }}} \right)\quad (\sigma > 0) $$(1.3)
Sigmoid kernel
$$ k\left( {x,z} \right) = \tanh (\alpha \langle x,z\rangle + \beta ),\,\left( {\alpha > 0,\beta < 0} \right) $$(1.4)
RBF kernel
$$ k\left( {x,z} \right) = \exp \left( { - \rho d(x,z)} \right),\,\left( {\rho > 0} \right) $$(1.5)
where d(x,z) is a distance measure.
1.2.2 Kernel Character
Based on the definition of kernel, it seems that it firstly constructs the nonlinear mapping space and then computes the inner product of the input vectors in the nonlinear mapping space. In fact, in the practical application, kernel function represents the nonlinear mapping space. Based on this idea, the inner product computation can avoid the computation in the nonlinear mapping space. So it need not know the mapping equation in advanced in the practical application. In fact, for one kernel function, it can construct the kernel-based feature space, where the inner product is defined by kernel function. The vector in the nonlinear mapping space can described as follows.
(a)
Vector norm:
$$ \left\| {\Upphi (x)} \right\|_{2} = \sqrt {\left\| {\Upphi (x)} \right\|^{2} } = \sqrt {\langle \Upphi (x),\Upphi (x)\rangle } = \sqrt {k(x,x)} $$(1.6)
(b)
Vector linear combination norm:
$$ \begin{aligned} \left\| {\sum\limits_{i = 1}^{l} {\alpha_{i} \Upphi (x_{i} )} } \right\|^{2} & = \left\langle \sum\limits_{i = 1}^{l} {\alpha_{i} \Upphi (x_{i} )} ,\sum\limits_{j = 1}^{l} {\alpha_{j} \Upphi (x_{j} )} \right\rangle \\ & = \sum\limits_{i = 1}^{l} {\alpha_{i} \sum\limits_{j = 1}^{l} {\alpha_{j} \left\langle \Upphi (x_{i} ),\Upphi (x_{j} )\right\rangle } } \sum\limits_{i = 1}^{l} {\alpha_{i} \sum\limits_{j = 1}^{l} {\alpha_{j} k(x_{i} ,x_{j} )} } \\ \end{aligned} $$(1.7)
(c)
Norm of two vectors differ
$$ \begin{aligned} \left\| {\Upphi (x) - \Upphi (z)} \right\|_{2} & = \left\langle \Upphi (x) - \Upphi (z),\Upphi (x) - \Upphi (z)\right\rangle \\ & = \left\langle \Upphi (x),\Upphi (x)\right\rangle - 2\left\langle \Upphi (x),\Upphi (z)\right\rangle + \left\langle \Upphi (x),\Upphi (z)\right\rangle \\ & = k(x,x) - 2k(x,z) + k(z,z) \\ \end{aligned} $$(1.8)
According to T. M. Cover’s pattern classification theory, one complicate pattern classification will be more easily classified in the higher-dimensional mapping space than low-dimensional nonlinear mapping space.
Suppose that $$ k $$ is real positive definite kernel, and $$ R^{\mathbb{R}} : = \left\{ {f:{\mathbb{R}} \to R} \right\} $$ is the kernel mapping space from $$ {\mathbb{R}} $$ to $$ R $$ , then the mapping from $$ {\mathbb{R}} $$ to $$ R^{\mathbb{R}} $$ is defined by
$$ \begin{aligned} \Upphi : &\, {\mathbb{R}} \to R^{\mathbb{R}} \\ & x \to k( \cdot ,x) \\ \end{aligned} $$(1.9)
is Reproduced kernel mapping.
Mercer proposition given the function $$ k $$ in $$ {\mathbb{R}}^{2} $$ , then
$$ T_{k} :H_{2} ({\mathbb{R}}) \to H_{2} ({\mathbb{R}}) $$$$ \left( {T_{k} f} \right)(x): = \int\limits_{\mathbb{R}} {k(x,x^{\prime } )f(x^{\prime } ){\text{d}}\mu (x^{\prime } )} $$(1.10)
is positive, that is, for all $$ f \in H_{2} ({\mathbb{R}}^{2} ) $$ , then
$$ \int\limits_{{{\mathbb{R}}^{2} }} {k(x,x^{\prime } )f(x)f(x^{\prime } ){\text{d}}\mu (x)\mu (x^{\prime} )}\,\geq\,0 $$(1.11)
Then
$$ k(x,x^{\prime } ) = \sum\limits_{j = 1}^{{n_{f} }} {\lambda_{j} \psi_{j} (x)\psi_{j} (x^{\prime } )} $$(1.12)
Suppose that $$ k $$ satisfies kernel with Mercer Proposition, define the mapping from $$ {\mathbb{R}} $$ to $$ R^{\mathbb{R}} $$ as
$$ \Upphi :{\mathbb{R}} \to h_{2}^{{n_{f} }} $$$$ x \to \left( {\sqrt {\lambda_{j} \psi_{j} (x)} } \right)_{{j = 1,2, \ldots ,n_{f} }} $$(1.13)
is Mercer kernel mapping, where $$ \psi_{j} \in H_{2} ({\mathbb{R}}) $$ denotes the eigenvalue function $$ T_{k} $$ and its eigenvalue $$ \lambda_{j} $$ , $$ n_{f} $$ and $$ \psi_{j} $$ have the same definition to Mercer Proposition.
Supposed that $$ k $$ is Mercer kernel, $$ \Upphi $$ is Mercer kernel mapping, for all $$ (x,x^{\prime } ) \in {\mathbb{R}}^{2} $$ , then
$$ \left\langle {\Upphi (x),\Upphi (x^{\prime } )} \right\rangle = k(x,x^{\prime } ) $$(1.14)
Mercer kernel mapping is used to construct the Hilbert space, and the inner product is defined with kernel function. Mercer kernel and position definite kernel may be defined with the inner product in Hilbert kernel.
Suppose that $$ {\mathbb{R}} = [a,c] $$ is compact region, $$ k:[a,c] \times [a,c] \to C $$ is continuous function, then $$ k $$ is position definite kernel, only each continuous function $$ f:{\mathbb{R}} \to C $$ , then
$$ \int\limits_{{{\mathbb{R}}^{2} }} {k(x,x^{\prime } )f(x)f(x^{\prime } ){\text{d}}x{\text{d}}x^{\prime } \ge 0}. $$(1.15)
1.3 Current Research Status
In 1960s, kernel function has been introduced into pattern recognition, but it just developed to one hot research topic until SVM was successfully used in pattern recognition areas [39, 40]. In the following research, Scholkopf introduced kernel learning into feature extraction [41–43] and proposed kernel principal component analysis (KPCA) [41, 42], Mika [44–47], Baudat [48] and Roth [49] extended the linear discriminant analysis (LDA) method into kernel version through using kernel trick. From that, kernel learning and its relative research attracted researchers’ interest, and three research stages of the kernel learning are shown as follows. In the first stage, before 2000, the beginning research of the kernel learning, the main research fruits include KPCA, kernel discriminant analysis. Other few research fruits were achieved. In the second stage, 2000–2004, some relative kernel learning algorithms are achieved such as kernel HMM [50], kernel associative memory [51]. This stage of research is regarded as the basis for the following research on kernel learning.
In the third stage, from 2005 to now, many researchers devote their interests to the kernel learning research area. They developed many kernel learning methods and applied them to many practical applications. Many universities and research institutions carried out kernel research study earlier, such as Yale, MIT, Microsoft Corporation, and achieved fruitful results. China’s Shanghai Jiao Tong University, Nanjing University, Nanjing University, Harbin Institute of Technology, Shenzhen Graduate Institute, and other research institutions have recently carried out learning algorithms and applications of kernel research gradually and have achieved some results.
Although research on kernel learning only lasted about a decade, but it has formed a relatively complete system of kernel learning research and a number of research branches are developed. They are kernel-based classification, kernel clustering algorithms, feature extraction based on kernel learning algorithms, kernel-based learning neural networks and kernel applications and other research application branch.
1.3.1 Kernel Classification
Kernel learning method originated in SVMs [39, 40, 52], which is a typical classification algorithms. In subsequent studies, the researchers made a variety of kernel-based learning classification algorithm. Peng et al. applied kernel method to improve the nearest neighbor classifier [53], and implemented the nearest neighbor classification in the nonlinear mapping space. Recently, researchers have proposed some new kernel-based learning classification algorithm, Jiao et al. [54] proposed kernel matching pursuit classifier (KMPC) algorithm, as well as Zhang et al. [55] proposed a learning-based minimum distance classifier, and to optimize the kernel function parameters applied to the idea of algorithm design, the algorithm can automatically adjust the parameters of the kernel function and enhance the ability of nonlinear classification problem. In addition, Jiao et al. [56] proposed kernel matching pursuit classification algorithm.
1.3.2 Kernel Clustering
Kernel clustering algorithm was developed only in recent years as an important branch of kernel learning. Ma et al. [57] proposed a discriminant analysis based on kernel clustering algorithm, which is the main idea to use kernel learning to map the original data into a high-dimensional feature space. This method performed C-means clustering discriminant analysis algorithms. Have et al. [58] use kernel methods of spectral clustering method extended to kernel spectral clustering method, and Kima [59] and other researcher presented other various kernel-based learning methods for clustering comparison. Kernel clustering-applied research also received the majority of attention of scholars. Researchers use kernel clustering as target tracking, character recognition, and other fields. Studies have shown that kernel learning algorithm clustering has been successfully applied and is widely used in various fields.
1.3.3 Kernel Feature Extraction
The branch of study and research in the kernel learning field was the most active research topic. Kernel feature extraction algorithm to learn a wealth of linear feature extraction algorithm can learn from the research results, coupled with its wide range of applications prompted the research branch of rapid development. Most of the algorithm is a linear feature extraction algorithm expansion, and improvement of the algorithm is a landmark KPCA algorithm [42] and KFD algorithm [49]. The success of these two algorithms with the kernel method to solve linear principal component analysis (PCA) and LDA in dealing with highly complex nonlinear distribution structure classification problem encountered difficulties. In subsequent research work, Yang et al. proposed KPCA algorithm for FR. Facial feature extraction based on combined Fisherface algorithm is also presented [60]. The combined Fisherface method extract two different characteristics of an object using PCA and KPCA respectively, and the two different characteristics are complementary to the image recognition. Finally the combination of two characteristics is used to identify the image. Liu [61] extended the polynomial kernel function as fractional power polynomial models and combined with KPCA and Gabor wavelet for FR. Lu and Liang et al. proposed kernel direct discriminant analysis (KDDA) algorithm [62–64], and the method differed from traditional KDA algorithms. Liang et al. [65] and Liang [66] proposed two kinds of criterions to solve mapping matrix. Their algorithms are similar because they all solved eigenvectors of degree minimum mapping matrix between unrelated and related original samples. This method was reported good results on recognition. Yang [67] analyzed theoretically KFD algorithm connotation and proposed a two stages of KPCA + LDA for KFD algorithm, and Belhumeur et al. [68] proposed the improved Fisherface algorithm in order to solve the SSS problem. Yang et al. in subsequent work theoretically proved the rationality of the algorithm [69]. In addition, Baudat et al. [70] proposed that a kernel-based generalized discriminant analysis algorithm with KDA difference is that it found that change in interclass matrix is zero matrix such a transformation matrix. Zheng and other researchers proposed a weighting factor based on the maximum interval discriminant analysis algorithm [71], and Wu et al. proposed a fuzzy kernel discriminant analysis algorithm [72], and Tao et al. proposed KDDA algorithmic improvements [73]. In fact, the choice of kernel parameter has a great impact on the performance of the algorithm. In order to try to avoid the kernel function parameters on the algorithm, the researchers applied kernel parameter selection method into KDA algorithm to improve the performance. For example, Huang [74], Wang [75], and Chen [76] selected kernel function parameters to improve KDA, and other improved KDA algorithms are presented by other literatures [77–86]. In addition, many other kernel-based learning methods were presented for feature extraction and classification. Wang et al. proposed a kernel-based HMM algorithm [87]. Yang used the kernel method to independent component analysis (ICA) for feature extraction and presented kernel independence Element Analysis (KICA) [88], and Chang et al. [89] proposed kernel Particle filter for target tracking. Zhang et al. [90] proposed Kernel Pooled Local Subspaces for feature extraction and classification.
1.3.4 Kernel Neural Network
In recent years, kernel method was applied to neural networks. For example, Shi et al. [91] will reproduce kernel and organically combine it with neural networks to propose reproduced kernel neural networks. The classical application of kernel in the neural network is self-organizing map (SOM) [60, 92–94]. The goal of SOM is to use low-dimensional space of the original high-dimensional space point that represents the point of making this representation to preserve the original distance or similarity relation. Zhang et al. [91] proposed kernel associative memory combined with wavelet feature extraction algorithm. Zhang et al. [95] proposed a Gabor wavelet associative memory combined with the kernel-based FR algorithm, Sussner et al. [96] proposed based on dual kernel associative memory algorithms, and Wang et al. [97] used the empirical kernel map associative memory to enhance the performance of the algorithm method.
1.3.5 Kernel Application
With the kernel research, kernel learning methods are widely used in many applications, for example, character recognition [98, 99], FR [100–102], text classification [103, 104], DNA analysis [105–107], expert system [108], image retrieval [109]. Kernel-learning-based FR is the most popular application, and kernel method provides one solution to PIE problems of FR.
1.4 Problems and Contributions
Kernel learning is an important research topic in the machine learning area, and some theory and application fruits are achieved and widely applied in pattern recognition, data mining, computer vision, and image and signal processing areas. The nonlinear problems are solved at large with kernel function and system performances such as recognition accuracy, prediction accuracy largely increased. However, kernel learning method still endures a key problem, i.e., kernel function and its parameter selection. Researches show that kernel function and its parameters have the direct influence on the data distribution in the nonlinear feature space, and the inappropriate selection will influence the performance of kernel learning. Research on self-adaptive learning of kernel function and its parameter has an important theoretical value for solving the kernel selection problem widely endured by kernel learning machine and has the same important practical meaning for the improvement of kernel learning systems.
The main contributions of this book are described as follows.
Firstly, for the parameter selection problems endured by kernel learning algorithms, the book proposes kernel optimization method with the data-dependent kernel. The definition of data-dependent kernel is extended, and the optimal parameters of data-dependent kernel are achieved through solving the optimization equation created based on Fisher criterion and maximum margin criterion. Two kernel optimization algorithms are evaluated and analyzed from two different views.
Secondly, for the problems of computation efficiency and storage space endured by kernel-learning-based image feature extraction, an image-matrix-based Gaussian kernel directly dealing with the images is proposed. The image matrix is not needed to be transformed to the vector when the kernel is used in image feature extraction. Moreover, by combining the data-dependent kernel and kernel optimization, we propose an adaptive image-matrix-based Gaussian kernel which not only directly deals with the image matrix but also adaptively adjusts the parameters of the kernels according to the input image matrix. This kernel can improve the performance of kernel-learning-based image feature extraction.
Thirdly, for the selection of kernel function and its parameters endured by traditional kernel discriminant analysis, the data-dependent kernel is applied to kernel discriminant analysis. Two algorithms named FC + FC-based adaptive kernel discriminant analysis and MMC + FC-based adaptive kernel discriminant analysis are proposed. Two algorithms are based on the idea of combining kernel optimization and linear projection based on two-stage algorithm. Two algorithms adaptively adjust the structure of kernels according to the distribution of the input samples in the input space and optimize the mapping of sample data from the input space to the feature space. So the extracted features have more class discriminative ability compared with traditional kernel discriminant analysis. On parameter selection problem endured by traditional kernel discriminant analysis, this report presents nonparameter kernel discriminant analysis (NKDA) and this method solves the performance of classifier owing to unfitted parameter selection. On kernel function and its parameter selection, kernel structure self-adaptive discriminant analysis algorithms are proposed and testified with the simulations.
Fourthly, for the problems endured by the recently proposed locality preserving projection (LPP) algorithm, (1) the class label information of training samples is not used during training; (2) LPP is a linear-transformation-based feature extraction method and is not able to extract the nonlinear features; (3) LPP endures the parameter selection problem when it creates the nearest neighbor graph. For the above problems, this book proposes a supervised kernel LPP algorithm, and this algorithm applies the supervised no parameter method of creating the nearest neighbor graph. The extracted nonlinear features have the largest class discriminative ability. The improved algorithm solves the above problems endured by LPP and enhances its performance on feature extraction.
Fifthly, for the pose, illumination, and expression (PIE) problems endured by image feature extraction for FR, three kernel-learning-based FR algorithms are proposed. (1) In order to make full use of advantages of signal-processing- and learning-based methods on image feature extraction, a face image extraction method of combining Gabor wavelet and enhanced kernel discriminant analysis is proposed. (2) Polynomial kernel is extended to fractional power polynomial model, and it is used to kernel discriminant analysis. A fraction power-polynomial-model-based kernel discriminant analysis for feature extraction of facial image is proposed. (3) In order to make full use of the linear and nonlinear features of images, an adaptively fusing PCA and KPCA for face image extraction are proposed.
Sixthly, on the training samples’ number and kernel function and its parameter endured by KPCA, this report presents one-class support-vector-based sparse kernel principal component analysis (SKPCA). Moreover, data-dependent kernel is introduced and extended to propose SKPCA