Subclass Marginal Fisher Analysis

Maronidis, Anastasios; Tefas, Anastasios; Pitas, Ioannis

Subclass Marginal Fisher Analysis

2015 IEEE Symposium Series on Computational Intelligence, 2015

Subclass Marginal Fisher Analysis Anastasios Maronidis * , Anastasios Tefas † and Ioannis Pitas ‡ Department of Informatics, Aristotle University of Thessaloniki, P.O.Box 451, 54124 Thessaloniki, Greece Email: * amaronidis@iti.gr, † tefas@aiia.csd.auth.gr, ‡ pitas@aiia.csd.auth.gr Abstract—Subspace learning techniques have been extensively used for dimensionality reduction (DR) in many pattern classiﬁcation problem domains. Recently, Discriminant Analysis (DA) methods, which use sub- class information for the discrimination between the data classes, have attracted much attention. As DA methods are strongly dependent on the underlying distribution of the data, techniques whose functionality is based on neighbourhood information among the data samples have emerged. For instance, based on the Graph Embedding (GE) framework, which is a platform for developing novel DR methods, Marginal Fisher Analysis (MFA) has been proposed. Although MFA surpasses the above distribution limitations, it fails to model potential subclass structure that might lie within the several classes of the data. In this paper, motivated by the need to alleviate the above shortcom- ings, we propose a novel DR technique, called Subclass Marginal Fisher Analysis (SMFA), which combines the strength of subclass DA methods with the versatility of MFA. The new method is built by extending the GE framework so as to include subclass information. Through a series of experiments on various real-world datasets, it is shown that SMFA outperforms in most of the cases the state-of-the-art demonstrating the poten- tial of exploiting subclass neighbourhood information in the DR process. I. I NTRODUCTION Dimensionality reduction (DR) is an important process for achieving efﬁcient pattern classiﬁcation. In recent years, a variety of subspace learning algo- rithms for DR has been developed. Locality Preserv- ing Projections (LPP) [1], [2] and Principal Compo- nent Analysis (PCA) [3] are two of the most popu- lar unsupervised linear DR algorithms with a wide range of applications. Besides, supervised methods like Linear Discriminant Analysis (LDA) [4] have shown superior performance in many classiﬁcation problems, since through the DR process they aim at achieving data class discrimination. In practice, usually there is the case that many data clusters appear inside the same class impos- ing the need to integrate this information in the DR process. Along these lines, techniques such as Clustering Discriminant Analysis (CDA) [5] and Subclass Discriminant Analysis (SDA) [6] have been proposed. Both of them utilize a speciﬁc objective criterion that incorporates data subclass information aiming to discriminate subclasses that belong to different classes, while putting no constraints to subclasses within the same class. Although the above methods have proven their potential in various classiﬁcation problems, their correct performance is highly dependent on speciﬁc assumptions with respect to the underlying distri- bution of the data samples [4]. Since in real-world problems such assumptions are rarely satisﬁed, it is clear that there is a need to overcome the limitations related to the above methods. Towards this end, in [7], the authors have presented a Graph Embedding (GE) framework, which offers as a platform to develop new DR methods. Using GE, they have pro- posed Marginal Fisher Analysis (MFA), which uses neighbourhood information among adjacent samples within and between the classes of a dataset. The advantage of MFA is that it models the intra-class compactness and the inter-class separability using vicinity information among the samples ignoring the underlying distribution of the data classes. Although MFA overcomes the limitations related to class distribution, it totally deﬁes potential struc- ture within the classes in the form of subclasses. Such structure is anticipated to provide DR process with crucial information, which may allow better discrimination of the classes. In this paper, extending the GE framework [7] so as to include subclass information, we propose a novel Subclass Marginal Fisher Analysis (SMFA) algorithm for supervised dimensionality reduction. The new method combines the modularity of subclass based methods with the strength of MFA, as it models the margins among classes using neighbourhood information between the samples belonging to the several subclasses. This combination enables SMFA to overcome the short-

Maronidis, A., Tefas, A., & Pitas, I. (2016). Subclass Marginal Fisher Analysis. In 2015 IEEE Symposium Series on Computational Intelligence (SSCI 2015): Proceedings of a meeting held 7-10 December 2015, Cape Town, South Africa (pp. 1391-1398). Institute of Electrical and Electronics Engineers (IEEE). https://doi.org/10.1109/SSCI.2015.198 Peer reviewed version Link to published version (if available): 10.1109/SSCI.2015.198 Link to publication record in Explore Bristol Research PDF-document This is the author accepted manuscript (AAM). The final published version (version of record) is available online via IEEE at 10.1109/SSCI.2015.198. Please refer to any applicable terms of use of the publisher. University of Bristol - Explore Bristol Research General rights This document is made available in accordance with publisher policies. Please cite only the published version using the reference above. Full terms of use are available: http://www.bristol.ac.uk/pure/userguides/explore-bristol-research/ebr-terms/ Subclass Marginal Fisher Analysis Anastasios Maronidis∗ , Anastasios Tefas† and Ioannis Pitas‡ Department of Informatics, Aristotle University of Thessaloniki, P.O.Box 451, 54124 Thessaloniki, Greece Email: ∗ amaronidis@iti.gr, † tefas@aiia.csd.auth.gr, ‡ pitas@aiia.csd.auth.gr Abstract—Subspace learning techniques have been extensively used for dimensionality reduction (DR) in many pattern classification problem domains. Recently, Discriminant Analysis (DA) methods, which use subclass information for the discrimination between the data classes, have attracted much attention. As DA methods are strongly dependent on the underlying distribution of the data, techniques whose functionality is based on neighbourhood information among the data samples have emerged. For instance, based on the Graph Embedding (GE) framework, which is a platform for developing novel DR methods, Marginal Fisher Analysis (MFA) has been proposed. Although MFA surpasses the above distribution limitations, it fails to model potential subclass structure that might lie within the several classes of the data. In this paper, motivated by the need to alleviate the above shortcomings, we propose a novel DR technique, called Subclass Marginal Fisher Analysis (SMFA), which combines the strength of subclass DA methods with the versatility of MFA. The new method is built by extending the GE framework so as to include subclass information. Through a series of experiments on various real-world datasets, it is shown that SMFA outperforms in most of the cases the state-of-the-art demonstrating the potential of exploiting subclass neighbourhood information in the DR process. I. I NTRODUCTION Dimensionality reduction (DR) is an important process for achieving efficient pattern classification. In recent years, a variety of subspace learning algorithms for DR has been developed. Locality Preserving Projections (LPP) [1], [2] and Principal Component Analysis (PCA) [3] are two of the most popular unsupervised linear DR algorithms with a wide range of applications. Besides, supervised methods like Linear Discriminant Analysis (LDA) [4] have shown superior performance in many classification problems, since through the DR process they aim at achieving data class discrimination. In practice, usually there is the case that many data clusters appear inside the same class imposing the need to integrate this information in the DR process. Along these lines, techniques such as Clustering Discriminant Analysis (CDA) [5] and Subclass Discriminant Analysis (SDA) [6] have been proposed. Both of them utilize a specific objective criterion that incorporates data subclass information aiming to discriminate subclasses that belong to different classes, while putting no constraints to subclasses within the same class. Although the above methods have proven their potential in various classification problems, their correct performance is highly dependent on specific assumptions with respect to the underlying distribution of the data samples [4]. Since in real-world problems such assumptions are rarely satisfied, it is clear that there is a need to overcome the limitations related to the above methods. Towards this end, in [7], the authors have presented a Graph Embedding (GE) framework, which offers as a platform to develop new DR methods. Using GE, they have proposed Marginal Fisher Analysis (MFA), which uses neighbourhood information among adjacent samples within and between the classes of a dataset. The advantage of MFA is that it models the intra-class compactness and the inter-class separability using vicinity information among the samples ignoring the underlying distribution of the data classes. Although MFA overcomes the limitations related to class distribution, it totally defies potential structure within the classes in the form of subclasses. Such structure is anticipated to provide DR process with crucial information, which may allow better discrimination of the classes. In this paper, extending the GE framework [7] so as to include subclass information, we propose a novel Subclass Marginal Fisher Analysis (SMFA) algorithm for supervised dimensionality reduction. The new method combines the modularity of subclass based methods with the strength of MFA, as it models the margins among classes using neighbourhood information between the samples belonging to the several subclasses. This combination enables SMFA to overcome the short- comings stemming from the distribution constraints of the data leading to improved classification performance. As a matter of fact, through an experimental comparison, it is shown that our method outperforms a number of state-of-the-art dimensionality reduction methods in terms of classification accuracy. The remainder of this paper is organized as follows. A literature review of related work is presented in Section II. The GE framework, which is employed for developing our method is described in Section III, while the novel SMFA method along with its kernelization is presented in Section IV. A comparison of SMFA with all the state-of-the-art subspace methods mentioned in the Introduction is conducted in Section V on a number of real-world datasets. Finally, conclusions are drawn in Section VI. II. R ELATED W ORK Although LDA proves to be an effective method in many classification problems, it encounters some fundamental limitations. For instance, it suffers from the small sample size problem, which occurs when the number of the training samples is smaller than the data dimensionality. In this case, LDA fails to optimize its objective criterion, due to the singularity of the involved matrices. A solution to this problem has been provided in [8], where the authors propose the use of the pseudo-inverse of a matrix, in order to overcome matrix singularity. Another approach is the utilization of PCA as a preprocessing step to reduce data dimensionality and then, the application of LDA, resulting to the combined PCA + LDA method [4]. For overcoming the small sample size problem, regularization techniques have also been employed [11], [12]. Moreover, in an indirect way to deal with the singularity problem, another method (2D-LDA), where the data are represented as matrices has been proposed in [10]. As has been clearly stated in [9], an additional problem appears when some of the smallest eigenvalues of the within matrix correspond to noisy features of the data. A factorization that prunes the noisy bases of the within matrix and a correlation-based criterion have been proposed in [9] for solving these problems. Another strong limitation is that LDA postulates that the data class samples have multivariate Gaussian distribution, common covariance matrix and different means, for achieving the optimal discrimination in Bayesian terms [13]. In real problems though, the class data might not be normally distributed. Many extensions of LDA have been proposed in the literature for circumventing these limitations [14], [15], [16], [17]. Amongst the most effective methods towards this end is Marginal Fisher Analysis [7] designed based on the Graph Embedding framework. MFA uses adjacency information among the data samples and succeeds in overcoming the abovementioned distribution limitations. However, MFA ignores information stemming from potential subclass structure within the data classes. As already mentioned in the Introduction, CDA and SDA have been proposed for exploiting subclass structure of the data. Along the same lines, a Mixture Subclass Discriminant Analysis (MSDA) method that modifies the objective function of SDA has been proposed in [18]. Moreover, the link between MSDA and the Gaussian mixture model has been accomplished using the Expectation-Maximization framework. In the same work, MSDA has further been extended in several ways so that the subclass separation problem is solved and nonlinearly separable subclass structure has been tackled using the kernel trick. In [19], a Multiple-Exemplar Discriminant Analysis (MEDA) method is presented. The classes are represented by some exemplar vectors. Using these exemplars, an objective criterion is constructed. In this vein, the subclass means can be used as exemplars, hence exploiting the subclass structure of the data. Subspace learning and clustering have been treated together into an iterative process in [20]. Intra-cluster similarity and inter-cluster separability are enhanced using initial cluster estimation in the subspace-learning step. Then, affinity propagation is adopted for clustering the reduced data providing an updated clustering estimation. In [21], the authors combine global with local geometric structures using a regularization technique. The singularity problem is tackled by imposing penalty on parameters and the optimal parameter is chosen based on a model selection approach. For conducting nonlinear DR, the application of the kernel trick to the linear approaches has been proposed [22]. The main idea is to firstly map the data from the initial space to a high-dimensional Hilbert space, where they might be linearly separable and then use a linear subspace method. This approach results to the kernelized versions of the linear techniques, that have already been developed, i.e., Kernel Principal Component Analysis (KPCA) [23], Kernel Discriminant Analysis (KDA) [24], Kernel Clustering Discriminant Analysis (KCDA) [25], Kernel Subclass Discriminant Analysis (KSDA) [26], etc. From the above review, it looks as though the several limitations stemming from the data distribu- tions or the singularity of the involved matrices have been successfully addressed by dedicated methods. However, there is still enough space for improvement as the new methods introduce new limitations. For instance, subclass-based methods postulate that the data subclasses have Gaussian distributions, hence translating the problem from classes to subclasses. Moreover, although some of the above-mentioned techniques manage to deal with such limitations and optimally model the distributions of the training data, the generalization ability to the test data still remains an open challenge. To this end, as we will see in the following sections, our method achieves surpassing any distribution related limitations, while at the same moment offers great generalization chances. III. G RAPH E MBEDDING In the GE framework [7], the set of the data samples to be projected in a low dimensional space is represented by two graphs, namely, the intrinsic Gint = {X , Wint } and the penalty Gpen = {X , Wpen } graph, where X = {x1 , x2 , · · · , xn } is the set of the data samples in both graphs. Moreover, Wint and Wpen is the intrinsic and the penalty weight matrix, respectively. The intrinsic weight matrix models the similarity connections between every pair of data samples that have to be reinforced after the projection. The penalty weight matrix contains the connections between the data samples that must be suppressed after the projection. For both of the above matrices these connections can have negative values. A negative value causes the opposite results, i.e., a negative value in the intrinsic matrix means that the corresponding data samples should diverge and a negative value in the penalty matrix means that the corresponding data samples should converge after the projection. Now, the problem of DR could be interpreted in an alternative way. It is desirable to project the initial data to the new low dimensional space, such that the geometrical structure of the data is preserved. The corresponding objective function for optimization is: postulates that, the larger the value Wint (q, p) is, the smaller the distance between the projections of the data samples xq and xp has to be. By using some simple algebraic manipulations, equation (2) becomes: J(Y) = tr{YLint YT } , where Lint = Dint −Wint is the intrinsic Laplacian matrix and Dint is the degree matrix defined as the diagonal matrix,P which has at position (q, q) the value Dint (q, q) = p Wint (q, p). Similarly, the Laplacian matrix Lpen = Dpen − Wpen of the penalty graph is often used as the constraint matrix B. Thus, the above optimization problem becomes: argmin tr{YLint YT } . tr{YLpen YT } argmin J(Y) , (1) (4) The optimization of the above objective function is achieved by solving the generalized eigenproblem: Lint v = λLpen v , (5) keeping the eigenvectors, which correspond to the smallest eigenvalues. This approach leads to the optimal projection of the given data samples. In order to achieve the out of sample projection, the linearization of the above approach should be used [7]. If we employ y = VT x, the objective function (2) becomes: argmin J(V) , (6) tr{VT XLpen XT V}=d J(V) = 1 tr{VT 2 XX q (xq − xp ) p Wint (q, p)(xq − xp ) tr{YBY T }=d (3) T ! V} , (7) where X = [x1 , x2 , . . . , xn ]. By using simple algebraic manipulations, we have: 1 XX tr{ (yq −yp )Wint (q, p)(yq −yp )T } , J(V) = tr{VT XLint XT V} . (8) 2 q p (2) Similarly to the straight approach, the optimal eigenwhere Y = [y1 , y2 , · · · , yn ] are the projected vecvectors are given by solving the generalized eigentors, d is a constant, B is a constraint matrix defined problem: to remove an arbitrary scaling factor in the embedding and Wint (q, p) is the value of Wint at position XLint XT v = λXLpen XT v . (9) (q, p). The structure of the objective function (2) J(Y) = IV. S UBCLASS M ARGINAL F ISHER A NALYSIS In this section, motivated by the well-known Marginal Fisher Analysis (MFA) method presented in [7], we propose a novel algorithm for dimensionality reduction, called Subclass Marginal Fisher Analysis (SMFA) employing the GE framework. The new method combines the power of subclass methods with the agility of the typical MFA to overcome the limitation of the intraclass Gaussian distribution assumption. The intrinsic graph matrix characterizes the intra-subclass compactness, while the penalty graph matrix characterizes the inter-class separability. Both graph matrices are built using neighbouring information of the graph nodes. More specifically, based on the graph embedding formulation presented in Section III, the intrinsic graph matrix is defined as: 1, if p ∈ Nkint (q) or q ∈ Nkint (p) , Wint (p, q) = 0, otherwise (10) where Nkint (q) denotes the index set of the kint nearest neighbours of the q-th sample in the same subclass. The penalty graph matrix is defined as: 1, if p ∈ Mkpen (q) or q ∈ Mkpen (p) Wpen (p, q) = , 0, otherwise (11) where Mkpen (q) denotes the set of samples that belong to the kpen nearest neighbours of q outside the class of q. It is worth noting that in contrast to the intrinsic graph matrix, the values of the penalty graph matrix depend on the class information regardless of the subclass labels. In this way we avoid to put constraints between subclasses belonging to the same class offering better generalization chances. The proposed SMFA algorithm inherits all the advantages of the typical MFA method. More specifically, there is no assumption on the data distribution, since the intra-subclass compactness is encoded by the nearest neighbours of the data belonging to the same subclass and the inter-class separability is modelled using the margins among the classes. Moreover, the functionality of SMFA is based on two parameters, i.e., kint and kpen , which appropriately adjusted may lead to avoiding potential overfitting, therefore offering huge generalization power to the method. Also, the available projection dimensionality using SMFA is determined by kpen , which almost always is much larger than that of LDA, CDA and SDA. Finally, SMFA is capable of leveraging potential subclass structure of the data, which in many cases may boost its performance. In Section V, the superiority of SMFA over a number of previously presented state-of-the-art DR methods in terms of classification accuracy is demonstrated through a series of experiments. A. Kernel Subclass Marginal Fisher Analysis In this section, the kernelization of SMFA (KSMFA) is presented. Kernels are widely used in classification problems, where the data are not linearly separable and in unsupervised learning when the data lie on a nonlinear manifold. Let us denote by X the initial data space, by F a Hilbert space and by f the non-linear mapping function from X to F. The main idea is to firstly map the original data from the initial space into another high-dimensional Hilbert space and then perform linear subspace analysis in that space. If we denote by mF the dimensionality of the Hilbert space, then the above procedure is described as: Pn ! p=1 a1p k(xq , xp ) . .. X ∋ xq→ yq=f (xq ) = ∈F , Pn a k(x , x ) q p p=1 mF p (12) where k is the kernel function. From the above equation it is obvious that Y = AT K , (13) where K is the Gram matrix, which has at (q, p) the value Kqp = k(xq , xp ) and  a11 · · · amF 1  .. .. .. A = [a1 · · · amF ] =  . . . a1n ··· amF n position    (14) is the map coefficient matrix. Consequently, the final KSMFA optimization becomes: argmin tr{AT KLint KA} , tr{AT KLpen KA} (15) where Lint = Dint − Wint and Lpen = Dpen − Wpen and Wint , Wpen are those defined in eq. 10 and 11, respectively. Similarly to the linear case, in order to find the optimal projections, we resolve the generalized eigenproblem: KLint Ka = λKLpen Ka , (16) keeping the eigenvectors that correspond to the smallest eigenvalues. B. Subclass Extraction From the above discussion, the need for efficient data clustering, is evident. A variety of clustering methods has been proposed in the literature. Techniques such as K-means and ExpectationMaximization (EM) [27] have been used for extracting clusters in a database. It is well-known that there is no method that consistently outperforms the others. A relatively new technique relying on spectral graph theory [28], called Spectral Clustering (SC), has also been proposed for data clustering. It has been shown that SC often outperforms traditional clustering algorithms such as K-Means [29]. However, the use of this method has certain limitations, described in [30]. SC can be used for the estimation of the correct number of subclasses within each class [29]. Another potential advantage of SC is that it uses the Gram matrix, which is also used by KSMFA. Therefore, when combining SC with KSMFA, the Gram matrix has to be calculated once, hence reducing the computational load. In this paper, a multiscale Spectral Clustering (MSC) approach, proposed in [31] has been used, in order to extract clusters within each class of the data at different scales. V. E XPERIMENTAL R ESULTS We conducted classification experiments on several real-world datasets using LPP, PCA, LDA, MFA, CDA, SDA and SMFA along with their kernel counterparts. For validating the performance of the algorithms, the 5-fold cross-validation procedure has been used. For extracting automatically the subclass structure, we have utilized the MSC technique [31], keeping the most plausible partition for each dataset. For classifying the data, the Nearest Centroid (NC) classifier has been used with LPP, PCA LDA and MFA algorithms, while the Nearest Cluster Centroid (NCC) [32] has been used with CDA, SDA and SMFA algorithms. In NCC, the cluster centroids are calculated and the test sample is assigned to the class of the nearest cluster centroid. NC and NCC were selected because they provide the optimal classification solutions in Bayesian terms, thus proving whether the DR methods have reached the goal described by their specific criterion. In the following paragraphs, we briefly present the datasets that have been used along with the performance rates of the various subspace learning methods. A. Classification experiments For the classification experiments, we have used diverse publicly available datasets offered for various classification problems. More specifically, FERAIIA, BU, JAFFE and KANADE were used for facial expression recognition, XM2VTS for face frontal view recognition, while MNIST and SEMEION for optical digit recognition. Finally, IONOSPHERE, MONK and PIMA were used in order to further extend our experimental study to diverse data classification problems. In our experiments, for performing DR we have used both the linear and the RBF kernel approach. The maximal dimensionality of the reduced space is determined by the rank of the corresponding matrices utilized by the discriminant analysis methods. Moreover, LPP is a parametric method regarding the variance of Gaussian similarity function, when constructing the affinity matrix. Thus, looking for the optimal variance, in order to achieve the best classification results, makes the comparison very complex. In this paper, for the sake of simplicity and relying on some empirical studies of ours, this parameter was allowed to take values in the range [0.1 · Ê(dij ), 2.0 · Ê(dij )], with step 0.1 · Ê(dij ), where Ê denotes the sample mean and dij is the Euclidean distance between i, j samples. The cross-validation classification accuracy rates for the several subspace learning methods over the utilized datasets, are summarized in Tables I and II for the linear and the kernel methods, respectively. The optimal dimensionality of the projected space that returned the above results is depicted in parenthesis. For each dataset, the best performance rate among linear and kernel methods separately is highlighted with bold, while the best overall performance rate among all methods, both linear and kernel, is surrounded by a rectangle. For ranking the methods in terms of classification performance we further conducted a posthoc Bonferroni test [33] for each pair of methods. The performance of pairwise methods is significantly different, if the corresponding average ranks q differ by at least the critical difference CD = qα j(j+1) 6T [34], where j is the number of methods compared, T is the number of data sets and critical values qα can be found in [35]. In our comparisons we set α = 0.05. The ranking has been performed including both linear and kernel methods in the comparison, as well as separately for the linear and kernel methods. The classification performance rank of each method is referred to in the last two rows of Tables I and II. Specific Rank denotes the method rank for the linear and the kernel methods, independently. Overall rank refers to the rank of each method among both the linear and the kernel methods. The ranking results are also illustrated in Fig. 1 left and right, for the linear and kernel methods, respectively. The vertical axis in both figures depicts the various methods, while the horizontal axis depicts the performance ranking. The circles indicate the mean rank and the intervals around them indicate the confidence interval as this is determined by the CD value. Overlapping intervals between two methods indicate that there is not a statistically significant TABLE I: Cross Validation Classification Accuracies (%) of Linear Methods on Several Real-World Datasets DATASET FER-AIIA BU JAFFE KANADE MNIST SEMEION XM2VTS IONOSPHERE MONK 1 MONK 2 MONK 3 PIMA SPECIFIC RANK OVERALL RANK LPP PCA LDA MFA CDA SDA SMFA 40.9(3) 39.4(298) 46.8(18) 34.2(92) 71.1(259) 53.6(99) 95.7(54) 84.6(23) 66.7(3) 56.0(1) 77.2(5) 61.8(1) 31.0(120) 38.1(49) 37.6(39) 43.3(46) 79.9(135) 83.2(55) 92.0(86) 72.3(15) 68.3(5) 53.3(4) 80.9(4) 63.5(6) 64.6(6) 51.6(6) 53.2(6) 67.1(6) 84.6(9) 88.2(9) 70.5(1) 78.9(1) 50.8(1) 52.0(1) 49.4(1) 56.5(1) 72.6(10) 52.4(6) 61.5(14) 66.3(19) 82.8(38) 86.9(8) 97.7(4) 76.0(12) 71.7(2) 58.7(2) 81.6(1) 74.4(1) 73.2 49.1(16) 40.0(15) 59.7(7) 84.8(15) 89.2(19) 98.1(3) 80.6(2) 70.0(4) 54.2(1) 74.6(2) 60.5(3) 75.5(11) 52.3(15) 54.1(6) 67.1(5) 85.1(14) 89.4(19) 97.4(2) 83.4(2) 74.2(3) 54.0(2) 66.3(2) 73.5(3) 72.6(12) 49.3(11) 44.9(20) 63.8(9) 85.3(40) 87.5(10) 98.4(4) 84.3(26) 78.3(2) 60.7(1) 86.1(5) 74.9(1) 5.1 9.0 5.8 9.8 5.0 8.5 3.0 5.0 4.0 6.6 2.7 5.0 2.3 4.0 TABLE II: Cross Validation Classification Accuracies (%) of Kernel Methods on Several Real-World Datasets DATASET FER-AIIA BU JAFFE KANADE MNIST SEMEION XM2VTS IONOSPHERE MONK 1 MONK 2 MONK 3 PIMA SPECIFIC RANK OVERALL RANK KLPP KPCA KDA KMFA KCDA KSDA KSMFA 50.2(252) 52.7(317) 28.8(98) 32.7(99) 81.4(299) 83.8(99) 71.3(297) 83.7(23) 63.3(2) 54.8(1) 62.5(2) 50.7(3) 41.5(29) 35.9(290) 25.9(58) 33.2(88) 64.5(155) 77.4(77) 74.7(56) 70.3(2) 72.5(1) 59.8(3) 79.2(5) 67.5(4) 54.9(6) 46.6(6) 42.4(6) 44.3(6) 86.0(9) 95.3(9) 61.3(1) 92.9(1) 55.8(1) 69.7(1) 51.7(1) 48.9(1) 61.3(9) 44.4(29) 47.8(6) 46.6(6) 86.4(21) 90.0(11) 78.7(31) 92.3(1) 60.0(1) 70.8(2) 79.2(2) 54.0(3) 56.1(12) 41.0(13) 36.1(18) 40.0(6) 83.4(19) 94.1(19) 71.5(3) 93.1(1) 58.3(4) 78.7(1) 67.5(2) 52.5(3) 53.5(12) 48.0(14) 46.3(5) 38.5(6) 85.2(15) 95.9(19) 57.3(4) 92.9(1) 61.7(3) 54.5(1) 58.3(1) 52.9(1) 56.7(39) 39.9(18) 34.1(13) 45.8(7) 86.7(34) 94.9(20) 81.2(4) 92.6(1) 70.8(4) 79.7(2) 73.3(2) 56.2(3) 5.3 10.2 5.0 10.0 4.3 8.1 2.8 6.3 3.9 8.2 4.1 8.3 2.6 6.1 difference between the corresponding ranks. The first remark from Tables I,II and Fig. 1 is that SMFA and KSMFA outperform the rest methods in the linear and kernel case, respectively. Although their superiority is not statistically significant over all remaining methods, undoubtedly these two methods offer a strong potential to improve the performance or the state-of-the-art in many classification domains. In addition, it is interesting to observe the robustness of SMFA and MFA along with their kernel counterparts across the datasets. This observation combined with the fact that both these methods rely on the same motivations shows the advantage gained by encoding the data distributions using neighbouring information between the samples towards overcoming the several limitations previously presented in this paper, offering at the same time great generalization chances. As a general remark, the superiority of subclass methods against unimodal ones is evident, with MFA and KMFA being vivid exceptions. The top overall performance is shown by SMFA followed by SDA and MFA, while the worst performance is shown by KLPP. More specifically, on the one hand, SDA, MFA and KMFA display on average the best performance in facial expression recognition problems. On the other hand, in optical digit recognition, face frontal view recognition and the remaining classification problems, SMFA and KSMFA clearly have on average the optimal performance. In comparing linear with kernel methods, a simple calculation yields mean overall rank equal to 6.84 for the linear methods and 8.17 for the kernel ones. Although the difference between the two approaches (i.e., linear and kernel) is significant, we must admit that there is ample space for improving the kernel results by varying the RBF parameter, as the selection of this parameter is not trivial and may easily lead to over-fitting. Actually, the top performance rates presented in this paper have been obtained by testing indicative values of the above parameter. As a matter of fact, it is interesting to observe that the use of kernels proves to be beneficial for some LPP KLPP PCA KPCA LDA KDA MFA KMFA CDA KCDA SDA KSDA SMFA KSMFA 1 2 3 4 5 6 7 8 1 Rank 2 3 4 Rank 5 6 7 Fig. 1: Ranking of Various Methods After Pairwise Post-Hoc Bonferroni Tests on Real Data. (Left: Linear Methods, Right: Kernel Methods) methods in certain datasets, while deteriorates the performance of others. For instance, from Tables I and II, the use of kernels boosts the performance of PCA in three out of the four last datasets (i.e., MONK 1, MONK 3 and PIMA), while this is not the case for example in XM2VTS. There are two main reasons for this. Firstly, while some datasets contain linearly separable classes, others may need some kernel to obtain this linearity. The second reason is that in our experiments, for relaxing the computational complexity, we have used the same kernel values per dataset across all methods and there is no fact advocating that the same value constitutes the optimal parameter for each method. VI. C ONCLUSIONS The main contribution of this paper is a novel Subclass Marginal Fisher Analysis (SMFA) dimensionality reduction method. The functionality of SMFA is based on adjacency information of data samples within the same subclass as well as the proximity of “marginal” samples belonging to different classes. In this way, the new method combines the flexibility of neighbourhood modelling methods, like MFA, with the modularity offered by subclass information towards overcoming inherent limitations stemming from the data distributions, offering at the same moment great generalization chances. Through an extensive experimental study, it has been shown that SMFA outperforms a number of state-of-the-art subspace learning methods in many real-world datasets pertaining to various classification domains. Similar remarks could also be drawn for KSMFA. Moreover, as a general remark, it could be stated that subclass-based methods exhibit supe- rior performance against unimodal ones, in terms of classification accuracy, proving the potential of including subclass information in the dimensionality reduction process. Although the performance of the proposed method is impressive, there is yet space for exploring new methods employing the Graph Embedding framework, either by designing completely new methods or by modifying SMFA. Experimenting on this direction is encompassed in our future plans. Moreover, in order to reinforce even more the outcomes of this paper and to provide more credibility to SMFA, in the near future we intend to extend our current experimental study to more datasets from additional classification domains. R EFERENCES [1] [2] [3] [4] [5] [6] [7] X. He and P. Niyogi, “Locality preserving projections,” in NIPS, S. Thrun, L. K. Saul, and B. Schölkopf, Eds. MIT Press, 2003. X. He, S. Yan, Y. Hu, P. Niyogi, and H. Zhang, “Face recognition using laplacianfaces,” IEEE Trans. Pattern Anal. Mach. Intell, vol. 27, no. 3, pp. 328–340, 2005. I. Jolliffe, Principal Component Analysis. Springer Verlag, 1986. D. J. Kriegman, J. P. Hespanha, and P. N. Belhumeur, “Eigenfaces vs. fisherfaces: Recognition using classspecific linear projection,” in ECCV, 1996, pp. I:43–58. X. W. Chen and T. S. Huang, “Facial expression recognition: A clustering-based approach,” Pattern Recognition Letters, vol. 24, no. 9-10, pp. 1295–1302, Jun. 2003. M. L. Zhu and A. M. Martinez, “Subclass discriminant analysis,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 28, no. 8, pp. 1274–1286, Aug. 2006. S. Yan, D. Xu, B. Zhang, H.-J. Zhang, Q. Yang, and S. Lin, “Graph embedding and extensions: A general framework for dimensionality reduction,” Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol. 29, no. 1, pp. 40–51, 2007. [8] J. Ye, R. Janardan, C. H. Park, and H. Park, “An optimization criterion for generalized discriminant analysis on undersampled problems.” IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI), vol. 26, no. 8, pp. 982–994, 2004. [25] [26] [9] M. Zhu and A. M. Martı́nez, “Pruning noisy bases in discriminant analysis,” IEEE Transactions on Neural Networks, vol. 19, no. 1, pp. 148–157, 2008. [27] [10] W. J. Krzanowski, P. Jonathan, W. V. McCarthy, and M. R. Thomas, “General interest section: Discriminant analysis with singular covariance matrices: Methods and applications to spectroscopic data.” Applied Statistics, vol. 44, no. 1, pp. 101–115, 1995. [28] [11] J. H. Friedman, “Regularized discriminant analysis,” Journal of the American Statistical Association, vol. 84, no. 405, pp. 165–175, 1989. [30] [12] M. Kyperountas, A. Tefas, and I. Pitas, “Weighted piecewise lda for solving the small sample size problem in face verification,” IEEE Transactions on Neural Networks, vol. 18, no. 2, pp. 506–519, 2007. [13] O. C. Hamsici and A. M. Martinez, “Bayes optimality in linear discriminant analysis,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 30, no. 4, pp. 647–657, Apr. 2008. [29] [31] [32] [14] T. Hastie, A. Buja, and R. Tibshirani, “Penalized discriminant analysis,” Annals of Statistics, vol. 23, pp. 73–102, 1995. [33] [15] G. Baudat and F. Anouar, “Generalized discriminant analysis using a kernel approach,” Neural Computation, vol. 12, no. 10, pp. 2385–2404, 2000. [34] [16] M. Loog, R. P. W. Duin, and R. Haeb-Umbach, “Multiclass linear dimension reduction by weighted pairwise fisher criteria.” IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI), vol. 23, no. 7, pp. 762–766, 2001. [17] G. Goudelis, S. Zafeiriou, A. Tefas, and I. Pitas, “Classspecific kernel-discriminant analysis for face verification,” IEEE Transactions on Information Forensics and Security, vol. 2, no. 3-2, pp. 570–587, 2007. [18] N. Gkalelis, V. Mezaris, and I. Kompatsiaris, “Mixture subclass discriminant analysis,” Signal Processing Letters, IEEE, vol. 18, no. 5, pp. 319–322, 2011. [19] S. K. Zhou and R. Chellappa, “Multiple-exemplar discriminant analysis for face recognition.” International Conference on Pattern Recognition (ICPR) (4), pp. 191–194, 2004. [20] X. Wu, X. Chen, X. Li, L. Zhou, and J. Lai, “Adaptive subspace learning: an iterative approach for document clustering,” Neural Computing and Applications, pp. 1–10. [21] X. Shu, Y. Gao, and H. Lu, “Efficient linear discriminant analysis with locality preserving for face recognition,” Pattern Recognition, vol. 45, no. 5, pp. 1892–1898, 2012. [22] K.-R. Müller, S. Mika, G. Rätsch, S. Tsuda, and B. Schölkopf, “An introduction to kernel-based learning algorithms.” IEEE Transactions on Neural Networks, vol. 12, no. 2, pp. 181–202, 2001. [23] B. Schölkopf, A. J. Smola, and K.-R. Muller, “Kernel principal component analysis.” in Proceedings of the International Conference on Artificial Neural Networks (ICANN1997), 1997, pp. 583–588. [24] M.-H. Yang, “Kernel eigenfaces vs. kernel fisherfaces: [35] Face recognition using kernel methods.” in FGR. IEEE Computer Society, 2002, pp. 215–220. B. Ma, H. Y. Qu, and H. S. Wong, “Kernel clustering-based discriminant analysis,” Pattern Recognition, vol. 40, no. 1, pp. 324–327, Jan. 2007. D. You, O. C. Hamsici, and A. M. Martı́nez, “Kernel optimization in discriminant analysis,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 33, no. 3, pp. 631–638, 2011. G. J. McLachlan and T. Krishnan, The EM algorithm and extensions., 2nd ed., ser. Wiley series in probability and statistics. Hoboken, NJ: Wiley, 2008. Doob, “Spectral graph theory.” in Handbook of Graph Theory, CRC Press, 2004, J. L. Gross and J. Yellen, Eds., 2004. U. von Luxburg, “A tutorial on spectral clustering,” Statistics and Computing, vol. 17, no. 4, pp. 395–416, 2007. U. von Luxburg, O. Bousquet, and M. Belkin, “Limits of spectral clustering.” in Advances in Neural Information Processing Systems (NIPS), vol. 17. MIT Press, 2005, pp. 857–864. A. Azran and Z. Ghahramani, “Spectral methods for automatic multiscale data clustering.” in IEEE Computer Vision and Pattern Recognition (CVPR) (1). IEEE Computer Society, 2006, pp. 190–197. A. Maronidis, A. Tefas, and I. Pitas, “Frontal view recognition using spectral clustering and subspace learning methods.” in ICANN (1), ser. Lecture Notes in Computer Science, W. D. K. I. Diamantaras and L. S. Iliadis, Eds., vol. 6352. Springer, 2010, pp. 460–469. O. J. Dunn, “Multiple comparisons among means,” Journal of American Statistical Association, vol. 56, no. 293, pp. 52–64, 1961. H. Chen, P. Tino, and X. Yao, “Probabilistic classification vector machines.” IEEE Transactions on Neural Networks, vol. 20, no. 6, pp. 901–914, 2009. J. Demsar, “Statistical comparisons of classifiers over multiple data sets,” Journal of Machine Learning Research, vol. 7, pp. 1–30, 2006.

学位认证多久☀️哪里买西雅图大学研究生毕业证书学位证书【办证微信Q：741003700】Seattle研究生毕业证书哪里买成绩单电子版，能完美还原海外各大学西雅图大学Bachelor Diploma degree, Master Diploma（本科/硕士西雅图大学毕业证书、成绩单）Seattle大学Offer录取通知书、雅思成绩单、托福成绩单、雅思托福代考、语言证书、学生卡、高仿留服认证书等毕业/入学/在读材料。1:1完美还原海外各大学毕业材料上的西雅图大学研究生毕业证书学位证书【办证微信Q：741003700】Seattle研究生毕业证书哪里买成绩单电子版工艺：水印，阴影底纹，钢印LOGO烫金烫银，LOGO烫金烫银复合重叠。文字图案浮雕，激光镭射，紫外荧光，温感，复印防伪。【业务类型】学历认证、西雅图大学Seattle毕业证、成绩单、大学Offer、雅思托福代考、语言证书、学生卡、高仿教育部认证等一切高仿或者真实可查认证服务。十年年留学服务公司,拥有海外样板无数，能完美1:1还原海外各国大学Seattle学分不够办理西雅图大学毕业证【Q微/741003700】美国Seattle毕业证文凭degree、Diploma、Transcripts、certificate等毕业材料。美国西雅图大学毕业证书制作【Q微/741003700】专业VIP服务《西雅图大学毕业证办理》《Seattle成绩单提高GPA修改》【Q微/741003700】做Seattle毕业证文凭西雅图大学本科毕业证书美国学历认证原版《西雅图大学成绩单、西雅图大学学历证明、回国人员证明》【一整套留学文凭证件办理#包含毕业证、成绩单、学历认证、使馆认证、归国人员证明、教育部认证、留信网认证永远存档，教育部学历学位认证查询】办理美国大学毕业证@【Q微/741003700】购买美国西雅图大学大学文凭学历【Q微/741003700】西雅图大学会计专业毕业证√电子工程专业文凭√制作Seattle生物工程专业学历证书√西雅图大学MBA毕业证√西雅图大学土木工程毕业证√【Q微/741003700】西雅图大学计算机科学毕业证√Seattle商科毕业证【Q微/741003700】√Seattle工商管理毕业证√Seattle经济学毕业证√西雅图大学建筑设计毕业证√Seattle市场营销毕业证√西雅图大学机械工程毕业证√西雅图大学电气工程毕业证√Seattle数学毕业证【Q微/741003700】√西雅图大学物理学毕业证√Seattle人工智能毕业证√西雅图大学会计和金融专业学位证 <a href="美国西雅图大学毕业证成绩单如何购买？" rel="nofollow">Seattle学分不够办理西雅图大学毕业证【Q微/741003700】美国Seattle毕业证文凭</a> 按理来说，听到秦话的话，秦凤应该最开心，可是她却有些开心不起来，冲着秦天，真诚的道：“秦天少主，你是秦家唯一的嫡系子弟，将来你不继承族长之位，谁继承啊？你将来从日月宗出师归来，完全可以回来继承族长之位啊，还望少主三思。”秦天望了一眼秦家众人身后刚刚到来的梦雪和吴强，这才对着秦凤说道：“大长老，我对秦家族长位子没有任何兴趣，你大可放心，只要你对秦家忠心，那么将来你可以接任族长之位，前提是族长答应，因为族长马上就康复了。”“什么？族长康复了？太好了！”秦家众人亢奋，现在秦家有秦天，得知秦霸也康复了，从此以后，秦家的城主地位将无人可以撼动，当然了外来势力除外。秦天双手一压，示意众人安静下来，这才继续威严的冷声道：“虽然明天我就要离开，但是并不代表我会完全放弃秦家，或许哪天我路过望月城，我希望任何时候我回来的时候，都能够看到秦家内部团结一致对外的情况，否则我会毫不留情的将蛀虫捏死，你们可明白？”“额……少主，属下明白，谨遵教诲。”秦凤等秦家众人没有一人敢怀疑秦天的话，纷纷表示会按照秦天的意思去做。“咦？”梦雪内心惊异了：“秦天似乎天生具有一种上位者的气质，似乎他天生要为王为帝，他果然是一个神秘的男人，不知道他内心还拥有多少不为人知的秘密？”吴强也感觉到了秦天的上位者的气息，不禁暗语哪里买西雅图大学研究生毕业证书学位证书【办证微信Q：741003700】Seattle研究生毕业证书哪里买成绩单电子版：“秦天他不过是一个少年，为何我从他身上感觉到了成熟的上位者的气势？是不是我感觉错了？”“秦天。”就在这个时候，一道有些苍老而虚弱的声音从人群一侧传来，瞬间吸引了所有人的注意力。秦霸！秦霸居然来了，现在的时间距离秦天给他解毒不过半个时辰，由此可见他身体体质非常强壮。

Log In

Subclass Marginal Fisher Analysis