Graph Embedding Exploiting Subclasses

Maronidis, Anastasios; Tefas, Anastasios; Pitas, Ioannis

Graph Embedding Exploiting Subclasses

2015 IEEE Symposium Series on Computational Intelligence, 2015

Graph Embedding Exploiting Subclasses Anastasios Maronidis, Anastasios Tefas and Ioannis Pitas Department of Informatics, Aristotle University of Thessaloniki, P.O.Box 451, 54124 Thessaloniki, Greece Email: amaronidis@iti.gr, tefas@aiia.csd.auth.gr, pitas@aiia.csd.auth.gr Abstract—Recently, subspace learning methods for Dimensionality Reduction (DR), like Subclass Discrim- inant Analysis (SDA) and Clustering-based Discrimi- nant Analysis (CDA), which use subclass information for the discrimination between the data classes, have attracted much attention. In parallel, important work has been accomplished on Graph Embedding (GE), which is a general framework unifying several sub- space learning techniques. In this paper, GE has been extended in order to integrate subclass discriminant information resulting to the novel Subclass Graph Embedding (SGE) framework. The kernelization of SGE is also presented. It is shown that SGE comprises a generalization of the typical GE including subclass DR methods. In this vein, the theoretical link of SDA and CDA methods with SGE is established. The efﬁcacy and power of the SGE has been substantiated by comparing subclass DR methods versus a diversity of unimodal methods all pertaining to the SGE framework via a series of experiments on various real-world data. I. I NTRODUCTION In recent years, a variety of subspace learn- ing algorithms for dimensionality reduction (DR) has been developed. Locality Preserving Projections (LPP) [1], [2] and Principal Component Analysis (PCA) [3] are two of the most popular unsupervised linear DR algorithms with a wide range of applica- tions. Besides, supervised methods like Linear Dis- criminant Analysis (LDA) [4] have shown superior performance in many classiﬁcation problems, since through the DR process they aim at achieving data class discrimination. Usually in practice, there is the case where many data clusters appear inside the same class imposing the need to integrate this information in the DR approach. Along these lines, techniques such as Clustering Discriminant Analysis (CDA) [5] and Subclass Discriminant Analysis (SDA) [6] have been proposed. Both of them utilize a speciﬁc objective criterion that incorporates the data subclass information in an attempt to discriminate subclasses that belong to different classes, while they put no constraints to subclasses within the same class. In parallel to the development of subspace learn- ing techniques, a lot of work has been carried out in DR from a graph theoretic perspective. Towards this direction, Graph Embedding (GE) has been introduced as a generalized framework, which uniﬁes several existing DR methods and furthermore offers as a platform for developing novel algorithms [7]. In [2], [7] the connection of LPP, PCA and LDA with the GE framework has been illustrated and in [7], employing GE, the authors propose Marginal Fisher Analysis (MFA). In addition, the ISOMAP [8], Locally Linear Embedding (LLE) [9] and Lapla- cian Eigenmaps (LE) [10] algorithms have also been interpreted within the GE framework [7]. From the perspective of GE, the data are consid- ered as vertices of a graph, which is accompanied by two matrices, the intrinsic and the penalty matrix, weighing the edges among vertices. The intrinsic matrix encodes the similarity relationships, while the penalty matrix encodes the undesirable connections among the data. In this context, the DR task is translated to the problem of transforming the initial graph into a new one in a way that the weights of the intrinsic matrix are reinforced, while the weights of the penalty matrix are suppressed. Apart from the core ideas on GE presented in [7], some other interesting works have also been published recently in the literature. A graph-based supervised DR method has been proposed in [11] for circumventing the problem of non-Gaussian dis- tributed data. The importance degrees of the same- class and not-same-class vertices are encoded by the intrinsic and extrinsic graphs, respectively, based on a strictly monotonically decreasing function. More- over, the kernel extension of the proposed approach is also presented. In [12], the selection of the neigh- bor parameters of the intrinsic and extrinsic graph matrices is adaptively performed based on the dif- ferent local manifold structure of different samples, enhancing in this way the intra-class similarity and inter-class separability. Methodologies that convert a set of graphs into a

Maronidis, A., Tefas, A., & Pitas, I. (2016). Graph Embedding Exploiting Subclasses. In 2015 IEEE Symposium Series on Computational Intelligence (SSCI 2015): Proceedings of a meeting held 7-10 December 2015, Cape Town, South Africa (pp. 1452-1459). Institute of Electrical and Electronics Engineers (IEEE). https://doi.org/10.1109/SSCI.2015.206 Peer reviewed version Link to published version (if available): 10.1109/SSCI.2015.206 Link to publication record in Explore Bristol Research PDF-document University of Bristol - Explore Bristol Research General rights This document is made available in accordance with publisher policies. Please cite only the published version using the reference above. Full terms of use are available: http://www.bristol.ac.uk/pure/userguides/explore-bristol-research/ebr-terms/ Graph Embedding Exploiting Subclasses Anastasios Maronidis, Anastasios Tefas and Ioannis Pitas Department of Informatics, Aristotle University of Thessaloniki, P.O.Box 451, 54124 Thessaloniki, Greece Email: amaronidis@iti.gr, tefas@aiia.csd.auth.gr, pitas@aiia.csd.auth.gr Abstract—Recently, subspace learning methods for Dimensionality Reduction (DR), like Subclass Discriminant Analysis (SDA) and Clustering-based Discriminant Analysis (CDA), which use subclass information for the discrimination between the data classes, have attracted much attention. In parallel, important work has been accomplished on Graph Embedding (GE), which is a general framework unifying several subspace learning techniques. In this paper, GE has been extended in order to integrate subclass discriminant information resulting to the novel Subclass Graph Embedding (SGE) framework. The kernelization of SGE is also presented. It is shown that SGE comprises a generalization of the typical GE including subclass DR methods. In this vein, the theoretical link of SDA and CDA methods with SGE is established. The efficacy and power of the SGE has been substantiated by comparing subclass DR methods versus a diversity of unimodal methods all pertaining to the SGE framework via a series of experiments on various real-world data. I. I NTRODUCTION In recent years, a variety of subspace learning algorithms for dimensionality reduction (DR) has been developed. Locality Preserving Projections (LPP) [1], [2] and Principal Component Analysis (PCA) [3] are two of the most popular unsupervised linear DR algorithms with a wide range of applications. Besides, supervised methods like Linear Discriminant Analysis (LDA) [4] have shown superior performance in many classification problems, since through the DR process they aim at achieving data class discrimination. Usually in practice, there is the case where many data clusters appear inside the same class imposing the need to integrate this information in the DR approach. Along these lines, techniques such as Clustering Discriminant Analysis (CDA) [5] and Subclass Discriminant Analysis (SDA) [6] have been proposed. Both of them utilize a specific objective criterion that incorporates the data subclass information in an attempt to discriminate subclasses that belong to different classes, while they put no constraints to subclasses within the same class. In parallel to the development of subspace learning techniques, a lot of work has been carried out in DR from a graph theoretic perspective. Towards this direction, Graph Embedding (GE) has been introduced as a generalized framework, which unifies several existing DR methods and furthermore offers as a platform for developing novel algorithms [7]. In [2], [7] the connection of LPP, PCA and LDA with the GE framework has been illustrated and in [7], employing GE, the authors propose Marginal Fisher Analysis (MFA). In addition, the ISOMAP [8], Locally Linear Embedding (LLE) [9] and Laplacian Eigenmaps (LE) [10] algorithms have also been interpreted within the GE framework [7]. From the perspective of GE, the data are considered as vertices of a graph, which is accompanied by two matrices, the intrinsic and the penalty matrix, weighing the edges among vertices. The intrinsic matrix encodes the similarity relationships, while the penalty matrix encodes the undesirable connections among the data. In this context, the DR task is translated to the problem of transforming the initial graph into a new one in a way that the weights of the intrinsic matrix are reinforced, while the weights of the penalty matrix are suppressed. Apart from the core ideas on GE presented in [7], some other interesting works have also been published recently in the literature. A graph-based supervised DR method has been proposed in [11] for circumventing the problem of non-Gaussian distributed data. The importance degrees of the sameclass and not-same-class vertices are encoded by the intrinsic and extrinsic graphs, respectively, based on a strictly monotonically decreasing function. Moreover, the kernel extension of the proposed approach is also presented. In [12], the selection of the neighbor parameters of the intrinsic and extrinsic graph matrices is adaptively performed based on the different local manifold structure of different samples, enhancing in this way the intra-class similarity and inter-class separability. Methodologies that convert a set of graphs into a vector space have also been presented. For instane, a novel prototype selection method from a classlabeled set of graphs has been proposed in [13]. A dissimilarity metric between a pair of graphs is established and the dissimilarities of a graph from a set of prototypes are calculated providing an n-dimensional feature vector. Several deterministic algorithms are used to select the prototypes with the most discriminative power [13]. The flexibility of GE has also been combined with the generalization ability of the support vector machine classifier resulting to improved classification performance. In [14], the authors propose the substitution of the support vector machine kernel with sub-space or submanifold kernels, that are constructed based on the GE framework. Despite the intense activity around GE, no extension of GE has been proposed, so as to integrate subclass information. In this paper, such an extension is proposed, leading to the novel Subclass Graph Embedding (SGE) framework, which is the main contribution of our work. Using subclass block form in both the intrinsic and penalty graph matrices, SGE optimizes a criterion which preserves the subclass structure and simultaneously the local geometry of the data. The local geometry may be modelled by any similarity or distance measure, while subclass structure may be extracted by any clustering algorithm. Choosing the appropriate parameters, SGE becomes one of the well-known aforementioned algorithms. Along these lines, in this paper it is shown that a variety of unimodal DR algorithms are encapsulated within SGE. Furthermore, the theoretical link between SGE and CDA, SDA methods is also established, which is another novelty of our work. Finally, the kernelization of SGE (K-SGE) is also presented. The efficacy of SGE and K-SGE is demonstrated through a comparison between subclass DR methods and a diversity of unimodal ones – all pertaining to the SGE framework – via a series of experiments on various datasets. The remainder of this paper is organized as follows. The subspace learning algorithms CDA and SDA are presented in Section II in order to pave the way for their connection with SGE. The novel SGE framework along with its kernelization is presented in Section III. The connection between the SGE framework and the several subspace learning techniques is given in Section IV. A comparison of the aforementioned methods on real-world datasets is presented in Section V. Finally, conclusions are drawn in Section VI. II. S UBSPACE L EARNING T ECHNIQUES In this section, we provide the mathematical formulation of the subspace learning techniques CDA and SDA in order to allow their connection with the SGE framework. The other methods mentioned in the Introduction are encapsulated in the proposed SGE framework as well. However, their detailed description is omitted, as they have already been described in [7]. In the following analysis, we consider that each data sample denoted by x is an m-dimensional real′ vector, i.e., x ∈ Rm . We also denote by y ∈ Rm its projection y = VT x to a new m′ -dimensional ′ space using a projection matrix V ∈ Rm×m . CDA and SDA attempt to minimize: v T SW v , (1) v T SB v where SW is called the within and SB the between scatter matrix [15]. These matrices are symmetric and positive semi-definite. The minimization of the ratio (1) leads to the following generalized eigenvalue decomposition problem to find the optimal discriminant projection eigenvectors: J(v) = SW v = λSB v . (2) The eigenvalues λi of the above eigenproblem are by definition positive or zero: 0 ≤ λ1 ≤ λ 2 ≤ · · · ≤ λm . (3) Let v1 , v2 , · · · , vm be the corresponding eigenvectors. Then the projection y = VT x, from the initial space to the new space of reduced dimensionality employs the projection matrix V = [v1 , v2 , · · · , vm′ ] whose columns are the eigenvectors vi , i = 1, . . . , m′ and m′ ≪ m. Looking for a linear transform that effectively separates the projected data of each class, CDA makes use of potential subclass structure. Let us denote the total number of subclasses inside the i-th class by di and, for the j-th subclass of the i-th class, the number of its samples by nij , its q-th sample by ij xij q and its mean vector by µ . CDA attempts to (CDA) minimize (1), where SW is the within-subclass (CDA) and SB the between-subclass scatter matrix, defined in [5]: (CDA) SW = nij di X c X X ij xij q −µ i=1 j=1 q=1 (CDA) SB = c−1 X c X dl di X X i=1 l=i+1 j=1 h=1 ij xij q −µ T , (4) T µij − µlh µij − µlh . (5) The difference between SDA and CDA mainly lies on the definition of the within scatter matrix, while the between scatter matrix of SDA is a modified version of that of CDA. The exact definitions of the two matrices are: (SDA) SW = n X T (xq − µ) (xq − µ) , (6) q=1 dl di X c X c−1 X X T pij plh µij−µlh µij−µlh , = (SDA) SB i=1 l=i+1 j=1 h=1 (7) n where pij = nij is the relative frequency of the j-th cluster of the i-th class [6]. It is worth mentioning (SDA) that SW is actually the total covariance matrix of the data. The previously described DR methods along with LPP, PCA and LDA can be seen under a common prism, since their basic calculation element towards the construction of the corresponding optimization criteria is the similaritiy among the samples. Thus we can unify them in a common framework if we consider that the samples form a graph and we set criteria on the similarities between the nodes of this graph. In the following section we describe in detail this approach. III. S UBCLASS G RAPH E MBEDDING In this section, the problem of dimensionality reduction is described from a graph theoretic perspective. Before we present the novel SGE, let us first briefly provide the main ideas of the core GE framework. A. Graph Embedding In the GE framework, the set of the data samples to be projected in a low dimensionality space is represented by two graphs, namely, the intrinsic Gint = {X , Wint } and the penalty Gpen = {X , Wpen } graph, where X = {x1 , x2 , · · · , xn } is the set of the data samples in both graphs. The intrinsic graph models the similarity connections between every pair of data samples that have to be reinforced after the projection. The penalty graph contains the connections between the data samples that must be suppressed after the projection. For both of the above matrices these connections might have negative values imposing the opposite results. Choosing the values of both the intrinsic and the penalty graph matrices, may lead to either supervised, unsupervised or semi-supervised DR algorithms. Now, the problem of DR could be interpreted in another way. It is desirable to project the initial data to the new low dimensional space, such that the geometrical structure of the data is preserved. The corresponding objective function for optimization is: argmin J(Y) , (8) tr{YBY T }=d 1 XX (yq −yp )Wint (q, p)(yq −yp )T } , J(Y)= tr{ 2 q p (9) where Y = [y1 , y2 , · · · , yn ] are the projected vectors, d is a constant, B is a constraint matrix, defined to remove an arbitrary scaling factor in the embedding and Wint (q, p) is the value of Wint at position (q, p) [7]. The structure of the objective function (9) postulates that, the larger the value Wint (q, p) is, the smaller the distance between the projections of the data samples xq and xp has to be. By using some simple algebraic manipulations, equation (9) becomes: J(Y) = tr{YLint YT } , (10) where Lint = Dint −Wint is the intrinsic Laplacian matrix and Dint is the degree matrix defined as the diagonal matrix,P which has at position (q, q) the value Dint (q, q) = p Wint (q, p). The Laplacian matrix Lpen = Dpen − Wpen of the penalty graph is often used as the constraint matrix B. Thus, the above optimization problem becomes: argmin tr{YLint YT } . tr{YLpen YT } (11) The optimization of the above objective function is achieved by solving the generalized eigenproblem: Lint v = λLpen v , (12) keeping the eigenvectors, which correspond to the smallest eigenvalues. This approach leads to the optimal projection of the given data samples. In order to achieve the out of sample projection, the linearization [7] of the above approach should be used. If we employ y = VT x, the objective function (9) becomes: argmin J(V) , (13) tr{VT XLpen XT V}=d where J(V) is defined as: ! XX 1 T T tr{V (xq−xp )Wint (q, p)(xq−xp ) V} , 2 q p (14) where X = [x1 , x2 , . . . , xn ]. By using simple algebraic manipulations, we have: J(V) = tr{VT XLint XT V} . (15) Similarly to the straight approach, the optimal eigenvectors are given by solving the generalized eigenproblem: T T XLint X v = λXLpen X v . = 1 tr{VT 2 nij nij di X c X X X P (q, p) B. Linear Subclass Graph Embedding = In this section, we propose a GE framework that allows the exploitation of subclass information. In the following analysis, it is assumed that the subclass labels are known. We attempt to minimize the scatter of the data samples within the same subclass, while separating data samples from subclasses that belong to different classes. Finally, we are not concerned about samples that belong to different subclasses of the same class. Usually, in real-world problems, local geometry of the data is related to the global supervised structure. Samples that belong to the same class or subclass, should be “sufficiently close” to each other. SGE actually exploits this fact. It simultaneously handles supervised and unsupervised information. As a consequence, it combines the global labeling information with the local geometrical characteristics of the data samples. This is achieved by weighing the above connections with the similarities of the data samples. The Gaussian similarity function (see eq. 17), has been used in this paper for this purpose. 2 d (xq , xp ) Sqp = S(xq , xp ) = exp − , (17) σ2 where d(xq , xp ) is a distance metric (e.g., Euclidean) and σ 2 is a parameter (variance) that determines the distance scale. i=1 j=1 q=1 p=1 ij (16) ij xij q − xp xij q − xij p T ! V} (20) tr{VT X (Dint − Wint ) XT V} (21) tr{VT XLint XT V} . = (22) The derivation of (22) is omitted due to lack of space. The matrix Wint is block diagonal with blocks that correspond to each class and is given by:   1 Wint 2 Wint 0    . (23) Wint =  ..   . 0 c Wint i are block diagonal submatrices, with blocks Wint that correspond to the subclasses and are given by:   i1 P  Pi2 0    i Wint = (24) . ..   0 . Pidi Pij is the submatrix of P that corresponds to the data of the j-th cluster of the i-th class. By looking carefully at the form of Wint , it is clear that the degree intrinsic matrix Dint has values Dint ( j−1 i−1 X X nst +q, s=0 t=0 j−1 i−1 X X nst +q) = s=0 t=0 X Pij (q, p) , p (25) where p runs over the indices of the j-th cluster of i-th class. Let us denote as P an affinity matrix. Without limiting the generality, we assume that this matrix has block form, depending on the subclass and the class of the data samples. Using the linearized approach, we attempt to optimize a more general discrimination criterion. We consider again that y = VT x is the projection of x to the new subspace. Let Pij (q, p) be the value of P at position (q, p) of the submatrix that contains the j-th subclass of the i-th class. Then, the proposed criterion is: In parallel, we demand to maximize a criterion, which encodes the similarities among the centroid vectors of the subclasses. Let the value Qlh ij express the similarity between the centroid vectors µij and µlh . The more similar two centroids that belong to different classes are, the further apart their projections mij = VT µij have to be from each other: argmin J(Y) , argmax G(mij ) , J(Y) = nij nij di X c X 1 XX yqij − ypij tr{ 2 i=1 j=1 q=1 p=1 T Pij (q, p) yqij − ypij } (18) G(mij ) = tr{ dl di X c c−1 X X X (26) mij − mlh i=1 l=i+1 j=1 h=1 (19) ij lh Qlh ij m − m T } (27)  = tr{VT  = dl di X c c−1 X X X i=1 l=i+1 j=1 h=1 Qlh ij ij  µij − µlh  µ −µ lh T ! V} (28) tr{VT X (Dpen − Wpen ) XT V} (29) tr{VT XLpen XT V} . = (30) Again, the derivation of (30) is omitted due to lack of space. The block matrix Wpen in (29) consists of block submatrices:   1,2 1,c 1,1 · · · Wpen Wpen Wpen 2,1 2,2 2,c   Wpen Wpen · · · Wpen   Wpen =  . . .. .. . . .   . . . . c,1 c,2 c,c Wpen Wpen · · · Wpen (31) i,i The submatrices Wpen lying on the main block diagonal are given by:   Wi1  Wi2 0    i,i Wpen =   , (32) . ..   0 idi W where Wij corresponds to the j-th subclass of the i-th class and is given by: P P dω Qωt ij ω6 = i t=1 T Wij = − enij (enij ) , 2 (nij ) (33) nij -times z }| { where enij = [ 11 · · · 1 ]T . Respectively, the offdiagonal submatrices of Wpen are given by:  ldl  l2 l1 Wi1 Wi1 · · · Wi1  Wl1 Wl2 · · · Wldl  i2 i2 i2   i,l Wpen =   , i 6= l , .. .. .. ..   . . . . l1 Wid i l2 Wid i ··· ldl Wid i (34) where: lh Wij Qlh ij T enij (enlh ) . = nij nlh (35) It can be easily shown that D = 0, so that Lpen = −Wpen . C. Kernel Subclass Graph Embedding In this section, the kernelization of SGE is presented. Let us denote by X the initial data space, by F a Hilbert space and by f the non-linear mapping function from X to F. The main idea is to firstly map the original data from the initial space into another high-dimensional Hilbert space and then perform linear subspace analysis in that space. If we denote by mF the dimensionality of the Hilbert space, then the above procedure is described as: Pn ! p=1 a1p k(xq , xp ) .. ∈F , X ∋ xq→ yq=f (xq ) = . Pn a k(x , x ) q p p=1 mF p (36) where k is the kernel function. From the above equation it is obvious that Y = AT K , where K is the Gram matrix, which has at (q, p) the value Kqp = k(xq , xp ) and  a11 · · · amF 1  .. .. .. A = [a1 · · · amF ] =  . . . a1n · · · amF n (37) position    (38) is the map coefficient matrix. Consequently, the final SGE optimization becomes: argmin tr{AT KLint KA} . tr{AT KLpen KA} (39) Similarly to the linear case, in order to find the optimal projections, we resolve the generalized eigenproblem: KLint Ka = λKLpen Ka , (40) keeping the eigenvectors that correspond to the smallest eigenvalues. IV. SGE AS A G ENERAL D IMENSIONALITY R EDUCTION F RAMEWORK In this section, it is shown that SGE is a generalized framework that can be used for subspace learning, since all the standard approaches are specific cases of SGE. Let us use the Gaussian similarity function (17), in order to construct the affinity matrix. In the following analysis, we initially let the variance of Gaussian σ 2 tend to infinity. Hence, S(xq , xp ) = 1, ∀(q, p) ∈ {1, 2, · · · , n}2 . Let the intrinsic matrix elements be: ( S(xq ,xp ) = n1ij , if xq , xp ∈ Cij ij nij , P (q, p) = 0 , otherwise (41) where Cij is the set of the samples that belong to the j-th subclass of the i-th class. Obviously, (20) becomes the within-subclass criterion of CDA (also see eq. 4). Thus, in this case, Wint is the intrinsic graph matrix of CDA. Let also: ij lh Qlh ij = S(µ , µ ) = 1, ∀ i, j, h, l (42) the penalty matrix elements. Then, (28) becomes the between-subclass criterion of CDA (also see eq. 5). Thus, Wpen is the penalty graph matrix of CDA and the connection between CDA and GE has been established. Let us consider that each data sample constitutes its own class, i.e., c = n, di = 1 and ni = 1, ∀i ∈ {1, 2, · · · , c}. Thus, each class-block of the penalty graph matrix reduces to a single element of the matrix. Obviously, each data sample coincides with the mean of its class. By setting: Ql1 i1 = then: P S(µi , µl ) 1 = , ∀ (i, l) ∈ {1, 2, · · · , c}2 , n n (43) P dω t=1 Qωt i1 X1 1 − 1. n (ni ) ω6=i (44) These values lie on the main diagonal of the penalty graph matrix. Regarding the off diagonal elements we have: 1 Ql1 i1 = . (45) ni nl n − ω6=i 2 =− n = It can be easily shown that the degree penalty matrix is D = 0, so that Lpen = −Wpen . Obviously, T Lpen = I − n1 en (en ) and XLpen XT becomes the covariance matrix C of the data. By using as intrinsic graph matrix the identity matrix, SGE becomes identical to PCA: T T tr{V IV} tr{V XLint X V} = argmin tr{VT XLpen XT V} tr{VT CV} (46) leading to the following generalized eigenproblem: (47) solved by keeping the smallest eigenvalues, or by setting µ = λ1 , since λ 6= 0, this leads to: Cv = µIv , and Ql1 1 i1 = . ni nl n (48) solved by keeping the greatest eigenvalues, which is obviously the PCA solution. Now, consider that every class consists of a unique subclass, thus di = 1, ∀i ∈ {1, 2, . . . , c}. If we set: S(xq ,xp ) = n1i , if xq , xp ∈ Ci ni P(q, p) = , 0 , otherwise (49) (52) These are the values of the penalty graph matrix of LDA. So, by taking the Laplacians of the above matrices, we end up to the LDA algorithm. Let us now reject the assumption that the variance of Gaussian tends to infinity. Consider that there is only one class which contains the whole set of the data, i.e., c = 1. Also consider that there are no subclasses within this unique class, i.e., d1 = 1. In this case the intrinsic graph matrix becomes equal to P. Thus, by setting P equal to the affinity matrix S, the intrinsic Laplacian matrix becomes that of LPP. We observe that by utilizing the identity matrix as the penalty Laplacian matrix, obviously we get the LPP algorithm. Since we consider a unique class, which contains a unique subclass, from (31) and (32) we have that Wpen = W11 . The values of W11 are given from (33), which in this case reduces to: W11 = − Q11 T 11 n e (en ) . n2 If we set: n2 , 1−n Q11 11 = T argmin Iv = λCv , then the intrinsic graph matrix becomes that of LDA. Furthermore, if we set: ni nl , ∀ (i, l) ∈ {1, . . . , c}2 (50) Ql1 i1 = n then P P dω ωt Q t=1 i1 ω6=i ni − n − (51) = 2 nni (ni ) then Wpen = W11 =  1  1  1−n Lpen =  .  .. 1 1−n 1 n n−1 e 1 1−n 1 .. . 1 1−n (53) (54) T (en ) . Consequently,  1 · · · 1−n 1 · · · 1−n   (55) ..  . .. . .  ··· 1 Thus, if we make the assumption that the number of the data-samples becomes very large, then asymptotically we have Lpen = I. Finally, to complete the analysis, if we consider as the intrinsic Laplacian matrix, the matrix Lint = I − 1 n n T e (e ) n (56) and if we set: Qlh ij = nij nlh , n (57) TABLE I: Dimensionality Reduction Using SGE Framework LPP PCA LDA CDA SDA P 11 P(L int ) d2 (xq ,xp ) (q, p) = exp − , ∀xq , xp σ2 Lint = I Pi1 (q, p) = n1 , xq , xp ∈ ci Pij (q, p) = i 1 nij Lint = I − , xq , xp ∈ cij 1 n ne (en )T in (33) and (35), SGE becomes identical to SDA. The parameters that determine the connection of the several methods with SGE are summarized in Table I. V. E XPERIMENTAL R ESULTS We conducted 5-fold cross-validation classification experiments on several real-world datasets using the proposed linear and kernel SGE framework. For extracting automatically the subclass structure, we have utilized the multiple Spectral Clustering technique [16], keeping the most plausible partition for each dataset. For classifying the data, the Nearest Centroid (NC) classifier has been used with LPP, PCA and LDA algorithms, while the Nearest Cluster Centroid (NCC) [17] has been used with CDA and SDA algorithms. In NCC, the cluster centroids are calculated and the test sample is assigned to the class of the nearest cluster centroid. NC and NCC were selected because they provide the optimal classification solutions in Bayesian terms, thus proving whether the DR methods have reached the goal described by their specific criterion. A. Classification experiments For the classification experiments, we have used diverse publicly available datasets offered for various classification problems. More specifically, FERAIIA, BU, JAFFE and KANADE were used for facial expression recognition, XM2VTS for face frontal view recognition, while MNIST and SEMEION for optical digit recognition. Finally, IONOSPHERE, MONK and PIMA were used in order to further extend our experimental study to diverse data classification problems. The cross-validation classification accuracy rates for the several subspace learning methods over the utilized datasets, are summarized in Table II. The optimal dimensionality of the projected space that returned the above results is depicted in parenthesis. For each dataset, the best performance rate among linear and kernel methods separately is highlighted with bold, while the best overall performance rate Q11 11 = Q(Lpen ) σ2 c di d n2 1−n σ2 1 1 1 ∞ ∞ n c 1 1 n c ∞ c di d ∞ c di d (Lpen = I) 1 Ql1 i1 = n n i nl Ql1 i1 = n Qlh ij = 1 n ij nlh Qlh ij = n among all methods, both linear and kernel, is surrounded by a rectangle. The classification performance rank of each method is also referred in the last two rows of Table II. Specific Rank denotes the method rank for the linear and the kernel methods, independently. Overall Rank refers to the rank of each method among both the linear and the kernel methods. The ranking has been achieved through a post-hoc Bonferroni test [18]. An immediate remark from Table II is that in both linear and kernel case, multimodal methods exhibit better classification performance than the unimodal ones. In particular, the top overall performance is shown by SDA followed by CDA, while the worst performance is shown by KLPP and KPCA. This result undoubtedly shows that the inclusion of subclass information in the DR process offers a strong potential to improve the performance of the state-of-the-art in many classification domains. In comparing linear with kernel methods, a simple calculation yields mean overall rank equal to 5.08 for the linear methods and 5.90 for the kernel ones. Although the average performance of linear methods is clearly better than that of kernel ones, we must admit that there is ample space for improving the kernel results by varying the RBF parameter, as the selection of this parameter is not trivial and may easily lead to over-fitting. Actually, the top performance rates presented in this paper have been obtained by testing indicative values of the above parameter. As a matter of fact, it is interesting to observe that the use of kernels proves to be beneficial for some methods in certain datasets, while deteriorates the performance of others. VI. C ONCLUSIONS In this paper, data subclass information has been incorporated within Graph Embedding (GE) leading to a novel Subclass Graph Embedding (SGE) framework, which constitutes the main contribution of our work. In particular, it has been shown that SGE comprises a generalization of GE, encapsulating a number of state-of-the-art unimodal subspace learning techniques already integrated within TABLE II: Cross Validation Classification Accuracies (%) of Linear Methods on Several Real-World Datasets DATASET FER-AIIA BU JAFFE KANADE MNIST SEMEION XM2VTS IONOSPHERE MONK 1 MONK 2 MONK 3 PIMA SPECIFIC RANK OVERALL RANK LPP PCA LDA CDA SDA KLPP KPCA KDA KCDA KSDA 40.9(3) 39.4(298) 46.8(18) 34.2(92) 71.1(259) 53.6(99) 95.7(54) 84.6(23) 66.7(3) 56.0(1) 77.2(5) 61.8(1) 31.0(120) 38.1(49) 37.6(39) 43.3(46) 79.9(135) 83.2(55) 92.0(86) 72.3(15) 68.3(5) 53.3(4) 80.9(4) 63.5(6) 64.6(6) 51.6(6) 53.2(6) 67.1(6) 84.6(9) 88.2(9) 70.5(1) 78.9(1) 50.8(1) 52.0(1) 49.4(1) 56.5(1) 73.2 49.1(16) 40.0(15) 59.7(7) 84.8(15) 89.2(19) 98.1(3) 80.6(2) 70.0(4) 54.2(1) 74.6(2) 60.5(3) 75.5(11) 52.3(15) 54.1(6) 67.1(5) 85.1(14) 89.4(19) 97.4(2) 83.4(2) 74.2(3) 54.0(2) 66.3(2) 73.5(3) 50.2(252) 52.7(317) 28.8(98) 32.7(99) 81.4(299) 83.8(99) 71.3(297) 83.7(23) 63.3(2) 54.8(1) 62.5(2) 50.7(3) 41.5(29) 35.9(290) 25.9(58) 33.2(88) 64.5(155) 77.4(77) 74.7(56) 70.3(2) 72.5(1) 59.8(3) 79.2(5) 67.5(4) 54.9(6) 46.6(6) 42.4(6) 44.3(6) 86.0(9) 95.3(9) 61.3(1) 92.9(1) 55.8(1) 69.7(1) 51.7(1) 48.9(1) 56.1(12) 41.0(13) 36.1(18) 40.0(6) 83.4(19) 94.1(19) 71.5(3) 93.1(1) 58.3(4) 78.7(1) 67.5(2) 52.5(3) 53.5(12) 48.0(14) 46.3(5) 38.5(6) 85.2(15) 95.9(19) 57.3(4) 92.9(1) 61.7(3) 54.5(1) 58.3(1) 52.9(1) 3.3 5.8 3.8 6.4 3.6 6.0 2.5 4.2 1.6 3.0 3.5 6.7 3.4 6.7 2.9 5.4 2.4 5.2 2.7 5.5 GE. Besides, the connection of SGE with subspace learning algorithms that use subclass information in the embedding process has been also analytically proven. The kernelization of SGE has also been presented. Through an extensive experimental study, it has been shown that subclass learning techniques outperform a number of state-of-the-art unimodal learning methods in many real-world datasets pertaining to various classification domains. In addition, the experimental results highlight the superiority in terms of classification performance of linear methods against kernel ones. [7] [8] [9] [10] In the near future, we intend to employ SGE as a template to design novel DR methods. For instance, as current subclass methods are strongly dependent on the underlying distribution of the data, we anticipate that novel methods, which use neighbourhood information among the data of the several subclasses, will succeed in alleviating this sort of limitations. [11] R EFERENCES [14] [1] [2] [3] [4] [5] [6] X. He and P. Niyogi, “Locality preserving projections,” in NIPS, S. Thrun, L. K. Saul, and B. Schölkopf, Eds. MIT Press, 2003. X. He, S. Yan, Y. Hu, P. Niyogi, and H. Zhang, “Face recognition using laplacianfaces,” IEEE Trans. Pattern Anal. Mach. Intell, vol. 27, no. 3, pp. 328–340, 2005. I. Jolliffe, Principal Component Analysis. Springer Verlag, 1986. D. J. Kriegman, J. P. Hespanha, and P. N. Belhumeur, “Eigenfaces vs. fisherfaces: Recognition using classspecific linear projection,” in ECCV, 1996, pp. I:43–58. X. W. Chen and T. S. Huang, “Facial expression recognition: A clustering-based approach,” Pattern Recognition Letters, vol. 24, no. 9-10, pp. 1295–1302, Jun. 2003. M. L. Zhu and A. M. Martinez, “Subclass discriminant analysis,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 28, no. 8, pp. 1274–1286, Aug. 2006. [12] [13] [15] [16] [17] [18] S. Yan, D. Xu, B. Zhang, H.-J. Zhang, Q. Yang, and S. Lin, “Graph embedding and extensions: A general framework for dimensionality reduction,” Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol. 29, no. 1, pp. 40–51, 2007. J. B. Tenenbaum, V. de Silva, and J. C. Langford, “A global geometric framework for nonlinear dimensionality reduction.” Science, vol. 290, no. 5500, pp. 2319–2323, dec 2000. S. T. Roweis and L. K. Saul, “Nonlinear dimensionality reduction by locally linear embedding.” Science, vol. 290, no. 5500, pp. 2323–2326, dec 2000. M. Belkin and P. Niyogi, “Laplacian eigenmaps and spectral techniques for embedding and clustering.” Advances in Neural Information Processing Systems (NIPS), vol. 14, pp. 585–591, 2001. Y. Cui and L. Fan, “A novel supervised dimensionality reduction algorithm: Graph-based fisher analysis,” Pattern Recognition, vol. 45, no. 4, pp. 1471–1481, 2012. J. Shi, Z. Jiang, and H. Feng, “Adaptive graph embedding discriminant projections,” Neural Processing Letters, pp. 1–16, 2013. E. Zare Borzeshi, M. Piccardi, K. Riesen, and H. Bunke, “Discriminative prototype selection methods for graph embedding,” Pattern Recognition, 2012. G. Arvanitidis and A. Tefas, “Exploiting graph embedding in support vector machines,” in Machine Learning for Signal Processing (MLSP), 2012 IEEE International Workshop on. IEEE, 2012, pp. 1–6. R. A. Fisher, “The statistical utilization of multiple measurements.” Annals of Eugenics, vol. 8, pp. 376–386, 1938. A. Azran and Z. Ghahramani, “Spectral methods for automatic multiscale data clustering.” in IEEE Computer Vision and Pattern Recognition (CVPR) (1). IEEE Computer Society, 2006, pp. 190–197. A. Maronidis, A. Tefas, and I. Pitas, “Frontal view recognition using spectral clustering and subspace learning methods.” in ICANN (1), ser. Lecture Notes in Computer Science, W. D. K. I. Diamantaras and L. S. Iliadis, Eds., vol. 6352. Springer, 2010, pp. 460–469. O. J. Dunn, “Multiple comparisons among means,” Journal of American Statistical Association, vol. 56, no. 293, pp. 52–64, 1961.

Il relève de l'évidence que la présence française en Chine constitue un enjeu géopolitique majeur pour la France. Moins connue est sans doute la politique d'influence culturelle qui est menée dans ce pays depuis de nombreuses années : la diffusion linguistique, au sein des institutions d'enseignement chinoises ou françaises/francophones (telles les Alliances françaises ou l'Institut français), fait ainsi pleinement partie de l'arsenal de soft power français dans cette région du monde. L'ampleur des chiffres (nombre d'apprenants, d'enseignants, de département de français, etc.) peut parfois donner le tournis aux acteurs concernés de près ou de loin par le français langue étrangère (FLE) dans ce pays. Je m'intéresserai dans cette contribution à l'effet que peut produire cette « place » du français en Chine sur la discipline didactique du FLE française/francophone 1. En m'appuyant sur diverses expériences d'enseignement du français et de recherches en contexte chinois, voici la question simple que je souhaite poser ici : comment la didactique du FLE prend-elle le contexte d'enseignement-apprentissage chinois comme un « grand Autre », une différence spectacularisée qui lui sert de miroir, une sorte de miroir déformant la mettant à l'épreuve ? En effet, si la diversité et la pluralité sont (selon les discours usuellement circulants en DDL au moins) censées être prises en compte par les enseignants et didacticiens des langues, il est étonnant de voir les nombreuses simplifications et catégorisations grossières qui sont effectuées quand il s'agit, particulièrement, de l'enseignement du français en Chine. Aussi peut-on dire que la Chine « met à l'épreuve » la didactique du FLE, lui révélant certains de ses travers-peut-être plus répandus qu'on ne le croit-en les grossissant. Je considèrerai ici trois aspects principaux de cette mise à l'épreuve au contact de la Chine : mise à l'épreuve des chiffres tout d'abord, mise à l'épreuve des recherches en didactique du FLE sur les cultures d'enseignement et d'apprentissage (CE/A) ensuite, et enfin, mise à l'épreuve des convictions éthico-politiques des acteurs du FLE. Ces différents aspects seront envisagés dans une fonction heuristique pour la discipline didactique : il ne s'agit absolument pas de participer au très en vogue « Chine bashing », bien au contraire : c'est la didactique du FLE qu'il s'agit d'interroger, à travers son rapport au contexte chinois. Une sorte d'usage méthodique de la Chine 2 pour mettre en perspective trois types de discours ou tendances fortes-aussi 1 Cette expression de « didactique du FLE française / francophone » ne doit pas être comprise comme renvoyant à un champ bien délimité, structuré et cohérent, mais, plus simplement, à la didactique d'expression française. C'est à cette didactique-là que je me réfèrerai principalement. 2 « Je n'oppose donc pas les pensées de la Chine et de l'Europe en tant que 'monde', mais je monte, au sens opératoire du terme, et au point par point, un vis-à-vis entres elles ; ou encore je ne pose pas d'emblée des 'contraires', mais fais jouer un effet de contraste de façon que l'une s'inscrive en regard de l'autre, qu'elles se réfléchissent l'une l'autre et s'éclairent » (Jullien, 2007 : 88 ; je souligne).

国外文凭《Maine学历证书学位证书》《Q微信/1954292140》毕业证明书补发：原版制作缅因大学毕业证书学位证书本科文凭证书办理美国文凭毕业证成绩单、做美国研究生文凭、《制作缅因大学学位证书Maine毕业证购买》、美国文凭Maine毕业证书原版制作缅因大学成绩单、美国成绩单密封邮寄论文没过美国Maine Offer letter缅因大学Offer。专业留学服务公司《原版制作缅因大学研究生毕业证实拍图》【Q/微1954 292 140】《Maine成绩单密封邮寄急速办理》拥有海外样板无数，能完美1:1还原海外各国大学缅因大学degree #Diploma #Transcripts等毕业材料。为留学生提供以下服务：一、缅因大学毕业证#成绩单等全套材料《美国本科文凭证书论文没过缅因大学研究生毕业证实拍图》【Q/微1954292140】《急速办理缅因大学成绩单密封邮寄》，从防伪到印刷，从水印到钢印烫金，跟学校原版100%相同. 二、真实教育部认证，教育部存档，中国教育部留学服务中心认证（即教育部留服认证）网站100%可查. 三、真实使馆认证（即留学人员回国证明），使馆存档可通过大使馆查询确认. 四、真实留信网认证，国家专业人才认证中心颁发入库证书，留信网永久存档可查. 文凭学历证书办理流程《美国本科文凭证书论文没过缅因大学研究生毕业证实拍图》《Q微1954292140》： 1、客户提供办理信息：姓名、生日、专业、学位、毕业时间等（如信息不确定可以咨询顾问：微信1954292140我们有专业老师帮你查询）； 2、客户付定金下单； 3、公司确认到账转制作点做电子图； 4、电子图做好发给客户确认； 5、电子图确认好转成品部做成品； 6、成品做好拍照或者视频确认再付余款； 7、快递给客户（国内顺丰，国外DHL）。【公司采用定金+余款的付款流程，以最大化保障您的利益，让您放心无忧】我们主要项目有：美国毕业证办理，英国毕业证办理，加拿大毕业证办理，澳洲毕业证办理，德国文凭，（高仿文凭样本展示《美国本科文凭证书论文没过缅因大学研究生毕业证实拍图》【Q/微1954292140】《急速办理缅因大学成绩单密封邮寄》）法国文凭，新西兰文凭，新加坡文凭，马来西亚文凭等国大学文凭制作，如果那您有需要希望你与我们联系。《原版制作缅因大学研究生毕业证实拍图》文凭学历证书办理流程《Q微1954292140》： 1、客户提供办理信息：姓名、生日、专业、学位、毕业时间等（如信息不确定可以咨询顾问：微信1954292140我们有专业老师帮你查询）； 2、客户付定金下单； 3、公司确认到账转制作点做电子图； 4、电子图做好发给客户确认； 5、电子图确认好转成品部做成品； 6、成品做好拍照或者视频确认再付余款； 7、快递给客户（国内顺丰，国外DHL）。真实网上可查的证明材料 1 #教育部学历学位认证《美国本科文凭证书论文没过缅因大学研究生毕业证实拍图》【Q/微1954292140】《急速办理缅因大学成绩单密封邮寄》留服官网真实存档可查，永久存档。 2 #留学回国人员证明（使馆认证）《Maine研究生毕业证实拍图原版制作》【Q/微1954292140】《论文没过缅因大学教育部认证本科文凭证书》使馆网站真实存档可查。真实可查留信认证的作用(私企，外企，荣誉的见证): 1：该专业认证可证明留学生真实留学身份《Maine成绩单密封邮寄缅因大学本科文凭证书急速办理》【Q/微1954292140】《论文没过Maine研究生毕业证实拍图成绩单密封邮寄》同时对留学生所学专业等级给予评定，国家专业人才认证中心颁发入库证书。 2：个人职称评审加20分，个人信誉贷款加10分。 3：在国家人才网主办的全国网络招聘大会中纳入资料，供国家500强等高端企业选择人才。留学有一种留学模式中外合作办学《美国本科文凭证书论文没过缅因大学研究生毕业证实拍图》【Q/微1954 292 140】《急速办理缅因大学成绩单密封邮寄》，这种学习方式一般有“3+1”或者“2+2”这类办学模式，即在国内读2-3年，再到国外读2-3年，实现自己的留学梦想。但是有些项目国家是不承认的，很多的合作办学项目并未备案。据了解，目前范围经批准的中外合作办学项目有一千多个。，其中高等教育类占62%。但这些合作办学项目的质量参差不齐。这导致很多留学生都误入进行没有备案的中外合作办学学习，也就是所谓的野鸡大学。那对于谠种没有备案的项目，国外学历认证《原版制作缅因大学研究生毕业证实拍图》【Q/微1954 292 140】《Maine成绩单密封邮寄急速办理》应该如何办理呢？

Log In

Graph Embedding Exploiting Subclasses