International Journal of Computational Intelligence Systems

Volume 9, Issue 6, December 2016, Pages 1041 - 1054

Manifold Regularized Proximal Support Vector Machine via Generalized Eigenvalue

Authors
Jun Liang*, Fei-yun Zhang, Xiao-xia Xiong, Xiao-bo Chen, Long Chen, Guo-hui Lan
Automotive Engineering Research Institute, Jiangsu University, Zhenjiang, 212013, P. R. China
* Corresponding author. E-mail:liangjun@ujs.edu.cn.
Corresponding Author
Jun Liang
Received 8 April 2014, Accepted 11 April 2016, Available Online 1 December 2016.
DOI
10.1080/18756891.2016.1256570How to use a DOI?
Keywords
Support vector machines; Generalized eigenvalues; Locality preserving projections; Manifold regularization
Abstract

Proximal support vector machine via generalized eigenvalue (GEPSVM) is a recently proposed binary classification technique which aims to seek two nonparallel planes so that each one is closest to one of the two datasets while furthest away from the other. In this paper, we proposed a novel method called Manifold Regularized Proximal Support Vector Machine via Generalized Eigenvalue (MRGEPSVM), which incorporates local geometry information within each class into GEPSVM by regularization technique. Each plane is required to fit each dataset as close as possible and preserve the intrinsic geometric structure of each class via manifold regularization. MRGEPSVM is also extended to the nonlinear case by kernel trick. The effectiveness of the method is demonstrated by tests on some examples as well as on a number of public data sets. These examples show the advantages of the proposed approach in both computation speed and test set correctness.

Copyright
© 2016. the authors. Co-published by Atlantis Press and Taylor & Francis
Open Access
This is an open access article under the CC BY-NC license (http://creativecommons.org/licences/by-nc/4.0/).

1. Introduction

Standard SVMs1, which are powerful tools for data classification and regression, have come to play a very dominant role in machine learning and pattern recognition community. The approach is systematic and motivated by Statistical Learning Theory2, which seeks the optimal separating hyperplane by minimizing structural risk instead of empirical risk. In binary classification, it assigns the point to one of the two disjoint half spaces in either the original input space or a higher dimensional feature space. In order to find the optimal separating hyperplane, SVMs need to solve a quadratic programming problem (QPP) which is time-consuming in large sized samples.

Recently, several simpler classification methods were proposed3,4. Opposite to SVMs, in binary classification, they all aim at finding two planes so that each plane was closest to the points of its own class and furthest away from the points of the other class. This idea not only leads to faster and simpler algorithms than the conventional SVMs but also avoids the unbalanced sample distribution problem in SVMs. Among these methods, generalized eigenvalue proximal SVM (GEPSVM)5,6,7 does binary classification by formulating two eigenvalue problems instead of quadratic programming problem to generate two nonparallel planes which make it extremely fast in real-world applications. The eigenvectors corresponding to the smallest eigenvalues determines the above two planes, as shown in Fig. 1.

Fig. 1.

The examples of standard SVM

The last decade has witnessed the emergence of manifold learning as a vigorous paradigm for dimensionality reduction and data visualization. The basic assumption of manifold learning is that though probably each data point consists thousands of features, it may actually be sampled from a low dimensional manifold embedded in a high dimensional space. The popular manifold learning methods, including Locally Linear Embedding (LLE)8, ISOMAP9, Laplacian Eigenmap10 and their various extensions11,12, have gained successful applications in many fields. In addition to the applications in dimensionality reductions, manifold regularization13 has been proposed as a new form of regularization that allows us to exploit the geometry of the distribution of the data points. It has also been used in SVMs, e.g., Laplacian SVMs14,15, to take advantage of unlabeled data points. In the literature, manifold learning, especially locality preserving projections criteria16, are widely used to reduce dimensionality of the original data in feature extraction or penalize classifier using unlabeled data samples in semi-supervised learning. However, there is little evidence which has shown whether supervised classifier can benefit from manifold structure directly from labeled samples. Recently, the idea of manifold learning has been applied in GEPSVM by considering the local information within each class to eliminate the influence of outliers17.

With the geometric intuition of GEPSVM and the essence of manifold learning, we argue that manifold regularization is particularly suitable for GEPSVM and its variants. In contrast with conventional SVMs, GEPSVM and other recently proposed classifiers, e.g., Twin SVMs4,18, are all based on seeking planes best fitting the data points of the same class rather than just separating points of different classes. Twin SVMs are non-parallel planar classifiers whose target is to construct a classification hyper plane for two kinds of data, which makes the sample of each super plane distance as close as possible, and the distance of the samples as far as possible. In other words, they are not only “discriminative” based approach but also “expressive” based approach. Therefore, it’s natural to make attempts to combine manifold structure with these methods. Another reason is that the numerators of the objective functions of GEPSVM have similar formulations as least square regression which is known to tend to overfit data points. That’s why Tikhonov regularization terms19 are integrated in GEPSVM to enhance its generalization ability and avoid overfitting. But they merely penalize the numerators whereas ignoring the denominators. Furthermore, in many cases, manifold regularization is more powerful as it takes the intrinsic geometry structure of the data points into account. In this paper, manifold regularization, specifically locality preserving projections (LPP)20,21, is firstly extended in our context and then incorporated into classical GEPSVMs. Locality Preserving Projections (LPP) are linear projective maps that arise by solving a variation problem that optimally preserves the neighborhood structure of the data set22,23. LPP should be seen as an alternative to Principal Component Analysis (PCA)—a classical linear technique that projects the data along the directions of maximal variance. When the high dimensional data lies on a low dimensional manifold embedded in the ambient space, the Locality Preserving Projections are obtained by finding the optimal linear approximations to the Eigen functions of the Laplace Beltrami operator on the manifold24,25. LPP shares many of the data representation properties of nonlinear techniques such as Laplacian Eigen maps or Locally Linear Embedding. Yet LPP is linear and more crucially is defined everywhere in ambient space rather than just on the training data points. LPP may be conducted in the original space or in the reproducing kernel Hilbert space into which data points are mapped. This gives rise to kernel LPP. We should emphasize that the purpose of manifold regularization in our context is different from the classical one whose primary aim is to utilize the distribution information of unlabeled samples. Unlike semi-supervised26 learning, the intention of the proposed approach is to demonstrate how manifold structure of labeled samples is also beneficial to supervised learning even when unlabeled samples are available. Twin SVM is a non-parallel planar classifier shown in Fig. 2, its target is to construct a classification hyper plane for two kinds of data, which makes the sample of each super plane distance as close as possible, and the distance of the samples as far as possible.

Fig. 2.

The examples of Twin SVM

In this paper, MRGEPSVM is derived by designing a modified locality preserving projections criteria and incorporating it into GEPSVM. Unlike conventional LPP, we project the data points onto planes and require the local geometry structure to be preserved. It demonstrates that manifold regularization is suitable for GEPSVM and its variants.

The remainder of this paper is organized as follows: In section 2, we give a brief overview of GEPSVM followed by the construction of local preserving projections criteria in section 3. Section 4 presents manifold regularized GEPSVM based on the novel regularization. In section 5, detailed experimental results are given and we conclude this paper in section 6.

2. A brief review of GEPSVM

Supposed we are given m1 training samples belongs to class 1 and m2 training samples belongs to class 2 in the n-dimensional real space, with m1 + m2 = m. Let A and B be two matrices whose rows are samples of class 1 and class 2 respectively. Let w1 and w2 be the normal to the two planes respectively, GEPSVM seeks two nonparallel planes given by:

xTw1+b1=0xTw2+b2=0
so that each one is close to the points of one class but far away from the points of the other. This idea yields the following optimization problems.
{minw1,b10Aw1+e1b12/w12Bw1+e2b12/w12minw2,b20Bw2+e2b22/w22Aw2+e1b22/w22
where Bw1 + e2b1 ≠ 0, and Aw2 + e1b2 ≠ 0, while e1 and e2 are vectors of ones of appropriate dimensionality. In order to achieve good generalization, Tikhonov regularization term 12 is introduced, leading to the following optimization problems.
{minw1,b10Aw1+e1b12+[w1;b1]2Bw1+e2b12minw2,b20Bw2+e2b22+w1;b12Aw2+e1b22
where Bw1 + e2b1 ≠ 0, and Aw2 + e1b2 ≠ 0. By defining:
z1T=[w1T,b1]z2T=[w2T,b2]P=[Ae1]T[Ae1]Q=[Be2]T[Be2]
The objective functions in (2) can be written as the following Rayleigh quotient problems:
{minz10z1T(P+δ1I)z1z1TQz1minz20z2T(P+δ1I)z2z2TQz2
The optimal z1 and z2, i.e. the coefficients of the two planes, are determined by solving a pair of generalized eigenvalue problems respectively. As to the detailed algorithm of GEPSVM, please refer to Ref. 3.

3. Local Preserving Projections on Plane

In this section, the idea of local preserving projections is extended in the case that we expect the intrinsic geometry structure of original data samples of the same class can be preserved after they are projected to a plane.

Firstly, in order to preserve the original local geometry structure within each class, we need to construct a graph for each class, named as with-in class graph G(c)(c = 1,2). Each vertex in G(c) correspinds to a sample point which belongs to class c and an edge between a vertex pair is added when the corresponding sample pair is each other’s k-nearest neighbors. After the structure of G(c) is fixed, we need to determine its weight matrix S(c). In the literature, there exist several methods for weight calculation, e.g., RBF kernel, inverse Euclidean distance. In our experiments, RBF kernel is selected as follows: Sij(c)=exp(xi(c)xjc2/δ) if xi(c) and xj(c) are neighbors and belonging to the same class c, otherwise Sij(c)=0.

After with-in graphs have been constructed in input space, we project the points of each class onto their corresponding fitting planes and extend LPP criteria under this condition. The detailed procedure is given as follows.

Give an arbitrary point x in ℝn and let y be its orthogonal projection onto the fitting plane wT x + b = 0, we can obtain the following equality from Fig. 3.

y+wwd=x
Fig. 3.

Illustration of the geometry of projecting x onto wT x + b = 0.

After a simple calculation as described in Ref. 15, the value of d can be expressed as:

d=wTx+bw

On substituting (7) into (6), the orthogonal projection of x can be formulated as:

y=xwwd=x(wTx+b)ww2=x(wTx+b)w¯
where w¯ is defined as:
w¯=ww2

According to our idea, the local geometry structure within the points of each class should be preserved after the projection. Take class 1 for example, this leads to the following optimization problem.

minw1=i,i=1m1yi(1)yj(1)2Sij(1)
where yi(1) is the orthogonal project of point xj(1), i.e., i-th point of class 1, onto its corresponding fitting plane w1Tx+b1=0. Following (8), yi(1) can be expressed explicitly in the form:
yi(1)=xi(1)(w1Txi(1)+b)w1w12=xi(1)(w1Txi(1)+b)w¯1
where:
w¯1=w1w12
Thus, the objective function of (10) is reformulated as:
i,i=1m1yi(1)yj(1)2Sij2=i,i=1m1xi(1)(w1Txi(1)+b)w¯1xj(1)+(w1Txj(1)+b)w¯12Sij(1)=i,i=1m1(xi(1)xj(1))w1T(xi(1)xj(1))w¯12Sij(1)
For the sake of simplicity, let us define:
dij(1)=xi(1)xj(1)=ATtiATtj=AT(titj)
where li is m1-dimensional unit vector with the i-th element 1, 0 otherwise. Therefore, (13) can be expressed in the form:
i,j=1m1yi(1)yj(1)2Sij(1)=i,j=1m1dij(1)w1Tdij(1)w¯12Sij(1)=i,j=1m1(dij(1)w1Tdij(1)w¯1)T(dij(1)w1Tdij(1)w¯1)Sij(1)=i,j=1m1(dij(1)Tw1Tdij(1)w¯1T)(dij(1)w1Tdij(1)w¯1)Sij(1)=i,j=1m1(dij(1)Tdij(1)2w1Tdij(1)dij(1)Tw¯1+(w1Tdij(1))2w¯1Tw¯1)TSij(1)

By dropping the terms irrelative to optimize variables and making use of the expression of w1 and dij(1), i.e., (12) and (14), (15) can be further formulated as following:

i,j=1m1yi(1)yj(1)2Sij(1)i,j=1m1(2w1Tdij(1)dij(1)Tw¯1+(w1Tdij(1))2w¯1Tw¯1)Sij(1)=i,j=1m1(2(w1Tdij(1))2w12+(w1Tdij(1))2w12)Sij(1)=1w12i,j=1m1(w1Tdij(1))2Sij(1)=1w12w1T(i,j=1m1dij(1)dij(1)TSij(1))w1=1w12w1TAT(i,j=1m1(titj)(titj)TSij(1))Aw1
where ei is a m1 dimensional unit vector with the i-th element 1, 0 otherwise. Now, let us derive a more explicit expression for i,j=1m1(titj)(titj)TSij(1) as:
i,j=1m1(titj)(titj)TSij(1)=i,j=1m1(titj)(tiTtjT)Sij(1)=i,j=1m1(titiTtitjTtjtiT+tjtjT)Sij(1)=2(i=1m1titiTj=1m1Sij(1)i,j=1m1titjTSij(1))=2(D(1)S(1))=2L(1)
where D(1) is a diagonal matrix with its entries being the row sums of S(1), i.e. Dij(1)=jSij(1), and L(1) = D(1)S(1) is the Laplacian matrix of the data points of class 1. On substituting (17) into (16), we have:
i,j=1m1yi(1)yj(1)2Sij(1)2w12w1TATL(1)Aw1
Considering minimization modus operandi in (10), we finally obtain the following relationship.
{minw1i,j=1m1yi(1)yj(1)2Sij(1)maxw12w12w1TATL(1)Aw1
Now we should compare our novel LPP criteria (19) with the traditional one usually expressed in the form, and X is the set of xi.
minw1TXTLXw1w1TXTDXw1
which is obtained by optimizing the vector w1 onto which each point of is X projected meanwhile best preserving the local structure within X. In our context, X = A. We can gain some insights into the difference between (19) and (20), i.e., minimization is converted to maximization. This phenomenon is explained as follows. First of all, it can be noticed that w1 is merely the normal vector of the fitting plane. Furthermore, our goal is to preserve local geometry information within original space after points are projected onto the fitting plane instead of its normal vector. After determining whether the normal vector is orthogonal to its derived plane, it’s clear that we reach that goal. A simple geometry example to visually illustrate this phenomenon is given in Fig. 4.

Fig. 4.

Illustration of the difference between LPP and the proposed projections

It’s obvious that the best vector that preserves local structure of data points corresponds to the worst plane doing so. As Fig. 4 left shows, A, B are projected meanwhile best preserving the local structure. However, in Fig. 4 right, a and b are projected, but they do not preserve the local structure.

4. Incorporate local geometry structure into GEPSVM

We are now in a position to introduce the new criteria (18) into classical GEPSVM. As aforementioned, GEPSVM aims to find two distinct planes which are not parallel to each other so that each closest to the points of one class and furthest away from the other. Recalling the objective function (5) of GEPSVM, it’s obvious that its numerator is similar to least square regression, which tends to generate overfitting solutions and that’s why we need regularization technique to avoid it. In original GEPSVM, Tikhonov regularization plays the role in tackling this problem. In this paper, besides Tikhonov term, manifold regularization, i.e., criteria (19), is also used to handle it.

Our idea is to seek two planes so that each one is required not only closest to the data set of its own and furthest from the other data set but also to preserve the local geometry structure of them, respectively. To achieve this, (10) is used to penalize the original GEPSVM, leading to the following modified objective functions for class 1.

{minw1,b1Aw1+e1b12/w12Bw1+e2b12/w12minw1i,j=1m1yi(1)yj(1)2Sij(1)

Making the result of (19), (21) can be expressed as following:

{minw1,b1Aw1+e1b12/w12Bw1+e2b12/w12maxw12w12w1TATL(1)Aw1

We can view the second function as manifold regularized and incorporated into the first function as in (23) . At the same time, in order to balance the effect of the two objectives, trade-off factor δM is introduced, which leads to the following optimization problems.

minw1,b1Aw1+e1b12/w12Bw1+e2b12/w12+δMw1TATL(1)Aw1/w12=minw1,b1Aw1+e1b12Bw1+e2b12+δMw1TATL(1)Aw1
Under the definition in GEPSVM, (23) can be written as:
minw1,b1Aw1+e1b12Bw1+e2b12+δM[w1Tb1][ATL(1)A0n0nT0][w1b1]=z1TPz1z1TQz1+δMz1TM(1)z1=z1TPz1z1T(Q+δMM(1))z1
where [ATL(1)A0n0nT0] and 0n is a vector of zero with order n. By combining (24) with Tikhonov regularization with appropriate trade-off factor δT, we obtain the following optimization problems.
minz10z1T(P+δTI)z1z1T(Q+δMM(1))z1
Then, the optimal z1, i.e. [w1; b1], is the eigenvector corresponding to the smallest eigenvalue of the following generalized eigenvalue problem.
(P+δTI)z1=λ(Q+δMM(1))z1

Following the similar procedure above, we define an analogous minimization problem to (25) for determining Z2 = [w2; b2], i.e., the coefficient of the second plane w2Tx+b2=0 by:

minz20z2T(Q+δTI)z2z2T(P+δMM(1))z2
where M(2)=[BTL(2)B0n0nT0], and L(2) = D(2)S(2).

The minimum of (27) is achieved at the eigenvector corresponding to the smallest eigenvalue of the following generalized eigenvalue problem.

(Q+δTI)z2=λ(P+δMM(2))z2

Once we obtain the eigenvectors corresponding to the smallest eigenvalues of (26) and (28), the two planes are determined at the same time. Therefore, for a new coming sample x, we first calculate the distance from x to the two planes given by (29):

d1=|xTw1+b1|/w12d2=|xTw2+b2|/w22

Then we classify x as belonging to class 1 if otherwise d1 < d2 to class 2.

We now describe our simple algorithm as follows for implementing a linear MRGEPSVM.

Given m1 samples in ℝn represented by A ∈ ℝm1×n and m2 samples represented by B ∈ ℝm2×n:
  1. (i)

    Define the augmented matrix P, Q, M(1) and M(2) in feature space

  2. (ii)

    Solve the eigenvalue problems of (36) and (38);

  3. (iii)

    For the new coming sample x, compute the distances d1, d2, and classify it to the nearest one.

Algorithm 1:

Linear Manifold Regularized Proximal Support Vector Machine via Generalized Eigen-value

Originally, GEPSVM were proposed to tackle XOR problems which is difficult for conventional linear classifications such as SVMs, by which it can obtain nearly 100% correct. A “cross planes” example is show in Fig. 5 for GEPSVM and the proposed MRGEPSVM. We observe that MRGEPSVM obtains nearly the same accuracy as GEPSVM, which implies the ability of GEPSVM to solve XOR problems is not depressed in MRGEPSVM by introducing manifold regularization technique.

Fig. 5.

The “cross planes” learned by GEPSVM and MRGEPSVM together with their correctness on the training data set

5. Nonlinear MRGEPSVM

In this section, we extend our results to nonlinear classifiers. Suppose that the Euclidean space ℝn is mapped to a Hilbert space H, named as feature space instead of input space ℝn, through a nonlinear mapping function ϕ : ℝnH. Let K(xi, xj) be kernel function in feature space satisfying K(xi, xj) = ϕ(xi)T ϕ(xj).The data samples A and B are mapped to ϕ(A) and ϕ(B), respectively. Let the fitting planes in feature space be denoted by:

φ(xT)w1+b1=0φ(xT)w2+b2=0
therefore, (23) can be expressed in following form in feature space.
minw1,b1φ(A)w1+e1b12φ(B)w1+e2b12+δMw1Tφ(A)TL(1)φ(A)w1
A basic principle in feature space is that w1 can be expressed as the linear combinations of all data samples in H, leading to:
w1=φ(CT)u1
where CT = [AT BT], and u1 ∈ ℝm.

Substituting (32) into (31), we obtain:

minu1,b1K(A,CT)u1+e1b12K(B,CT)u1+e2b12+δMu1TK(C,AT)L(1)K(A,CT)u1=z1TPz1z1TQz1+δMz1TM(1)z1
where:
P=[K(A,CT)e1]T[K(A,CT)e1]Q=[K(A,CT)e2]T[K(A,CT)e2]M(1)=[K(C,AT)L(1)K(A,CT)0m0mT0]

Similarly, by combining (33) with Tikhonov regularization, we obtain:

minz10z1T(P+δTI)z1z1T(Q+δMM(1))z1

By an entirely similar argument, we can define an analogous minimization problem for the fitting plane of class 2.

minz20z1T(P+δTI)z2z2T(Q+δMM(2))z2
where:
M(2)=[K(C,BT)L(1)K(B,CT)0m0mT0]

After z1 and z2 are solved, two fitting planes in feature space are determined. In order to derive the decision rule in feature space, suppose we gave a new data point x ∈ ℝn to be classified. Then the distance between its image in feature space, i.e. ϕ(x), and the fitting plane of class 1, i.e. ϕ(xT) w1 + b1 = 0, is given by:

d1=|φ(xT)w1+b1|w12

By using (32), (38) can be expressed as:

d1=|φ(xT)φ(CT)u1+b1|/u1Tφ(C)φ(CT)u1=|K(xT,CT)u1+b1|/u1TK(C,CT)u1

Similarly, the distance between and the fitting plane of class 2, i.e., ϕ(xT) w2 + b2 = 0, is given by:

d2=|K(xT,CT)u2+b2|/u2TK(C,CT)u2

The decision rule for nonlinear MRGEPSVM is the same as its linear version, expect the distances from x to fitting planes need to be calculated by (35) and (36) .

We now give an explicit statement for nonlinear MRGEPSVM algorithm.

Given m1 samples in ℝn represented by A ∈ ℝm1×n and m2 samples represented by B ∈ ℝm2×n.
  1. (i)

    Define the matrix P, Q, M(1) and M(2) in input space;

  2. (ii)

    Solve the eigenvalue problems of (26) and (27);

  3. (iii)

    For the new coming sample x, compute the distances d1, d2, and classify it to the nearest one.

Algorithm 2:

Nonlinear Manifold Regularized Proximal Support Vector Machine via Generalized Eigenvalue

6. Experimental Results

To demonstrate the performance of the proposed method, we conducted experiments on several benchmark datasets from UCI Repository 27. Optimal values of all parameters involving each method were obtained by using a tuning set comprising of 10 percent of the data set. Table 1 shows the comparison of MRGEPSVM, TSVM, GEPSVM and SVM.

MRGEPSVM TSVM GEPSVM SVM
Hepatitis 85.16 ± 2.16 80.79 ± 3.74 58.29 ± 2.13 80.00 ± 2.03
Ionosphere 87.47 ± 3.38 88.03 ± 4.69 75.19 ± 3.43 86.04 ± 5.91
Heart-statlog 84.82 ± 1.98 84.44 ± 2.62 84.81 ± 3.11 84.07 ± 4.56
Votes 95.40 ± 1.62 96.08 ± 2.73 91.93 ± 3.08 94.50 ± 2.79
Sonar 79.81 ± 1.08 77.25 ± 3.75 66.76 ± 5.44 79.79 ± 2.13
Pima 74.88 ± 3.09 73.70 ± 5.14 74.60 ± 4.16 76.68 ± 3.82
Table 1.

Classification accuracy using linear kernel

From the experimental results listed in Table 1, we can see that MRGEPSVM gains superior performance than GEPSVM on nearly all datasets. We also performed experiments on several UCI datasets using RBF kernel. The performances of each method are shown in Table 2.

MRGEPSVM TSVM GEPSVM SVM
Hepatitis 81.94 ± 3.14 82.67 ± 5.03 78.25 ± 4.86 83.13 ± 3.56
BUPA 66.09 ± 2.71 67.83 ± 7.01 63.80 ± 3.25 58.32 ± 6.83
WPBC 79.72 ± 2.14 81.92 ± 4.17 62.70 ± 3.36 79.92 ± 3.16
Table 2.

Classification accuracy using nonlinear kernel

The aforementioned methods have been tested on benchmark data sets that are publicly available. Results regard their performance in terms of classification accuracy. The results regarding the linear kernel have been obtained using the first two repositories. The third one has been used in the non-linear kernel implementation. For each data set, the latter repository offers 100 predefined random splits into training and test sets. For several algorithms, results obtained from each trial, including SVMs, are recorded. Execution times and the other accuracy results have been calculated using an Intel Core i7 CPU 3.20 GHz, 8 GB RAM running Windows 8 with Matlab 2013b, during normal daylight operations. In the case of nonlinear kernel, we observe there are still perceptible improvements compared with GEPSVM. Meanwhile the accuracy of MRGEPSVM is close to TSVM and SVM. It seems that manifold structure didn’t greatly facilitate the performance in feature space. A possible explanation may lie in the higher dimensionality offset the effect brought by manifold structure within data samples after they are mapped.

Elapsed time are listed in Table 3 and Table 4, by different methods with Gaussian and linear kernel, respectively. In the linear one, MRGEPSVM and GEPSVM outperform TSVM and SVM in all cases. MRGEPSVM is at least twice faster than GEPSVM. When the Gaussian kernel is used, SVMs implementations achieve better performances with respect to the eigenvalues based methods. In all cases, MRGEPSVM is faster than GEPSVM.

Dataset MRGEPSVM GEPSVM TSVM SVM
Votes 1.1473 5.8744 0.4523 0.2022
Sonar 3.8717 5.8947 0.1543 0.4080
Pima 0.0296 0.1143 0.0302 7.1968
Flare-solar 1.9839 16.1654 2.1429 4.4562
Waveform 0.5998 4.480 0.9016 0.2284
Thyroid 0.0246 0.1280 0.0503 0.0718
Heart 0.0361 0.2187 0.0278 0.1732
Banana 0.4989 3.1102 0.0346 1.3505
Breast-cancer 0.0688 0.3544 0.0429 0.1188
Table 3.

Elapsed time in seconds using Gaussian kernel

Dataset MRGEPSVM GEPSVM TSVM SVM
Votes 0.0119 0.0277 0.0024 0.0019
Sonar 0.0364 0.0854 0.0589 0.0395
Pima 0.0015 0.2858 0.6809 0.0013
Flare-solar 0.0158 0.1673 0.1092 0.0893
Waveform 0.0013 0.0934 0.0438 0.0472
Thyroid 0.0011 0.0183 0.00524 0.0018
Heart 0.0012 0.1091 0.0019 0.0011
Banana 0.0024 0.1578 0.0063 0.0038
Breast-cancer 0.0002 0.0158 0.0016 0.0009
Table 4.

Elapsed time in seconds using linear kernel

USPS database is a handwritten numeral recognition database provided by the United States postal service, which includes 10 kinds of gray images from 0 to 9, where the gray value has been normalized, with each figure containing 1100 images of 16 × 16 pixels. Table 5 is the average accuracy and standard deviation of different algorithms using USPS dataset.

Num. SVM Test±Std (%) LPP Test±Std (%) TSVM Test±Std (%) GEPSVM Test±Std (%) MRGEPSVM Test±Std (%)
l = 10 77.14 ± 1.97 82.26 ± 1.73 81.12 ± 1.67 81.41 ± 1.85 82.26 ± 1.37
l = 20 84.67 ± 0.93 89.19 ± 1.00 89.40 ± 1.09 87.78 ± 1.34 89.97 ± 1.06
l = 30 86.07 ± 1.62 91.60 ± 1.07 91.93 ± 0.87 90.51 ± 1.13 92.99 ± 0.98
l = 40 88.13 ± 1.13 93.22 ± 1.02 93.49 ± 0.95 91.88 ± 0.99 94.33 ± 0.93
l = 50 90.48 ± 1.34 94.50 ± 0.88 94.46 ± 1.05 93.40 ± 1.26 95.50 ± 1.08
Table 5.

The average accuracy and standard deviation of different algorithms in USPS dataset

In this experiment, we randomly selected 110 image data sets from each class of samples, and randomly selected 10% and 20% of the data as the training set to verify the MRGEPSVM algorithm. The penalty parameters C of SVM and TWSVM, as well as the regularization terms of MRGEPSVM and GEPSVM, are selected from the collection {2i| − 9,…, 9}. The value k of KNN is selected from the collection {3, 4,…,20}.

The experimental set-up is meant to simulate a real-world situation: we considered binary classification problems due to the splits of the training data, where all of one driver cases were labeled and all the rest were left unlabeled. The test set is composed of entirely new drivers, forming the separate group.

In Fig. 6, we compare the error rates of 30 binary classification problems of GEPSVM, MRGEPSVM algorithm. We train on the same driver from a training set of examples, and test on the remaining five set of samples. We considered the task of classifying the driver whether he meets obstacles as a binary classification problem.

Fig. 6.

Error rates of 30 binary classification problems

7. Conclusions

In this paper, a novel proximal support vector machine, called manifold regularized GEPSVM, is presented. Our analysis allows us to see the role of manifold regularization in proximal support vector machine via generalized eigenvalue in a clear way. MRGEPSVM is derived by designing a modified locality preserving projections criteria and incorporating it into GEPSVM. Unlike conventional LPP, we project the data points onto planes and require the local geometry structure to be preserved. In GEPSVM, we also solve a pair of eigenvalue problems to determine the two fitting planes. One advantage of MRGEPSVM is that it demonstrates that manifold regularization is suitable for GEPSVM and its variants. As to future work, we believe the proposed regularization technique may be applied in more powerful classifiers, e.g. Twin SVM, to further boost its performance.

Acknowledgements

This work is partially supported by the National Natural Science Foundation of China (Grant U1564201, No. 51108209, No. 50875112, No. 61573171 and No. 70972048), the Natural Science Foundation of Jiangsu Province (Grant No. BK2010339 and No. BK20140570), the Natural Science Foundation of the Jiangsu Higher Education Institutions of China (Grant No. 10KJD580001), the College Graduate Research and Innovation Program of Jiangsu Province(Dr. Innovation) (Grant No. CXLX11 0593), the Project Funded by the Priority Academic Program Development of Jiangsu Higher Education Institutions(PAPD), the Project Funded by the department of transport information (Grant 2013-364-836-900), the “Six Talent Peak” project in Jiangsu Province (DZXX-048), and the National Statistical Science Research Project (2014596).

References

1.CJC Burges, A Tutorial on Support Vector Machines for Pattern Recognition, Data Mining and Knowledge Discovery, Vol. 2, No. 2, 1998, pp. 1-43.
10.X He and P Niyogi, Locality Preserving Projections, Advances in Neural Information Processing Systems, MIT, Vol. 16, 2004, pp. 153-160.
14.CM Bishop, Pattern Recognition and Machine Learning, Springer-Verlag, New York, 2006.
16.AN Tikhonov and VY Arsenin, Solutions of illposed problems, VH Winston and Sons, Washington DC, 1977.
19.D Zhang, F Wang, C Zhang, et al., Multi-View Local Learning, AAAI, in Proceedings of the 23rd AAAI Conference on Artificial Intelligence (2008), Vol. 2, pp. 752-757.
27.M Lichman, UCI Machine Learning Repository, University of California, Irvine, 2013.
Journal
International Journal of Computational Intelligence Systems
Volume-Issue
9 - 6
Pages
1041 - 1054
Publication Date
2016/12/01
ISSN (Online)
1875-6883
ISSN (Print)
1875-6891
DOI
10.1080/18756891.2016.1256570How to use a DOI?
Copyright
© 2016. the authors. Co-published by Atlantis Press and Taylor & Francis
Open Access
This is an open access article under the CC BY-NC license (http://creativecommons.org/licences/by-nc/4.0/).

Cite this article

TY  - JOUR
AU  - Jun Liang
AU  - Fei-yun Zhang
AU  - Xiao-xia Xiong
AU  - Xiao-bo Chen
AU  - Long Chen
AU  - Guo-hui Lan
PY  - 2016
DA  - 2016/12/01
TI  - Manifold Regularized Proximal Support Vector Machine via Generalized Eigenvalue
JO  - International Journal of Computational Intelligence Systems
SP  - 1041
EP  - 1054
VL  - 9
IS  - 6
SN  - 1875-6883
UR  - https://doi.org/10.1080/18756891.2016.1256570
DO  - 10.1080/18756891.2016.1256570
ID  - Liang2016
ER  -