Face Spoof Attack Detection with Hypergraph Capsule Convolutional Neural Networks

Faceauthenticationhasbeenwidelyusedinpersonalidentification.However,faceauthenticationsystemscanbeattackedbyfake images.Existingmethodstrytodetectsuchattackswithdifferentfeatures.Amongthem,usingcolorimagesbecomepopular since it is flexible and generic. In this paper, a novel feature representation for face spoof attack detection, namely hypergraph capsule convolutional neural networks (HGC-CNNs), is proposed, which takes advantage of multiple features. To achieve it, capsuleneuralnetworksareusedtointegratedifferenttypesoffeatures.Inaddition,hypergraphregularizationisappliedtolearn thecorrelationsamongsamples.Inthisway,thedescriptivepowerisimproved.Theproposedfeaturerepresentationiscomparedwithexistingfeaturesforfacespoofattackdetectionandexperimentalresultsontwocommonlyuseddatasetsemphasizethe effectivenessofHGC-CNN.


INTRODUCTION
In the past few years, biometric identification technologies have been widely applied for authentication.Among them, face recognition attracts plenty of attention since it is contactless and easy to use.Therefore, it is used in lots of applications such as entrance guard, mobile payment and so on.Since authentication with face recognition becomes popular, security is critical in these applications.Faces are important privacies of people.However, it is easy to capture facial images or make fake images.If they are abused by malicious attackers, it is possible to conduct face spoof attacks to face recognition systems.In this way, the applications with facial authentication are vulnerable.
To avoid face spoof attacks, researchers make various attempts, which are called face spoof attack detection or face liveness detection.The key is how to distinguish real faces and fake photos (videos).A straightforward method is using motion cues.Some applications such as mobile payment and online banking systems may require the users to do some specific actions.These actions can be recognized with existing landmark detection methods [1].In this way, the success rate of spoof attacks are significantly reduced.However, it takes much time to pass the authentication.Besides, in many applications, the users may not cooperate with the requirements.Therefore, some methods without interactions are needed.
One choice of face spoof attack detection without interactions is still using motion features.These motion-based methods are used with videos.Anjos et al. used optical flow to explore the foreground/ background motion correlation [2] and it outperformed previous methods using motion analysis.Bharadwaj et al. used motion magnification to enhance the facial expressions in videos [3].Then, facial images are represented by multi-modal features.They are local binary pattern (LBP) in images and histogram of oriented optical flow (HOOF) in videos.
On the other hand, another choice is used with static images.Some of them requires additional sensors.For example, Liu and Kumar used both color images and near infrared illumination to detect face spoof attacks with using 3D face masks [4].Convolutional neural networks (CNNs)-based configurations were investigated to improve the performance.Wang et al. proposed to use the depth information extracted by a Kinect senor [5].Depth information was integrated with 2D textual information learned from 2D facial image regions.Similar to the aforementioned method, textual information is also extracted with CNN.Bresan et al. proposed to combine intrinsic image properties and deep neural networks [6].Depth, salience and illumination maps are used as features and a meta-learning method is used as the classifier.Since additional sensors are not always available, face spoof attack detection with color images are much more commonly used.More of the researches focus on image features describing different characteristics between real faces and fake photos.Based on improvements of LBP, Peng et al. proposed guided scale-based local binary pattern (GS-LBP) and local guided binary pattern (LGBP) [7].In this way, textual features are extracted and support vector machine (SVM) is used for classification.Galbally and Marcel transmitted the problem of spoof attack to image quality assessment [8].They used four different features for image distortion analysis.
Due to the ability on concept abstraction and description, deep learning is also used in face spoof attack detection.Yang et al. firstly proposed to use neural networks for spoof feature detection and SVM for classification [9].Feng et al. combined Shearlet features, RGB images and optical flow in neural networks [10].In this way, three different features for image quality, pixel colors and motion cues are used.Lei et al. divided faces into parts and used the inherent representation in CNN as features [11].Atoum et al. used HSV and YCC images instead of RGB images.A two-steamed strategy is used to detect features in color images and depth images.Liu et al. used both spatial and temporal features in CNN for attack detection [12].To detect unseen attacks, Shao et al. focused on the domain generalization problem [13].They made use of meta-learning and the feature space is regularized by the trained domain knowledge.
According to the review above, we can figure out the trend of face spoof attack detection is using color images and applying multiple features in order to improve universality [14][15][16].However, existing methods usually use traditional image features and ignore the semantic information.In this paper, we propose a method for face spoof attack detection with hypergraph capsule convolutional neural networks (HGC-CNNs).It improves descriptive power with correlations of multiple samples and integration of multiple features.The contribution can be summarized as: 1. First, a novel method for feature representation is proposed for face spoof attack detection.It is based on the combination of graph convolutional neural networks (GCNNs) and capsule networks.Correlations of multiple samples are learned with GCNNs, while multiple features are integrated in capsules.
2. Second, the process of graph construction is further improved with hypergraph regularization, in which different types of features are represented in a uniform feature space.Adjacency can be computed and applied in construction of graph Laplacian.The remainder of our paper is organized below.In Section 2, we outline the proposed HGC-CNN first and then introduce it in detail.After theoretical details, in Section 3, we show the performance of HGC-CNN on face spoof attack detection by experimental evaluations and comparisons.Finally, in Section 4, we come to the conclusion.

Outline
The overall process of the proposed method is shown in Figure 1.
First, different features of facial images are extracted.In our implementation, we use color pixels, LBP and image quality [8].The framework is flexible and features can be easily added or removed.Second, a capsule layer is used, which is the key contribution of the proposed method.In this layer, different features are enclosed to capsules and then they are integrated with hypergraph regularization.Third, the integrated features are further processed by three full connected layers to obtain the final feature representation, which is similar to the implementation of the original capsule networks.

Graph Regularized Capsule Convolutional Neural Networks
CNNs have proven successful in image description due to their high-level semantic abstraction capabilities.Let X = {x i |i = 1, 2, ... N} be a set of input samples.In traditional CNNs with  layers, the for- ward output on x i on the l-layer can be defined as where f (⋅) is widely-known as the activation function and (W) l is the mapping parameters of the l-th layer.
There are several drawbacks on the traditional definition of CNNs.The first one is CNNs ignore the semantic correlations among samples.Therefore, Kipf and Welling proposed GCNNs [17].In GCNNs, the output on x i depends on its K neighbors, which is given by where a ik denotes the adjacency weights between nodes i and k.It is usually normalized.
The second drawback of traditional CNNs is they simply work on scale values, which limits the descriptive ability.To improve it, Sabour et al. proposed capsule neural networks [18].In capsule networks, the input is formed as a vector.This vector is called a capsule and more properties can be contained in it.In this way, highly informative outputs can be computed with where P is the types of properties in one capsule.
Inspired by GCNN and capsule networks, the proposed graphregularized capsule convolutional neural networks (GRC-CNNs) combines the above two ideas and propose a novel approach for multiple-feature learning.The key forward operation is defined as Then, the general form of GRC-CNN can be rewritten as L is known as the graph Laplacian matrix and contains the semantic correlations among samples in X.

Hypergraph Construction
In the scenario of using multiple features, simple summing is used in the traditional routine of capsule networks.According to it, Eq. ( 5) can be transformed to However, in the proposed method, we propose multiple-feature integration by hypergraph regularization.In traditional graph regularization methods, the relationships among features are assumed to be pairwise, which is over-simplified and limits the descriptive power.In hypergraph, features with the same property are connected by hyperedges.With hyperedges, more than two features can be connected.Therefore, the key to obtain good performance is the hypergraph construction of a uniform L which integrates multiple L p .
To simplify the notation, X p is used to represent the output of X p instead of W p X p , which can be assumed to be new features.To integrate them, we use patch alignment framework [19,20].In this framework, feature vectors can be represented by vertices in this graph.Notations are shown in Table 1 and the process of multiplefeature integration is shown in Figure 2. In details, the Laplacian matrix L p for p-th features and the global L can be computed with two stages.
In Eq. ( 8), we use all pairs of vertices contained by a hyperedge, e, and compute the summing value of w(e) (e) By expanding Eq. ( 9) and combining items, the patch optimization formulation for each hyperedge can be defined as where Y = y i | 1, 2, … , n is the labels of images.For out-ofsample images, Y can be estimated by KNN.Besides, Matrix E is defined as where ⃗ e = [1, … , 1] T , I is an n × n identity matrix.The computational complexity of this stage is K × n × d 2 .
2. Whole alignment stage: In the hypergraph, the weight of a hyperedge is computed by summing the adjacency scores of all the pairs of vertices contained in this hyperedge.The adjacency score of any pair of vertices is defined as the L2 distance of image features: where feat(u) represents the image feature vector of vertex u.  is calculated by the the standard deviation of distances.In this way, Ω can be computed by Figure 2 The process of multiple-feature integration with hypergraph regularization.
Then, DE and DV can be computed by and With the hyper edge weighting matrix, the multi-feature hypergraph Laplacian can be computed by summing the patch optimization defined in Eq. ( 10) of all the hyperedges: The computational complexity of this stage is n 2 .
Then, the overall output can be computed by standard eigendecomposition on L. The dimensionality of resulting vectors D are determined by the number of eigen values we retain.

Datasets and Settings
In the experiments, we use two challenging datasets for face spoof attack detection.The first one is the NUAA Photogrph Imposter Database (NUAA).NUAA is collected by generic and commonlyused webcams [21].It is collected in three sessions.The place and illumination conditions of each session are different.There are 5105 real faces and 7509 fake images from 15 subjects in total.Some sample images are shown in Figure 3.
The second dataset is the multispectral-spoof face spoofing database built at Idiap Research institute (MSSPOOF) [22].It contains both color images and infrared images.Similar to NUAA,  In the experiments, the performance is measured by two criteria as error rates.In face spoof attack detection, we pay more attention on the ratio of fake images classified as real faces, which is known as the false negative (FN) rate.However, the ratio of real faces classified as fake images is also used in our experiments, which is known as the false positive (FP) rate.For cross validation, we randomly choose a subset of samples in training and the rest samples are used in testing.
The proportion ranges from 10% to 50%.This process is repeated 10 times and the average results are shown.All the facial parts are detected and resized to 200 × 200.SVM is used as the classifier for image features.A laptop with i7-9750H CPU, 16G RAM and GTX1650 GPU is used.Evaluations are run on MATLAB R2017a.

Parameter Sensitivities
There are two parameters to be tuned in the proposed method.They are the number of neighbors and the dimensionality of output features D. To demonstrate the effectiveness of hypergraph regularization, we compare it with existing graph learning methods, such as LDA, DLA, LPP, NPE, LSDA and ISOMAP [20].For simplicity, only the results on FN rates under 50% training proportion are shown in Figures 5 and 6.Although the proposed hypergraph regularization is not always the best for each number and dimensionality, the overall best performance is still achieved with it.However, for different datasets, the conditions of the best performance are different.For NUAA, the best performance is achieved with K = 20 and D = 150.On the other hand, for MSSPOOF, the best performance is achieved with K = 25 and D = 200.

Comparison with Existing Methods
To demonstrate the performance of face spoof attack detection, the following methods including the proposed HGC-CNN are used in comparison.
• Face anti-spoofing with domain generalization (FAS-DG) [13].FAS-DG estimates the depth information to get depth loss and  • GS-LBP [7].GS-LBP makes use of the edge-preserving property of the guided scale space.Besides, joint quantization is used to encode the spatial locality.SVM is used as the classifier.
• HGC-CNNs.The proposed multiple-feature method applied capsule networks and hypergraph regularization.
The results on NUAA is shown in Table 2, while the results on MSSPOOF is shown in Table 3.The best results in each row are highlighted.According to them, we can figure out the following observation.
1. Generally speaking, the performance on NUAA is better than MSSPOOF.Fake Images in MSSPOOF are taken with good quality, which makes it more difficult to distinguish fake images and real faces.
2. FP rates are much lower than FN rates.However, FN rates are more important.
3. The proposed HGC-CNN achieves the best performance on FN rates.However, in some cases, SeetaFace6 achieves the best performance on FP rates.
4. Averagely, HGC-CNN achieves better performance over existing methods.In Tables 2 and 3, the proposed method is not always the best on FP.We have looked into the failed samples of the proposed method.We can figure out that blurred images and dark images may cause failures.Therefore, in the future, some image preprocessing can be introduced to further improve the performance.

CONCLUSIONS
In

3 .
Finally, based on the novel representations, face spoof attack is detected with classification.Comprehensive experiments are conducted to validate our method on two commonlyused benchmark datasets.The experimental results indicate the improvement of the proposed method.

Figure 1
Figure 1 The flowchart of face spoof attack detection with hypergraph capsule convolutional neural networks.(1) Face detection is performed to obtain facial images.(2) The features of facial images are extracted.Different types of features can be used, such as color features, LBP and image quality.(3) Different features are enclosed in capsules.(4) Different types of features are integrated with hypergraph regularization.(5) Three FC layers are used to compute the hidden representations.They can be used in classification with different classifiers, such as SVM, softmax and so on.

Figure 3
Figure 3 Sample images in the NUAA dataset are shown.The images in the left two columns are real and the other two columns are fake.

Figure 4
Figure 4 Sample images in the MSSPOOF dataset are shown.The first row are images taken with in Visible spectra (VIS), while the second one are images taken in Near-Infrared spectra (NIR).The first column are real accesses.The second column are VIS attacks.The third column are NIR attacks.

Figure 5
Figure 5 Comparison under different numbers of neighbors.The left figure is the result of NUAA while the right one is the result of MSSPOOF.

Figure 6
Figure 6 Comparison under different dimensionality.The left figure is the result of NUAA while the right one is the result of MSSPOOF.

Table 1
Notations in the hypergraph.
1. Part optimization stage: We define one patch to be the vertices connected by one hyperedge.Thus, a patch in thisDEThe diagonal matrix containing the edge degreesH In this matrix, H(v, e) = 1 if v ∈ e
this paper, a novel feature representation for facial images is proposed.It is used in face spoof attack detection with color images and called HGC-CNNs.It is a multiple-feature learning framework combining the ideas of capsule neural networks and hypergraph regularization.Capsule neural networks are able to integrate different types of features, such as pixel intensities, LBP and image quality.Furthermore, hypergraph regularization can be used to learn the correlations among samples.Locality information further improves the descriptive power of output features.The novel representation can be compatible with exiting classifiers and SVM is chosen in the experiments.Experimental results on the NUAA Photogrph Imposter Database and the Multispectral-Spoof face spoofing database show that the proposed method outperforms previous method on face spoof attack detection with color images.