In this section, we conduct experiments on two public hyper-spectral facial image databases to evaluate the effectiveness of the proposed algorithm; as compared with standard FCM, spatial FCM and FRFCM algorithms.
4.2. Skin Feature Segmentation of the PolyU Hyper-Spectral Face Database
The PolyU–HSFD, gathered by Di et al. [
15] and acquired with the CRI’s VariSpec Liquid Crystal Tunable Filter (LCTF) and a Halogen Light system, consists of 25 data subjects of Asian descent (8 females and 17 males; 21–38 years old), with varying poses (frontal, right and left view of a subject) and time periods that include the hairstyle changes and skin condition diversification. Each data cube (see
Figure 7 for one example) size is 220 × 180 × 33 pixels, with 33 bands covering the spectral range of 400–720 nm with a step size of 10 nm.
Figure 8a,b present basic clustering of facial images of front and left views, respectively. We can observe partial basic clustering with little effective feature information in
Figure 8a, which is consistent with the band selection results of low-SNR bands. We can also find that specific local features stand out on distinct bands, while the bands with all of the local features are severely affected by noise (such as the last third basic clustering of
Figure 8b). Hence, it is necessary to integrate discriminative basic clustering to get a robust result.
Next, we evaluate whether different sizes of a patch have an effect on the clustering ensemble’s results. It is not hard to find that we depict clustering distribution as patches and represent the patches’ feature information by calculating a mean spectral characteristic described in
Section 3.2. The reason for this is that the patch is treated as the high-homogeneity neighborhood.
Figure 9 illustrates the clustering ensemble results with a different patch size of the left view of the facial image of PolyU-HSFD. We can observe that the separation between features is poor with a large patch size of 4 × 4 pixels in
Figure 9c. As the contour boundary of local features belongs to pixel mutation, the local feature boundary can be blurred if the neighborhood size is relatively large, especially if the two local features are close to each other, such as eyes and eyebrows, as illustrated in the
Figure 9c. Instead, the clustering ensemble with a patch size of 2 × 2 pixels is more sensitive to contour boundaries of local features, as illustrated in the
Figure 9a. As can be seen, a patch with a size of 3 × 3 pixels (
Figure 9b) completely confuses the background and features.
The performance of the proposed algorithm with 2 × 2 and 4 × 4 pixels patch sizes are presented in
Table 1, in terms of precision, recall and F1-score. The recall reflects the proportion of the samples correctly classified as true positive samples (TP) to the positive samples that conclude true positive samples (TP) and the false negative samples (FN). It indicates the ability to identify positive samples. The recall can be expressed as:
The higher the recall is, the stronger the recognition ability of the positive samples is.
In addition, the precision reflects the proportion of the true positive samples (TP) to the test’s positive samples that conclude true positive samples (TP) and the false positive samples (FP). It indicates the ability to identify negative samples. The precision can be expressed as:
The higher the precision is, the stronger the recognition ability of the negative samples is.
The F1-score is the combination of accuracy and the recall rate, which can be seen as the average effect. It reflects the robustness of the classification or segmentation model. The higher the F1-score is, the more robust the model is. The F1-score can be given according to:
In this database, we select three non-skin closed local features for assessment, for example, brows, eyes and a mouse. We can see that the 2 × 2 patch size achieves better performance, by improving the F1-score by 0.67%, −0.02% and 0.29% for local features. In general, the non-skin area features are more prominent in a 2 × 2 patch size, because the contour boundary is more obvious. Therefore, we choose the patch size of 2 × 2 pixels for the proposed algorithm in all the following experiments.
Finally, the performance of the proposed algorithm on PolyU-HSFD is compared with the previously mentioned standard FCM, spatial FCM, FRFCM and basic clustering. In
Figure 10a (front facial image), the standard FCM clustering loses numerous features, because spectral reflectance of the face is not a reliable biometric, as spectral reflectance will change slightly due to the external environment [
31].
Figure 10b illustrates spatial FCM clustering that performs better than the standard FCM algorithm as it makes full use of the spatial and spectral information, as expressed by Equation (7). This is achieved by adding the spatial function to the membership function
as expressed by Equation (8).
Figure 10c illustrates the performance of the proposed algorithm, in which almost all local features are obvious.
Figure 10d also indicates a good result using a fast and robust FCM (FRFCM) algorithm.
Table 2 and
Figure 11 summarize the results. Compared with spatial FCM, which makes full use of all spectral and spatial information, our method performs better and the F1-score rate improved by 0.11%, 0.43% and 0.77% for local features. Compared with the better basic clustering of the 23rd band and FRFCM, the F1-score rate is improved by −0.04%, 0.13% and 0.61% as well as −0.11%, 0.11 and 0.02% for local features. The non-skin area features are more prominent in our proposed method.
However, the feature characteristics of the right wing of the nose are not obvious. These features are highlighted in the low SNR bands, which are not selected into the ensemble process. Hence, basic clustering has a significant impact on ensemble results.
4.3. Skin Feature Segmentation of the UWA Hyper-Spectral Face Database
UWA-HSFD is acquired with the CRI’s VariSpec Liquid Crystal Tuneable Filter (LCTF) and integrated with a photon focus camera. UWA-HSFD consists of 79 data subjects in the frontal view taken over 1–4 sessions (see
Figure 12 for one example) [
32,
33]. Each data cube of a hyper-spectral facial image contains 33 bands covering the visible spectral range from 400 to 720 nm with a 10 nm step. The SNR in this database is relatively lower because Uzair et al. [
31] used a novel algorithm that automatically adjusted the camera exposure time based on the filter’s transmittance, illumination intensity and CCD sensitivity for each frequency band. Most subjects had slight head movements and eye blinking during image collection process, therefore, there was alignment errors between individual bands.
In our experiment, each subject in the databases was cropped to a different size according to his/her position from the background. We selected data cubes with little inter-misalignments in three sessions.
Figure 13 illustrates basic clustering with three sessions of the same subject. Once again, this proves that local features are highlighted on distinct bands. We selected three local features for assessment: beard, eyes and brows, which highlight the biological characteristics of the beard. According to the band selection results, the bands highlighting different features with less noise are selected as the input of the clustering ensemble.
In addition, we can observe that the basic clustering of session one had a poor performance of nose characteristics. In session two, several bands in the middle have some noise in the front of the images. That may be due to imaging photographic light.
The performance for the proposed algorithm in
Figure 15a,c,e is compared with spatial FCM in
Figure 15b,d,f and the basic clustering, respectively. We can observe that the local features nose and mouth are not recognized for session one, as illustrated in
Figure 15b and the recognition is not complete and clear for session two, as shown in
Figure 15d. The most unsatisfactory result has little useful information for session three, as illustrated in
Figure 15f. Relatively speaking, local features are clear and complete in the clustering ensemble.
Table 3,
Table 4 and
Table 5 and
Figure 16. illustrate the performance evaluations of three local features (brows, eye and beard) for three sessions. Compared with the better basic clustering of the 12th band, the F1-score rate improved by 0.03%, −0.12% and 0.05% for local features in session one; 0.11%, 0.25% and 0% for local features in session two; and 0%, −0.05% and 0.45% for local features in session three. Compared with the spatial FCM, the F1-score rate improved by 0.28%, 0.31% and 0.80% for local features in session one; 0%, 0.06% and 0.82% for local features in session two; and 0.89%, 0.64% and 0.86% for local features in session 3. Totally, the proposed algorithm is superior to other two algorithms.
However, as illustrated in
Figure 15a, we notice that the nose feature is unsatisfactory in the clustering ensemble of session one, caused by basic clustering as illustrated in
Figure 13a. There is noise interference in the front of clustering ensemble that can be found in
Figure 15c. In this case, we use the location information for feature extraction to avoid being affected by noise interference in the front of facial image. Therefore, basic clustering plays an essential role in the ensemble process.